I’m having trouble to use the text search and the autocomplete because I have a piece with +87k documents, some of them being big (~3.4MB of text).
I already:
-
Removed every field from the text index, except
title
,searchBoost
andseoDescription
; these are the only fields copied tohighSearchText
and the fieldlowSearchText
is always set to an empty string. -
Modified the standard text index, including the fields
type
,published
andtrash
in the begining of it. I’m also modified the queries to have equality conditions on these fields. The result returned by the command db.aposDocs.stats() shows:
type_1_published_1_trash_1_highSearchText_text_lowSearchText_text_title_text_searchBoost_text: 12201984 (~11 MB, fits nicely in memory)
-
Verified that this index is being used, both in ‘toDistinc’ query as well in the final ‘toArray’ query.
What I think is the biggest problem
The documents have many repeated words in the title, so if the user types a word present in 5k document titles, the server suffers.
Idea I’m testing
The MongoDB docs says that to improve performance the entire collection must fit in RAM (https://docs.mongodb.com/manual/core/index-text/#storage-requirements-and-performance-costs, last bullet).
So, I created a separete collection named “search” with just the fields highSearchText (string, indexed as text) and highSearchWords (array, also indexed), wich result in total size of ~ 19 MB.
By doing the same operations of the standard apostrophe autocomplete in this collection, I achieved much faster, but similar results.
Issues
- I’ll have to update this collection every time I update a piece in the aposDocs collection.
- I’m losing part of the textScore because I copied only the field
highSearchText
, not other weighted fields likesearchBoost
. I could copy all of them, but will increase the search collection size. - I’m testing this search collection for the autocomplete. For the simple text search, I’m limiting the sorted response to 50 results. Maybe I’ll have to use the search collection as well, because the search could still breaks.
Is there some easier approach I’m missing? Please, any ideas are welcome.