Autocomplete and text search memory issues: need ideas

souzabrs · February 20, 2020, 5:02pm

I’m having trouble to use the text search and the autocomplete because I have a piece with +87k documents, some of them being big (~3.4MB of text).

I already:

Removed every field from the text index, except title, searchBoost and seoDescription; these are the only fields copied to highSearchText and the field lowSearchText is always set to an empty string.
Modified the standard text index, including the fields type, published and trash in the begining of it. I’m also modified the queries to have equality conditions on these fields. The result returned by the command db.aposDocs.stats() shows:
type_1_published_1_trash_1_highSearchText_text_lowSearchText_text_title_text_searchBoost_text: 12201984 (~11 MB, fits nicely in memory)
Verified that this index is being used, both in ‘toDistinc’ query as well in the final ‘toArray’ query.

What I think is the biggest problem
The documents have many repeated words in the title, so if the user types a word present in 5k document titles, the server suffers.

Idea I’m testing
The MongoDB docs says that to improve performance the entire collection must fit in RAM (https://docs.mongodb.com/manual/core/index-text/#storage-requirements-and-performance-costs, last bullet).

So, I created a separete collection named “search” with just the fields highSearchText (string, indexed as text) and highSearchWords (array, also indexed), wich result in total size of ~ 19 MB.

By doing the same operations of the standard apostrophe autocomplete in this collection, I achieved much faster, but similar results.

Issues

I’ll have to update this collection every time I update a piece in the aposDocs collection.
I’m losing part of the textScore because I copied only the field highSearchText, not other weighted fields like searchBoost. I could copy all of them, but will increase the search collection size.
I’m testing this search collection for the autocomplete. For the simple text search, I’m limiting the sorted response to 50 results. Maybe I’ll have to use the search collection as well, because the search could still breaks.

Is there some easier approach I’m missing? Please, any ideas are welcome.

boutell · February 24, 2020, 4:31pm

This forum is primarily for announcements and general discussion of Apostrophe’s future. For how-to questions, it’s best to use:

chat.apostrophecms.org (chat with the community on discord)
stackoverflow, tagged apostrophe-cms (for clearly defined how-to questions - provide specific details so others can reproduce your issue)
Enterprise Support (support@apostrophecms.com; we’ll reach out and arrange for you to work with our engineers directly on a priority basis)

souzabrs · February 24, 2020, 6:38pm

Sorry about that. I republished the topic on stackoverflow, with minor updates: https://stackoverflow.com/questions/60382003/autocomplete-and-text-search-memory-issues-in-apostrophe-cms-need-ideas

Thanks.