Improved Multi-Term Search Relevancy

Last modified by Vincent Massol on 2021/04/06

This is commonly referred to as sloppy phrase matching and allows for more "Googleish" query strings. The implementation allows for partial multi-term matching so that searching for long phrases won’t “hide” documents with a subset of shorter multi-term matches.

For those interested in the more technical aspects:

  • The query string is split into groups of multi-term contiguous sequences (word shingles), ignoring stopwords where appropriate.
  • The higher the number of terms in close proximity the more relevant the document will appear.
  • While the order of the terms in the query string is important for forming word shingles, the order in which the terms appear in the document doesn’t affect the score.
  • Stemming is still used so that all forms of each word are considered matches even in multi-term relevancy calculations.
  • Exact matches are still returned when terms are encased in quotes.
Tags:
   

Get Connected