How Meilisearch handles typos
Meilisearch stores all indexed terms in a single Finite State Transducer (FST) built at index time. At query time, the engine generates a Levenshtein automaton from your search term and intersects it with the pre-built FST in a single streaming pass. This finds all indexed terms within the allowed edit distance efficiently, without scanning the entire dictionary. Meilisearch uses Damerau-Levenshtein distance, meaning transpositions (swapped adjacent characters, like"teh" → "the") count as a single edit, not two.
Typo tolerance is on by default for every index and every query. No query-level parameters are required.
Word length thresholds
Meilisearch does not apply typo tolerance uniformly. The number of typos allowed depends on the length of the query word:| Query word length | Typos allowed |
|---|---|
| 1–4 characters | 0 (prefix match only) |
| 5–8 characters | 1 |
| 9+ characters | 2 |
minWordSizeForTypos.
Two special typo counting rules
First-character typo costs 2. A typo on the first character of a word is counted as two typos, not one. This means “caturday” does not match “saturday” (one substitution on position 1, but it costs 2, exceeding the 1-typo budget for 8-char words). This prevents a class of false positives where only the initial character differs. Concatenation costs 1 typo. When two words are separated by a space, Meilisearch also considers them as a single concatenated candidate with 1 typo. For example, searching for"any way" will match documents containing "anyway". No other engine in this comparison handles word-split typos this way.
Typo tolerance is a ranking rule, not a filter
When a query term matches an indexed term via a typo, that result is not discarded or penalized with a separate score modifier. Instead, typo count feeds directly into thetypo ranking rule, one of the seven criteria in Meilisearch’s bucket sort pipeline.
This means:
- A document matching with 0 typos always ranks above one matching with 1 typo, all else being equal
- A document matching with 1 typo always ranks above one matching with 2 typos
- A result with 0 typos in a less important attribute (body) outranks a result with 2 typos in a more important attribute (title), because
typocomes beforeattributein the ranking pipeline
typo ranking rule, since every returned document would have 0 typos by definition.
Prefix search and typo tolerance work together
Meilisearch applies prefix search and typo tolerance simultaneously on the last word of a query. This means a partial, misspelled word still returns results. For example, searching"iphoe" (5 characters, 1 typo budget) can match "iphone" as a prefixed, typo-corrected term in a single pass.
Elasticsearch can approximate this by combining an edge_ngram tokenizer (for prefix expansion at index time) with a fuzzy query at search time, but the two mechanisms work on different levels and require careful coordination. In Meilisearch, prefix and typo tolerance are a single unified step with no extra configuration.
You can disable prefix search independently from typo tolerance if needed.
Split and concatenate: handling word boundary mistakes
Beyond character-level edits, Meilisearch handles a class of mistakes that Levenshtein distance cannot catch: wrong word boundaries. Concatenation: when a user types multiple words, Meilisearch also searches their concatenated forms. For a query"the news paper", it additionally tries "thenews paper", "the newspaper", and "thenewspaper". Concatenation is applied to up to 3 consecutive words, and each concatenated candidate counts as 1 typo in the ranking pipeline.
Splitting: when a user types a single word, Meilisearch considers frequency-based splits. For "newspaper", it finds that "news" and "paper" both have meaningful frequency in the index and tries the split candidate. The split is data-driven: it picks the boundary that maximizes the frequency of both halves in the index dictionary, not a fixed linguistic rule. A split into "new" + "spaper" is rejected because "spaper" has no frequency.
Split words must remain adjacent. A document with "news" and "paper" separated by other words will not match.
Together, these two mechanisms handle the common real-world case where users omit or add spaces within compound words or multi-word phrases. Elasticsearch can handle compound words through custom token filters (like the word_delimiter_graph filter or language-specific compound word decomposers), but this requires upfront index configuration per language and does not cover the query-side concatenation case. See Concatenated and split queries for more detail.
Language-aware tokenization before typo matching
Meilisearch’s tokenizer, Charabia, normalizes and segments text before typo tolerance runs. This matters because typo matching operates on tokens, not raw characters, and what counts as a token depends on the language. Key transformations that affect typo matching:| Language / Feature | What Charabia does | Why it matters for typos |
|---|---|---|
| All Latin scripts | Lowercase, decompose accents, remove diacritics | "café" and "cafe" are the same token (no typo budget wasted on accents) |
| CamelCase | Splits "iPhone" into "i" + "phone" | Searching "iphoen" can match the "phone" token with 1 typo |
| German | Decomposes compound words ("Krankenhaus" → "kranken" + "haus") | Each part is independently typo-matchable |
| Arabic | Removes the definite article "ال" | "الكتاب" and "كتاب" are treated as the same root |
| Turkish | Specialized case folding (dotted/dotless i) | "I" and "ı" don’t incorrectly cost a typo |
| Chinese / Japanese / Korean | Dictionary-based segmentation (jieba, lindera) | Words are correctly isolated before character-level matching |
| Greek | Final sigma handling | "λόγος" and "λόγοσ" normalize to the same form |
pg_trgm is always raw: trigrams of "café" and "cafe" differ regardless of configuration.
Surgical disable controls
Meilisearch gives you four independent knobs to turn typo tolerance off for specific situations, without affecting the rest:| Setting | Scope | Use case |
|---|---|---|
enabled: false | Entire index | Massive or multilingual datasets where false positives dominate |
disableOnWords | Specific query terms | Brand names, proper nouns, product codes you want exact |
disableOnAttributes | Specific document fields | SKU, barcode, serial number fields where precision matters |
disableOnNumbers | All numeric tokens | Prevents 2024 matching 2025, improves indexing performance |
fuzziness overrides, but it requires per-query code changes or separate index mappings. Meilisearch exposes all of these as index-level settings applied consistently across every query.
How other engines handle typos
Elasticsearch and OpenSearch
Elasticsearch (and OpenSearch, which shares the same Lucene core) uses fuzzy queries based on Levenshtein distance, but they must be explicitly enabled per query with thefuzziness parameter:
fuzziness: "AUTO" applies similar length-based thresholds, but they differ from Meilisearch’s defaults:
| Word length | Elasticsearch AUTO | Meilisearch default |
|---|---|---|
| 1-2 chars | 0 edits | 0 typos |
| 3-5 chars | 1 edit | 0 typos |
| 5-8 chars | 2 edits | 1 typo |
| 9+ chars | 2 edits | 2 typos |
- Opt-in: if you forget to add
fuzzinessto a query, typos return zero results - Score modifier: fuzzy matches lower the BM25 score, but the score is still a single number mixing term frequency, IDF, and fuzziness penalty into an opaque value
- Not a ranking rule: there is no way to say “always prefer 0-typo matches over 1-typo matches regardless of term frequency.” A frequent misspelled term can outscore a rare exact match
- Prefix queries are separate:
fuzzyandprefixare two distinct query types in Elasticsearch. Combining them requires aboolquery with both afuzzyclause and aprefixclause, or using anedge_ngramtokenizer at index time. It is achievable, but requires deliberate setup and adds complexity to every query
Apache Solr
Solr is built on the same Lucene engine as Elasticsearch. Fuzzy matching uses the~ tilde syntax in query strings, or the fuzzy query type in JSON:
~N suffix sets the maximum edit distance (0, 1, or 2). Behavior is identical to Elasticsearch at the Lucene level:
- Opt-in per query: not automatic
- Lucene fuzzy query: edit distance computed at query time, Levenshtein automata generated on the fly
- BM25 score modifier: fuzzy matches reduce the document’s relevance score; no strict bucket ordering
- No prefix fuzzy: the tilde syntax does not combine prefix expansion with fuzzy matching
MongoDB Atlas Search
MongoDB Atlas Search is built on Lucene and exposes afuzzy option within the text operator:
- Opt-in: the
fuzzyoption must be added explicitly; standardtextqueries do not tolerate typos prefixLength: the first N characters must match exactly before fuzzy expansion applies, which improves performance but reduces coverage for early-position typos- Lucene scoring: fuzzy matches lower the relevance score, same BM25 mechanics as Elasticsearch and Solr
- Computed at query time: automata are generated on the fly per query
Manticore Search
Manticore Search (a fork of Sphinx) supports fuzzy matching via theMATCH function with a fuzzy flag or using levenshtein() in expressions:
fuzziness parameter in a way similar to Elasticsearch (Manticore offers an Elasticsearch-compatible API layer).
- Opt-in: fuzzy matching must be explicitly invoked per query
- Levenshtein distance: computed at query time
- Score modifier: fuzzy matches reduce the BM25-based relevance weight
- No automatic prefix+fuzzy: prefix and fuzzy are separate matching modes
PostgreSQL (pg_trgm)
PostgreSQL’s pg_trgm extension uses trigram similarity rather than edit distance. It splits strings into overlapping 3-character substrings and measures how many trigrams two strings share:
- Statistical, not edit-based: “iphone” and “iphoen” share many trigrams (
iph,pho,hoe,oen) so they score well. But short-word false positives are common because short strings share few trigrams in general - Threshold tuning required: the similarity threshold (default 0.3) must be manually tuned per use case
- Not automatic: requires explicit
similarity()calls or GIN/GIST indexes with the%operator - No ranking integration: similarity is a plain score on top of SQL
WHEREclauses, not a search ranking rule - No prefix awareness: trigram similarity is not prefix-aware. “prog” does not naturally match “programming” via trigrams the way prefix DFA does
Learn more
- Typo tolerance settings: configure thresholds, disable on words or numbers, and more
- Typo tolerance calculations: how edit distance is computed in detail
- Concatenated and split queries: how Meilisearch handles word boundary mistakes
- Prefix search: how prefix matching works and how it interacts with typo tolerance
- Language support: Charabia’s tokenization and normalization per language
- Ranking rules: how the
typorule fits into the full ranking pipeline - Ranking vs BM25: why Meilisearch’s multi-criteria system produces better results for application search