Typo tolerance vs fuzzy search: how Meilisearch handles misspellings

Most search engines treat typo handling as an optional, query-level feature you opt into. Meilisearch treats it as a first-class ranking criterion that works automatically on every query. This page explains the technical differences and why they matter.

How Meilisearch handles typos

Meilisearch stores all indexed terms in a single Finite State Transducer (FST) built at index time. At query time, the engine generates a Levenshtein automaton from your search term and intersects it with the pre-built FST in a single streaming pass. This finds all indexed terms within the allowed edit distance efficiently, without scanning the entire dictionary. Meilisearch uses Damerau-Levenshtein distance, meaning transpositions (swapped adjacent characters, like "teh" → "the") count as a single edit, not two. Typo tolerance is on by default for every index and every query. No query-level parameters are required.

Word length thresholds

Meilisearch does not apply typo tolerance uniformly. The number of typos allowed depends on the length of the query word:

Query word length	Typos allowed
1–4 characters	0 (prefix match only)
5–8 characters	1
9+ characters	2

The hard cap is 2 typos per word, regardless of length. Words with 3 or more differences will never match. These thresholds are configurable via minWordSizeForTypos.

Two special typo counting rules

First-character typo costs 2. A typo on the first character of a word is counted as two typos, not one. This means “caturday” does not match “saturday” (one substitution on position 1, but it costs 2, exceeding the 1-typo budget for 8-char words). This prevents a class of false positives where only the initial character differs. Concatenation costs 1 typo. When two words are separated by a space, Meilisearch also considers them as a single concatenated candidate with 1 typo. For example, searching for "any way" will match documents containing "anyway". No other engine in this comparison handles word-split typos this way.

Typo tolerance is a ranking rule, not a filter

When a query term matches an indexed term via a typo, that result is not discarded or penalized with a separate score modifier. Instead, typo count feeds directly into the typo ranking rule, one of the seven criteria in Meilisearch’s bucket sort pipeline. This means:

A document matching with 0 typos always ranks above one matching with 1 typo, all else being equal
A document matching with 1 typo always ranks above one matching with 2 typos
A result with 0 typos in a less important attribute (body) outranks a result with 2 typos in a more important attribute (title), because typo comes before attribute in the ranking pipeline

There is no score blending or weighting. The ordering is strict and transparent. Disabling typo tolerance entirely also disables the typo ranking rule, since every returned document would have 0 typos by definition.

Prefix search and typo tolerance work together

Meilisearch applies prefix search and typo tolerance simultaneously on the last word of a query. This means a partial, misspelled word still returns results. For example, searching "iphoe" (5 characters, 1 typo budget) can match "iphone" as a prefixed, typo-corrected term in a single pass. Elasticsearch can approximate this by combining an edge_ngram tokenizer (for prefix expansion at index time) with a fuzzy query at search time, but the two mechanisms work on different levels and require careful coordination. In Meilisearch, prefix and typo tolerance are a single unified step with no extra configuration. You can disable prefix search independently from typo tolerance if needed.

Split and concatenate: handling word boundary mistakes

Beyond character-level edits, Meilisearch handles a class of mistakes that Levenshtein distance cannot catch: wrong word boundaries. Concatenation: when a user types multiple words, Meilisearch also searches their concatenated forms. For a query "the news paper", it additionally tries "thenews paper", "the newspaper", and "thenewspaper". Concatenation is applied to up to 3 consecutive words, and each concatenated candidate counts as 1 typo in the ranking pipeline. Splitting: when a user types a single word, Meilisearch considers frequency-based splits. For "newspaper", it finds that "news" and "paper" both have meaningful frequency in the index and tries the split candidate. The split is data-driven: it picks the boundary that maximizes the frequency of both halves in the index dictionary, not a fixed linguistic rule. A split into "new" + "spaper" is rejected because "spaper" has no frequency. Split words must remain adjacent. A document with "news" and "paper" separated by other words will not match. Together, these two mechanisms handle the common real-world case where users omit or add spaces within compound words or multi-word phrases. Elasticsearch can handle compound words through custom token filters (like the word_delimiter_graph filter or language-specific compound word decomposers), but this requires upfront index configuration per language and does not cover the query-side concatenation case. See Concatenated and split queries for more detail.

Language-aware tokenization before typo matching

Meilisearch’s tokenizer, Charabia, normalizes and segments text before typo tolerance runs. This matters because typo matching operates on tokens, not raw characters, and what counts as a token depends on the language. Key transformations that affect typo matching:

Language / Feature	What Charabia does	Why it matters for typos
All Latin scripts	Lowercase, decompose accents, remove diacritics	`"café"` and `"cafe"` are the same token (no typo budget wasted on accents)
CamelCase	Splits `"iPhone"` into `"i"` + `"phone"`	Searching `"iphoen"` can match the `"phone"` token with 1 typo
German	Decomposes compound words (`"Krankenhaus"` → `"kranken"` + `"haus"`)	Each part is independently typo-matchable
Arabic	Removes the definite article `"ال"`	`"الكتاب"` and `"كتاب"` are treated as the same root
Turkish	Specialized case folding (dotted/dotless i)	`"I"` and `"ı"` don’t incorrectly cost a typo
Chinese / Japanese / Korean	Dictionary-based segmentation (jieba, lindera)	Words are correctly isolated before character-level matching
Greek	Final sigma handling	`"λόγος"` and `"λόγοσ"` normalize to the same form

In contrast, engines like Elasticsearch, Solr, and Manticore apply edit distance after their configured analyzer runs. If the analyzer includes ASCII folding, accents are normalized before matching. But normalization is opt-in and per-field: without explicit configuration, an accent, a case difference, or a language-specific ligature can consume part of the typo budget or cause misses entirely. Charabia applies the right normalization automatically based on the detected language, with no per-field setup required. PostgreSQL pg_trgm is always raw: trigrams of "café" and "cafe" differ regardless of configuration.

Surgical disable controls

Meilisearch gives you four independent knobs to turn typo tolerance off for specific situations, without affecting the rest:

Setting	Scope	Use case
`enabled: false`	Entire index	Massive or multilingual datasets where false positives dominate
`disableOnWords`	Specific query terms	Brand names, proper nouns, product codes you want exact
`disableOnAttributes`	Specific document fields	SKU, barcode, serial number fields where precision matters
`disableOnNumbers`	All numeric tokens	Prevents `2024` matching `2025`, improves indexing performance

Elasticsearch can achieve similar granularity through per-field analyzer configuration and query-level fuzziness overrides, but it requires per-query code changes or separate index mappings. Meilisearch exposes all of these as index-level settings applied consistently across every query.

How other engines handle typos

Elasticsearch and OpenSearch

Elasticsearch (and OpenSearch, which shares the same Lucene core) uses fuzzy queries based on Levenshtein distance, but they must be explicitly enabled per query with the fuzziness parameter:

{
  "query": {
    "match": {
      "title": {
        "query": "iphoen",
        "fuzziness": "AUTO"
      }
    }
  }
}

fuzziness: "AUTO" applies similar length-based thresholds, but they differ from Meilisearch’s defaults:

Word length	Elasticsearch AUTO	Meilisearch default
1-2 chars	0 edits	0 typos
3-5 chars	1 edit	0 typos
5-8 chars	2 edits	1 typo
9+ chars	2 edits	2 typos

Elasticsearch is more permissive for short words (allows 1 edit from 3 characters vs Meilisearch’s threshold of 5), which increases recall but also false positives on short terms. However:

Opt-in: if you forget to add fuzziness to a query, typos return zero results
Score modifier: fuzzy matches lower the BM25 score, but the score is still a single number mixing term frequency, IDF, and fuzziness penalty into an opaque value
Not a ranking rule: there is no way to say “always prefer 0-typo matches over 1-typo matches regardless of term frequency.” A frequent misspelled term can outscore a rare exact match
Prefix queries are separate: fuzzy and prefix are two distinct query types in Elasticsearch. Combining them requires a bool query with both a fuzzy clause and a prefix clause, or using an edge_ngram tokenizer at index time. It is achievable, but requires deliberate setup and adds complexity to every query

Normalization and custom tokenizers. Where Elasticsearch has a genuine advantage is in its analyzer system. You can build a fully custom pipeline: any combination of character filters (strip HTML, map characters), tokenizers (standard, whitespace, ngram, edge-ngram, pattern, language-specific), and token filters (lowercase, stemmer, synonym, ASCII folding, stop words, phonetic). This makes Elasticsearch extremely powerful for domain-specific normalization: a medical search engine can apply specialized stemming, a legal platform can expand abbreviations, a multilingual product catalog can use the ICU analyzer with Unicode-aware case folding and decomposition across all scripts. Charabia provides built-in normalization for the most common languages, but Elasticsearch’s analyzer framework is more flexible for advanced or unusual requirements. The trade-off is that getting it right requires significant configuration expertise, and misconfigured analyzers are a common source of relevance bugs.

Apache Solr

Solr is built on the same Lucene engine as Elasticsearch. Fuzzy matching uses the ~ tilde syntax in query strings, or the fuzzy query type in JSON:

q=title:iphoen~1

The ~N suffix sets the maximum edit distance (0, 1, or 2). Behavior is identical to Elasticsearch at the Lucene level:

Opt-in per query: not automatic
Lucene fuzzy query: edit distance computed at query time, Levenshtein automata generated on the fly
BM25 score modifier: fuzzy matches reduce the document’s relevance score; no strict bucket ordering
No prefix fuzzy: the tilde syntax does not combine prefix expansion with fuzzy matching

MongoDB Atlas Search

MongoDB Atlas Search is built on Lucene and exposes a fuzzy option within the text operator:

{
  "$search": {
    "text": {
      "query": "iphoen",
      "path": "title",
      "fuzzy": {
        "maxEdits": 2,
        "prefixLength": 3
      }
    }
  }
}

Opt-in: the fuzzy option must be added explicitly; standard text queries do not tolerate typos
prefixLength: the first N characters must match exactly before fuzzy expansion applies, which improves performance but reduces coverage for early-position typos
Lucene scoring: fuzzy matches lower the relevance score, same BM25 mechanics as Elasticsearch and Solr
Computed at query time: automata are generated on the fly per query

Manticore Search

Manticore Search (a fork of Sphinx) supports fuzzy matching via the MATCH function with a fuzzy flag or using levenshtein() in expressions:

SELECT * FROM movies WHERE MATCH('@title iphoen~2');

Or with the HTTP API using the fuzziness parameter in a way similar to Elasticsearch (Manticore offers an Elasticsearch-compatible API layer).

Opt-in: fuzzy matching must be explicitly invoked per query
Levenshtein distance: computed at query time
Score modifier: fuzzy matches reduce the BM25-based relevance weight
No automatic prefix+fuzzy: prefix and fuzzy are separate matching modes

PostgreSQL (`pg_trgm`)

PostgreSQL’s pg_trgm extension uses trigram similarity rather than edit distance. It splits strings into overlapping 3-character substrings and measures how many trigrams two strings share:

SELECT * FROM movies
WHERE similarity(title, 'iphoen') > 0.3
ORDER BY similarity(title, 'iphoen') DESC;

This is a fundamentally different approach:

Statistical, not edit-based: “iphone” and “iphoen” share many trigrams (iph, pho, hoe, oen) so they score well. But short-word false positives are common because short strings share few trigrams in general
Threshold tuning required: the similarity threshold (default 0.3) must be manually tuned per use case
Not automatic: requires explicit similarity() calls or GIN/GIST indexes with the % operator
No ranking integration: similarity is a plain score on top of SQL WHERE clauses, not a search ranking rule
No prefix awareness: trigram similarity is not prefix-aware. “prog” does not naturally match “programming” via trigrams the way prefix DFA does

Learn more

Typo tolerance settings: configure thresholds, disable on words or numbers, and more
Typo tolerance calculations: how edit distance is computed in detail
Concatenated and split queries: how Meilisearch handles word boundary mistakes
Prefix search: how prefix matching works and how it interacts with typo tolerance
Language support: Charabia’s tokenization and normalization per language
Ranking rules: how the typo rule fits into the full ranking pipeline
Ranking vs BM25: why Meilisearch’s multi-criteria system produces better results for application search

Self-hosting

Comparisons

Migration

Demos

Under the hood

Other resources

Typo tolerance vs fuzzy search: how Meilisearch handles misspellings

How Meilisearch handles typos

Word length thresholds

Two special typo counting rules

Typo tolerance is a ranking rule, not a filter

Prefix search and typo tolerance work together

Split and concatenate: handling word boundary mistakes

Language-aware tokenization before typo matching

Surgical disable controls

How other engines handle typos

Elasticsearch and OpenSearch

Apache Solr

MongoDB Atlas Search

Manticore Search

PostgreSQL (`pg_trgm`)

Learn more

Self-hosting

Comparisons

Migration

Demos

Under the hood

Other resources

​How Meilisearch handles typos

​Word length thresholds

​Two special typo counting rules

​Typo tolerance is a ranking rule, not a filter

​Prefix search and typo tolerance work together

​Split and concatenate: handling word boundary mistakes

​Language-aware tokenization before typo matching

​Surgical disable controls

​How other engines handle typos

​Elasticsearch and OpenSearch

​Apache Solr

​MongoDB Atlas Search

​Manticore Search

​PostgreSQL (pg_trgm)

​Learn more

How Meilisearch handles typos

Word length thresholds

Two special typo counting rules

Typo tolerance is a ranking rule, not a filter

Prefix search and typo tolerance work together

Split and concatenate: handling word boundary mistakes

Language-aware tokenization before typo matching

Surgical disable controls

How other engines handle typos

Elasticsearch and OpenSearch

Apache Solr

MongoDB Atlas Search

Manticore Search

PostgreSQL (`pg_trgm`)

Learn more