fuzzy_match_positions_graphemes

Function fuzzy_match_positions_graphemes 

Source
pub fn fuzzy_match_positions_graphemes(
    query: &str,
    candidate: &str,
) -> Option<(Vec<usize>, i64)>
Expand description

Returns (positions, score) for a case-insensitive fuzzy match.

“Fuzzy” here means ordered subsequence matching over grapheme clusters, plus a deterministic score to help sort candidates.

  • positions are grapheme indices in candidate where query graphemes were matched.
  • score is higher for:
    • consecutive matches
    • matches at the start of the candidate
    • matches at word boundaries
    • shorter candidates (mild bonus)

Returns None if query cannot be matched in order.

Scoring model (intentionally simple and stable):

  • Base: +10 per matched grapheme
  • Consecutive bonus: +20 for each match that immediately follows the prior match (by grapheme index)
  • Start-of-string bonus: +30 if the first match is at index 0
  • Word-boundary bonus: +15 for each match whose position is a word boundary (start of string, or preceded by a non-word grapheme)
  • Gap penalty: -1 per non-matching grapheme between successive matches
  • Length penalty: -1 per grapheme in candidate (mild; favors shorter)

Word characters are ASCII letters/digits and underscore. Boundary is detected between a non-word grapheme and a word grapheme.

§Versioning

This function is an alias for the stable scorer fuzzy_match_positions_graphemes_v1.