Skip to content

Latest commit

 

History

History
231 lines (160 loc) · 4.16 KB

section_wildcard_queries.md

File metadata and controls

231 lines (160 loc) · 4.16 KB

Wildcard queries

apple i*

Notes:


Wildcard queries

Expand query:

i*

Term Documents
galaxy #2, #4
iphone #2, #3
ipad #2, #3, #4
lumia #1

iphone OR ipad

#2, #3, #4

Notes:

  • How to get prefix queries, i.e. salz*? Think of the search tree.

salz*

­ Comes free with a search tree

­ <script class="tree" type="application/json"> { "name": "S", "children": [ { "name": "SA", "children": [ { "name": "SAL", "children": [ { "name": "SALB" }, { "name": "SALZ", "children": [ { "name": "Salzburg", "fill": "#1b91ff" }, { "name": "Salzach", "fill": "#1b91ff" } ] } ] }, { "name": "SAR" } ] }, { "name": "SE" } ] } </script>

Notes:

  • How to get suffix search, i.e. *burg?

*burg

­ Build index with reversed terms

­ <script class="tree" type="application/json"> { "name": "G", "children": [ { "name": "GR", "children": [ { "name": "GRU", "children": [ { "name": "grubuenrok", "fill": "#1b91ff" }, { "name": "grubzlas", "fill": "#1b91ff" } ] }, { "name": "…" } ] }, { "name": "…" } ] } </script>

Notes:

  • How to get infix search, i.e. sal*urg?

sal*urg

­ Intersect results of sal* and *urg

Notes:

  • Audience question

N-gram queries

corona

(3-gram)

[^co, cor, oro, ron, ona, na^]

Notes:


N-gram index

Term Doc IDs
^co #1, #3, #5
cor #1, #2
oro #1, #5
ron #1, #4
ona #1, #2, #4
na^ #1, #2, #3

Notes:

  • Can the original contents be reconstructed from the index?

N-gram queries

Expand query

cor*

^co AND cor

Notes:


N-gram queries

^co AND cor

Term Doc IDs
^co #1, #3, #5
cor #1, #2
oro #1, #5
ron #1, #4
ona #1, #2, #4
na^ #1, #2, #3

#1

Notes:

  • Audience question
  • How can this lead to false positives?

False N-Gram Positives

cor*

^co AND cor

^concord

Notes: