Skip to content

Latest commit

 

History

History
123 lines (78 loc) · 4.39 KB

section_field_weights.md

File metadata and controls

123 lines (78 loc) · 4.39 KB

Use document structure for ranking

Notes:

How can we exploit the document structure to improve ranking? Think of a typical Wikipedia article.


Document structure

Document Structure

Notes: How can we exploit this information for ranking purposes?

Field weights

Document Structure

Notes:

How can we determine the field weights?


Index with Fields

Doc Author Title
#1 Arthur McAuthor A book providing information about information retrieval
#2 Shakesbeer A book about the search for King Arthur

Term Doc IDs
arthur #1:Author, #2:Title
book #1:Title, #2:Title
information #1:Title
mcauthor #1:Author
shakesbeer #2:Author
...

Notes:

Audience question


Field weights

Term Doc IDs
arthur #1:Author, #2:Title
book #1:Title, #2:Title
...

$$\begin{aligned} \text{weight}(\text{author}) & = 10\\\ \text{weight}(\text{title}) & = 1 \end{aligned}$$


­ arthur book?

  • #1 → author + title = 10 + 1 = 11
  • #2 → title + title = 1 + 1 = 2

Notes:

Audience question


Field weights

  • Determining weights is hard
  • Use annotated corpus and machine learning

Notes:

What else can be done with field info? -> Field queries!

Field queries

Term Doc IDs
arthur #1:Author, #2:Title
shakesbeer #2:Author
...

  • ­ title:arthur?
    • ­ #2
  • ­ author:shakesbeer?
    • ­ #2

Notes:

Audience question