Use document structure for ranking

Notes:

How can we exploit the document structure to improve ranking? Think of a typical Wikipedia article.

Document structure

Notes: How can we exploit this information for ranking purposes?

Field weights

Notes:

How can we determine the field weights?

Index with Fields

Doc	Author	Title
#1	Arthur McAuthor	A book providing information about information retrieval
#2	Shakesbeer	A book about the search for King Arthur

Term	Doc IDs
arthur	#1:Author, #2:Title
book	#1:Title, #2:Title
information	#1:Title
mcauthor	#1:Author
shakesbeer	#2:Author
...

Notes:

Audience question

Field weights

Term	Doc IDs
arthur	#1:Author, #2:Title
book	#1:Title, #2:Title
...

$$\begin{aligned} \text{weight}(\text{author}) & = 10\\\ \text{weight}(\text{title}) & = 1 \end{aligned}$$

arthur book?

#1 → author + title = 10 + 1 = 11
#2 → title + title = 1 + 1 = 2

Notes:

Audience question

Field weights

Determining weights is hard
Use annotated corpus and machine learning

Notes:

What else can be done with field info? -> Field queries!

Field queries

Term	Doc IDs
arthur	#1:Author, #2:Title
shakesbeer	#2:Author
...

title:arthur?
- #2
author:shakesbeer?
- #2

Notes:

Audience question

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

section_field_weights.md

section_field_weights.md

Use document structure for ranking

Document structure

Notes: How can we exploit this information for ranking purposes?

Field weights

Index with Fields

Field weights

Field weights

What else can be done with field info? -> Field queries!

Field queries

Files

section_field_weights.md

Latest commit

History

section_field_weights.md

File metadata and controls

Use document structure for ranking

Document structure

Notes: How can we exploit this information for ranking purposes?

Field weights

Index with Fields

Field weights

Field weights

What else can be done with field info? -> Field queries!

Field queries