Notes:
How can we exploit the document structure to improve ranking? Think of a typical Wikipedia article.
Notes:
How can we determine the field weights?
Doc | Author | Title |
---|---|---|
#1 | Arthur McAuthor | A book providing information about information retrieval |
#2 | Shakesbeer | A book about the search for King Arthur |
Term | Doc IDs |
---|---|
arthur | #1:Author, #2:Title |
book | #1:Title, #2:Title |
information | #1:Title |
mcauthor | #1:Author |
shakesbeer | #2:Author |
... |
Notes:
Audience question
Term | Doc IDs |
---|---|
arthur | #1:Author, #2:Title |
book | #1:Title, #2:Title |
... |
arthur book
?
- #1 → author + title = 10 + 1 = 11
- #2 → title + title = 1 + 1 = 2
Notes:
Audience question
- Determining weights is hard
- Use annotated corpus and machine learning
Notes:
Term | Doc IDs |
---|---|
arthur | #1:Author, #2:Title |
shakesbeer | #2:Author |
... |
-
title:arthur
?- #2
-
author:shakesbeer
?- #2
Notes:
Audience question