Implementing Adaptive SIFT #92

chris-aeviator · 2024-11-23T06:06:10Z

Congrats to the release of the paper and library.

Could you point out which of the return values of activeft hints at approximation of the models uncertainty?

What's also not clear to me is how "SIFT estimates the uncertainty about the response to a given prompt after having been fine-tuned on some data (§3)". Empathesis is on after the model has been fine tuned, as this library is used to determine the dataset to use before fine tuning.

jonhue · 2024-11-23T09:24:19Z

Hi Chris!

I had not added adaptive SIFT to this library. It is a relatively straightforward post-hoc processing of the returned outputs, based on their "values". But I understand that in its current form, the meaning of the returned "values" is a bit cryptic. The estimated posterior uncertainty is sqrt(-values).
I added direct support for adaptive SIFT in #93. You can just pass your alpha parameter when instantiating the sift.Retriever object.

Regarding how the uncertainty estimation works: The key thing is that we use the "surrogate model" to estimate uncertainty after fine-tuning. In this way, we can estimate this uncertainty without actually having to fine-tune the model.

Hope this helps
Jonas

chris-aeviator · 2024-11-27T18:53:47Z

hey, thanks for the fast implementation. When I set alpha (I tried values from the paper, 0.15, 0.55, 2.0), but whenever I set alpha, I don't get any results from the retriever, contrary to when i set it to None. I'm working with phi3 and also the embeddings are generated from phi 3.

retriever = Retriever(index, also_query_opposite=False, alpha=0.15)

(P.S. sorry for the double post, posted from a wrong account at first)

jonhue · 2024-11-27T21:49:07Z

Hi Chris, very sorry about this.

For the adaptive results from the paper, we ran experiments post-hoc. I must have made a mistake in the pr where I ported the code into this library. I'll try to get this fixed in the next few weeks!

jonhue · 2024-11-27T21:51:00Z

One thing that might happen since you mentioned that you use embeddings from phi3 is that they are in a different unit. Are your embeddings normalized? If not you might have to set very different (much smaller) alphas.

chris-aeviator · 2024-11-28T09:41:03Z

ok this is helpful. Just realized you've been using different embedding models. Any chance you can share the experiments/code around GPT2 & Phi that's part of the paper? Is your codebase based on https://github.com/socialfoundations/tttlm ?

chris-aeviator changed the title ~~Adaptive SIFT~~ Implementing Adaptive SIFT Nov 23, 2024

jonhue closed this as completed Nov 23, 2024

jonhue reopened this Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Adaptive SIFT #92

Implementing Adaptive SIFT #92

chris-aeviator commented Nov 23, 2024

jonhue commented Nov 23, 2024 •

edited

Loading

chris-aeviator commented Nov 27, 2024

jonhue commented Nov 27, 2024

jonhue commented Nov 27, 2024 •

edited

Loading

chris-aeviator commented Nov 28, 2024

Implementing Adaptive SIFT #92

Implementing Adaptive SIFT #92

Comments

chris-aeviator commented Nov 23, 2024

jonhue commented Nov 23, 2024 • edited Loading

chris-aeviator commented Nov 27, 2024

jonhue commented Nov 27, 2024

jonhue commented Nov 27, 2024 • edited Loading

chris-aeviator commented Nov 28, 2024

jonhue commented Nov 23, 2024 •

edited

Loading

jonhue commented Nov 27, 2024 •

edited

Loading