Problem in candidate-based generation on GENRE using transformers >= 4.36.0 #103

y0uCeF · 2024-01-20T20:57:36Z

Since version v4.36.0 of huggingface transformers, it is not allowed to have prefix_allowed_tokens_fn return an empty set of tokens 27797.
When doing non-free generation on GENRE, which is based on candidates, the output of the lambda assigned to prefix_allowed_tokens_fn may indeed be an empty list, raising a ValueError with the following message:

f"`prefix_allowed_tokens_fn` returned an empty list for batch ID {batch_id}."
"This means that the constraint is unsatisfiable. Please check your implementation"
f"of `prefix_allowed_tokens_fn` "

I have not reproduced the case for free generation, and it seems the code I use for per-mention Trie creation is no longer presented in examples.ipynb :

trie = Trie([
                  [2] + model.encode(e).tolist()[1:]
                  for e in doc["candidates"]
          ])
          if doc["candidates"] else Trie([[2] + self.model.encode("NIL").tolist()[1:]])

Anyway, it is recommended to stick to versions < v4.36.0 (such as v4.35.2) if one falls into that error.
This condition may be added to requirements.txt

The text was updated successfully, but these errors were encountered:

Klaifer · 2024-07-19T19:37:26Z

I had this same problem with the GENRE huggingface implementation, and I developed a workaround.

I'm going to insert it here because I think it might help someone, or someone might adjust it to work in this implementation.

The implementation I'm referring to is: https://huggingface.co/facebook/mgenre-wiki

There is the following instruction:

outputs = model.generate(
    **tokenizer(sentences, return_tensors="pt"),
    num_beams=5,
    num_return_sequences=5,
    # OPTIONAL: use constrained beam search
    prefix_allowed_tokens_fn=lambda batch_id, sent: trie.get(sent.tolist()),
)

The solution is to replace with:

outputs = model.generate(
    **tokenizer(sentences, return_tensors="pt"),
    num_beams=5,
    num_return_sequences=5,
    # OPTIONAL: use constrained beam search
    prefix_allowed_tokens_fn=lambda batch_id, sent: trie.get(sent.tolist()) or [tokenizer.eos_token_id],
)

ghost mentioned this issue Mar 21, 2024

Model doesn't generate output of prefix_allowed_tokens_fn. huggingface/transformers#28922

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in candidate-based generation on GENRE using transformers >= 4.36.0 #103

Problem in candidate-based generation on GENRE using transformers >= 4.36.0 #103

y0uCeF commented Jan 20, 2024

Klaifer commented Jul 19, 2024

Problem in candidate-based generation on GENRE using transformers >= 4.36.0 #103

Problem in candidate-based generation on GENRE using transformers >= 4.36.0 #103

Comments

y0uCeF commented Jan 20, 2024

Klaifer commented Jul 19, 2024