Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model = HfApiModel(model_id=model_id, timeout=300) - Timeout parameter seems ineffective #61

Open
joaopauloschuler opened this issue Jan 4, 2025 · 1 comment

Comments

@joaopauloschuler
Copy link
Contributor

Hello,
First of all, congrats for your work.

model = HfApiModel(model_id=model_id, timeout=300)

When creating a new model, the timeout setting seems ineffective. When running the agent, I frequently get:

HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://api-inference.huggingface.co/models/Qwen/Qwen2.5-72B-Instruct/v1/chat/completions (Request ID: 21DsMRrush2AMn6er2XMD)

Model too busy, unable to get response in less than 120 second(s)```
@aymeric-roucher
Copy link
Collaborator

Since the class HfApiModel uses self.client = InferenceClient(self.model_id, token=token, timeout=timeout) under the hood, maybe this could be due to huggingface_hub.InferenceClient not respecting timeouts. Did you try calling directly the InferenceClient with the same Qwen model and a long message list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants