-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Output from tool calls are not very observable #9770
Comments
🤖 It's great that you've identified the issue with the If you have a proposed solution or fix in mind, I would encourage you to go ahead and create a pull request with your changes. Your contribution will be valuable in improving LlamaIndex and ensuring it aligns with the OpenInference Tracing conventions. Thank you for your dedication to improving LlamaIndex! This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@tslmy do you have any idea on what the fix here would be? Looking at arize's spec, I actually have no idea 😆 We should be populating a |
@logan-markewich , I opened an Issue to the Open Inference Specifications repo above, where I asked maintainers there to come up with a conventional way to specify tool outputs. Let's see what they will say. |
function_call_response
does not adhere to OpenInference Tracing conventions
Hi, @tslmy, I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue highlighted a bug where the output from function calls is not observable in Arize Phoenix and ChainLit due to a non-standard string used to identify responses from generic function calls with callback handlers. The resolution involved encouraging the author to create a pull request with their proposed changes, and there is ongoing discussion about the need for a conventional way to specify tool outputs, with an issue opened in the Open Inference Specifications repo for further discussion. Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you! |
@dosubot shall we wait till Arize-ai/open-inference-spec#37 is closed before closing this? |
Hello, i am also trying to get a non OpenAI model working. Is OpenAILike the only way to do this, like you did. Or can i just use HuggingFaceLLM? |
Based on the context provided,
On the other hand, So, depending on the specific requirements of your project and the ecosystem you are working with, you can choose either Sources
|
@teis-e openailike is for models hosted on a server that has the same API as openai HuggingfaceLLM is specifically for running a model locally (no server, just in the same process/program as llamaindex) |
Right! Thnx, but the openailike gives the opurtunity to run agents and function calling etc without OpenAI. So just a local model, like the Zypher from above? |
Using openailike does not magically give access to function calling (I wish haha) The server running the api has to actually have this implemented to handle the tools kwarg when running the model |
Hi @teis-e, Do you have any update on this issue? I'm experiencing the same problem while attempting to track token usage from LLM calls. I'm using Phoenix, and I'm finding it difficult to determine the precise number of input, output, and total tokens used for each LLM call. |
I think you can tokenize the messages and then the len() of that is the token amount.Op 14 okt 2024 om 05:15 heeft ryann-sportsbet ***@***.***> het volgende geschreven:
Hi @teis-e,
Do you have any update on this issue?
I'm experiencing the same problem while attempting to track token usage from LLM calls. I'm using Phoenix, and I'm finding it difficult to determine the precise number of input, output, and total tokens used for each LLM call.
For context, I'm working with Bedrock Converse.
Any insights or progress on this matter would be greatly appreciated mate
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I can think you can tokenize the messaga and then the len() is the amount of tokens. |
Bug Description
What seems wrong
I can't see responses from function calls in Arize Phoenix...
... or in ChainLit:
What I was expecting
I usually can see results from other types of Spans, such as
embedding
in Arize Phoenix...... and Query Engines tools in ChainLit:
so I was expecting the same level of observability for
FunctionTool
s.My guess
I think this is because
"function_call_response"
, the string LlamaIndex uses to identify responses from generic function calls with callback handlers, was non-standard to OpenInference Tracing conventions.The
"function_call_response"
is defined here:llama_index/llama_index/callbacks/schema.py
Line 61 in 3b52215
It is sent to callback managers here:
llama_index/llama_index/agent/react/step.py
Line 230 in 3b52215
And the conventions are here. Indeed, Arize Phoenix doesn't know
"function_call_response"
, as shown here:https://github.com/Arize-ai/phoenix/blob/363e8913a3a0f7dafca5dc6bba6bf0e9776c1158/app/src/openInference/tracing/semanticConventions.ts#L51-L58
Version
0.9.23
Steps to Reproduce
Save the script above as
main.py
. Run in terminal:Ask the Agent "Search Wikipedia and summarize what is a cheesecake", and observe.
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: