Replies: 2 comments
-
This could be done with a stream output mode that returns a single JSONL line per output token. This would allow including detailed info about the output, and make front-ends that want to work with the output stream in more detailed ways much easier. For example: { "id": 413, "string": "turn", "top_picks": [{ (token id, string and probability for each) }] } This would allow a bunch of things:
I think it'd be really interesting. |
Beta Was this translation helpful? Give feedback.
-
What you're asking for is basically returning token logprobs, which is a desired feature but I have not had the time to prioritize it. https://platform.openai.com/docs/api-reference/chat/create#chat-create-logprobs It's on the low priority backlog |
Beta Was this translation helpful? Give feedback.
-
This may be too "blue sky" but would it be possible to return all the discarded tokens from a generation in some form of multidimensional array via the API?
The aim would be to allow the user to view and possibly regenerate using a discarded token.
An example of this is from NovelAI:
https://github.com/LostRuins/koboldcpp/assets/84193813/884cbe47-e5ec-4373-bc50-59bb54d4f39c
My naive idea would be to store all the tokens generated (candidates->data ??) when running the sample_tokens and then provide that as a new "generate_fulltokens" endpoint.
Beta Was this translation helpful? Give feedback.
All reactions