Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Models integration #27

Open
TheWhiteWord opened this issue Sep 8, 2023 · 58 comments
Open

Local Models integration #27

TheWhiteWord opened this issue Sep 8, 2023 · 58 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@TheWhiteWord
Copy link

Hey Devs,
let me start by saying that this programme is great. Well done on your work, and thanks for sharing it.

My question is: is there any plan to allow for the integration of local models?
Even a just section in the documentation would be great.

Have a good day
theWW

@andraz
Copy link

andraz commented Sep 8, 2023

+1 to this question

It makes no sense to shovel money into some closed source while we have a powerful GPU that can run 13b Llama with no problem with some of the other open source projects.

@thedualspace
Copy link

I'd also be very eager to use local models with ChatDev, Llama based models show great promise

@j-loquat
Copy link

j-loquat commented Sep 8, 2023

Local model use and perhaps as a more advanced feature, assign different models to different agents in the company - so could use a local python-optimized model for an engineer, and a llama2 model for the CEO, etc.

@TheWhiteWord
Copy link
Author

@j-loquat I love that idea. That is a thing i was considering more and more. Ai becoming more and more like greek gods, each with its characther and function that complete each other.. it was the original vision of Altman too, kind of, but they lost their way

@andraz
Copy link

andraz commented Sep 8, 2023

No need to have 1 "god AGI" (which can not be ran locally as it demands crazy hardware) if we can have 20 agents with 20 different local narrow AI models that can be loaded one after another.

@TheWhiteWord
Copy link
Author

TheWhiteWord commented Sep 8, 2023

Oh god, sorry Devs but this conversation is too interesting. You may need to turn notifications off XD

I was trained as an artist, and the first thing to know is that limitations are the generator of creativity. A big Ai with all the knowledge of the world may just become the most boring thing to touch the planet. And this may be controversial, but I think that bad qualities are needed too...everything has its meaning and use in order to create balance. Just my opinion

@hemangjoshi37a
Copy link

This has been referenced in #33

@starkdmi
Copy link

  1. Install LocalAI - OpenAI compatible server.
  2. Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.
  3. Start LocalAI server locally and run:
OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

@andraz
Copy link

andraz commented Sep 13, 2023

The command above did not work in Anaconda Prompt, but this version did:

(chatdev_conda_env) C:\chatdev>set OPENAI_API_BASE=http://127.0.0.1:5001/v1

(chatdev_conda_env) C:\chatdev>set OPENAI_API_KEY=123456

(chatdev_conda_env) C:\chatdev>python run.py --task "Hello world in python" --name "HelloWorld"
**[Preprocessing]**

**ChatDev Starts** (20230913191808)

**Timestamp**: 20230913191808

**config_path**: C:\chatdev\CompanyConfig\Default\ChatChainConfig.json

**config_phase_path**: C:\chatdev\CompanyConfig\Default\PhaseConfig.json

**config_role_path**: C:\chatdev\CompanyConfig\Default\RoleConfig.json

**task_prompt**: Hello world in python

**project_name**: HelloWorld

**Log File**: C:\chatdev\WareHouse\HelloWorld_DefaultOrganization_20230913191808.log

**ChatDevConfig**:
 ChatEnvConfig.clear_structure: True
ChatEnvConfig.brainstorming: False


**ChatGPTConfig**:
 ChatGPTConfig(temperature=0.2, top_p=1.0, n=1, stream=False, stop=None, max_tokens=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias={}, user='')

I am having a problem using it with local api:

It looks like all that the API returns is 1 token:

Text-generation-webui side:

llm_load_print_meta: model size     = 13.02 B
llm_load_print_meta: general.name   = openassistant_llama2-13b-orca-8k-3319
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =  128.35 MB (+ 1600.00 MB per state)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 11656 MB
...................................................................................................
llama_new_context_with_model: kv self size  = 1600.00 MB
llama_new_context_with_model: compute buffer total size =  191.47 MB
llama_new_context_with_model: VRAM scratch buffer: 190.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
2023-09-13 19:11:27 INFO:Loaded the model in 7.52 seconds.

Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 498 tokens and max_tokens is 15937.

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.23 ms /     1 runs   (    0.23 ms per token,  4424.78 tokens per second)
llama_print_timings: prompt eval time =   955.31 ms /   498 tokens (    1.92 ms per token,   521.30 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   957.88 ms
Output generated in 1.41 seconds (0.00 tokens/s, 0 tokens, context 498, seed 1828391196)
127.0.0.1 - - [13/Sep/2023 19:11:42] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 551 tokens and max_tokens is 15885.
Llama.generate: prefix-match hit

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.16 ms /     1 runs   (    0.16 ms per token,  6410.26 tokens per second)
llama_print_timings: prompt eval time =   835.88 ms /   489 tokens (    1.71 ms per token,   585.01 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   836.73 ms
Output generated in 1.24 seconds (0.00 tokens/s, 0 tokens, context 551, seed 192786861)
127.0.0.1 - - [13/Sep/2023 19:11:46] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 521 tokens and max_tokens is 15907.
Llama.generate: prefix-match hit

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.13 ms /     1 runs   (    0.13 ms per token,  7633.59 tokens per second)
llama_print_timings: prompt eval time =   884.39 ms /   459 tokens (    1.93 ms per token,   519.00 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   885.63 ms
Output generated in 1.26 seconds (0.00 tokens/s, 0 tokens, context 521, seed 1288396660)
127.0.0.1 - - [13/Sep/2023 19:11:53] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 574 tokens and max_tokens is 15854.
Llama.generate: prefix-match hit

ChatDev side



Chief Executive Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 0**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Product Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are responsible for all product-related matters in ChatDev. Usually includes product design, product strategy, product vision, product innovation, project management and product marketing.
Here is a new customer's task: Hello world in python.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]



**[OpenAI_Usage_Info Receive]**
prompt_tokens: 521
completion_tokens: 1
total_tokens: 522


**[OpenAI_Usage_Info Receive]**
prompt_tokens: 574
completion_tokens: 1
total_tokens: 575


Chief Product Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 1**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Executive Officer. Now, we are both working at ChatDev and we share a common interest in collaborating to successfully complete a task assigned by a new customer.
Your main responsibilities include being an active decision-maker on users' demands and other key policy issues, leader, manager, and executor. Your decision-making role involves high-level decisions about policy and strategy; and your communicator role can involve speaking to the organization's management and employees.
Here is a new customer's task: Hello world in python.
To complete the task, I will give you one or more instructions, and you must help me to write a specific solution that appropriately solves the requested instruction based on your expertise and my needs.]



Chief Executive Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 1**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Product Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are responsible for all product-related matters in ChatDev. Usually includes product design, product strategy, product vision, product innovation, project management and product marketing.
Here is a new customer's task: Hello world in python.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]



**[OpenAI_Usage_Info Receive]**
prompt_tokens: 544
completion_tokens: 1
total_tokens: 545

@starkdmi
Copy link

Yeah, the command above was for macOS, no troubles with conda environment here.

@andraz, why don't you increase the context to 4K or 8K tokens? Based on your model name it support context up to 8K tokens.

Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 521 tokens and max_tokens is 15907.

As for one token response I guess it's streaming feature, so you don't need to wait for a full response.

@xkaraman
Copy link

xkaraman commented Sep 20, 2023

Hello there,
I am trying to use llama-2-7B version as described above.

I created a new yaml file with name gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613. Then on model used I downloaded and use one of the hugginface model library `llama-2*.bin' models.

I can successfully run it and receive answers to my questions as part of the returning object via curl but also says that
"usage":"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

When then I try to run chatdev on a simple task ie python run.py --task "Hello world in python" --name "HelloWorld", chatdev prints the start up prompts and then receives no objects from the local llm with continuous empty usage logs

...
Note that we must ONLY discuss the product modality and do not discuss anything else! Once we all have expressed our opinion(s) and agree with the results of the discussion unanimously, any of us must actively terminate the discussion by replying with only one line, which starts with a single word <INFO>, followed by our final product modality without any other words, e.g., "<INFO> PowerPoint".

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0


**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0


**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0

After 3 retries it crashes with the following KeyError.

Traceback (most recent call last):
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/camel/utils.py", line 145, in wrapper
    return func(self, *args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/camel/agents/chat_agent.py", line 200, in step
    response["id"],
KeyError: 'id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/media/**/4TB_DATA/git/ChatDev/run.py", line 111, in <module>
    chat_chain.execute_chain()
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain
    self.execute_step(phase_item)
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 130, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 292, in execute
    self.chatting(chat_env=chat_env,
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 131, in chatting
    assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1)
  File "/media/**/4TB_DATA/git/ChatDev/camel/agents/role_playing.py", line 242, in step
    assistant_response = self.assistant_agent.step(user_msg_rst)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7ff680181ac0 state=finished raised KeyError>]

I have already exported OPENAI_API_BASE and OPENAI_API_KEY to the localhost otherwise it crashed.

What can I do to successfully use the local LLM?

Thanks for any help and sorry if this is the wrong place to ask it!

@GitSimply
Copy link

GitSimply commented Sep 20, 2023

@starkdmi

  1. Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.

What model are you using?

@jacktang
Copy link

  1. Install LocalAI - OpenAI compatible server.
  2. Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.

Hello @starkdmi , can you share the file gpt-3.5-turbo-16k.yaml?

@starkdmi
Copy link

@jacktang, it depends on the model but for example looks like - gpt-3.5-turbo-16k.txt (rename to .yaml) for Vicuna 1.5.

@GitSimply, those are working with many of the GPT tools on my setup: WizardLM, WizardCoder, WizardCoderPy, Wizard-Vicuna, Vicuna, CodeLLaMa.

@Egalitaristen
Copy link

Egalitaristen commented Sep 26, 2023

I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.

Anyway. The basics:

  • Windows 10
  • Following the installation instructions from the readme for steps 1-3 (gitclone, conda, cd, install requirements)

On step 4 do this instead:

set OPENAI_API_BASE=http://localhost:1234/v1

And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.

@sankalp-25
Copy link

sankalp-25 commented Sep 28, 2023

Hey @starkdmi, while using LocalAI

git clone https://github.com/go-skynet/LocalAI
cd LocalAI
git checkout -b build
cp your-model.bin models/
docker compose up -d --pull always
curl http://localhost:8080/v1/models

After doing this in LocalAI, I am directly executing this in ChatDev
OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

and I am getting the following error:

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/camel/utils.py", line 145, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/camel/agents/chat_agent.py", line 191, in step
response = self.model_backend.run(messages=openai_messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/camel/model_backend.py", line 69, in run
response = openai.ChatCompletion.create(*args, **kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line
raise self.handle_error_response(
openai.error.APIError: rpc error: code = Unknown desc = inference failed {"error":{"code":500,"message":"rpc error: code = Unknown desc = inference failed","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = inference failed', 'type': ''}} {'Date': 'Tue, 26 Sep 2023 06:20:10 GMT', 'Content-Type': 'application/json', 'Content-Length': '94'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/ChatDev/run.py", line 111, in
chat_chain.execute_chain()
File "/root/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain
self.execute_step(phase_item)
File "/root/ChatDev/chatdev/chat_chain.py", line 130, in execute_step
self.chat_env = self.phases[phase].execute(self.chat_env,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/chatdev/phase.py", line 292, in execute
self.chatting(chat_env=chat_env,
File "/root/ChatDev/chatdev/utils.py", line 77, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/chatdev/phase.py", line 131, in chatting
assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/camel/agents/role_playing.py", line 242, in step
assistant_response = self.assistant_agent.step(user_msg_rst)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f272c7ca710 state=finished raised APIError>]

how can I fix this?

@starkdmi
Copy link

@sankalp-25, the problem is the local open-ai server which wrongly responds. Do you have a config file for your model in the models/ directory near the .bin file?

It should look like that one so it simulates gtp-3.5 model instead of hosting your-model.

LocalAI on startup will list the models hosted and you should see the correct name (gpt-3.5/4).

@sankalp-25
Copy link

sankalp-25 commented Sep 28, 2023

Hey @starkdmi, I have renamed the .yaml file to gpt-3.5-turbo-16k.yaml and the model file to gpt-3.5-turbo-16k-0613, after which I am doing as follows, and if I am not wrong config file is .yaml which I have renamed from docker-compose.yaml to gpt-3.5-turbo-16k.yaml. If I am wrong, please let me know what is the mistake.

Please check the below log

$docker compose -f gpt-3.5-turbo-16k.yaml up -d --pull always
[+] Running 1/1
✔ api Pulled 2.9s
[+] Running 1/0
✔ Container localai-api-1 Running

$ curl http://localhost:8000/v1/models
{"object":"list","data":[{"id":"gpt-3.5-turbo-16k-0613","object":"model"}]}

after this I am trying to run the following in chatdev

$OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

The error I am getting was given in previous comment

Thank you

@starkdmi
Copy link

@sankalp-25, we could test the model is working using this Python code:

import openai # https://github.com/openai/openai-python#installation

openai.api_key = "sk-dummy"
openai.api_base = "http://127.0.0.1:8000/v1"

chat_completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo-16k-0613",
  messages=[{"role": "user", "content": "Calculate 20 minus 5."}]
)

completion = chat_completion.choices[0].message.content
print(completion) # The result of 20 minus 5 is 15. 

@sankalp-25
Copy link

@starkdmi, what is it when you say config file?
if I am not wrong config file is .yaml which I have renamed from docker-compose.yaml to gpt-3.5-turbo-16k.yaml.
and I only have gpt-3.5-turbo-16k-0613 and gpt-3.5-turbo-16k-0613.tmpl in /models,
when I run the I am code for checking of model, the following is the error

Traceback (most recent call last):
File "/root/FGPT/LocalAI/models/infer.py", line 6, in
chat_completion = openai.ChatCompletion.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line
raise self.handle_error_response(
openai.error.APIError: rpc error: code = Unknown desc = unimplemented {"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = unimplemented', 'type': ''}} {'Date': 'Thu, 28 Sep 2023 11:08:00 GMT', 'Content-Type': 'application/json', 'Content-Length': '91'}

@starkdmi
Copy link

@sankalp-25, wow, the docker-compose.yaml is completely different thing. The docs are here.

The correct content of the file named gpt-3.5-turbo-16k.yaml may look like:

name: gpt-3.5-turbo-16k # or gpt-3.5-turbo-16k-0613

parameters:
  model: vicuna-13b-v1.5-16k.Q5_K_M.gguf
  temperature: 0.2
  top_k: 80
  top_p: 0.7
  max_tokens: 2048
  f16: true

context_size: 16384

template:
  chat: vicuna

f16: true
gpu_layers: 32
mmap: true

@asfandsaleem
Copy link

I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.

Anyway. The basics:

  • Windows 10
  • Following the installation instructions from the readme for steps 1-3 (gitclone, conda, cd, install requirements)

On step 4 do this instead:

set OPENAI_API_BASE=http://localhost:1234/v1

And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.

Correct. You also need one more step.
set OPENAI_API_KEY="xyz"

@travelhawk
Copy link

I tried to use LM Studio as a local OpenAI substitute. It works good, by utilizing the here suggested setup of environment variables.

OPENAI_API_BASE=http://127.0.0.1:1234/v1 OPENAI_API_KEY="xyz" python run.py --task "A drawing app" --name "Draw App"

However, it doesn't run through and terminates with an error that the max tokens are exceeded:

Traceback (most recent call last):
  File "C:\Users\falk\repos\AI\ChatDev\run.py", line 114, in <module>
    chat_chain.execute_chain()
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 163, in execute_chain
    self.execute_step(phase_item)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 133, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 291, in execute
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 165, in chatting
    seminar_conclusion = "<INFO> " + self.self_reflection(task_prompt, role_play_session, phase_name,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 219, in self_reflection
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 136, in chatting
    if isinstance(assistant_response.msg, ChatMessage):
  File "C:\Users\falk\repos\AI\ChatDev\camel\agents\chat_agent.py", line 53, in msg
    raise RuntimeError("error in ChatAgentResponse, info:{}".format(str(self.info)))
RuntimeError: error in ChatAgentResponse, info:{'id': None, 'usage': None, 'termination_reasons': ['max_tokens_exceeded_by_camel'], 'num_tokens': 17171}

For inference I'm using the zephyr-7B-beta. Does anyone know how to fix this or what to do?

@jamiemoller
Copy link

I tried to use LM Studio as a local OpenAI substitute. It works good, by utilizing the here suggested setup of environment variables.

OPENAI_API_BASE=http://127.0.0.1:1234/v1 OPENAI_API_KEY="xyz" python run.py --task "A drawing app" --name "Draw App"

However, it doesn't run through and terminates with an error that the max tokens are exceeded:

Traceback (most recent call last):
  File "C:\Users\falk\repos\AI\ChatDev\run.py", line 114, in <module>
    chat_chain.execute_chain()
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 163, in execute_chain
    self.execute_step(phase_item)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 133, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 291, in execute
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 165, in chatting
    seminar_conclusion = "<INFO> " + self.self_reflection(task_prompt, role_play_session, phase_name,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 219, in self_reflection
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 136, in chatting
    if isinstance(assistant_response.msg, ChatMessage):
  File "C:\Users\falk\repos\AI\ChatDev\camel\agents\chat_agent.py", line 53, in msg
    raise RuntimeError("error in ChatAgentResponse, info:{}".format(str(self.info)))
RuntimeError: error in ChatAgentResponse, info:{'id': None, 'usage': None, 'termination_reasons': ['max_tokens_exceeded_by_camel'], 'num_tokens': 17171}

For inference I'm using the zephyr-7B-beta. Does anyone know how to fix this or what to do?

My first though is that this is a max token problem

@travelhawk
Copy link

Obviously it is. Is it because of the model? How to raise the max tokens?

@sammcj
Copy link

sammcj commented Nov 26, 2023

Looks like there's an open PR to add this - #53

@acbp
Copy link

acbp commented Nov 28, 2023

Is it possible to use ollama

yes using litellm openai-proxy, like that:

litellm --api_base http://localhost:11434 --add_key OPENAI_API_KEY=dummy --drop_params --model ollama/orca2:7b

A proxy server openai-compatible api will run and redirect to ollama,
then run AI-openai-compatible-app, like chatdev:

OPENAI_API_BASE=http://localhost:8000/v1 OPENAI_API_KEY=dummy python3 run.py --task "<task>" --name "<title>"

docs litellm proxy

@BackMountainDevil
Copy link

@xkaraman

tenacity.RetryError: RetryError

have you fix the error? I met same err when I use chatglm3-6b as llm server. And server got some red color logs at ""POST /send_message HTTP/1.1" 404 Not Found" . So I think the code got err because llm server did not respond to /send_message correctly. And the code will try again until max_time.

@davidxll
Copy link

davidxll commented Dec 3, 2023

  1. Install LocalAI - OpenAI compatible server.
  2. Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.
  3. Start LocalAI server locally and run:
OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

This should be added to the wiki or documented somewhere

@godshades
Copy link

can someone guide me how to run on full docker stack
like 1 container for local models
1 container for ChatDev

@tecno14
Copy link

tecno14 commented Dec 16, 2023

To save the base and/or key in the conda environment use this before activate it (or unactive then re active again)

conda env config vars set OPENAI_API_BASE=http://localhost:1234/v1 --name ChatDev_conda_env
conda env config vars set OPENAI_API_KEY=any --name ChatDev_conda_env

@BackMountainDevil
Copy link

BackMountainDevil commented Dec 16, 2023

使用Langchain-Chatchat这个项目,调用本地2000端口

我尝试了你的提议,端口上不出意外是2w,应该不是2k(可能是打错了),我用的也是chatglm3-6b-32k,知识库是BAAI/bge-large-zh,能跑,但奇怪的是响应很慢,不是没有响应,而是过好一会才响应,gpu的80g内存够的,最后花了93mins完成一款无法运行的“Snake game in pure html”

@mroxso
Copy link

mroxso commented Dec 26, 2023

For me it wasn't OPENAI_API_BASE, but BASE_URL.
After setting this, everything works fine with LiteLLM + Ollama

@evmond1
Copy link

evmond1 commented Feb 13, 2024

FYI - its not OPEN_API_BASE. if using anaconda on windows you do SET BASE_URL="http://localhost:1234/v1" and then SET OPEN_API_KEY="not needed" . this is if you're using LMstudio. All working my end using Mistral instruct 7B.

image

@hemangjoshi37a
Copy link

If anyone has ollama integrated with this then please let me know. thanks a lot. happy coding.

@akhil3417
Copy link

how to go about using other services that like Together.ai offers an OpenAI compatible API, how to set host ?

@opencoca
Copy link

If the API is OpenAI compatible you can point at the API endpoint using --api_base as with local models.

@resdig3
Copy link

resdig3 commented Mar 1, 2024

I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.

Anyway. The basics:

  • Windows 10
  • Following the installation instructions from the readme for steps 1-3 (gitclone, conda, cd, install requirements)

On step 4 do this instead:

set OPENAI_API_BASE=http://localhost:1234/v1

And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.

Trying to get this running on a Win10 machine.
I keep getting this error, like it needs a working API key of some sort:

.conda\envs\ChatDev_conda_env\lib\site-packages\openai_base_client.py", line 877, in _request
raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: abc123xyz. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

@vishusinghal
Copy link

FYI - its not OPEN_API_BASE. if using anaconda on windows you do SET BASE_URL="http://localhost:1234/v1" and then SET OPEN_API_KEY="not needed" . this is if you're using LMstudio. All working my end using Mistral instruct 7B.

image

I tried with BASE_URL as well as OPEN_AI_BASE but getting an APIConnectionError

tenacity.RetryError: RetryError[<Future at 0x1cdcf0bce80 state=finished raised APIConnectionError>]

Can you help?

@maramowicz
Copy link

It's not BASE_URL, it's OPENAI_BASE_URL! But I also saw on begin that OPENAI_API_BASE is also used (maybe I'm wrong here but it works for me), so the correct command is OPENAI_API_BASE=http://localhost OPENAI_BASE_URL=http://localhost OPENAI_API_KEY="anything" python run.py --task "Snake game in pure html" --name "WebSnake"

@hemangjoshi37a
Copy link

hi all the contributors ,
I have found a github app that helps solving issues using AI and the approach to this is very interesting so i am sharing this , please anyone add this to this repo . : https://github.com/apps/sweep-ai
not sponsored or anything but i found it helpful so.

@BradKML
Copy link

BradKML commented Jul 24, 2024

@acbp has anyone wrote documentation specifically for things like Ollama (or LocalAI, Xorbit, or OpenLLM) to work with ChatDev? LiteLLM maybe?

@hemangjoshi37a
Copy link

If anyone has achieved or can suggest for this . In this we want to continuously index the updated code base in the HippoRAG index and query on the updated index and then make code changes and continuously do so. Here consider that we want to do this offline with ollama type models only and dont want to use OpenAI or Claude .

If anyone can suggest how can i do this ?

@hgftrdw45ud67is8o89
Copy link

hgftrdw45ud67is8o89 commented Jul 26, 2024

anyone had success with llama.cpp?

I got tenacity.RetryError: RetryError[<Future at 0x2595e8f1e10 state=finished raised NotFoundError>]

and

raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'code': 404, 'message': 'File Not Found', 'type': 'not_found_error'}}

@syedshahzebhasnain
Copy link

syedshahzebhasnain commented Jul 30, 2024

On Mac( m3) i used this with local install ( Anaconda)

BASE_URL=http://localhost:1234/v1 OPENAI_API_KEY=dummy python3 run.py --task "< task >" --name "<title>"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests