Minimal requirements for smooth functioning #3

jedreky · 2023-12-29T10:46:26Z

Hi, I tried running the voice chat on my home computer and finally I ended up with this:

llm-1    | {"timestamp":"2023-12-29T10:38:56.854053Z","level":"WARN","fields":{"message":"Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2\n"},"target":"text_generation_launcher"}
llm-1    | {"timestamp":"2023-12-29T10:38:56.868376Z","level":"WARN","fields":{"message":"Could not import Mistral model: Mistral model requires flash attn v2\n"},"target":"text_generation_launcher"}
llm-1    | {"timestamp":"2023-12-29T10:38:57.633639Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n  File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n    sys.exit(app())\n  File \"/opt/conda/lib/python3.9/site-packages/typer/main.py\", line 311, in __call__\n    return get_command(self)(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1157, in __call__\n    return self.main(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/typer/core.py\", line 778, in main\n    return _main(\n  File \"/opt/conda/lib/python3.9/site-packages/typer/core.py\", line 216, in _main\n    rv = self.invoke(ctx)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1688, in invoke\n    return _process_result(sub_ctx.command.invoke(sub_ctx))\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1434, in invoke\n    return ctx.invoke(self.callback, **ctx.params)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 783, in invoke\n    return __callback(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/typer/main.py\", line 683, in wrapper\n    return callback(**use_params)  # type: ignore\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py\", line 83, in serve\n    server.serve(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 207, in serve\n    asyncio.run(\n  File \"/opt/conda/lib/python3.9/asyncio/runners.py\", line 44, in run\n    return loop.run_until_complete(main)\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 634, in run_until_complete\n    self.run_forever()\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 601, in run_forever\n    self._run_once()\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 1905, in _run_once\n    handle._run()\n  File \"/opt/conda/lib/python3.9/asyncio/events.py\", line 80, in _run\n    self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 159, in serve_inner\n    model = get_model(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py\", line 259, in get_model\n    raise NotImplementedError(\"Mistral model requires flash attention v2\")\nNotImplementedError: Mistral model requires flash attention v2\n"},"target":"text_generation_launcher"}
llm-1    | {"timestamp":"2023-12-29T10:38:57.962449Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\nTraceback (most recent call last):\n\n  File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n    sys.exit(app())\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py\", line 83, in serve\n    server.serve(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 207, in serve\n    asyncio.run(\n\n  File \"/opt/conda/lib/python3.9/asyncio/runners.py\", line 44, in run\n    return loop.run_until_complete(main)\n\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 647, in run_until_complete\n    return future.result()\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 159, in serve_inner\n    model = get_model(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py\", line 259, in get_model\n    raise NotImplementedError(\"Mistral model requires flash attention v2\")\n\nNotImplementedError: Mistral model requires flash attention v2\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}

So I guess my GPU is too old... So here comes the question: what kind of hardware do I need for this voice chat to work smoothly? What have you tested it on?

Any help will be appreciated :)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimal requirements for smooth functioning #3

Minimal requirements for smooth functioning #3

jedreky commented Dec 29, 2023 •

edited

Loading

Minimal requirements for smooth functioning #3

Minimal requirements for smooth functioning #3

Comments

jedreky commented Dec 29, 2023 • edited Loading

jedreky commented Dec 29, 2023 •

edited

Loading