Replies: 1 comment
-
What ChatGPT misses is that it is much slower, but this is what we do when you don't assign a layer to a GPU's memory. I don't think the answer it gave you is a good one. We keep all the intermediate data on your GPU itself since that is the fatest and then have the parts of the model that didn't fit in shared memory so it is accessible there. So it can already be done, and its what happens if you put things on tbe CPU memory. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I will preface this by saying I know nothing about about CUDA programming and little to nothing about python or what makes Kobald AI tick. Most of my knowledge comes from ChatGPT tutoring me. But I noticed something about my system, and this probably applies to others as well.
I have a RTX 3070. It has 8GB of VRAM, but, according to ChatGPT, this card also has an additional 16GB of "shared memory" giving a theoretical total of 24GB. According to ChatGPT, this memory is accessible and can be utilized to increase performance of chatbots like KoboldAI.
Is it possible to modify KoboldAI to access this additional memory? If it is possible, performance could be increased (at least on my system) by a factor of 3.
(This is of course assuming KoboldAI doesn't already do this... Whenever I get memory errors it seems to imply 8GB is the maximum so I am assuming it isn't.)
A sample from ChatGPT regarding this idea:
Beta Was this translation helpful? Give feedback.
All reactions