Replies: 2 comments 2 replies
-
We already have an option for that, its called |
Beta Was this translation helpful? Give feedback.
1 reply
-
Likewise, it appears that os.nice() is non portable and does not work on windows or OSX. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I just had some tests and I was able to massively increase the speed of generation by increasing the threads number.
I have an i7-12700H, with 14 cores and 20 logical processors. So by the rule (of logical processors / 2 - 1) I was not using 5 physical cores.
Also the number of threads seems to increase massively the speed of BLAS when using CLBLAST. My GPU don't get 100% processing utilization at the default thread number.
Would not be better to instead of limiting the thread number bellow the logical processors number, just decreasing the process priority, using "os.nice" and "psutil"?
Or could I define a different thread number, and default thread count, for BLAS than for the token generation?
Do you have some ideias of tests I can make to enhance this feature?
Beta Was this translation helpful? Give feedback.
All reactions