Releases: Macoron/whisper.unity
1.3.2
Minor release. Fixed Metal support on MacOS.
What's Changed
- Update version string in package.json to 1.3.1 by @from2001 in #86
- Use the new WHISPER_METAL_EMBED_LIBRARY flag to embed the metal lib by @injeniero in #93
- Updated MacOS binaries (fix Metal support) by @Macoron in #94
New Contributors
- @from2001 made their first contribution in #86
- @injeniero made their first contribution in #93
Full Changelog: 1.3.1...1.3.2
1.3.1
New minor release. Includes update of whisper.cpp to 1.5.5 and bug fixes.
What's Changed
- Fixed out of bounds exception during resampling by @Macoron in #74
- Add visionOS support by @Macoron in #75
- Added missing Accelerate framework by @Macoron in #76
- Update README.md with VisionOS support by @yosun in #77
- Updated whisper.cpp to 1.5.5 by @Macoron in #84
New Contributors
Full Changelog: 1.3.0...1.3.1
1.3.0 - GPU Support
This release introduce whisper.cpp update to 1.5.1, GPU inference support and other minor improvements.
Whisper.cpp updated to 1.5.1
whisper.cpp 1.5.1 got a lot of improvements and bug fixes including better GPU usage.
Check original release notes for more information.
GPU Support
Whisper now supports GPU acceleration. This can drastically improve performance for some hardware.
Model | CPU | CUDA |
---|---|---|
tiny | 1188 ms | 185 ms |
small | 8992 ms | 517 ms |
large-v2 | 60325 ms | 1946 ms |
Tests of "jfk.wav" transcribing on Windows with Intel Core i5-12400F and Nvidia Geforce RTX 2070 Super.
Model | CPU | Metal |
---|---|---|
tiny | 1113 ms | 189 ms |
small | 6319 ms | 860 ms |
large-v2 | 40608 ms | 3888 ms |
Tests of "jfk.wav" transcribing on Apple M1 Pro.
For Windows and Linux you would need Nvidia GPU and installed CUDA Toolkit (tested with 12.2.0). Unity project compiled with enabled CUDA expects your end-users to have Nvidia GPU and CUDA libraries. Trying to run build without it will result error.
For MacOS you would need ARM CPU, like M1 or newer. iOS Metal inference isn't yet supported. In case of Intel or older hardware, whisper.cpp should fallback to CPU inference.
To activate GPU inference, go to Project Settings => Whisper => Enable CUDA or Enable Metal. For more information, check README.
Other
Ubuntu libs now compiled on Ubuntu 20.04. This might cause problems with Ubuntu 18.04. If you need support for earlier versions of Ubuntu or other distros, consider recompiling libs from source.
New loop mode for microphone was added. It creates a new endless non-stopping stream using Unity build-in circular microphone loop. This is very useful for whisper streaming transcription. To activate it - set Loop
in MicrophoneRecord
to "true".
What's Changed
- Endless loop microphone and memory leak fix by @Macoron in #55
- Updated whisper.cpp to 1.5.0 by @Macoron in #60
- Add CUDA support for Windows by @Macoron in #61
- Add CUDA support for Linux by @Macoron in #63
- Metal support for MacOS by @Macoron in #64
- Updated whisper.cpp to 1.5.1 by @Macoron in #65
Full Changelog: 1.2.1...1.3.0
1.2.1
This release introduces VAD and some other minor improvements.
Voice Activity Detection (VAD)
Voice Activity Detection(VAD) was added to this project. It allows you to check if current audio has any speech detected. For example, you can finish microphone input when user stopped speaking.
output.mp4
Implementation of the VAD is very basic. It is direct port of energy-based VAD from whisper.cpp. Don't expect it to be very robust, but as a proof of concept it should work fine.
VAD Streaming
Now streaming supports VAD. This should drastically reduce hallucinations that was caused by silent audio regions.
output_novad.mp4
ggml.base.en, VAD disabled
output_vad.mp4
ggml.base.en, VAD enabled
What's Changed
- Added VAD and VAD Stop by @Macoron in #44
- Better logging by @Macoron in #48
- VAD for streaming by @Macoron in #49
- New stream events and more documentation by @Macoron in #53
Full Changelog: 1.2.0...1.2.1
1.2.0
New major release with a lot of changes.
whisper.cpp updated to 1.4.2
While 1.4.2 is technically still in beta, it was available for several month and seems to be working stable. The quality of transcription shouldn't have changed, however some results looks different comparing to previous versions. If this is critical for you, consider using previous releases.
Prompting
Whisper.unity now supports prompting. Prompting helps you to "guide" transcription style, names or specific terminology. It isn't as powerful as prompting LLM, but you can get really interesting results with it.
Streaming
output.mp4
First version of transcription streaming was added. Now transcription will be updating in real-time, using microphone or audio stream. This is mostly direct port of original whisper.cpp demo except VAD.
What's Changed
- Update whisper.cpp to 1.4.2 by @Macoron in #30
- Add prompting support by @SharafeevRavil in #25
- Fixed string conversion error by @Macoron in #34
- Add progress callback by @Macoron in #35
- setter for modelPath by @achimmihca in #37
- Quick-fix of il2cpp by @Macoron in #38
- Sliding window streaming support by @Macoron in #40
- Fixed some initialize by @Macoron in #41
- Samples cleanup by @Macoron in #43
New Contributors
- @achimmihca made their first contribution in #37
Full Changelog: 1.1.1...1.2.0
1.1.1
Minor release. Add prebuild Linux binaries and Github Actions tests/builds.
What's Changed
- Linux support by @Macoron in #21
- Add github actions for test runner by @Macoron in #24
- CI for build whisper.cpp libraries by @Macoron in #27
- Fix unity test runner by @Macoron in #28
Full Changelog: 1.1.0...1.1.1
1.1.0
This release adds timestamps and confidence data for segments and tokens. It changes signature of OnNewSegment
event and WhisperResult
class, so make sure to update your code if you used them.
What's Changed
Demos
Segments timestamps prediction
subtitles.mp4
whisper.tiny
in subtitles demo, color shows confidence level for each token
Full Changelog: 1.0.3...1.1.0
1.0.3
What's Changed
- Language API by @Macoron in #7
- Input selector for microphone demo (+some refactoring) (#11) by @SharafeevRavil in #12
- Support for Unity 2019.4 and newer by @Macoron in #15
- Set prepare iOS for recording by @Macoron in #16
New Contributors
- @SharafeevRavil made their first contribution in #12
Language detection example
Full Changelog: 1.0.2...1.0.3