-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative & new OSS-TTS-engines #627
Comments
Thank you very much for this study and summary! |
I would vote using Mimic 3, it sounds the best to me, and supports a decent number of languages. And eSpeakNG is a very decent choice too, seeing how many languages it supports. |
But if mimic3 is not much more that eSpeakNG (specifically
It supports Polish. 😉 More seriously: It all is about the quality of the synthesised speech in relation to the resources (CPU, RAM, I/O, mass storage) used. |
One more suggested by piggz: https://github.com/coqui-ai/TTS . looks to be distributed via pip, at least on PC |
New contenders: RHVoice & eSpeakNG
I remembered the side-tracked discussion about alternative and maintained OSS-TTS-engines, when I came across these two TTS engines at F-Droid ([1], [2]):
Both are maintained, but eSpeakNG may output a low quality voice, as most improvements relative to the original eSpeak ([1], [2], [3]) did not address the voice engine proper.
OTOH, eSpeakNG now also has Python bindings contributed, and the eSpeakNG-based Mimic3 listening samples are fine. eSpeakNG is well documented: [1] & [2]
In comparison to eSpeak (and maybe also eSpeakNG), RHVoice seems to provide a higher-quality voice-synthesis and a set of languages which often lack good quality speech-synthesis. RHVoice is documented in multiple languages.
flite
Then I pursued to look up descendents of CMU's flite (Carnegie Mellon University's Festival-lite vox (voice encoder)), so I first started looking for flite proper: latest source code v2.1 - v2.3+, original source code (alternative site) v1.0 - v2.1.0, research context and Festival vox documentation, original webpage with slide deck and scientific paper (alternative site, direct link to paper as HTML pages and Postscript file) Side note: Interesting to still see occasional commits (as of 2022-08-23, the latest one on 2022-05-16) and releases (2.2 on 2020-08-13 and 2.3 in "March 2022", but untagged, hence better consider the master branch as of 2022-05-16 as "flite 2.3+" release) for the original flite.
mimic 1, 2 & 3
Then I looked at the well known flite-based mimic, of which we knew that its first incarnation had ceased development with its release v1.3.0.1 (also the state of the master branch since 2020-03-06) with a few additional bug-fixes in the development branch, works well (solely for English) and is well documented ([1] & [2]).
@rinigus had hopes for mimic2 (see also), which died quickly in 2020, despite having the cool ability to deploy ones own voice.
Furthermore, it was "designed to run in the cloud", which likely means that it uses a lot of resources when the server and client component are running on a single machine.
Mimic2 is also very well documented: [1] & [2]
Mimic-3 is now Mycrofts's focus and it seems to be developed well, but only had a single, proper release yet (v0.2.3) and uses
libespeak-ng1
.It is also supported by the cool Mimic Recording Studio to record one's own voice.
Mimic-3 is nicely documented, too: [1] and [2]
Still it needs to be analysed which functions mimic-3 provides over a direct use of
libespeak-ng
and evaluated if these are worth the additional dependency (technically and WRT sustainability).Interestingly Mycroft's top-level mimic documentation-page provides listening samples on which the
libespeak-ng1
-based mimic3 output sounds quite well.FreeTTS
FreeTTS is also flite-based, but written in Java. Furthermore its la(te)st release is v1.2.2 on 2009-03-09 and its la(te)st commit to SVN-trunk happened on 2012-05-08. Thus not worth to pursue.
NanoTTS, a command-line front-end for PicoTTS
NanoTTS ceased development in 2019, while the la(te)st commit to PicoTTS happened on 2018-02-14 (there are many downstream packages, e.g., this one). As mimic1, NanoTTS and PicoTTS are clearly EOLed, but working fine (in my experience), plus support more languages than mimic1.
TL;DR
RHVoice (documentation) seems to be worth being evaluated for integration in Pure Maps and so does eSpeakNG /
libespeak-ng
(documentation: [1] & [2]) and / or mimic-3 (documentation [1], [2], [3]), in order to provide maintained and improved TTS synthesis compared to the extant choices mimic1 and NanoTTS (includes PicoTTS). As these legacy components are working fine (well, mimic1 never for me, but for many others), there is currently no need to rush the evaluation and potential adaption of RHVoice and / or eSpeakNG /libespeak-ng
/ mimic-3.The text was updated successfully, but these errors were encountered: