In the first part we have set up a very simple but efficient voice assistant which can listen to your commands and react on them. It starts up as soon as internet and audio appears in system. And yes, this is the limitation of our very first implementation - it needs internet as it uses Google TTS and STT engines.
But this time, we'll try to replace them with something which can be executed 100% offline. And I have couple of candidates, apart from the ones we talked before
Mozilla Deep Speech
The most recent version of DeepSpeech right now is v.0.9.3 and it requires Python 3.7. whereas we're running 3.9 in our Debian. I found it out, when I attempted to install the deepspeech-0.9.3-cp37-cp37m-linux_aarch64.whl and it was giving me this confusing message:
As I figured it out, I should have checked --verbose flag and then it became more apparent that it attempted to install things under 3.7 directories, whereas current version of python was 3.9. Python is such a mess.
As Debian repo doesn't have 3.7 I decided to build it myself. It's done pretty straightforward: download -> configure -> make -> make install. I just crosschecked the very first hit I found on google and it went just fine.
To be continued
https://deepspeech.readthedocs.io/en/v0.9.3/
coqui-ai TTS and SST
https://tts.readthedocs.io/en/latest/
Coqui-Ai TTS - Text to speech
I was playing with TTS (text-to-speech) using my Intel Atom Z8350 tablet - Asus Transformer Book T102HA, as I'm on vacation right now, away from my little SBC device at home. It was almost a smooth ride. Saying "almost", because in Windows 10 it took me a while to understand two main dependencies:
1) you need to have python 3.9, not 3.10
2) you need to install windows 10 sdk and build tools, using "Visual Studio Installer". And that thing brings like 5 additional Gb of various DLLs, header files and tools to your Windows 10. Kind a crazy.
Eventually you can do:
pip3 install tts
And check this out:
tts --text "This is another test using tacotron2" --out_path speech_tacotron2.wav --model_name "tts_models/en/ek1/tacotron2"
tts --text "This is another test using tacotron2-DDC" --out_path speech_tacotron2-DDC.wav --model_name "tts_models/en/ljspeech/tacotron2-DDC"
tts --text "This is another test using tacotron2-DDC_ph" --out_path speech_tacotron2-DDC_ph.wav --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph"
tts --text "This is another test using glow-tts" --out_path speech_glow.wav --model_name "tts_models/en/ljspeech/glow-tts"
tts --text "This is another test using speedy-speech" --out_path speech_speedy.wav --model_name "tts_models/en/ljspeech/speedy-speech"
tts --text "This is another test using tacotron2-DCA" --out_path speech_tacotron2.wav --model_name "tts_models/en/ljspeech/tacotron2-DCA"
tts --text "This is another test using vits" --out_path speech_vits.wav --model_name "tts_models/en/ljspeech/vits"
tts --text "This is another test using fast_pitch" --out_path speech_fast_.wav --model_name "tts_models/en/ljspeech/fast_pitch"
tts --text "This is another test using vits" --out_path speech_vits.wav --model_name "tts_models/en/vctk/vits"
tts --text "This is another test using fast_pitch" --out_path speech_fast_.wav --model_name "tts_models/en/vctk/fast_pitch"
tts --text "This is another test using tacotron-DDC" --out_path speech_tacotron.wav --model_name "tts_models/en/sam/tacotron-DDC"
tts --text "This is another test using capacitron-t2-c50" --out_path speech_capacitron-t2-c50.wav --model_name "tts_models/en/blizzard2013/capacitron-t2-c50"
tts --text "This is another test using capacitron-t2-c150" --out_path speech_capacitron-t2-c150.wav --model_name "tts_models/en/blizzard2013/capacitron-t2-c150"
Tensor Flow TTS
https://github.com/TensorSpeech/TensorFlowTTS
AprirASR TTS
https://github.com/abb128/april-asr
Used in LiveCaptions, a wonderful application for Gnome to generate English captions from the played audio
OpenAI Whisper TTS
https://github.com/openai/whisper
Whisper.CPP TTS
A re-implementation of OpenAI's Whisper but on C++, which is leaner to resources
https://github.com/ggerganov/whisper.cpp
Mimic3 TTS
https://github.com/MycroftAI/mimic3
No comments:
Post a Comment