Speech Recognition and Text-to-Speech using ML. Part 1: Build your own Alexa / Siri

As you know, one of the project goals is to have your own "Alexa" or "Siri" replacement in our little SBC. A lot of work was already done by a number of talented folks, so we won't shy to just take results of their work and reuse it in our favor.

But at the beginning, let's start with some very basic stuff here. Like making sure we can record and playback audio, then we'll play with couple of entirely offline text-to-speech software. 

Check some basic stuff

arecord / aplay

These two guys worked for me out of the box. The recording is happening on the internal microphone (the quality is so-so), playback via the currently selected device in PulseAudio - in my case through TV connected with HDMI

arecord -d 5 -r 48000 test.wav
aplay test.wav

 

Text-to-speech

espeak

Espeak was around for a long while. It sounds very robotics, no matter what kind of voice you choose.

apt install espeak

espeak "Hello there!

espeak --voices

espeak -v en-gb "Text will be converted to speech."

Festival

Festival is also a long standing player. And it also sounds very robotic, by default, but we can install some additional "language packages" which make it bit better. And there're also commercial packages existing on a market, but they're not as good as their pricetag 

apt install festival festvox-ru festvox-en1 festvox-us1 festvox-us2 festvox-us3 festlex-cmu festlex-poslex

Below commands must be executed under unprivileged user (non-root): 

echo "This is the best as it can get" | festival --tts

echo '(SayText "
This is the best as it can get")' | festival --pipe

I can see it clearly, when the software was designed by morons. Speaking of Festival you cannot easily switch between different voice packs, you have to update configs or use interactive mode:

festival
(voice.list)
(voice_us1_mbrola)
(SayText "This is now a different voice")
(voice_us2_mbrola)
(SayText "This is now a different voice")
(voice_us3_mbrola)
(SayText "This is now a different voice")
(voice_en1_mbrola)
(SayText "This is now a different voice")

If you want some voice to become a default one, make a change to /usr/share/festival/siteinit.scm search for a line voice_default

Look this video where a nice guy explaining how to download more voices - https://www.youtube.com/watch?v=Ju_X11JyRSE

I checked the website he mentioned, all voice packs are old as dinosaurs' poo, so to me it doesn't really make sense to spend on it any additional time

pico2wave 

I got tired of those robotic voices. So I just looked for couple of youtube videos how it sounds and wasn't really impressed. So skipping it.

Google text-to-speech engine

If you're on Debian - don't install it from repo. The version from repo is outdated and doesn't work anymore. Install it via pip, it will take much freshier version:

pip install gTTS
 
Let's play with it now:
~/.local/bin/gtts-cli 'This is the test from Google text to speech services' --output /tmp/test.mp3
ffplay -nodisp -autoexit /tmp/test.mp3

Well, the quality is just increadable. I'll probably go online :) but need to understand, this Google can close this backdoor at anytime, so if we're building something we don't want to support all the time, we'll need to have a fallback option, line one of the above. So far I just made a simple and stupid wrapper, we'll extend it later on, if needed, to analyze exit code from gtts-cli and switch to something different, like espeak:

cat /usr/local/bin/sayit
#!/bin/sh
if [ -z "$1" ]
then
  echo "Error: no input parameter is given"
  exit 0
fi
tfile="$(mktemp /tmp/tts.XXXXXXXXX.mp3)" || exit 1
~pi/.local/bin/gtts-cli "$1" --output $tfile
ffplay -nodisp -autoexit $tfile
rm $tfile

Speech recognition

Let's start with something simple like this project - https://github.com/ProgrammingHero1/romantic-alexa
 

I've made a small fork of it - https://github.com/kha84/pgttsjaisrar - with fixing some obvious issues and swapped the ugly text-to-speech python lib with our lovely above shell file, made it to run as systemd service and, hell, it works just well out of box. 


Stay tuned for the part 2, where we going to make it to work offline - https://orange-pi-4-lts.blogspot.com/2022/09/speech-recognition-and-text-to-speech_3.html

No comments:

Post a Comment

Start here

Disable Firefox from updating itself and flash those annoying "Restart to Keep Using Firefox" messages on you

I recently switched from Brave to Firefox. Just because Brave appeared to be some proprietary shit, even though they're masking themselv...