Building up Google Photos replacement using your hardware

If you think long enough, you'll figure that there are number of components you'll need.

But before we go into all these details further, I need to share my view of what I think should be the best way of organizing our media library. The most important thing is, it should be using EXIF tags as a primary source of metadata for organization.

Why EXIF tags?

Simply because they could be or already are written inside of your media files - both photos and videos. So you won't be depending on some specific media library application, which is storing all your organization entities (what files belong to what events / dates / folders) somewhere deep inside of its own proprietary database. We want a flexibility as we don't want to be trapped into vendor-locked-in situation, so we need to ensure we will be able to migrate from one app to another. In future, when the time comes.

Speaking of EXIF tags here, I'm not a big fan of getting deep inside of things like ISO, exposure, saturation, all that camera-technical shit. I'm not that kind of crazy photographer person, I don't care about all that. But what I do need is to be able to mark a bunch of photos with some human-readable descriptive tag like "2022.09 - Going far east". So I can later see, what tags do I have in my library and look only on photos having that tag.

We can, for sure, organize our own library using good old folders approach. Like to store together photos/videos belonging to the same event. But that approach alone lacks flexibility, if we want the same photo/video to appear in more than just a single "folder". Or if we want to find all photos/videos made in a particular place. Or if we want to find all photos, we tagged as "retro cars" or "skiing in mountains". So folders might be nice for physical storage organization, but not as a way to access your media library.

Mobile app to view / modify media

First of all we need a photo gallery application on our mobile phone, as the most shots we're doing using it. We need to navigate through our photos/videos and modify them as needed.

Requirements:

- modify dates of photos / videos
- modify GPS locations for photos / videos
- assign/modify custom EXIF tags for photos / videos
- recognize faces and objects in photos and assign tags for them automatically
- explore your media library on your phone, utilizing all the above - dates, GPS location, custom text tags
- built-in map, so we could use it for navigation
- bulk editing features (to assign GPS, date and text tags to a bunch of items)
- built-in video player
- showing thumbnails for video

The best app I found is an open-source Aves. It's almost perfect, just missing support for modifying dates / tags for videos, but developer is there, he's super responsive and claiming this feature is already planned. I raised couple of small bugs and features requests and they were all processed. I do admire that guy. And he is among the minority (unfortunately) who really understand it, how to make proper user-friendly UI.

The next best thing might have been a proprietary app called Piktures, but it has an idiotic bug they don't recognize. And it's a proprietary shit. And it's missing half of features Aves have.

There's also a proprietary Android app called "F-Stop" but it's buggy as hell and UI is made for goblins. And the free version cannot do much.

So really, there's not many alternatives to Aves, so we stick to it. Thibault Deckers, mate, you're the best!

A way to sync our media across devices

Google Photo mobile app does that for you, it's uploading photos from the device' internal storage to their own servers. So we need a way to do the same ourselves, as we're building a Google Photo replacement.

Ideally you want to have all your devices to be synced with each other, like if you shoot a new photo on your phone, it should appear on your PC and on your tablet. If you modified that photo later on your PC, like you set proper GPS coordinates your phone "forgot" to set, this photo should be synced back to your phone and tablet. And all of that should work over the internet securely, as you might be away from home WiFi. This is called two-way sync.

In my previous post I explained how this can be achieved by using Tor and apps like FolderSync. Tor we need here just to bypass NAT, if our home PC is running behind ISP cgNAT and our own router's NAT. If you have a white IP address at home (or IPv6), you might not need it.

But there's also an open source app called SyncThing. You can also build a two-way synchronization with it and developers are claiming it doesn't require anything specific to bypass NAT, if you want to sync over internet. If you're windows user, give it a shot with SyncTrayzor, which is nice GUI wrapper for SyncThing. The setup is quick and easy, so you can share any folders from your Android to be permanently in sync with folders on your PC.

Desktop app to view / modify your media library

This is something, that Google Photo doesn't have. Google is only offering you a Web application. But we want to be able to browse our photo library (which is kept synced to our PC and all other devices as explained in the previous section) on a PC the very same convenient way how we do that on our phone with using Aves.

Requirements for this app are the same as for the phone app I listed above.

Given we're on Linux we can benefit from a variety of open source apps. I tested lot of them and the only I can suggest is DigiKam, so we take it.

Web application to access your photos from remote PC and share then

This is the last piece in puzzle. And it's also optional and might be needed:

- if you're planning to share your photos by giving readonly access to specific albums to your friends & family (by just giving them some URL)

- if you're planning to view your whole media library while being away from your home network

XXXX

Ultimate guide to Linux Photo / Media managers

gThumb 😉

https://gitlab.gnome.org/GNOME/gthumb

(+) UI is good. Minimalistic and sleek.
(+) It recognized my tags on photos I set in Aves
(+) I was able to create "virtual folders", as they call it, based on those tags - with all my photos in it. So it does support a good navigation based on tags
(+) It does support bulk tags assignment
(+) it does support proper sorting, based on Exif information
(+) it does have a mini-map which shows current photo location

(*) It has built-in videoplayer, but it doesn't quite work for me, so I need to investigate
(*) Video thumbnails are not shown, need to investigate it

(-) Tags on video files are not considered - #249
(-) Doesn't have any support for faces recognition
(-) You cannot change GPS location for photos
(-) You cannot use map to navigate through your photos

I tested all that using backported version 3.12.2:
apt install gthumb=3:3.12.2-2~bpo11+1 gthumb-data=3:3.12.2-2~bpo11+1

Initially this package was missing "map view" plugin. It looked like it was just forgotten to be enabled by not providing a library during package creation. I asked for a clarification from a guy, who supports this package and Debian and he fixed that. So we have now version 3:3.12.2-3~bpo11+1

deepin-album

https://github.com/linuxdeepin/deepin-album

A part of Chinese GNU/Linux distribution called Deepin.
It doesn't appear in Debian 11 Stable, only in Testing.
And, as always, the version there is bit behind to the current stable version from developer.

When I attempted to see what will be an effect of taking that package from testing, I was amazed by the number of packages that were about to be upgraded in my system. So I'd rather search for some alternative ways to get it installed. Neither flathub,

shotwell 😉

https://wiki.gnome.org/Apps/Shotwell

Looks very promising. It indexes folders you'll tell him to and keeps that index in sqlite file of ~/.local/share/shotwell/data/photo.db
Plus it also generates thumbnails to ~/.cache/showtell

Upon the startup it asked to process all photos from ~/Pictures, and it took it few minutes to process my library. It immediately picked up all the tags and you can browse images by dates / events and tags.

(+) very nice UI, like iPhoto
(+) read tags from photos, can navigate using tags
(+) automatically create "events" - virtual folders you can use for navigation
(+) reads photos geodata
(+) reads face tags
(+) can create face tags manually (0.30.16+)

(-) there's no map - no way to navigate photos using map
(-) you cannot modify tags and GPS on videos
(-) thumbnails for videos are not shown
(-) it doesn't have built-in video player
(-) doesn't show dates/GPS/tags for video files at all
(-) doesn't have faces recognition
(-) I also noticed it extensively writes back to files with moving "last updated" file attribute. Even if it's not being asked. Like upon the most recent start up, it went and updated a lot of my files, just because it found that these files got some additional tags I set in the other program. To me it's a bug.

It looks like it got some additional features in the latest version but compiling it manually is a pain. Flatpak has version 0.30.16 but it was far from ideal and it was missing to read GPS from some of my photos properly, while the version from Debian repo 0.30.11 did that just fine.
Update: Flatpak got 0.30.17 version, and it's slightly better.

KPhotoAlbum 😖

https://www.kphotoalbum.org/user-support/3minutetour/

(*) The UI is very "nontraditional" to say the least. You forced to navigate your library in a step-by-step manner, by applying a filter on a top of filter, with using some fullscreen "Pick a filter" window.

(*) The initial indexing process is taking ages and running on all CPU cores. I think guys could have looked up how the same is done in ShotWell. Next runs are faster, but still it takes too long

(*) it shits a lot by leaving its internal directories all across the places. It creates whatever .comments directories everywhere in the folders with pictures. It creates a lot of .hidden directories and files in a root folder you set it to scan. I don't like this kind of abusive behavior inside of my media library

(-) kphotoalbum failed to recognize my custom description EXIF tags
(-) it has no ability to navigate by those EXIF tags
(-) it doesn't write updates you've made to tags down to media files

(?) haven't tested if it can update geodata in media files, but most probably it cannot. Looks like whatever changes you're doing are done outside

(+) it can show photos on map
(+) it has built-in video player
(+) it looks like it has some Face tagging features, but they're not very automatic, so you have to process all your images one by one

Geequi 😏

https://www.geeqie.org/

https://github.com/BestImageViewer/geeqie#features

Quite an advanced application. It's closer Digikam than any other above apps. It has a decent number of configurable options, but UI is full of craziness. Not that KPhotoAlbum craziness, but it's definitely not an Aves or iPhoto.

An interesting note - it looks like it doesn't use any kind of database to index your media. Such approach has both pros and cons. Main con is - when you get into some folder with a lot of items, it takes a while for Geequi to load everything. And you won't be able to find any photos across all your folders.

== version 1:1.6-9+deb11u1 (from debian repo) ==

(+) generates thumbnails for video
(-) doesn't have built-in video player
(+) recognized tags on photos
(-) but doesn't show all of them in a single place, so no tag-based navigation
(+) recognized GPS from photo and shows them on map
(-) you cannot pick a photo / number of photos on map to see them closer. Map navigation is very limited
(-) doesn't recognize dates, tags and GPS from video - raised #1034
(+) you can set/update GPS coordinates for photos
(-) crashes a lot
(-) searching is over-complicated

After playing with it for half an hour I found below bugs:
1) Preferences -> keywords -> search -> fill the list with duplicates
2) Cannot affect the starting folder. App always starts from home folder
3) Pressing "F" to go into fullscreen shows just black screen
4) I do have a sorting applied by "Exif date original". When I enter some folder with large amount of items, for the first time during the session, the very first file gets selected, then all metadata is getting loaded, sorting being applied and the file I was standing on appears somewhere in the random place, not the first one. This is somewhat expected from the logic point of view, but very inconvenient. How it should be done instead is:
- the file picking pane should be disabled with some refresh animation being shown while all the images are getting loaded
- not to set the focus on the very first item which loads before all metadata is being processed from that folder

I decided to give it a try with more recent version. Flathub has 1.7.3 so I installed it.

== 1.7.3. from flathub ==

(+) crashes less than 1.6.9
(-) missing mini-map for GPS data (1.6.9 had it) - #996 has workaround
(-) missing thumbnails for video, cannot play video
(-) has the same bugs as listed explained above, except of black full screen

== 2.0.1 (latest version on github) ==

I haven't tested the latest version (2.0.1), as they don't publish appImage for aarch64 - raised #1035 for that. Will probably return back to it one day and compile it from sources. But so far I'm moving on.

Geotagging ☝

https://github.com/jmlich/geotagging

Added it here as a kind of honorable mention. It's not really a photo / video manager, but it's a nice tool to enrich your photos with GPS data, if you had any kind of GPS tracker with you, which was writing a .gpx track file.

Gnome-photos 😡

https://wiki.gnome.org/Apps/Photos
https://gitlab.gnome.org/GNOME/gnome-photos

Current version is 43 and flathub has it, wow! I pay respect to gnome-photos developers to integrate their project properly to flathub. There's some issue with flatpak version of application though, it doesn't work properly: whenever I click to open a folder with pictures nothing happens, like some call to file picking dialog fails. Yes, I know, the developer left quite extensive instructions on how to troubleshoot it, but I see that from the screenshots the app doesn't do any of what we listed as our requirements.

gwenview 😑

https://apps.kde.org/gwenview/

(-) Even though it supports tags, it failed to recognize my tags on photos. The tags I set in Aves are appearing as EXIF / XMP / Subject, but it looks like gvenview is using it's own place to store it's own tags

(-) you cannot use tags for navigation - like to search by them

(-) all current versions are prune to high memory consumption and lead to system crashes. I already reported couple of bugs and they were even fixed, but you need to take the latest DEV version to get those fixes

(-) It doesn't have mini-map, out-of-the-box

Memories / showmypictures 😡

https://anufrij.org/memories/
https://github.com/artemanufrij/showmypictures

Looks nice, was initially made for Elementary OS, but the project looks completely abandoned. Last update was made in 2018.

Debian repository doesn't have it, flathub doesn't have it, appimagehub doesn't have it, snapcraft doesn't have it. Even modern Elementary OS releases don't have it. So skipping it.

Photonic 😒

https://github.com/oferkv/phototonic/

Last update in 2018 :( project is dead

PhotoQT 💔

https://photoqt.org/
https://gitlab.com/lspies/photoqt

Actively developed, claims to support EXIF tags, video and face tagging. Debian repository doesn't have it, but the good thing flathub has it and it's the same latest stable version as being developed on gitlab - so I do pay respect to developers for doing packaging properly. I've been testing version 2.9.1

(*) the app doesn't use any database to store metadata. This is what it has in similar to Geeqie. So be prepared to limitations

(+) very-very nice looking UI, really one of the best
(+) it can do manual face tagging

(-) sorting of photos only by dates from files and not Exif date - #166
(-) it doesn't show all Exif tags, like Exif subject I set in Aves
(-) you cannot modify Exif dates, Exif tags or GPS coordinates
(-) it cannot navigate using Exif tags / nor find images by tags

(-) it doesn't have built-in map. If you click on GPS coordinates in Exif pane it opens up OpenStreetView in a web browser

Bugs:

- when I assign a face tags on a photo, PhotoQT overrides existing EXIF tags I had in a file (EXIF Subject). This is scary shit - #164
- (probably my own local bug) it doesn't play video correctly, only sound was appearing
- small UI bug - #165

QView 💀

https://interversehq.com/qview/

Cannot compete at all, as it's just a single file viewer with no surfing capabilities

Photoprism

https://photoprism.app/

xxx

KDE Pix

https://apps.kde.org/pix/

xxx

KGeoTag

https://apps.kde.org/kgeotag/

https://kgeotag.kde.org/

Mint PIX 💀

https://github.com/linuxmint/pix

Fork of gThumb for Linux Mint. Some minor things are getting polished, but nothing big

Darktable 💔

https://www.darktable.org/

Debian 11 stable has some outdated version like 3.4.1, so we're installing it from flathub. Good to see it has the most recent version 4.0.1

I played like 20 minutes with it and first impression is - WOW! Compared to all the above applications, it's like super champ. But it has few major flaws, which are making it impossible for us to use it. Not that they're flaws, they're more like principles developers have put into the basic design of it. I'll list them here:

(-) it doesn't support video in any form. Because it's photo editor.
(-) it deliberately not writing any metadata back as EXIF to files. It claims it's non-destructive app, no matter what. So all the changes are written to sidecar XMP files, which is a bullshit.

Digikam 💓💓💓

https://www.digikam.org/
Source code: https://invent.kde.org/graphics/digikam
Forum: https://forum.kde.org/viewforum.php?f=255
Bugzilla: https://bugs.kde.org/buglist.cgi?list_id=2164853&product=digikam&resolution=---

This is a well known monster I know and love. It can do all the stuff we need.
We might have started with it, but hey, you would never looked at the above software first of all, right?

However, there are few important bits here:

1) You'll need to make sure to take the version at least >= 7.3, this is where Digikam started to use exiftool. So you'll be able to read/write EXIF tags not only from photos but also from videos.

2) Don't take the flatpak version until #44 is solved. As you won't be able to use exiftool. See also this

3) Considering #416516, #436876 and #366348 we might need to wait till version 8 (or build it ourselves), as exiv2 is still being used

4) I did my own tests with version 7.9 compiled from source code and whatever tags I was putting to my mp4 video files weren't written back to files by Digikam. Tested the same with last stable version 7.8. Raised #459928
I tried to build version 8.0 (from master) but it was failing.

Overall comparison

app name	tag support	gps support	faces recognition	comments
gThumb	Only for photos	Only for photos. Cannot be used for navigation. Cannot update	No	Good, but no metadata support for video
deepin-album	?	?	?	Wasn't tested as it's hard to get
ShotWell	Only for photos	Partially for photos	Will be in 0.31/32	Very promising but currently lacks video support and geotags based navigation
KPhotoAlbum	No	Yes	Very manual	without proper tag support it's useless for my purposes
Geequi	Partially, for photos. You can't navigate by them	Partially, for photos. You can't navigate by them	Very manual	Without proper video support and navigation it's useless
Geotagging	?	?	?	Wasn't tested
Gnome-photos	?	?	?	Wasn't tested, as flatpak version doesn't work properly
gWenView	Not really	No	No	Useless for us
Memories / showmypictures	?	?	?	Wasn't tested as it seems to be abandoned in 2018. Screenshots are awesome, like iPhoto :)
Photonic	?	?	?	Wasn't tested as it seems to be abandoned in 2018
PhotoQT	Not really	No	Manual tagging with overriding your existing EXIF tags because of bug	Looks nice, but has some critical bugs
QView	tag_support	gps_support	faces_recognition	comments
Photoprism	tag_support	gps_support	faces_recognition	comments
KDE Pix	tag_support	gps_support	faces_recognition	comments
KGeoTag	tag_support	gps_support	faces_recognition	comments
Mint PIX	tag_support	gps_support	faces_recognition	comments
Darktable	Only for photos	Only for photos	?	High quality app but cannot write metadata back to photos and lacks support for any video files at all
DigiKam	Yes	Yes	Yes	The most feature-rich program

Exploring open-source android applications

Today I have just few words for you. They are ...

Aves

URL: https://github.com/deckerst/aves

This is basically the best media gallery you can find for Android. Not only because it's open source, but it's actually one of the most feature-rich application which is being actively developed. What it can do now:

- read GPS data from your photos / videos and display them on map, like Google Photo does. But unlike Google Photo, Aves doesn't leak your data anywhere outside of your phone. You can then use a map to find what photos you've made in specific places.

- update GPS data (so far only for photos). Typical use case: if you have a bunch of photos shared by your friends via messengers (like WhatsApp), they usually missing GPS data on them. But with Aves you can update them to whatever you need

- assign any custom free-text tags on your photos and search photos by them. Typical example: instead of putting your photos into different directories, you can assign one or multiple tags on them, like "2022 - Trip to Europe" or "Outdoor" and then you'll be able to see all your photos tagged with this tag. It's much more convenient and flexible way of organizing your media library, provided growing support for these tags from desktop applications, like DigiKam.

- fix date & time on your photos

More features are coming:

- objects and face recognition (offline! no google involved)

- assign GPS and tags for videos

Speaking of features, the only closest application I could find was an app named "Piktures" but it's proprietary, paid and it has a idiotic bug which basically prevents you from using free version. I reported it to developers, but they were too lazy to fix it.

LibreTube

URL: https://github.com/libre-tube/LibreTube

Google is known for controversial style of artificially limiting features in their free apps, to unblock them only if you pay some subscription fee. Like YouTube mobile app. Without paying you won't be able to download videos to see them later, when you have no internet connection. Or to listen to videos in a background, with your phone screen being turned off.

But that's not the case, if you're using LibreTube. Not only it offers you these features for free, out of the box, but it also allows you to watch youtube videos without watching annoying ads or being registered at all. You still can have your subscription lists and watch history, but they'll be stored locally, on your phone.

I think it's just awesome!

Speech Recognition and Text-to-Speech using ML. Part 2: going offline

In the first part we have set up a very simple but efficient voice assistant which can listen to your commands and react on them. It starts up as soon as internet and audio appears in system. And yes, this is the limitation of our very first implementation - it needs internet as it uses Google TTS and STT engines.

But this time, we'll try to replace them with something which can be executed 100% offline. And I have couple of candidates, apart from the ones we talked before

Mozilla Deep Speech

The most recent version of DeepSpeech right now is v.0.9.3 and it requires Python 3.7. whereas we're running 3.9 in our Debian. I found it out, when I attempted to install the deepspeech-0.9.3-cp37-cp37m-linux_aarch64.whl and it was giving me this confusing message:

pi@orangepi4-lts:~/projects $ pip3 install https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-cp37-cp37m-linux_aarch64.whl

ERROR: deepspeech-0.9.3-cp37-cp37m-linux_aarch64.whl is not a supported wheel on this platform.

As I figured it out, I should have checked --verbose flag and then it became more apparent that it attempted to install things under 3.7 directories, whereas current version of python was 3.9. Python is such a mess.

As Debian repo doesn't have 3.7 I decided to build it myself. It's done pretty straightforward: download -> configure -> make -> make install. I just crosschecked the very first hit I found on google and it went just fine.

To be continued

https://deepspeech.readthedocs.io/en/v0.9.3/

coqui-ai TTS and SST

https://github.com/coqui-ai

https://tts.readthedocs.io/en/latest/

Coqui-Ai TTS - Text to speech

I was playing with TTS (text-to-speech) using my Intel Atom Z8350 tablet - Asus Transformer Book T102HA, as I'm on vacation right now, away from my little SBC device at home. It was almost a smooth ride. Saying "almost", because in Windows 10 it took me a while to understand two main dependencies:

1) you need to have python 3.9, not 3.10

2) you need to install windows 10 sdk and build tools, using "Visual Studio Installer". And that thing brings like 5 additional Gb of various DLLs, header files and tools to your Windows 10. Kind a crazy.

Eventually you can do:

pip3 install tts

And check this out:

tts --text "This is another test using tacotron2" --out_path speech_tacotron2.wav --model_name "tts_models/en/ek1/tacotron2"

tts --text "This is another test using tacotron2-DDC" --out_path speech_tacotron2-DDC.wav --model_name "tts_models/en/ljspeech/tacotron2-DDC"

tts --text "This is another test using tacotron2-DDC_ph" --out_path speech_tacotron2-DDC_ph.wav --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph"

tts --text "This is another test using glow-tts" --out_path speech_glow.wav --model_name "tts_models/en/ljspeech/glow-tts"

tts --text "This is another test using speedy-speech" --out_path speech_speedy.wav --model_name "tts_models/en/ljspeech/speedy-speech"

tts --text "This is another test using tacotron2-DCA" --out_path speech_tacotron2.wav --model_name "tts_models/en/ljspeech/tacotron2-DCA"

tts --text "This is another test using vits" --out_path speech_vits.wav --model_name "tts_models/en/ljspeech/vits"

tts --text "This is another test using fast_pitch" --out_path speech_fast_.wav --model_name "tts_models/en/ljspeech/fast_pitch"

tts --text "This is another test using vits" --out_path speech_vits.wav --model_name "tts_models/en/vctk/vits"

tts --text "This is another test using fast_pitch" --out_path speech_fast_.wav --model_name "tts_models/en/vctk/fast_pitch"

tts --text "This is another test using tacotron-DDC" --out_path speech_tacotron.wav --model_name "tts_models/en/sam/tacotron-DDC"

tts --text "This is another test using capacitron-t2-c50" --out_path speech_capacitron-t2-c50.wav --model_name "tts_models/en/blizzard2013/capacitron-t2-c50"

tts --text "This is another test using capacitron-t2-c150" --out_path speech_capacitron-t2-c150.wav --model_name "tts_models/en/blizzard2013/capacitron-t2-c150"

This will download a number of different voice and vocoder models (if something fails for you, check the output of tts --list_models command first, and adjust the above) and synthesize the phrase given as --text parameter using them.

You can then go and listen the result, using your favorite media player, or download SoX to use aplay.

Given I was getting very unstable performance figures, like what time does it take to generate the same sentence using different combinations of models/vocoders, I drafted a very simple script to try them all out multiple times, so I'll have a more data to compare - https://github.com/kha84/tts-comparison

I figured it out, that for my hardware the balance between best performing and most accurate spelling was to use tts_models/en/ljspeech/glow-tts

Tensor Flow TTS

https://github.com/TensorSpeech/TensorFlowTTS

AprirASR TTS

https://github.com/abb128/april-asr

Used in LiveCaptions, a wonderful application for Gnome to generate English captions from the played audio

OpenAI Whisper TTS

https://github.com/openai/whisper

Whisper.CPP TTS

A re-implementation of OpenAI's Whisper but on C++, which is leaner to resources

https://github.com/ggerganov/whisper.cpp

Mimic3 TTS

https://github.com/MycroftAI/mimic3

MaryTTS

http://mary.dfki.de/

Speech Recognition and Text-to-Speech using ML. Part 1: Build your own Alexa / Siri

As you know, one of the project goals is to have your own "Alexa" or "Siri" replacement in our little SBC. A lot of work was already done by a number of talented folks, so we won't shy to just take results of their work and reuse it in our favor.

But at the beginning, let's start with some very basic stuff here. Like making sure we can record and playback audio, then we'll play with couple of entirely offline text-to-speech software.

Check some basic stuff

arecord / aplay

These two guys worked for me out of the box. The recording is happening on the internal microphone (the quality is so-so), playback via the currently selected device in PulseAudio - in my case through TV connected with HDMI

arecord -d 5 -r 48000 test.wav
aplay test.wav

Text-to-speech

espeak

Espeak was around for a long while. It sounds very robotics, no matter what kind of voice you choose.

apt install espeak

espeak "Hello there!

espeak --voices

espeak -v en-gb "Text will be converted to speech."

Festival

Festival is also a long standing player. And it also sounds very robotic, by default, but we can install some additional "language packages" which make it bit better. And there're also commercial packages existing on a market, but they're not as good as their pricetag

apt install festival festvox-ru festvox-en1 festvox-us1 festvox-us2 festvox-us3 festlex-cmu festlex-poslex

Below commands must be executed under unprivileged user (non-root):

echo "This is the best as it can get" | festival --tts

echo '(SayText "This is the best as it can get")' | festival --pipe

I can see it clearly, when the software was designed by morons. Speaking of Festival you cannot easily switch between different voice packs, you have to update configs or use interactive mode:

festival
(voice.list)
(voice_us1_mbrola)
(SayText "This is now a different voice")
(voice_us2_mbrola)
(SayText "This is now a different voice")
(voice_us3_mbrola)
(SayText "This is now a different voice")
(voice_en1_mbrola)
(SayText "This is now a different voice")

If you want some voice to become a default one, make a change to /usr/share/festival/siteinit.scm search for a line voice_default

Look this video where a nice guy explaining how to download more voices - https://www.youtube.com/watch?v=Ju_X11JyRSE

I checked the website he mentioned, all voice packs are old as dinosaurs' poo, so to me it doesn't really make sense to spend on it any additional time

pico2wave

I got tired of those robotic voices. So I just looked for couple of youtube videos how it sounds and wasn't really impressed. So skipping it.

Google text-to-speech engine

If you're on Debian - don't install it from repo. The version from repo is outdated and doesn't work anymore. Install it via pip, it will take much freshier version:

pip install gTTS

Let's play with it now:

~/.local/bin/gtts-cli 'This is the test from Google text to speech services' --output /tmp/test.mp3
ffplay -nodisp -autoexit /tmp/test.mp3

Well, the quality is just increadable. I'll probably go online :) but need to understand, this Google can close this backdoor at anytime, so if we're building something we don't want to support all the time, we'll need to have a fallback option, line one of the above. So far I just made a simple and stupid wrapper, we'll extend it later on, if needed, to analyze exit code from gtts-cli and switch to something different, like espeak:

cat /usr/local/bin/sayit
#!/bin/sh
if [ -z "$1" ]
then
echo "Error: no input parameter is given"
exit 0
fi
tfile="$(mktemp /tmp/tts.XXXXXXXXX.mp3)" || exit 1
~pi/.local/bin/gtts-cli "$1" --output $tfile
ffplay -nodisp -autoexit $tfile
rm $tfile

Speech recognition

Let's start with something simple like this project - https://github.com/ProgrammingHero1/romantic-alexa

I've made a small fork of it - https://github.com/kha84/pgttsjaisrar - with fixing some obvious issues and swapped the ugly text-to-speech python lib with our lovely above shell file, made it to run as systemd service and, hell, it works just well out of box.

Stay tuned for the part 2, where we going to make it to work offline - https://orange-pi-4-lts.blogspot.com/2022/09/speech-recognition-and-text-to-speech_3.html