How to Lichess Voice

14 Jun 20237,849 viewsEnglish (US)

Straight Talk from the Top Expert on all of the Things

Lichess voice recognition is the preeminent way to play chess while woodworking, welding, performing delicate surgical procedures, or folding laundry. If you have a microphone, why not give it a try?

Unlike Alexa or Siri where voice data is stored remotely - ripe for hacking and misuse, we do everything on your device. Not a single audio file is created. The Lichess server doesn't know whether you make moves via speech, mouse, keyboard, or touch. It stores a single account preference (true or false) for visibility of the voice UI. That is all we know.

Getting started

First you need a free lichess.org account, so get one and log in. Enable voice input in Preferences -> Game Behavior or use the hamburger menu in the move list button row on any game page or the main puzzles page. This setting does not turn your microphone on, but it does toggle display of the voice button that lets you do so.

Let's turn that sucker on, shall we? It is time to name some squares in the coordinate trainer.

The first time you click the voice button, your browser will ask whether lichess.org can use your microphone. Grant permission and a 40 MB download of the speech software will begin. This only happens once (unless you clear your cache). Feel free to do a few rounds with the keyboard while you wait. Once the voice button pulses blue and says Listening, try saying "start". And this time, speak the squares as they light up.

Pretty cool right? How many can you get in 30 seconds? Probably not as many as this guy.

Voice settings for games and puzzles

Sure, you can stop reading this now and start making voice moves in games and puzzles. But for the best results, you should know a bit about how it works.

Voice recognition is tricky business. Sometimes you say "a8" but the recognizer just hears "8". Maybe your move has a "b", but we hear "c", "d", or "e". Or you call out "green" but we think it's "queen". Across all mics, voices, accents, and background noise levels we sampled, the recognition accuracy for chess phrases is roughly 70%. To make this work, we need a few tricks up our sleeves.

Some of you participated in the crowd sourced voice data gathering operation where Howard the octopus asked you to speak 50 chess phrases, colors, and assorted commands. We used that data to build substitution mappings based on hard statistics. And we use those mappings to fuzzy match every heard phrase to one or more commands. When we're not sure (which happens a lot), we give you multiple arrows to choose from. Here's what it looks like:

We call this disambiguation. Notice the white outline around the green arrow. That outline always indicates the preferred move when we can identify a single best guess for your command. By default this move is played once the sweeping arc (shown on the d2 square above) becomes a full circle. End the countdown early by saying "yes" to play the outlined move, "pink", "blue", "red", etc. to play a different move, or "no" to cancel and clear the arrows. You can also configure the timer duration, disable the timer, or switch to numbered arrows if colors are tough to tell apart. Use the voice gear menu to configure these options. Find it near the top right of your browser window in landscape layouts.

The clarity slider controls how many disambiguation arrows you are shown. It adjusts the threshold for error tolerance between the phrase that we heard and the move phrases we match it with. Select fuzzy to match more moves, clear to match only one, or normal for something in between. There is no real downside to choosing fuzzy when in timer mode, but you might prefer normal or clear if you frequently make the preferred move.

The timer slider controls the countdown duration. Choose the leftmost setting to disable it entirely, but be careful. When the timer is off and clarity is set to clear, the preferred move will be played immediately. This can cause misplays due to recognition error as many of the chessboard files sound alike. Use the phonetic alphabet to differentiate them or enable move confirm. Unless your pronunciation is flawless and the recognizer hears perfectly, playing clear with no timer and no move confirm is dangerous.

The fastest way to play error free is at fuzzy clarity with a timer. The available vocabulary is limited to arrow selection during countdowns, but the recognition latency is much lower so "yes", "no", colors, and numbers are processed in the blink of an eye. Lower quality microphones, non-native accents, and a bit of noise shouldn't hold you back. The only downside is that you must first say "no" to clear the timer before you can issue a new command due to the limited countdown vocabulary.

Lastly, there is push to talk. When this setting is enabled, you speak commands while holding the Shift key. The microphone is ignored when shift is not pressed. lichess.org must be the foreground window to capture key presses, and of course it's no longer hands free. Push to talk is most useful when you have company or wish to stream some voice chess and talk to your stream viewers without triggering an arrow apocalypse.

Vocabulary in games and puzzles

There are many ways to speak your moves, and the disambiguation system makes it good for beginners. If a command is exploratory or ambiguous (such as "bishop" when both your bishops are still on the board), we show indefinite (untimed) arrows or selection circles. With no preferred move and thus no countdown, any arrows or circles remain on the board until dismissed and the complete move vocabulary remains available. You can say "bishop" (lighting up a bunch of squares), then immediately say "g8 promote queen" after changing your mind.

Selection circles (pictured above) choose a single piece when there are multiple options. Name the circle's color (or number) to select it, showing up to eight move arrows. If more than eight squares can be reached, we show a big gray dot on each. Name the destination square to complete your move when dots are shown.

Here's some examples of allowed phrases:

"pawn" - Show pawn moves or selection circles on every pawn
"takes" - Up to eight available captures of opponent pieces
"knight" - Available knight moves, or moves that capture an opponent knight
"take the rook" - Moves that capture an opponent rook
"e4" - If occupied, select the piece on that square. If unoccupied, show pieces that can move to that square. Pawn moves are always preferred for unoccupied squares as in SAN.
"bishop e7" - Move your bishop to e7
"bishop takes knight" - Take a knight with your bishop
"rook 1 a1" - Move your rank 1 rook to the a1 square
"g8 promote queen" - Move your pawn to g8 and promote to queen

You can see every possible phrase given the board state by saying "vocabulary". The list will include general commands such as "oops" and "rematch" (in games), "thumbs up", "solution", and "next" (in puzzles), or "stop listening" and "flip" (in both).

See the video

https://youtu.be/Ibfk4TyDZpY

Looking forward

We have barely scratched the surface of speech recognition in Lichess. Maybe one day it will be possible to do puzzle storms, navigate the lobby, analysis, studies and more in multiple languages.

Speaking of languages - on testy.lichess.dev, the voice settings menu has a language popup where you can try French (be warned). You can also try Turkish, German, Russian, Italian, Vietnamese, and Swedish which are untested and probably unusable. Use the SHOW ME EVERYTHING button from the voice help dialog to explore their vocabularies. If you can get a single word to work, that's a win. Congrats!

To help fix them, or add your own language, have a look at this readme. If you've soiled your undergarments, that's OK. You may now go change into something dry. But you'll need to become familiar with lexicons, patch jsons, and substitution costs to have a shot at being helpful.

Of course, the best way to support a new language is with solid data instead of playing whack-a-mole and patching substitutions in manually. We made a lot of noise on social media and the Lichess blog to get around 500 participants for our English language data gathering operation. It was barely enough. If you think we can get at least that many in your native language, and there's a small model available for it, maybe we can talk. Stay tuned for ways to volunteer!

Acknowledgements

Lichess speech recognition would not exist without the fine folks at Alpha Cephei. We use their Vosk API and small US English model, both of which are downloaded the first time you click the voice button. Creating a good acoustic model requires meticulous annotation of many thousands of hours of audio samples and countless hours of testing. We are very grateful that Alpha Cephei provides their speech models openly and free of charge.

No less important are the many contributors to the Kaldi speech recognition toolkit. This open source library contains state of the art algorithms that make real time speech processing possible. We should also mention Denis Treskunov and Ciaran O'Reilly who brought it all to your web browser!

Discuss this blog post in the forum

How to Lichess Developer

Learn to Internet with Lichess PGN Viewer