Trying ReSpeaker Mic Array v2.0
While the LUMOSUR environment does it’s thing based on sensors, touch screens and mobile controllers, voice control is also a part of the mix. We are not developing an Alexa clone and you won’t be able to order a pizza through LUMOSUR, but you can entrust it with your home automation, air conditioners – even door locks – because it doesn’t require any cloud- or even Internet connection. However – a good microphone is required to reliably recognize voice commands. So – we tried the ReSpeaker v2.0 microphone array.
A normal microphone or microphone array (we are normally using the Sony PlayStation Eye) works just fine in a small room. Larger rooms are tough to deal with. Walls reflect and distort the sound waves, the noise floor is high(er) and it’s almost impossible to get the Speech To Text (STT) software to recognize anything even from just a few feet away.
The ReSpeaker is supposed to help with those problems. It has 4 microphones and processes those inputs to provide DOA (direction of arrival), beam forming, noise suppression and de-reverbaration to deliver a workable voice signal even in large rooms and from many feet away. It also has acoustic echo cancellation that is very helpful – more about that later. It comes with a USB connector and is supposed to work “out of the box”.
Well – it didn’t. The default operation system on the microphone array delivers the raw (unprocessed) signals as well as the processed signal. This of course confused both, pocketSphinx and Kaldi. But it was easy to flash the firmware to a one channel (processed audio) version and we were able to start the tests.
Here two test recordings, 2 channels, 16 Khz. Both recordings start at about 5ft away from the microphones speaking directly towards them. The speaker is slowly walking past the devices to about a distance of 10ft, turning around and is again facing the microphones. At that point, the speaker is at a 90 degree angle to the Sony Playstation Eye so that it is not able to capture the sound directly anymore.
Test 1: The Sony Playstation Eye
Test 2: The ReSpeaker Microphone Array
Well – I think the differences are small but obvious. The Sony microphone is already pretty good and the differences between both microphones would be much more pronounced if one would compare the ReSpeaker against a low quality microphone. However – the Sony only works well in close(r) proximity and only as long as one is in front of the array. Whereas the ReSpeaker does it’s thing even from a distance and in a 360 degree circle around the device.
But there’s another cool feature that makes a big difference. Imagine playing music or other sounds and trying to keep talking to your device. The ReSpeaker comes with an integrated sound card that actively filters everything played against the microphone input. So even if you’re listening to your favorite radio station our audio stream, the ReSpeaker will still deliver a good quality microphone signal to your STT engine.
Let’s talk about the pro’s and con’s:
The Sony costs between $5 – $10 and delivers good audio quality in small rooms and as long as the voice source is in front of it. It doesn’t do any processing, de-reverb or gain control – all processing must be done on the host. The setup is a bit tricky because ALSA doesn’t really like the four channels as provided by the Sony but this can be mitigated via .soundrc (see below). Of course – external noise sources (radio, music) will drown the voices.
The ReSpeaker costs around $70 and is able to pickup the voice from any direction, even if 10ft or more away. It provides extensive processing so there’s no need for any additional work on the host. The integrated USB audio device delivers not only crystal clear sound, but also cancels (mostly) the audio against the microphone signal.
Here are a few more things to consider: If you want to control the LED ring on the ReSpeaker manually or if you want to update the software or do other tasks, you need to do in Python. I would have liked a nice C-library that could be packaged with an application, but – well – Python it is. The sound card is set to S24_3LE format. I haven’t been able to make it work (under alsa) to accept more than one audio stream at a time, even with dmix and other tricks.
Conclusion:
Don’t expect wonders. If your STT like Sphinx or Kaldi doesn’t understand you with a cheap Sony Playstation Eye, it will most likely not understand you even with the ReSpeaker attached.
Configuring ALSA for Sony Play Station Eye (Raspberry PI) :
We have the Sony Playstation eye setup in udev to find it reliably:
/etc/udev/rules.d/70-alsa-permanent.rules SUBSYSTEM!="sound", GOTO="my_usb_audio_end" ACTION!="add", GOTO="my_usb_audio_end" ATTRS{idVendor}=="1415", ATTRS{idProduct}=="2000", ATTR{id}="VOICE"
Now – in .asoundrc , we’ll have to address two issues: The down mixing of the 4 separate microphone streams and the adjustment of the recording levels.
That’s the device we’re using:
pcm.array { type hw card VOICE }
We route the signal through the softvolume stage that allows us to increase or decrease the recording volume via the simple “Mic Gain” control
pcm.array_gain { type softvol slave { pcm "array" } control { name "Mic Gain" count 2 card 0 } min_dB -40.0 max_dB 10.0 resolution 80 }
Finally we use a third stage to combine all 4 microphone streams into a single one.
pcm.cap { type plug slave { pcm "array_gain" channels 4 } route_policy sum }
You can now record from “cap” or define it as default.
About the author:
Michaela Merz is an entrepreneur and first generation hacker. Her career started even before the Internet was available. She invented and developed a number of technologies now considered to be standard in modern web-environments. Among other things, she developed, founded, managed and sold Germany’s third largest Internet Online Service “germany.net” . She is very much active in the Internet business and enjoys “hacking” modern technologies like block chain, IoT and mobile-, voice- and web-based services.
One thought on “Trying ReSpeaker Mic Array v2.0”