Michaela Merz's personal blog site

A coders perspective on the webAudioAPI

The new webAudioAPI has been hailed as the holy grail of modern Javascript technology. While it is impressive, it lacks a lot of features necessary to really advance Javascript into a truly multimedia future.

So – what exactly is the webAudioAPI? It basically describes a high-level JavaScript API for processing and synthesizing audio in web applications. In other words, you can capture audio-sources (like your microphone), you can visualize or process audio-streams and you are able to download audio and play it on your computer – even MP3s or other audio formats. Firefox (Version 25 and above) finally makes the webAudioAPI available and it was time for me to give it a shot.  Want to try it out yourself?

Though everything is still marked as experimental, I took an excited in-depth look into the possibilities of the webAudioAPI from the coders perspective. But after a couple of days, I am quite a bit disenchanted. WebAudioAPI leaves a lot to be desired.

Playing audio:

Playing audio works fine IF you download the complete audio. Streaming is a completely different matter. Especially  if you want to handle the streaming yourself.

While webAudioAPI allows you to decode, say mp3s or oggs, it’s way to picky in regard to the data it accepts.

Pseudo-Code:

while(1){
      chunk = readAChunkOfDataFromWeb();
      context.decodeAudioData(chunk, function(buffer) {
                playBuffer(buffer);
                }, function(error)
                console.log("decodeAudioData err: "+err);
                });
}

This would read a chunk of audio (mp3 or ogg) from a website, process and play it while other chunks are being downloaded. Voila, easy streaming. But unfortunately, it doesn’t work. The “decodeAudioData” tries to “sniff out” the audio-type which works fine for the first chunk but fails miserably for all other chunks. Even a “brute-force” approach to align “ogg-“audio by splitting downloaded chunks along its “OggS” boundaries didn’t work.

My suggestion: Make it much more robust. Skip to magic if you can’t figure it out. Or give us a way to retrieve decoding parameters from successful decode and allow us to set the parameters for subsequent decodes.

On the same topic: Even if we would be able to decode all chunks .. how are we going to play them?

function playBuffer(buf) {       
          var source    = context.createBufferSource();
          source.buffer = buf;
          source.connect(context.destination);
          source.start();
}

We *could* set start time or offset parameters in “start” – but that sure looks like a facepalm detour.

My suggestion: Allow the buffers to grow so that we don’t need to create new buffers, new connects and new plays.

The same might be valid for video-streams as well, though didn’t check it.

Recording audio:

WebAudioAPI let’s you record your microphone, process or visualize the data and upload it to some remote server. It works. I’ve tried it. But unfortunately, all is not well because the sample rate is not changeable and, in my case, was set to 48Khz. As there’s no possibility to compress or convert the audio into let’s say speex , the size of the recorded stream is ridiculously large: half a megabyte of WAV data for just a few seconds of recording.

My suggestion: Without the possibility to compress the recorded data, the recording functionality might me a nice gimmick, but without real use.  

Conclusion:

I am aware of the fact, that webAudioAPI currently is a moving target. And I might have overlooked functionality that would address my issues above. WebAudioAPI offers a lot of promising functionality. But it lacks the flexibility to push modules into the data stream (Unix R4 anybody?) and it doesn’t offer some very basic necessities. Being able to stream data on my own terms would be one. Being able to compress captured audio data into something useful would be the another one.

I will continue to monitor the developments of this technology. And maybe it will become the holy grail. But for now, its not ready.

 

 

 

5 thoughts on “A coders perspective on the webAudioAPI

  1. Dein Pseudocode oben klammert komisch. 🙂

    Now for the one important question I have in mind after reading all of this: How do I prevent websites from surveilling, recording and broadcasting me and my computer’s/TV’s/smartphone’s/game console’s/… environment without always having a fat bug prone proxy between client and server?

    In other words: Webrecording! Yeah! How do I kill that crap? >;)

    1. Every access to the microphone/camera requires the users approval. And I have a switch on my mic to turn it off. The browser is not the primary attack vector if somebody wants to access your mic.

      1. I have a switch too. And a plug. But there’s a fashion outthere. 🙂 Hardwired webcams and mics in all kind of web enabled devices.I really doubt a “smart” tv with voice control will ask for approval bevor executing java script.

        “The browser is not the primary attack vector if somebody wants to access your mic.”

        Not yet. Malware infested websites are already an important attack vector for a range of intrusions – troyan horses, worms, online banking attacks, id theft, …
        Smart appliances like TVs, game boxes, smartphones, navigators, you-name-it make this vector increasingly attractive.
        This sound-API might offer additional and quite spectacular opportunities for attackers.

        BTW, is there a video API too? 🙂 It’s almost impossible today to use a laptop or tablet without looking straight into a camera.

Leave a Reply to Stephan Lahl Cancel reply

Your email address will not be published. Required fields are marked *