Introducing the Audio API extension

  • Revision slug: Introducing_the_Audio_API_Extension
  • Revision title: Introducing the Audio API extension
  • Revision id: 39349
  • Created:
  • Creator: sebmozilla
  • Is current revision? No
  • Comment 3 words added

Revision Content

The Audio API extension extends the HTML5 specification of the <audio> and <video> media elements by exposing Audio Metadata and raw Audio data. This enables users to visualize audio data, to process this audio data and to create new Audio data.

Reading Audio Streams

The 'loadedmetadata' Event

When the metadata of the media element is available, it will trigger a 'loadedmetadata' Event. This event has the following attributes:

  • mozChannels: Number of Channels
  • mozSampleRate: Sample rate per second
  • mozFrameBufferLength: Number of samples collected in all channels

This information will be needed after to decode the Audio Data stream. The following example extracts the data from an audio element:

<!DOCTYPE html>
<html>
  <head>
    <title>JavaScript Metadata Example</title>
  </head>
  <body>
    <audio id="audio-element"
           src="song.ogg"
           controls="true"
           style="width: 512px;">
    </audio>
    <script> 
      function loadedMetadata() {
        channels          = audio.mozChannels;
        rate              = audio.mozSampleRate;
        frameBufferLength = audio.mozFrameBufferLength;      
      } 
      var audio = document.getElementById('audio-element'); 
      audio.addEventListener('loadedmetadata', loadedMetadata, false);
    </script>
  </body>
</html>

The 'MozAudioAvailable' Event

     As the audio is played, sample data is made available to the audio layer and the Audio buffer (size defined in mozFrameBufferLength) gets filled with those samples. Once the buffer is full, the event MozAudioAvailable will be triggered. This event will therefore contain the raw samples of a period of time. Those samples may or may not have been played yet at the time of the event and have not been adjusted for mute/volume settings on the media element. Playing, pausing, and seeking the audio also affect the streaming of this raw audio data.

The MozAudioAvailable event has 2 attributes:

  • frameBuffer: Framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats).
  • time: Timestamp for these samples measured from the start in seconds

The framebuffer contains an array of audio samples. It's importnt to note that the samples are not separated by channels, they are all delivered together. For example, it will be for a 2 channel signal: Channel1-Sample1 Channel2-Sample1  Channel1-Sample2 Channel2-Sample2 Channel1-Sample3 Channel2-Sample3.

We can extend the previous example to visualize the timestamp and the first 2 samples in a div element:

<!DOCTYPE html>
<html>
  <head>
    <title>JavaScript Visualization Example</title>
    <body>
    <audio id="audio-element"
           src="revolve.ogg"
           controls="true"
           style="width: 512px;">
    </audio>
	<pre id="raw">hello</pre>
    <script> 
      function loadedMetadata() {
        channels          = audio.mozChannels;
        rate              = audio.mozSampleRate;
        frameBufferLength = audio.mozFrameBufferLength; 		
      } 

      function audioAvailable(event) {
        var frameBuffer = event.frameBuffer;
        var t = event.time;
		        
        var text = "Samples at: " + t + "\n"
        text += frameBuffer[0] + "  " + frameBuffer[1]
        raw.innerHTML = text;
      }
      
      var raw = document.getElementById('raw')
      var audio = document.getElementById('audio-element'); 
      audio.addEventListener('MozAudioAvailable', audioAvailable, false); 
      audio.addEventListener('loadedmetadata', loadedMetadata, false);
      
    </script>
  </body>
</html>

Creating an Audio Stream

It is also possible to create and setup an <audio> element for raw writing from script (i.e., without a src attribute). Content scripts can specify the audio stream's characteristics, then write audio samples. Users must create an Audio Object and then use the mozSetup function to specify the number of channels and the frequency (in Hz). See example below:

// Create a new audio element
var audioOutput = new Audio();
// Set up audio element with 2 channel, 44.1KHz audio stream.
audioOutput.mozSetup(2, 44100);

Once this is done, the samples needs to be created. Those samples have the same format than the ones in the mozAudioAvailable event.  Then the samples are written in the audio stream with the function mozWriteAudio. It's important to note that not all the samples might get written in the stream. The function returns the number of samples written, which is useful for the next writing. You can see an example below:

// Write samples using a JS Array
var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);

// Write samples using a Typed Array
var samples = new Float32Array([0.242, 0.127, 0.0, -0.058, -0.242, ...]);
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);

The mozCurrentSampleOffset() gives the audible position of the audio stream, meaning the position of the last heard sample.

// Get current audible position of the underlying audio stream, measured in samples.
var currentSampleOffset = audioOutput.mozCurrentSampleOffset();

Audio data written using the mozWriteAudio() method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (the sample offset that is currently being played by the hardware can be obtained with mozCurrentSampleOffset()), where a little means something on the order of 500ms of samples. For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).

It's also possible to auto detect the minimal duration of the pre-buffer, such that the sound is played without interruptions, and lag between writing and playback is minimal. To do this start writing the data in small portions and wait for the value returned by mozCurrentSampleOffset() to be more than 0.

var prebufferSize = sampleRate * 0.020; // Initial buffer is 20 ms
var autoLatency = true, started = new Date().valueOf();
...
// Auto latency detection
if (autoLatency) {
  prebufferSize = Math.floor(sampleRate * (new Date().valueOf() - started) / 1000);
  if (audio.mozCurrentSampleOffset()) { // Play position moved?
    autoLatency = false; 
}

Processing an Audio Stream

Since the MozAudioAvailable event and the mozWriteAudio() method both use Float32Array, it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second. The first audio stream needs to be muted so only the second audio element is heard.

<audio id="a1" 
       src="song.ogg"
       controls>
</audio>
<script>
var a1 = document.getElementById('a1'),
    a2 = new Audio(),
    buffers = [];

function loadedMetadata() {
  // Mute a1 audio.
  a1.volume = 0;
  // Setup a2 to be identical to a1, and play through there.
  a2.mozSetup(a1.mozChannels, a1.mozSampleRate);
}

function audioAvailable(event) {
  // Write the current framebuffer
  var frameBuffer = event.frameBuffer;
  writeAudio(frameBuffer);
}

a1.addEventListener('MozAudioAvailable', audioAvailable, false);
a1.addEventListener('loadedmetadata', loadedMetadata, false);

function writeAudio(audio) {
  buffers.push(audio);

  // If there's buffered data, write that
  while(buffers.length > 0) {
    var buffer = buffers.shift();
    var written = a2.mozWriteAudio(buffer);
    // // If all data wasn't written, keep it in the buffers:
    if(written < buffer.length) {
      buffers.unshift(buffer.slice(written));
      return;
    }
  }
}
</script>

See also:

Revision Source

<p>The Audio API extension extends the HTML5 specification of the &lt;audio&gt; and &lt;video&gt; media elements by exposing Audio Metadata and raw Audio data. This enables users to visualize audio data, to process this audio data and to create new Audio data.</p>
<h2>Reading Audio Streams</h2>
<h3>The '<strong>loadedmetadata'</strong> Event</h3>
<p>When the metadata of the media element is available, it will trigger a 'loadedmetadata' Event. This event has the following attributes:</p>
<ul> <li>mozChannels: Number of Channels</li> <li>mozSampleRate: <span style="font-weight: bold;">S</span><strong>ample rate per second</strong></li> <li>mozFrameBufferLength: Number of samples collected in all channels</li>
</ul>
<p>This information will be needed after to decode the Audio Data stream. The following example extracts the data from an audio element:</p>
<pre>&lt;!DOCTYPE html&gt;
&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;JavaScript Metadata Example&lt;/title&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;audio id="audio-element"
           src="song.ogg"
           controls="true"
           style="width: 512px;"&gt;
    &lt;/audio&gt;
    &lt;script&gt; 
      function loadedMetadata() {
        channels          = audio.mozChannels;
        rate              = audio.mozSampleRate;
        frameBufferLength = audio.mozFrameBufferLength;      
      } 
      var audio = document.getElementById('audio-element'); 
      audio.addEventListener('loadedmetadata', loadedMetadata, false);
    &lt;/script&gt;
  &lt;/body&gt;
&lt;/html&gt;
</pre>
<h3>The '<strong>MozAudioAvailable' </strong>Event</h3>
<p>     As the audio is played, sample data is made available to the audio layer and the Audio buffer (size defined in mozFrameBufferLength) gets filled with those samples. Once the buffer is full, the event <strong>MozAudioAvailable</strong> will be triggered. This event will therefore contain the raw samples of a period of time. Those samples may or may not have been played yet at the time of the event and have not been adjusted for mute/volume settings on the media element. Playing, pausing, and seeking the audio also affect the streaming of this raw audio data.</p>
<p>The <strong>MozAudioAvailable</strong> event has 2 attributes:</p>
<ul> <li>frameBuffer: Framebuffer (i.e., an array) containing decoded audio sample data (i.e., floats).</li> <li>time: Timestamp for these samples measured from the start in seconds</li>
</ul>
<p>The framebuffer contains an array of audio samples. It's importnt to note that the samples are not separated by channels, they are all delivered together. For example, it will be for a 2 channel signal: Channel1-Sample1 Channel2-Sample1  Channel1-Sample2 Channel2-Sample2 Channel1-Sample3 Channel2-Sample3.</p>
<p>We can extend the previous example to visualize the timestamp and the first 2 samples in a div element:</p>
<pre>&lt;!DOCTYPE html&gt;
&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;JavaScript Visualization Example&lt;/title&gt;
    &lt;body&gt;
    &lt;audio id="audio-element"
           src="revolve.ogg"
           controls="true"
           style="width: 512px;"&gt;
    &lt;/audio&gt;
	&lt;pre id="raw"&gt;hello&lt;/pre&gt;
    &lt;script&gt; 
      function loadedMetadata() {
        channels          = audio.mozChannels;
        rate              = audio.mozSampleRate;
        frameBufferLength = audio.mozFrameBufferLength; 		
      } 

      function audioAvailable(event) {
        var frameBuffer = event.frameBuffer;
        var t = event.time;
		        
        var text = "Samples at: " + t + "\n"
        text += frameBuffer[0] + "  " + frameBuffer[1]
        raw.innerHTML = text;
      }
      
      var raw = document.getElementById('raw')
      var audio = document.getElementById('audio-element'); 
      audio.addEventListener('MozAudioAvailable', audioAvailable, false); 
      audio.addEventListener('loadedmetadata', loadedMetadata, false);
      
    &lt;/script&gt;
  &lt;/body&gt;
&lt;/html&gt;
</pre>
<h2>Creating an Audio Stream</h2>
<p>It is also possible to create and setup an &lt;audio&gt; element for raw writing from script (i.e., without a <em>src</em> attribute). Content scripts can specify the audio stream's characteristics, then write audio samples. Users must create an Audio Object and then use the mozSetup function to specify the number of channels and the frequency (in Hz). See example below:</p>
<pre>// Create a new audio element
var audioOutput = new Audio();
// Set up audio element with 2 channel, 44.1KHz audio stream.
audioOutput.mozSetup(2, 44100);
</pre>
<p>Once this is done, the samples needs to be created. Those samples have the same format than the ones in the mozAudioAvailable event.  Then the samples are written in the audio stream with the function mozWriteAudio. It's important to note that not all the samples might get written in the stream. The function returns the number of samples written, which is useful for the next writing. You can see an example below<code><span style="font-family: Verdana,Tahoma,sans-serif;">:</span><br>
</code></p>
<pre>// Write samples using a JS Array
var samples = [0.242, 0.127, 0.0, -0.058, -0.242, ...];
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);

// Write samples using a Typed Array
var samples = new Float32Array([0.242, 0.127, 0.0, -0.058, -0.242, ...]);
var numberSamplesWritten = audioOutput.mozWriteAudio(samples);
</pre>
<p>The <strong>mozCurrentSampleOffset() </strong>gives the audible position of the audio stream, meaning the position of the last heard sample.</p>
<pre>// Get current audible position of the underlying audio stream, measured in samples.
var currentSampleOffset = audioOutput.mozCurrentSampleOffset();
</pre>
<p>Audio data written using the <strong>mozWriteAudio()</strong> method needs to be written at a regular interval in equal portions, in order to keep a little ahead of the current sample offset (the sample offset that is currently being played by the hardware can be obtained with <strong>mozCurrentSampleOffset()</strong>), where a little means something on the order of 500ms of samples. For example, if working with 2 channels at 44100 samples per second, a writing interval of 100ms, and a pre-buffer equal to 500ms, one would write an array of (2 * 44100 / 10) = 8820 samples, and a total of (currentSampleOffset + 2 * 44100 / 2).</p>
<p>It's also possible to auto detect the minimal duration of the pre-buffer, such that the sound is played without interruptions, and lag between writing and playback is minimal. To do this start writing the data in small portions and wait for the value returned by <strong>mozCurrentSampleOffset()</strong> to be more than 0.</p>
<pre>var prebufferSize = sampleRate * 0.020; // Initial buffer is 20 ms
var autoLatency = true, started = new Date().valueOf();
...
// Auto latency detection
if (autoLatency) {
  prebufferSize = Math.floor(sampleRate * (new Date().valueOf() - started) / 1000);
  if (audio.mozCurrentSampleOffset()) { // Play position moved?
    autoLatency = false; 
}
</pre>
<h2>Processing an Audio Stream</h2>
<p>Since the <strong>MozAudioAvailable</strong> event and the <strong>mozWriteAudio()</strong> method both use <strong>Float32Array</strong>, it is possible to take the output of one audio stream and pass it directly (or process first and then pass) to a second. The first audio stream needs to be muted so only the second audio element is heard.</p>
<pre>&lt;audio id="a1" 
       src="song.ogg"
       controls&gt;
&lt;/audio&gt;
&lt;script&gt;
var a1 = document.getElementById('a1'),
    a2 = new Audio(),
    buffers = [];

function loadedMetadata() {
  // Mute a1 audio.
  a1.volume = 0;
  // Setup a2 to be identical to a1, and play through there.
  a2.mozSetup(a1.mozChannels, a1.mozSampleRate);
}

function audioAvailable(event) {
  // Write the current framebuffer
  var frameBuffer = event.frameBuffer;
  writeAudio(frameBuffer);
}

a1.addEventListener('MozAudioAvailable', audioAvailable, false);
a1.addEventListener('loadedmetadata', loadedMetadata, false);

function writeAudio(audio) {
  buffers.push(audio);

  // If there's buffered data, write that
  while(buffers.length &gt; 0) {
    var buffer = buffers.shift();
    var written = a2.mozWriteAudio(buffer);
    // // If all data wasn't written, keep it in the buffers:
    if(written &lt; buffer.length) {
      buffers.unshift(buffer.slice(written));
      return;
    }
  }
}
&lt;/script&gt;
</pre>
<p>See also:</p>
<ul> <li>Wikipedia Article on Digital Audio: <a class=" external" href="http://en.wikipedia.org/wiki/Digital_audio" rel="freelink">http://en.wikipedia.org/wiki/Digital_audio</a></li>
</ul>
Revert to this revision