1. Web Design
  2. Audio

Web Audio and 3D Soundscapes: Introduction

Read Time:8 minsLanguages:
This post is part of a series called HTML5 Web Audio and 3D Soundscapes.
Web Audio and 3D Soundscapes: Implementation

In this tutorial we will be taking a close look at the fundamental Web Audio elements that are used to construct 3D soundscapes for immersive interactive applications including, but not limited to, 3D games.

The Web Audio API and the terminology it uses can sometimes be confusing but this tutorial aims to take away the complexity and provide a simpler explanation of the Web Audio elements and how they work together to form a 3D soundscape.


This demonstration contains three sounds that are rotating around a listener, the listener's direction is indicated by the arrow. If you imagine looking down on a game character (the listener), the rotating sounds could be easily represent friends or foes circling the character.

The demonstration source code and resources are attached to this tutorial.


The AudioContext interface is the heart and soul of Web Audio, it provides the functions required to create various Web Audio elements as well as providing a way to send all of the audio to hardware and onwards to someone's speakers or headphones.

It's important to make sure the AudioContext interface is available because Web Audio is still fairly new and it might not be available in some web browsers.

As well as providing the functions required to create various Web Audio elements, the AudioContext interface has two important properties; destination and listener which are both read-only. The destination property can be thought of as the connection to audio hardware, it's where all of the generated audio will eventually end up. The listener property (we will look as this in more detail later) represents the thing that is listening to all of the audio, e.g. a character, or more precisely a camera, in a game.


The AudioBuffer and AudioBufferSourceNode interfaces allow us to play audio. AudioBuffer objects contain the raw audio (sound samples) that are tweaked, crunched, and crushed as they make their way through Web Audio before reaching someone's speakers or headphones. AudioBufferSourceNode objects are used to start and stop the audio contained in AudioBuffer objects.

The standard way to load audio into an AudioBuffer object is to use a XMLHttpRequest object with its responseType set to arraybuffer. When the audio file has been loaded the array buffer is then sent to the AudioContext object for decoding and, if the decoding is successful, we will be provided with an AudioBuffer object.

The decodeAudioData() function also has a third parameter that accepts a second callback, that callback is called when the loaded audio file cannot be decoded.

Not all web browsers support the same audio formats, a good table of supported formats can be found here, so you might want to use the second callback to fallback to an alternative audio format if needed. For example, Internet Explorer doesn't support OGG Vorbis but it does support MP3. The only real problem with MP3 is it doesn't allow seamlessly looped audio like OGG Vorbis does.

When you have an AudioBuffer object available you can play it using an AudioBufferSourceNode object.

It's important to remember AudioBufferSourceNode objects are single-shot audio players, in other words you can only use the start() function once. You will need to create an AudioBufferSourceNode object and connect it (directly or indirectly) to the destination object exposed by the AudioContext object whenever you want to play audio from an AudioBuffer object.

You could make life a little simpler by creating a small utility function that creates, connects, and starts an AudioBufferSourceNode object for you.

When an AudioBufferSourceCode object finishes playing, and if you have no references to the object anywhere (e.g. you don't have them stored in an array), then Web Audio will automatically disconnect the object for you. This is extremely handy when you only need to fire-and-forget short sound effects etc.

If you decide to loop the audio, using the AudioBufferSourceNode loop property, you will need to keep a reference to the AudioBufferSourceNode object somewhere so you can stop() the audio playing.

So at this point we are using buffers to play audio, but the audio is being played directly without any panning or spatialization being applied to it. This is where PannerNode objects come into play.


PannerNode objects allow us to position audio in 3D space, within a cartesian coordinate system. This is where most of the 3D magic happens.

A PannerNode object has quite a few properties that allow us to fine-tune the behavior of the audio but for this tutorial we are only interested in two of them; maxDistance and panningModel. The maxDistance property is the distance from the listener at which point the audio volume will be zero. This is an arbitrary value and will only have meaning within your application but it defaults to 10000. The panningModel tells Web Audio how to process the audio passing through a PannerNode object. For 3D soundscapes you will probably want to set the value to HRTF (head-related transfer function).

To set the position of an AudioBufferSourceNode we use the setPosition() function exposed by a PannerNode object.

To make things a little clearer let's update the utility function we created previously.

At this point we are playing audio and positioning it in 3D space, but there is one more important element we need to look at; the audio listener.

The Audio Listener

Every AudioContext object exposes a listener object that represents the position and orientation of the thing that's listening to the audio. Usually the thing would be a virtual camera that's attached to a game character's head, the bumper of a car, the tail of an aircraft, or anything else that makes sense to the viewer from their perspective.

The listener object has a setPosition() function and a setOrientation() function. The setPosition() function places the listener somewhere within the 3D space, and the setOrientation() rotates the listener (imagine a camera panning and tilting).

The setPosition() function works in exactly the same way as the PannerNode setPosition() function and accepts three coordinates.

The setOrientation() function is a bit more complex, it accepts two unit vectors. The first vector represents the listener's rotation (the direction the camera is pointing), and the second vector represents the listener's up direction (it points out of the top of the camera).

If you only need to rotate the listener around one axis the vector calculations are relatively simple. For example, if you are using the same coordinate system that WebGL uses where positive x points to the right of the screen, positive y points to the top of the screen, and positive z points out of the screen, then you can rotate the listener around the y axis (pan the camera) using one cos() function call and one sin() function call.

The demonstration for this tutorial (source code is attached) does a similar thing and rotates the PannerNode objects around a single axis.


In this tutorial we took a look at the fundamental Web Audio elements that are used to construct 3D soundscapes for immersive interactive applications including, but not limited to, 3D games. Hopefully this tutorial has been of use to you and has provided enough information for you to have an understanding of how audio buffers, panners, and listeners work together to produce 3D soundscapes.

If you have any feedback or any questions please feel free to post a comment below.


The Next Step: Implementation

In the next tutorial, Web Audio and 3D soundscapes: Implementation, we will be take all of the above (and more) and wrap it in a simplified API. The main focus of the next tutorial will be 3D games but the API will be generic enough for use in various immersive interactive applications.

Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.