If you are anything like me, you are curious about the current state and the future of Web Audio. So I asked one of the Web Audio API spec editors, Mozilla’s Paul Adenot, if I could shoot some questions. He said sure, and was so kind to take some time and answer them elaborately. Here are his answers, stuffed with lots of useful information.
1. I see there is a lot going on right now with the Web Audio specification on GitHub. How is the spec work coming along?
For now, we’ve scopped out something like a v1. The list of issues that we plan to address can be found at . In short:
- Some changes around edge cases, to make things consistent
- Ability for the `AudioDestinationNode` to be connected to something (it simply copies it input), so that you can easily take existing code and pipe it to WebRTC or a `MediaRecorder`.
- Ability to request high-latency and high-buffering on an AudioContext, for example to make a music playback app
- Some mostly cosmetic changes around interfaces: the ability to use `new` to create nodes, etc.
- Changes on the `DynamicsCompressor` (also, see 4.)
I’ve personally been super busy with other things in Gecko in the past few
months, but I’ll be back on working full-time on the spec soon, and plan to address a lot of those, along with a big pull-request that explains the innards of the Web Audio API: the processing model.
Members of the working group have issues assigned to them, and we have a bi-weekly call to synchronize and to talk about ongoing issues.
2. A question by an outsider: How does the recommendation process work? Do you have to propose a final draft which the W3C then will have to accept?
The process is described in a document , and it’s a rather complicated thing.
Basically, we as working group members work on an “Editor Draft” , and sometime publish “Working drafts” . When we feel we’ve addressed everything and we have a coherent text with good feature, good implementations and tests, we can go to CR (Candidate recommendation).
At this time, more people will review the document, to do some kind of final review. The formal review from the Advisory Comittee can begin. There can be multiple CR, depending on the outcome of this review.
Once it’s accepted, the text becomes a “Proposed Recommendation”. At this time, no big changes are expected to happen. It can become a REC (W3C recommendation) some time later after some more review cycles.
3. Do you have an idea when the work for first final version of the spec (V1) will be finished and when we can expect the recommendation by W3C?
We plan to hit the CR (candidate recommendation) status before September 2016, which is the end of the current charter . The plan is to re-charter the group and kick of a v2 of the spec. I’d like it to have more lower-level feature, but it’s a bit early to know exactly what it will look like.
4. You have pointed out a concrete example of what is still to do: The exact specification of DynamicsCompressor. What is your stance on how a generic Web Audio compressor should work? Should there be a pre-delay by default, should it be adjustable?
For now, the DynamicsCompressor has the following features:
- Fixed look-ahead
- Fixed knee shape
- Fixed attack and release slope
- User-settable attack, release, knee level, threshold and ratio
- Pre-emphasis filter
- Post de-emphasis filter
In my opinion, it lacks, at least:
- The ability to side-chain
- A way to set the ratio and look-ahead to make it a brick-wall limiter
The plan for now is to reverse the current compressor to a spec, and to decide what feature to keep and what feature to drop. I think I’m currently assigned to this issue, and my plan is to run a little survey among musicians and developers to determine a good feature set for a general purpose compressor, but it’s unclear if that’s a thing that can exists.
Also, I’ve gathered a number of software compressor (VST and the like), and plan to do some listening tests and measures.
Then spec, probably overlapping some test implementations, and then it gets shipped in browsers.
For now, all Web Audio API implementation I know use the same code. Gecko forked Blink’s compressor, which was from before the Blink/WebKit fork. I believe IE’s implementation uses a variation of the same code, which would make sense, considering the number of magic numbers in the code to define the knee shape, for example.
5. With things like spatial panning, Web Audio is predestined to be used in conjunction with the upcoming WebVR standard. Did you have VR in mind when designing the spec?
I haven’t personally designed this part (I took my current role as an editor
after the grand design of the API was made). The Web Audio API was meant to be quite generic, and was based on the Use-Case and Requirements document . This document does not mention VR, but there a number of features that were designed for spatialized audio (for FPS games, for example). Specifically, the HRTF (Head-Related Transfer Function) panning model is great for such use cases, although the HRTF database is fixed, so it’s possible that the effect is not as immersive as it could be with a custom HRTF impulse set recorded on the person. I seem to recall the BBC has made something based on the ConvolverNode that accepts a custom set of HRTF impulses, maybe something to explore.
6. We all know that the days of ScriptProcessorNode are numbered. But what’s the deal with the newly introduced AudioWorklets? Why are they called Worklets, not Workers? And what about the latency of these AudioWorklets?
on a thread that is not the main thread and not a Worker, but I later found out that the CSS WG Houdini task force had a better text that did exactly that: they wanted to run arbitrary script on the compositor for scroll effects, or code to paint elements, and such. We felt that it was unnecessary to duplicate the effort and I dropped my text.
These Worklets will not add latency to the Web Audio graph (unless it’s designed to, for example, by implementing a delay line using a Worklet), because they will process audio in block of 128 sample-frames.
I talked about this very topic last year, when I did a keynote  during the
first Web Audio Conference. It’s very possible that we kind of close the gap, but it seems not very likely that we reach complete parity with what native can do in the near future, for security reasons, mainly (that are outlined in the talk).
Shared memory and atomics can be cool for audio, if you really need to use multiple rendering threads.
8. Despite efforts to increase performance, can you name some other long-term goals for Web Audio, beyond V1?
In a somewhat random order, and probably incomplete:
- Some ways to deal with high memory usage (to make things easy on the garbage collector, or to lower the size of the in-memory buffers, with, for example, lazy resampling, or in-memory audio buffer compression)
- A way to get the state of an `AudioParam` timeline in the future
- Direct access to the audio output stream, à la Audio Data API, to write
custom DSP code easily
- Partial audio buffer decoding, for example to enable low-latency custom audio streaming
- Timing information (output and input latency, audio clock/system clock
- More integration with other specs such a MediaRecorder, Streams, getUserMedia.
Thank you for taking the time, Paul. Keep up the good work.