variogr.am home | notes | writings | gallery

variogr.am latest

Audio playback and processing on the iPhone

Please see the update post for the latest information after reading this

I’ve been experimenting with sound synthesis on the iPhone using CoreAudio and music playback using the higher level Celestial framework.

  • We can send arbitrary samples out either the headphone jack (stereo) or the speakerphone out (mono) at 44kHz only if we kill the mediaserverd launchd process, which has the unfortunate effect of killing all other audio on the phone, including rings and calls. (see the update post)
  • We cannot yet receive samples coming in from the microphone (without going through the very high level file recording API)
  • We can play media files (movies and songs) just like the iPod portion without killing mediaserverd
  • We can access the iPod media library and perform queries for tracks.
  • We can play media files over the internet just like the web browser does, but with our own UI
  • The media file playback can seek, timestretch (but it sounds awful) and can detect detect the state of playback (which output, volume, etc.) It cannot play more than one thing at a time (no mixing.)
  • We have disassembled the mediaserver and higher level audio playback and recording APIs. They communicate over mach ports, and the mediaserverd sets hog mode on startup — only its PID can access audio while it is running. We are currently trying to figure out how to pipe arbitrary sample buffers into and out of the high level API. (see the update post)
  • The phone has enough scalar fixed point power to do a lot of interesting audio work — and with an always on internet connection, good battery, microphone input, and direct access to a users’ music library, there’s a lot of cool apps to be written.

[I’ve moved the rest of this post — click on to read more… ]

Please let me know if you can help — brian.whitman at variogr.am. Thanks to everyone on #iphone-uikit!

Summary

This stuff is out of date — Please see the update post for the latest information after reading this

As of Monday August 13

  • We can send arbitrary samples out either the headphone jack (stereo) or the speakerphone out (mono) at 44kHz only if we kill the mediaserverd launchd process, which has the unfortunate effect of killing all other audio on the phone, including rings and calls. (see the update post)
  • We cannot yet receive samples coming in from the microphone (without going through the very high level file recording API)
  • We can play media files (movies and songs) just like the iPod portion without killing mediaserverd
  • We can access the iPod media library and perform queries for tracks.
  • We can play media files over the internet just like the web browser does, but with our own UI
  • The media file playback can seek, timestretch (but it sounds awful) and can detect detect the state of playback (which output, volume, etc.) It cannot play more than one thing at a time (no mixing.)
  • We have disassembled the mediaserver and higher level audio playback and recording APIs. They communicate over mach ports, and the mediaserverd sets hog mode on startup — only its PID can access audio while it is running. We are currently trying to figure out how to pipe arbitrary sample buffers into and out of the high level API. (see the update post)
  • The phone has enough scalar fixed point power to do a lot of interesting audio work — and with an always on internet connection, good battery, microphone input, and direct access to a users’ music library, there’s a lot of cool apps to be written.

iPhone synthesis

I got my old friend CoreAudio working enough on my new telephone to make a beautiful sine wave out the headphone jack. If you’ve got one of these things and write audio software you probably want to check out the little zip below. It’s just a start, but at the rate people have been hacking on the iPhone, in a few hours I expect first a Colecovision emulator, then a clone of Absynth with multi-touch, then late August I’m looking at some poor student at Berklee porting CSound over. I’m sorry, whoever that is.

iPhone Sine Wave example (out of date, see the update post)

Instructions: I assume you have “broken jail” or whatever. scp or putfile the compiled app in the zip somewhere you can run code (use scp -p if you can) but also move or remove /System/Library/LaunchDaemons/coreaudiod.plist so that the mediaserverd doesn’t hog the audio output. (You can also move or rename mediaserverd temporarily and kill its process.) Restart the phone, then run the app. You should hear a beautiful sound over the headphone jack (only) — that’s the sound of fourteen kids in Norway rushing over to google ;303 resonant filter source code”

Update: thanks to NerveGas’s stellar work on the iPhone NES Emulator, he forced me to track down a huge bug in the calback. I assumed it was using Float32s, but hey, look at that — signed shorts. (O M A) Anyway, our sine wave friends sound a lot better — and — they’re in stereo. I’ve now contributed my immensely tiny gift to society– help building an audio backend that will surely be replaced in three days for a console emulator on a hacked cellular phone.

Audio processing / recording with CoreAudio

The latest iPhone Sine Wave example has code that *should* ring modulate the input coming in through the microphone (uncomment the lines and try it with the OS X 10.4 makefile.) However, on the iPhone, the call to start the input audio device hangs for a few seconds and then returns an IOKit “Device not ready” error. I’m looking into this.

The devices the phone reports are:

device id:257 name:Baseband Input DEFAULT INPUT
device id:261 name:WM8758 Output Device audio0 DEFAULT OUTPUT
device id:259 name:Baseband Output

The baseband output device is the speaker. You can route audio there as well as the headphones.

The only audio device that reports input channels (via kAudioDevicePropertyStreamConfiguration) is 257, the baseband input, who reports 1 channel.

Of course, we can record with Celestial (higher level API) - see Erica’s tutorial. But that’s AMR file export only and you can’t do anything in real time with it. Say we wanted a VoIP app, or, I don’t know, a guitar effects processor, or a real time music fingerprint detector.. we need lower level access.

Update: I tried shutting down every process that I thought would be using the baseband audio input — the phone app & etc. Still no go.

Hacking Celestial to do sample-level synthesis

Celestial is the higher level media manager on the phone. It’s dead simple to play back audio or video files in ObjC, four lines if you’re verbose like me:

AVController * av = [[AVController alloc] init];
AVItem * item = [[AVItem alloc] initWithPath:[t path] error:&error];
[av setCurrentItem:item preservingRate:NO];
BOOL ok = [av play:nil];

It works great — very useful, But that’s not very useful for synthesis. Even playing back one mp3 file after another has the dreaded 0.3 s gap of silence. But it plays without killing the mediaserver. Which means somewhere hidden in Celestial is the magic incantation to send samples to a !hog (owned by another process) buffer. Celestial takes care of the volume going down if the phone rings, following system volume settings, etc. All very useful stuff that any serious audio app will want. But no serious audio app is going to use this very high level file playback API.

I’ve been looking into Celestial and found some nice debug messages that confirm my suspicions:

NOTE TO FIG TEAM: Unexpectedly found that we already have
the hog mode for audio device %d.  (Process %d speaking.)
NOTE TO FIG TEAM: Process %d already has exclusive
access to audio device %d -- failing with error '!hog'.  (Process %d speaking.)

NOTE TO FIG TEAM: Tell %s how to set !hog on my own, thanks

And also some very familiar methods:

_FigSampleBufferGetAudioStreamPacketDescriptions
_FigSampleBufferSetDataBufferFromAudioBufferList
_FigAudioOutputUnitInitialize

If I have it right, Fig is the C-style Quicktimeish API, Celestial is the ObjC QTKit. Both Fig and Celestial call CoreAudio and AudioToolbox for decoding/encoding and hardware access. But they do it through But they are both started by a system-level soundserver (mediaserverd) and on initialization they !hog the hardware in both directions. And then there’s MeCCA, which appears to be a C++ lib for doing translation between CoreAudio and the telephone bits, including bluetooth, AMR recording, etc. Update: see below for more info on this…

I see two tacks to get sample-level access via Celestial.

1) is to figure out these Fig (QT) C methods. Some of them seem ripped right out of QuickTime with Fig at the front. Some of them seem brand new. But the fundamentals should be the same: init the subsystem (EnterMoviesOnThread?), set up your ASBD, start the audio output default unit with an IOProc, and fill up your buffers. Update: see below for more info on this…

2) is to trick the hell out of the thing. My first idea was to create a dummy .caf file of “infinite length” (you give it a -1 in the frame count header) but otherwise empty. Then tell Celestial to start playing it after I fill up one second’s worth of audio. Celestial should be playing it, slightly delayed, as long as I feed it more data. But here’s the kick — Celestial doesn’t want to play any uncompressed audio file! Unless I’m missing something, I cannot get Celestial to play AIFF, WAV or CAF. Does anyone know what’s up here? Update — I don’t know what I was doing wrong, but I am able to get Celestial to play aif and wav (44K stereo) files no problem. I am even able to get wavs to “loop” but damn if there isn’t a gap inbetween plays, even for a wave file.

Cozy with a disassembler: What Celestial is doing

I spend some time recently with a disassembler figuring out what exactly Celestial is doing to get hog mode and then write bytes to the IOProcs. Much, much love to the fiveforty wiki guys for their macho ldw and the python “aqualung” decompiler. Here’s what I learned:

  • mediaserverd starts up Celestial’s FigMediaServer, which eventually gets around to calling FigAudioOutputUnitStart. In a non-exported function figHDAC_Start(), before calling AudioOutputUnitStart, lots of mach port creation is done. I am able to query the service port for “com.apple.audio.systemsoundserver2″ and “com.apple.fig.movies” using task_for_pid and bootstrap_lookup(). I’m assuming what happens is a call like [AVController play] sends an RPC to the version of Fig that was linked into mediaserverd over the mach port.

    Here’s some mach source:

    // Get the task and port for the systemsoundserver.
    err = task_for_pid(mach_task_self(), mediaserverdpid, &mediaserverdtask);
    int bootstrapPort = 0;
    err = task_get_bootstrap_port(mach_task_self(), &bootstrapPort);
    err = bootstrap_look_up(bootstrapPort, "com.apple.audio.systemsoundserver2", &soundPort);
    

    But now what? Can I call AudioDeviceStart through this?

  • FigAudioOutputUnitStart in Celestial is the only place where hog mode is set. There’s a getpid() and then a AudioDeviceSetProperty with the “oink” property code for hog mode. So that means we can’t start code that again on our own. I was so resourcefully naive that I tried to replace the opcode for moving the pid into the AudioDeviceSetProperty stack with the MOVL R3, 0xFFFFFFFF instruction (which would turn off the hog) but no go. (Stunningly, the phone still plays sound and still has hog mode — I’m probably missing something.)

I’m still poking around. I want to see how MusicPlayer or anybody else sets up Celestial to play back audio.

AudioQueue and libIOAudio2User

NerveGas told me about libIOAudio2User.dylib, which for some reason I never checked out. It’s almost all in C++ (class IOAudio2Transformer) except for some machport stuff (hmm…) (The iphone toolchain is pretty awful about C++ at the moment, I can’t get vector to work, for starters.) I admit to being confused as to what Audio2Transformer does. Only MeCCA and AudioToolbox call it. It’s got functions like “EnqueueBuffer” and “GetNumberAvailableInputBuffers.” The name sounds promising (audio to userland?) but I don’t know what it could be for.

AudioQueue and the private-to-AudioToolbox AQClient and AQServer is another good looking one. AudioToolbox has public APIs like “AudioQueueEnqueueBuffer,” _AudioQueueServerInit and _AudioQueueStart. There’s a mach service probably controlled by mediaserverd called “com.apple.audio.AudioQueueServer.” I need to look into the disassembly for these. Celestial links AudioQueue but not any of the IOAudio2Transformer stuff. Update AudioQueue was our savior afterall — see the update post.

iPod music playback / library access

Sine waves are cool and all, but I know what you really want is to play a random song out of your iPod library on your phone at 1.5x. By using the very high level media playback API Apple calls Celestial, you can do that now:

iPhone Music Playback

Not surprisingly, Celestial is something very close to QTKit. Make sure coreaudiod is running this time!

Super Shuffle 2007


Did you ever want to hear 4 random seconds of every song on your iPod in shuffle at random speeds FOREVER?

Super Shuffle 2007 iPhone App

The source for this app is inside the .app bundle. (Can I meekly suggest that all iPhone app devs do this? There’s so much we’re all learning that always bundling the source will speed up everything for everyone, and I promise your first early stage attempts are not “critical IP.”) This is hilarious, but like QTKit, there’s that 0.5s of silence in between each track. There’s got to be a way to force AVController to load and play two tracks. I’ve tried two processes and it caught me.

Caution: on my walk home just now after listening to 15 minutes of this I (a) wanted to go insane and (b) had to restart my iPhone.. it hung up hard right in the middle of an excellent rendition of “I Can’t Go For That.”

Update: the super Shuffle app now also shows you how to detect the presence of the headphones using AVSystemController’s notification process. It’ll show you the current volume and playback channel on the main screen (it only updates every four seconds, just FYI)

Math & DSP power

I spent a while looking for any hint of an Accelerate or MKL/IPP style library in the OS, can’t find anything as of yet. There’s some FFT routines in AudioToolbox (for decoding I imagine) but nothing API-accessible. The chip is a 667 MHz ARM, so even scalar code should be relatively OK. First compilations of FFTW are pretty disastrous, though:

mbp-2.0# ./bench 1024
Problem: 1024, setup: 116.92 ms, time: 34.12 us, ``mflops'': 1500.7

iphone# ./bench 1024
Problem: 1024, setup: 3.60 s, time: 5.38 ms, ``mflops'': 9.5256

I found a fixed-point implementation tuned for the ARM but it only does 64 and 80 points. It’s significantly faster at 64 on the phone than fftw…

iphone# ./bench 64
Problem: 64, setup: 82.80 ms, time: 166.50 us, ``mflops'': 11.532

iphone# ./ffttest-arm
Timing FFT speed... 18.97 us per 64-point FFT, or 6.75 Mbps with QPSK,

So.. with the fixed point FFT example and a 64 “bin” real FFT… we can do about 80 FFTs in real time (assuming nothing else in running.) With the slower float FFT at 1024 points we can do 4 or so FFTs in real time. Probably should update the arm-fft code to do 1024 points.

What I think we know

  • Input sample rate seems to be limited to 8000Hz. (Trying to set it at anything higher sets it back to 8K. Uggggllllyyyy)
  • No audiounits — except for mentions of the DefaultOutputUnit in Fig. No MusicPlayer API, no ScheduledSoundPlayer (but there’s a ScheduledFilePlayer in Fig and a ScheduledSlicePlayer in AudioToolbox), no AUTimePitch, no DefaultOutputUnit, no AUGraphs, etc. Which also probably means no MovieAudioExtraction either, although AudioToolbox is there and we probably still have the audioconversion API if we need it.
  • Celestial acts like QTKit but does not seem to want to play two files at once from the same process. The time stretcher sounds like QuickTime’s– probably just changing the mp3 frame playback speed.
  • Celestial also has a recording feature, which is how Erica at TUAW made her Voice Recorder app. It by default creates amr files which Celestial can easily play back. This api screams “removed feature at the last minute”
  • MusicLibrary has a nice MLQuery api for pulling down tracks. Unfortunately, the predicates in the query are under this hidden “property->value” API and I don’t know any properties. I’ve tried the EyeTunes 4-char-int properties like ‘pArt’ but they don’t work. Creating a default query just by MLQuery *q = [[MLQuery alloc] init] returns all tracks (including video!) There’s a big data section in the MusicLibrary lib for propertyStringDescriptions but I can’t seem to extract it.

Unanswered Questions

If you know this stuff I could use your help on the following — please get in touch at brian.whitman at variogr.am

  • Input code? Why is starting the default (and only) input device hanging? (I should check the AVRecorder disassembly)
  • Any clues on a vectorized math lib for ARMs? Anything in the video portion?
  • Why is the default audio output device !hog’ged?
  • Can we figure out how to write to Celestial so we don’t have to kill it? (see the update post)
  • What’s up with the default stream ASBD on the default output device? L channel is always hissing, although it’s reporting 2 channel interleaved. update - I uploaded a new version with an OSX makefile. Problem is not on normal OSX. (see above, YAY!)
  • Can you play more than one AVItem in Celestial at once? I would think so — the ringtone crossfades with a music track… (update: I tried this a few times. I don’t think there’s ever a moment where the ringtone is playing while other audio is playing. I could be wrong though.)
  • MLPredicate properties — please? Anyone? (I’m pretty sure I found these in my disassembly but I need to double check)
  • Why can’t Celestial play uncompressed files? It does - just not caf files

Comments are closed.