empirical mode decomposition

Hi everyone,

I just posted some code from a project I worked on this summer:

http://code.google.com/p/realtime-emd/

You won’t find any OF code on the site, but I wanted to share this because OF was key to the development.

The final “core” code was surprisingly straightforward 150 lines

http://code.google.com/p/realtime-emd/source/browse/trunk/emd/EmpiricalModeDecomposition.c

But it wasn’t obvious that the implementation would look like that when I started staring at pages like this:

http://en.wikipedia.org/wiki/Hilbert%E2%80%93Huang-transform

The basic idea is to take a signal (an audio input, for example) and decompose it into its components. A fourier transform does something similar when it breaks a sound into its different frequency bins. Empirical mode decomposition (EMD) takes this to another level by breaking a sound into its noisy parts, pitched parts, frequency modulation envelopes, etc. At least, that’s how the theory goes – getting all of that stuff in practice requires a lot of fine-tuning!

The final result here was an external for Max/MSP. Instead of starting by writing a Max/MSP external from the ground up, I wrote a C++ implementation using OF. OF made it easy to visualize things like this:


http://www.flickr.com/photos/kylemcdonald/3697376314/in/photostream/


http://www.flickr.com/photos/kylemcdonald/3696568349/in/photostream/

And also to test the algorithm with microphone input instead of synthesized signals.

Other implementations of this algorithm exist primarily for Matlab in forms that could never possibly run in realtime. With OF I was able to start with a basic C++ outline for the idea, fill it out and test it, experiment with optimizations, and finally convert everything to C and wrap it with a Max external. Way easier than writing straight for Max and restarting Max incessantly!

Hi Kyle
this seems really interesting but i’m kinda lost in what sort of applications it is good for.
can it be used for example to isolate a specific element in a sound stream? for example a voice or a piano? would that be a suitable use for this algorithm?
what are some common uses for this technique?

thanks

rui

If well tuned, this could be used to isolate a noisy signal from pitched signals without removing the high frequency components of the pitched signals. I’m not sure if you could get down to the “voice” or “piano” distinction. You could also use this to isolate lower frequency control parameters. For example, if you have an AM wave you’re modulating, you should be able to recover a wave that describes the modulation envelope. I’ve heard you can use it for FM and other demodulation as well, but I haven’t been able to use it like that.

hi pelintra,

the problem with separating ‘voice’ or ‘piano’ is that what makes ‘voice’ and ‘piano’ unique and seperable is largely psychological and horribly complicated.

a wonderful source book on this is Bregman, A. S. (1990) Auditory scene analysis. MIT Press: Cambridge, MA.

cheers
d

Here’s an example of how similar a piano and voice can be.

http://www.youtube.com/watch?v=muCPjK4nGY4