approximate a sound from fft/mfcc?

I’m trying to figure out how to approximate a sound. I’m running it through fft and and getting an mfcc (using maximilian) and I have a good approximate frequency for each bin but I’m not sure how to put that data back together. I thought about doing some bluntly simple synthesis using a sine wave set at the approximate frequency of each bin and then summed, but it sounds really wrong. Is there a general/simple approach that someone knows of or has used before? Thanks!

that sounds to me like a really hard problem to solve and a really naive approach : ) that said take into account that the fft returns polar coordinates, which could have useful info that you are loosing when converted to just amplitude bins.

and also… ask kyle : )

It’s definitely a hard problem, which is why I said “approximate”, but maybe I should like “really roughly approximate” because I just want it to be identifiably similar in pitch (perceived), rather than an actual replication :slight_smile: I have it working to a satisfiable degree with just selecting the loudest bin from the fft, so if that’s where the “easily explainable code” road ends I’m ok with that, but I thought it might be fun to add together some of the louder bins to get slightly closer.

I’ll definitely ask kyle though :slight_smile:

btw, post if you find something else, seems really interesting : )

If you only use the module information of the FFT you are missing the phase information that is also necessary for signal reconstruction.

The FFT is a discrete transform, and using only module information can only works if the actual sinusoidal components of the sound exactly fit with the discrete frecuencies of the bins (n/Fs where Fs is the sampling frecuency and n goes from 0 to N/2-1, being N the number of points of the FFT).

Hence, every partial component of the sound will be falling in several bins, so you need to take the module and phase coefficients of all those contributing bins to recover that partial component.

If you want to sum the loudest components keeping phase coherence you can compute the true frequency of the bins with the phase difference between adjacent bins, as in this code:

for (k = 0; k <= fftFrameSize2; k++) {

/* de-interlace FFT buffer */
real = gFFTworksp[2*k];
imag = gFFTworksp[2*k+1];

/* compute magnitude and phase */
magn = 2.*sqrt(real*real + imag*imag);
phase = atan2(imag,real);

/* compute phase difference */
tmp = phase - gLastPhase[k];
gLastPhase[k] = phase;

/* subtract expected phase difference */
tmp -= (double)k*expct;

/* map delta phase into +/- Pi interval */
qpd = tmp/M_PI_VAL;
if (qpd >= 0) qpd += qpd&1;
else qpd -= qpd&1;
tmp -= M_PI_VAL*(double)qpd;

/* get deviation from bin frequency from the +/- Pi interval */
tmp = osamp*tmp/(2.*M_PI_VAL);

/* compute the k-th partials’ true frequency */
tmp = (double)k*freqPerBin + tmp*freqPerBin;

/* store magnitude and true frequency in analysis arrays */
gAnaMagn[k] = magn;
gAnaFreq[k] = tmp;


Now you can pick the bins with higher value in gAnaMagn[] and reconstruct the partials as the magnitude multiplied by the cos of the phase.

Hope it helps …