Hi,
there seems to be a lot of confusion within this topic.
I checked out the code posted by Yonas and there are several things that are interfering with the correct detection of the pitch. I’ll try to correct those later.
It’s main problem has to do with understanding pitch and frecuency, as Kyle explained very well, but also how the bins are distributed in relation to the notes.
The pitches (notes) have their frequencies logarithmically spaced, while the fft haves its bins linearly spaced. Although converting from linear to log distribution is trivial, you cannot detect pitch just by getting the higher valued bin, yet it might work for single note sounds, but with lots of false matches.
Real-world sounds are quite complex, composed of several overlapping sinewaves at different harmonics (multiples of the base frequency). In several cases you might get very high valued peak for some harmonics which can be detected as a different “note”.
Just to get a single pitched sound you can detect the peaks of the fft and refer to the lowest bin of those, probably ignoring the first 2 or 3 bins might help.
Keep in mind that there are several other algorithms that might be much more efficient and precise at getting a single pitch from a sound source. just google “pitch detection algorithm”.
Polyphonic pitch detection is just another thing; getting a reliable result is really complex. Consider that just a few years ago, Melodyne, a very well known pitch correcting software in the music production business, introduced the polyphonic pitch detection feature and it was really groundbreaking. It really yields some impressive results. Google it and check it.
Another issue with ffts and pitch, due to the linear/log spacing, is that at low pitched notes two or more semitones can fall into the same bin, making pitch detection not posible. The solution to this is to process an fft with more samples, hence bins, so each bin has a narrower bandwidth; yet the processing time is higher and the temporal resolution of the fft is lower, which is mainly noted at higher frequencies.
A solution to this is to use a constant-q transform, in which the bins of an fft are weighted-averaged in a certain way to produce a transform where the bins correspond exactly to a note of the musical scale.
Some years ago, while learning processing, i decided to implement a constantQ method for processing (at that time the only piece of code I found was in matlab). I published it at google code but I haven’t updated it since then.
http://code.google.com/p/p5cq/
although it works and the constant Q algorithm is correctly implemented, there are several flaws regarding the visualization (I just checked it and there’s a vertical offset on the “pianoroll” grid).
I’ll resurrect this project and port it into OF.
I hope this is of any help.
If my writing is somewhat confusing is because I’m tired.lol.
Cheers!