OpenFrameworks and speech synthesis (TTS) with FLITE

First of all, as is usual in forums, I’m sorry for my English. I’ll do my best.

I begin explaining how I managed (thanks to Arturo C) to use the FLITE library with OpenFrameworks.

If your are using Ubuntu (as I do), yo can install flite by the usual way

  
sudo apt-get install flite*  

(so you get the flite and the flite-dev).

Otherwise, you can download flite here: http://www.speech.cs.cmu.edu/flite/ and install it
Or you can just download the src and copy the “includes” folder to your /src folder in your project.

Then, an easy way to see if it is running would be

  
  
#include "ofMain.h"  
#include "ofAppGlutWindow.h"  
#include "testApp.h"  
#include "flite.h" // or #include "includes/flite.h"  
  
extern "C"  
{  
	voice* register_cmu_us_kal(); // you can change this to select a different voice  
}  
  
  
int main(void)  
{  
    ofSetFrameRate(25);  
    ofAppGlutWindow window;  
	ofSetupOpenGL(&window, 492, 369, OF_WINDOW); // this is useless for the example  
	ofRunApp( new testApp()); // this is useless for the example too  
	flite_init();  
	voice=register_cmu_us_kal(); // you can change this to select a different voice  
	flite_text_to_speech("hello world, how are you?",voice,"play");  
	return 0;  
}  
  
  

After doing this, you have to add these lines to your config.make in your project root folder

  
  
  
USER_CFLAGS = -I/usr/include/flite #I'm not sure about this one if you didn't install FLITE   
  
  
# USER_LDFLAGS allows to pass custom flags to the linker  
# for example libraries like:  
  
#USER_LDFLAGS = -lFestival -lestools -lestbase -leststring  
USER_LDFLAGS = -lflite_cmu_us_kal -lflite_cmu_time_awb -lflite_cmu_us_kal16 -lflite_cmu_us_awb -lflite_cmu_us_rms -lflite_cmu_us_slt -lflite_usenglish -lflite_cmulex -lflite  
  

So far, so good.
Then, I used a TTS class (created, again, by Arturo C) that converts the voice audio into an ofSoundBuffer object (created idem)*. The main point of this sound buffer object is that it split the soundwave into “chunks” and then, in addition to the ofSoundStream class, you can play the sound without interrupting the program (otherwise, the program stops until the sound is done playing).

* I’m not uploading those classes because I didn’t ask him yet.

NOW, MY QUESTION IS: Is there any way to change the SPEED of the voice? Because the voices are too fast. I couldn’t find any way of doing that in the documentation of FLITE itself. But maybe with the soundBuffer object? I tried by giving “less” information in the soundStream output (that would be like increasing the sample rate of the output but with the same amount of data: then, less frequency), and it works, but with a terrible audio clipping.

Any ideas? Thanks!

1 Like

Is it possible to share where you found Arturo’s TTS class? I’ve gotten TTS going with ofxSpeech, but would like to try this option out for a project. Thanks

milton, try the resample method in ofSoundBuffer with speed being a float <1 will make the sound slower, >1 will make it faster

that will also change the pitch of the voice if you want to avoid that you’ll need a more complex method called time stretching, a quick google returns this for example: http://www.surina.net/soundtouch/

but i think for what you need resampling should be enough

Thanks Arturo (again…) :slight_smile:

jhochenbaum: here you can download the classes. I might have broken something here and there, but they work:

http://miltonlaufer.com.ar/of/TTS-arturo-c.rar

Best! m

Thanks Milton. A couple quick questions as I try to get this running. Is it necessary to do the additions to config.make (which I don’t seem to see in my oF project)? I’ve downloaded flite1.4 (osx) and copied the include folder into the project, and which TTS sees (flite.h) without a problem.

I noticed Arturo’s TTS class take care of all the initialization and everything. However, when I add an instance of TTS I get a bunch of linker errors, i.e.

"Undefined symbols for architecture i386:
“_flite_text_to_wave”, referenced from:
TTS::convertToAudio(std::string, int, ofSoundBuffer&) in TTS.o
“_flite_init”, referenced from:
TTS::initialize() in TTS.o
“_register_cmu_time_awb”, referenced from:
_flite_set_voice_list in flite_voice_list.o
“_register_cmu_us_kal16”, referenced from:
_flite_set_voice_list in flite_voice_list.o
“_register_cmu_us_awb”, referenced from:
_flite_set_voice_list in flite_voice_list.o
“_register_cmu_us_kal”, referenced from:
_flite_set_voice_list in flite_voice_list.o
“_register_cmu_us_rms”, referenced from:
_flite_set_voice_list in flite_voice_list.o
TTS::initialize() in TTS.o
"

Does this have to do with rebuilding flite for my system? Thanks for your help

Did you try the example that I give in the main post and it worked? I guess that that would a good way of telling whether your problem is with the flite library or the TTS.

yah, just tried in a new project. If I copy the flite includes folder into the project, it sees it when I include flite.h, but it then still doesn’t know what “voice” is (unknown type name ‘voice’). Thanks for your help.

on a side note, I was able to build flite from source (./make) and generate some .wav files in terminal using flite directly (./bin/flight “say something” output.wav).

Let me know if you have any thoughts on getting oF talking with flite. Cheers!

I’ve managed to get TTS compiling and working, and now I’m trying to figure out how to tie in the ofSoundStream to actually play back the audio. I noticed the event that is broadcasted when text is added, and which passes the ofSoundBuffer object. I’ve made my test app a listener, and it now receives the ofSoundBuffer (well, technically that is a member inside TTSData object)-- any insight in choosing a channel to output the buffer on and actually getting it to play?

All best,

The TTS class notifies that event. You should capture that event (look the examples of customEvents) and keep the soundBuffer.
Then, you have to use the class ofSoundStream. When you setup up that object, it starts calling the function audioOut (you have to create it). So, you’ll have

  
  
//--------------------------------------------------------------  
void testApp::newSoundBuffer(const TTSData & tts){  
        mutex.lock();  
        soundBuffer = *tts.buffer;  
        mutex.unlock();  
}  
  
//--------------------------------------------------------------  
void testApp::audioOut(float * output, int buffersize, int nChannels, int deviceID, unsigned long long int tickCount){  
        mutex.lock();  
        soundBuffer.copyTo(output,buffersize,nChannels,position,true);  
        if(soundBuffer.size()>0){  
                position += buffersize;  
                position %= soundBuffer.bufferSize();  
        }  
        mutex.unlock();  
}  
  

(Again, all this was provided to me by Arturo.)

Cheers, I was pretty close to that, but was making a minor mistake when swapping the buffer to the output buffer. Working pretty well now!

However, one thing I’d like to do is alternate which output channel the current TTS buffer gets copied to-- so one line of text comes out channel 0 (speaker 1/L), the next channel 1 (speaker 2/R), etc, and wrapping depending on how many channels are present. I thought I could do this by changing the following in the tts callback:

  
soundBuffer = *tts.buffer;  

to

  
  
soundBuffer.setChannel(*tts.buffer, channelToAddTo);  
channelToAddTo++;  
channelToAddTo = channelToAddTo%numVoices;  
  

However, this seems to crash the audio callback when copying soundBuffer to the out buffer. Can you see why? Is it because only one channel is initialized/resized or something? The error is a EXC_BAD_ACCESS when copyTo gets called, although my the out buffer in the audio callback and the persistent ofSoundBuffer im using to copy the data from TTS to the output are both there… Thanks!

Making the persistant ofSoundBuffer (which receives the TTS data and copies it into the output buffer) as a pointer (using new) seemed to do the trick-- not exactly sure why making it as a normal stack object was causing the crash… so now the channel alternation is working fantastic, and it sounds great.

Setting loop to false however, it still seems to loop no matter what. I’m pretty sure this is because the data from TTS which is copied into the ofSoundBuffer, will keep on recopying to the output buffer stream. Should I just keep track of when the position (and last position) in the output buffer wraps, and call clear() on the ofSoundBuffer then? Thoughts appreciates guys. Cheers

Something like: in testApp::audioOut

  
  
mutex.lock();  
soundBuffer->copyTo(output,buffersize,nChannels,position,false);  
          
if(soundBuffer->size()>0){  
    lastPosition = position;  
    position += buffersize;  
    position %= soundBuffer->getNumFrames();  
}  
if (position < lastPosition) {  
    soundBuffer->clear();  
}  
mutex.unlock();  
  

I don’t have the code here, but to avoid the loop I did something very similar to your last if.

Yah, made sense in my head and seems to be working. Everything is almost there and i’ve made my ofSoundBuffer that fills with the TTS data in the TTS callback and passes it to the Output buffer in the audio callback a regular instance variable again (non pointer).

I think I’ve tracked down where the crash arises, not sure if you or Arturo can tell why this might be (it’s boggling me). In the TTS callback, if I use the regular buffer swap instead of the individual channel swap (see commented out below), it seems to work just fine. However, using setChannel instead, sometimes will work fine, and other times, it will cause an EXC_BAD_ACCESS, sometimes even locking up my entire system!

  
  
{        
        mutex.lock();  
        //soundBuffer.setChannel(*(tts.buffer), channelToAddTo); //what i want, but sometimes causes crash further down the line  
        soundBuffer = *tts.buffer; //seems to work okay all the time  
        mutex.unlock();  
        channelToAddTo++;  
        channelToAddTo = channelToAddTo%numVoices;  
    }  
  

When using setChannel and the EXC_BAD_ACCESS gets thrown, Xcodes debugger takes me to the method, and the following line (in bold). Is there any case I need to account for that could cause this?

  
  
void ofSoundBuffer::setChannel(const ofSoundBuffer & inBuffer, int targetChannel){  
	// resize ourself to match inBuffer  
	resize(inBuffer.getNumFrames()*channels);  
	// copy from inBuffer to targetChannel  
	float * bufferPtr = &this->buffer[targetChannel];  
	const float * inBufferPtr = &(inBuffer[0]);  
	for(unsigned int i=0;i<getNumFrames();i++){  
		***bufferPtr = *inBufferPtr;**  
		bufferPtr += channels;  
		// inBuffer.getNumChannels() is probably 1 but let's be safe  
		inBufferPtr += inBuffer.getNumChannels();   
	}  
}  
  

Actually, looking at the call trace, I think it actually goes further back into the TTS class, in TTS::threadedFunction, at the ofNotifyEvent-- i see the EXC_BAD_ACCESS thrown there… hmm

upon further testing, it sometimes crashes whether or not I use ofSoundBuffer::setChannel or just copy the buffer. Any idea why that might be?

I init my sound stream and TTS objects like so in testApp::setup()

  
  
numVoices = 2;  
    soundBuffer.setNumChannels(numVoices);  
    ofAddListener(ttsVoice.newSoundE, this, &testApp::soundGenerated);  
    soundStream.setup(this, numVoices, 0, 44100, 512, 4);  
    soundStream.setDeviceID(0);  
    soundStream.start();  
    ttsVoice.initialize();  
    ttsVoice.start();  
  

my tts callback is as so:

  
  
void testApp::soundGenerated(const TTSData& tts){  
   
     std::cout << "adding TTS buffer to: " << channelToAddTo << std::endl;  
     mutex.lock();  
     //soundBuffer.setChannel(*(tts.buffer), channelToAddTo);  
     soundBuffer = *tts.buffer;  
     mutex.unlock();  
     channelToAddTo++;  
     channelToAddTo = channelToAddTo%numVoices;  
}  
  

and finally, my main audio (out) callback:

  
  
void testApp::audioOut(float * output, int buffersize, int nChannels, int deviceID, unsigned long long int tickCount){  
      
    if (soundBuffer.getNumFrames() > 0)  
    {  
        mutex.lock();  
        soundBuffer.copyTo(output,buffersize,nChannels,position,false);  
          
        if(soundBuffer.size()>0){  
            lastPosition = position;  
            position += buffersize;  
            position %= soundBuffer.getNumFrames();  
        }  
          
        if (position < lastPosition) {  
            soundBuffer.clear();  
        }  
        mutex.unlock();  
    }  
}  
  

I honestly don’t see how that can be causing an exc_bad_acc on soundBuffer.copyTo(output,buffersize,nChannels,position,false);… any ideas?

it makes all the difference to be able to runtime harmonize flite voices, ie, change their speeds and use the samples in further dsp processes.

i had to make this run smooth for an upcoming project and thought to share my approach which is apparently thread safe and no mutexes involved,

here’s the process i’ve been using to get flite talking to my audio lib below, basically i ask flite to synthesize the text then hold the relevant info in a class which is later accessed through the main dsp func. you could strip this down and work with it , probably not the best approach but it’s been working rocksolid for me. would love to hear about possible optimizing etc…

hope this helps.
A

// s373AVSystem
/*
 *
 *  Created by andré sier on 20120530.
 *  Copyright 2012 s373.net/x. All rights reserved.
 *
 */


// fixed 120721
// watch out for ** Error in `./example~00system': double free or corruption (fasttop): 0x00002b740c0026e0 ***
// Aborted (core dumped)
// make: *** [run] Error 134

#pragma once

#include <iostream>
#include <stdio.h>

#include "s373AVBase.h"
#include "ofMain.h"

#include "flite.h"
#ifdef __cplusplus
extern "C"{
#endif
cst_voice* register_cmu_us_kal();
#ifdef __cplusplus
}
#endif



class s373AVSpeakThread : public ofThread{
public:
    string            systemcall;

    cst_voice         *flitevoice;
    int                flitesamplerate;

    int numbuffersamples;
    int runningnumsamples;

    string fullbufferstr,bufferstr;
    int bufferhead, maxbufferhead;
    int oldfullbuffersize;
    float bufferlocf, bufferspeedf;

    void setup(string scall, int nsamples, bool calcreceive=true){

        flite_init();

        flitevoice = register_cmu_us_kal();

        systemcall = scall;
        numbuffersamples=nsamples;
        runningnumsamples = 0;

        fullbufferstr = "";
        bufferstr = "";

        bufferhead=0;
        maxbufferhead = 1;
        oldfullbuffersize = numbuffersamples;

        bufferlocf = 0.0f;
        bufferspeedf = 0.10f;
        flitesamplerate = 8000; ///!

        for(int i=0; i<numbuffersamples;i++){
            bufferstr+='\0';
            fullbufferstr+='\0';
        }

        setSystemCall(scall);
    }

    const string  loadFile(const string & fn){

        while (isThreadRunning()) {
            ofSleepMillis(100);
        }

        ifstream myfile;
        myfile.open (ofToDataPath(fn).c_str());
        if(!myfile){
            cout << "error opening "<< fn << endl;
            return false;
        }
        cout << this << " opening "<< fn << endl;

        string line="";
        string fulltext="";

        int nlines=0;
        while(std::getline(myfile, line)){
            fulltext += line;
            nlines++;
        }

        cout << this << " speakloadfile nlines " << nlines << " nchars "<< fulltext.size() << endl;
        setSystemCall(fulltext);

        return fulltext;
    }



    void setSystemCall(const string & call){
        // stopThread();
        if(isThreadRunning())stopThread();
        systemcall = call;
        ofSleepMillis(250);
        if(isThreadRunning())stopThread();
        if(!isThreadRunning()) startThread();
    }

    void setSpeakSpeed(float s){
        // 8000 / 44100
        // 0,181405895692
        bufferspeedf = (s *  0.181405895692f);
    }

    float getBufferLocPercent(){
        return bufferlocf / (float)oldfullbuffersize;
    }


const string & readStr(int numsamptstoread){
        if(isThreadRunning()){
            return bufferstr;
        }

        if(oldfullbuffersize!=fullbufferstr.size()){
            oldfullbuffersize = fullbufferstr.size();
            maxbufferhead = oldfullbuffersize / numbuffersamples;
        }

        int maxlen = oldfullbuffersize-1;

        for(int i=0; i<numbuffersamples;i++){
            bufferlocf += bufferspeedf;
            if(bufferlocf>=maxlen){
                bufferlocf-=maxlen;
            }

            int idx = (int) bufferlocf;
            bufferstr[i] = fullbufferstr[idx];

        }

        return bufferstr;

}

const string & readBufferN(int nbuffer){
        if(isThreadRunning()){
            return bufferstr;
        }

        if(oldfullbuffersize!=fullbufferstr.size()){
            oldfullbuffersize = fullbufferstr.size();
            maxbufferhead = oldfullbuffersize / numbuffersamples;
        }

        if(nbuffer>=(maxbufferhead-1)){
            cout << this << " warning nbuffer > maxbufferhead "
            << nbuffer << " " << maxbufferhead << endl;

            nbuffer = (maxbufferhead-1);
        }


        int beginaddr = nbuffer * numbuffersamples;

        for(int i=0; i<numbuffersamples;i++){
            bufferstr[i] = fullbufferstr[beginaddr+i];
        }

        return bufferstr;

}




   void threadedFunction(){

           while (isThreadRunning()) {

            if(systemcall.size()<=1){
                    cout << this << " systemcall empty " << endl;
                    stopThread();
            }

            // typedef struct  cst_wave_struct {
            //     const char *type;
            //     int sample_rate;
            //     int num_samples;
            //     int num_channels;
            //     short *samples;
            // } cst_wave;

            cst_wave * wav = flite_text_to_wave(systemcall.c_str(),flitevoice);

            runningnumsamples = wav->num_samples;

            flitesamplerate = wav->sample_rate;

            cout << "samples " << runningnumsamples <<  " sr "<<flitesamplerate << endl;

            if(runningnumsamples!=numbuffersamples){
                fullbufferstr.resize(runningnumsamples,'\0');
                cout << "resized samples to " << runningnumsamples << endl;

            }

            for(int i=0; i<runningnumsamples; i++){ // short to char
                fullbufferstr[i] = (char) ofMap(wav->samples[i],-32767,32767,-127,127);

                // if(i%1000==0){
                //     cout << " sample " << i << " " << wav->samples[i] << " " << fullbufferstr[i] << endl;
                // }
            }

            stopThread();

        } // while thread running

    } // func


};


class s373AVSpeak : public s373AVBase {
public:

    s373AVSpeakThread speakthread;

    string            systemcall;

    float speakspeed;

    int numsamples;


    ~s373AVSpeak(){

    }

    void setup(string call, float speed=1,
        float vol = 1, float fb = 0.0f,
        int imode=0 ){

        s373AVBase::setup();

        systemcall = call;
        numsamples = buffersize;

        createreaderdata(numsamples);
        setupReader(440,vol,fb,imode); // depois da readerdata para aceder ao buffercomum
        setSpeed(speed);

        setSpeakSpeed(speed);

        om = FLITE;//SHELL;


        speakthread.setup(systemcall, numsamples);
         // starts thread, its super faster than terminal watchout
    }


    void setText(const string & t){
        systemcall=t;
        speakthread.setSystemCall(t);
    }

    void loadFile(const string & t){
        systemcall = speakthread.loadFile(t);
        speakthread.setSystemCall(systemcall);

    }



    void calcreaderdata(){

        const string & data = speakthread.readStr(buffersize);

        for(int i=0; i<buffersize;i++){
            readersdata[i] = ofMap(data[i],-127,127,-1,1);
        }

    }


    s373AVSpeak* setSpeakSpeed(float speed){
        speakspeed=speed;
        speakthread.setSpeakSpeed(speed);
        return this;
    }

    float getSpeakSpeed(){
        return speakspeed;
        // return speakthread.bufferspeedf;
    }


    /*virtual*/ s373AChannel * processBuffer(  float inmastervol=1.0f  ){

        // if(shellthread.hasinfo){
            calcreaderdata();
        // }

        return s373AVBase::processBuffer(inmastervol);

    }



    void draw(int x, int y, int nchars=10){

        // ie, karaoke!;)

        string info ="";// systemcall+"\n\n";

        int strsize = systemcall.size()-1;

        if(nchars >= strsize){
            nchars = strsize;
        }

        float bufferpercent = speakthread.getBufferLocPercent();

        int halfchars = nchars / 2;

        int halfstrpos = (int) (bufferpercent * (float)strsize );
        int strpos = halfstrpos - halfchars;
        if(strpos<0){ strpos = 0; }

        for(int i=0; i<nchars; i++){
            int idx = strpos + i;
            if(idx >= (strsize)){
                idx -= strsize;
            }

            info += systemcall[idx];

        }

        ofDrawBitmapString(info, x, y);


    }



};
1 Like

Hello @miltonlaufer @as1er an i ask ur help for flite C++
i am trying to use flite for test with c++ visual studio and i have included the include files needed
but i used the code above for test and gives me errors
the code is here{
#include “include/flite.h”

extern “C”
{
voice* register_cmu_us_kal(); // you can change this to select a different voice
}

int main(void)
{

flite_init();
voice = register_cmu_us_kal(); // you can change this to select a different voice  
flite_text_to_speech("hello world, how are you?", voice, "play");
return 0;

}
}

the errors are

Hi.

I managed to isolate the code, just missing windows libs, works nice to orchestrate choirs;)