Showing non-latin-1 characters

How can I show non-latin-1 characters, such as ő? drawString seems to treat its argument as being latin-1 encoded.

So, does a lack of replies mean that openframeworks can only deal with the very limited latin-1 character set?

In this case, it was very confusing to call the last argument of loadFont() “bFullCharacterSet” … it would be nice if openframeworks could at least support most Latin letters with diacritics …

i think the problem is: to support more than 256 different characters OF should be able to handle unicode-strings. this is not the case at the moment (please correct me if i am wrong).

maybe OF should support std::wstring instead of std::string?

best
joerg

[quote author=“jroge”]i think the problem is: to support more than 256 different characters OF should be able to handle unicode-strings. this is not the case at the moment (please correct me if i am wrong).

maybe OF should support std::wstring instead of std::string?[/quote]
Or how about using UTF-8 instead? That would be more beginner friendly, because it would be possible to type the non-ascii characters directly into the source file (with a UTF-8 supporting editor).

i don’t know if there’s a standard multi-platform UTF-8 support in C++.
wstring is C+±standard but does not support UTF-8 as i read.

I’m sorry for the delay - I was looking into researching this a bit more

I have to admit that I am not sure what we can support, but it has to do with an ignorance of string vs/ wstring issues. I actually couldn’t type that character “ő” into my compilers (codewarrior/devc++) to see what it is – it comes out as o", so I wasn’t able to check right away.

we are dependent on several things:

a) currently we use string, which is not wide (obviously an issue)
b) ofDrawBitmapString uses glut’s internal bitmap type, which I don’t think supports very many characters at all
c) ofTrueTypeFont likely *does* support fuller character sets, but we currently limit the amount loaded in be between non-full character set (ie, english) or full character set (latin).

I am absolutely happy to make changes to allow for full character sets - but we may run into alot of issues along the way if we just jump from strings to wstrings. I would like to suggest as a start - just adding an some additional code to ofTrueTypeFont, like

ofLoadFont(“myFont.ttf”, OF_FONT_SETSIZE_REDUCED);
ofLoadFont(“myFont.ttf”, OF_FONT_SETSIZE_LATIN);
ofLoadFont(“myFont.ttf”, OF_FONT_SETSIZE_NONLATIN);

and then maybe:

ofTrueTypeFont::drawString( string s);
ofTrueTypeFont::drawString( wideString s);

in order to fix the problem, can you prepare a simple example code for us and also a font that includes the character you are intersted in?

thanks,
zach

Thanks for the reply! :smiley:

It is not surprising that you couldn’t type it in dev-c++ because dev-c++ seems to only work with 8-bit encodings. This character (ő) is part of the ISO-8859-2 (latin-2) character set, but not part of ISO-8859-1. I don’t know about dev-c++ (as I haven’t tested this), but most non-unicode programs on windows interpret the characters above 127 according to the setting in Control Panel -> Regional And Language Options -> Advanced tab.

But even we use an editor which allows to type this character into the C++ source file, we have to decide which encoding to use when saving it.

If we save it as latin-2, openFrameworks’s drawString() will still interpret it as latin-1, and it will come out as õ (o with a tilde above—a similar looking, but different letter. Some other letters in latin-2 have character codes which represent completely different looking letters in latin-1).

If we save it as UTF-8, it will come out as two symbols, because this letter is represented with two bytes in UTF-8. But UTF-8 has the advantage that pure ASCII is a subset of it, so C++ source files can be safely saved as UTF-8. It would be very beginner friendly if drawString() could interpret the C strings passed to it as UTF-8: it would be possible to directly type non-ASCII characters (in a string) into the source code.

I don’t know what is the solution for this (typing non-ASCII into the source file) when using wstring with UTF-16.

And UTF-8 can be stored inside a C++ string for as long as we don’t want to know exactly which bytes represent a single character. Anything in the ASCII-range will still be exactly one character.

we are dependent on several things:

a) currently we use string, which is not wide (obviously an issue)
b) ofDrawBitmapString uses glut’s internal bitmap type, which I don’t think supports very many characters at all
c) ofTrueTypeFont likely *does* support fuller character sets, but we currently limit the amount loaded in be between non-full character set (ie, english) or full character set (latin).

I am absolutely happy to make changes to allow for full character sets - but we may run into alot of issues along the way if we just jump from strings to wstrings. I would like to suggest as a start - just adding an some additional code to ofTrueTypeFont, like

ofLoadFont(“myFont.ttf”, OF_FONT_SETSIZE_REDUCED);
ofLoadFont(“myFont.ttf”, OF_FONT_SETSIZE_LATIN);
ofLoadFont(“myFont.ttf”, OF_FONT_SETSIZE_NONLATIN);

and then maybe:

ofTrueTypeFont::drawString( string s);
ofTrueTypeFont::drawString( wideString s);

in order to fix the problem, can you prepare a simple example code for us and also a font that includes the character you are intersted in?

I’ll gladly help with with everything I can do, but what kind of code example are you looking for?

About the font: All recent operating systems should come with pre-installed fonts which have these characters. But if you’re looking for .ttf files (not .otf, which ships with WinXP), the simplest solution is to get DejaVu Sans from http://dejavu.sf.net/

EDIT:

P.S. ő has the character code 0x0151 in Unicode.

ok cool - that info is pretty helpful to start to dig around.

I am wondering if something like the code here:

http://www.codeguru.com/cpp/misc/misc/m-…-hp/c10451/

might come in handy, for encoding a decoding utf-8 into a byte array? the issues with UTF-8 (I think) is that it can contain ‘/0’ null bytes, so we would have to convert UTF-8 into some passable form.

you’ll have to forgive my ignorance but I still can’t see exactly how to code a test of this w/ the compilers I am using – I will check VS, perhaps that’s the way I need to test. I will take a look at freeType and the font you sent, and I will poke around the other font rendering libraries to see how they deal w/ it.

some useful info I found,
http://eyegene.ophthy.med.umich.edu/unicode/

also maybe useful to look at libraries, like this one:
http://pngwriter.sourceforge.net/
that have freeType and UTF-8 support to see how they get that to work.

it’s going to take a few days to get to, but I will work on this problem - in the meantime, if anyone wants to jump in on this w/ info, code or ideas, please jump in!

thanks!!
zach

[quote author=“zach”]the issues with UTF-8 (I think) is that it can contain ‘/0’ null bytes, so we would have to convert UTF-8 into some passable form.
[/quote]

no, that’s one of the good things about UTF-8, that it doesn’t contain 0-bytes.
so if there’s no encoding it still can be handled with C-string functions. it will just show some additional bytes with values from 128-255.

best
joerg

snaps - that was a misread of this
http://mail.nl.linux.org/linux-utf8/200-…-00009.html

ok cool! now to figure out how to test it …

hi there!

someone has a solution for this?

loading my xml-file:

<TEXT_BLOCK_1>
I want to write äüö ! and éèê
</TEXT_BLOCK_1>

gives me:

grretings ascorbin

my current workaround (working on mac) :

coding this:

text_XML.setValue(“SONDERZEICHEN:letter_a”, “à=\xE0 á=\xE1 â=\xE2 ã=\xE3 ä=\xE4 æ=\xE6”);
text_XML.setValue(“SONDERZEICHEN:letter_o”, “ò=\xF2 ó=\xF3 ô=\xF4 õ=\xF5 ö=\xF6”);
text_XML.setValue(“SONDERZEICHEN:letter_u”, “ù=\xF9 ú=\xFA û=\xFB ü=\xFC”);
text_XML.setValue(“SONDERZEICHEN:letter_e”, “è=\xE8 é=\xE9 ê=\xEA ë=\xEB”);
text_XML.setValue(“SONDERZEICHEN:letter_i”, “ì=\xEC í=\xED î=\xEE ï=\xEF”);
text_XML.setValue(“SONDERZEICHEN:letter_c”, “ç=\xE7 Ç=\xC7”);
text_XML.saveFile(“text.xml”);

saves my file as:

à=‡ á=· â=‚ ã=„ ä=‰ æ=Ê ò=Ú ó=Û ô=Ù õ=ı ö=ˆ ù=˘ ú=˙ û=˚ ü=¸ è=Ë é=È ê=Í ë=Î ì=Ï í=Ì î=Ó ï=Ô ç=Á Ç=«

now i can copy&paste the shown characters… :?

or i can use:

à=&\#xE0; á=&\#xE1; â=&\#xE2; ã=&\#xE3; ä=&\#xE4; æ&\#xE6; ò=&\#xF2; ó=&\#xF3; ô=&\#xF4; õ=&\#xF5; ö=&\#xF6; ù=&\#xF9; ú=&\#xFA; û=&\#xFB; ü=&\#xFC; è=&\#xE8; é=&\#xE9; ê=&\#xEA; ë=&\#xEB; ì=&\#xEC; í=&\#xED; î=&\#xEE; ï=&\#xEF; ç=&\#xE7; Ç=&\#xC7;

codepage here -> http://htmlhelp.com/de/reference/html40/entities/latin1.html

Hi everyone.

I was wondering if anyone had seen this:

http://sourceforge.net/projects/quesoglc/

It’s about quesoglc, which is a “is a free implementation of the OpenGL Character Renderer. QuesoGLC is based on the FreeType library, provides Unicode support, and is designed to be easily ported to any platform that supports both FreeType and the OpenGL API.”

Basically, it enables you to print multi-language strings on the glut window using simple commands like

glcRenderString()

I will look more into it, hopefully, creating an addon out of it.

Check out this thread:
http://forum.openframeworks.cc/t/extending-of-with-flgt/2232/0

The solution posted uses FTGL for font rendering. It supports UTF8 and has improved kerning and hinting compared to ofxTrueTypeFont. It is not perfect, but at least it is a step ahead of the current OF implementation.

Paul

[quote author=“droozle”]Check out this thread:
http://forum.openframeworks.cc/t/extending-of-with-flgt/2232/0

The solution posted uses FTGL for font rendering. It supports UTF8 and has improved kerning and hinting compared to ofxTrueTypeFont. It is not perfect, but at least it is a step ahead of the current OF implementation.

Paul[/quote]

@droozle:
Thank you so much. You were way already ahead of me. You saved me plently of hours battling with fontconfig.

:slight_smile: We both should thank GameOver for his awesome work.

About UTF8-support: I recently tried using Arabic text and while the characters seemed to look OK, I found out that text is (of course) printed from left to right, instead of right to left. So we will probably have to hack FTGL a bit more to get full support for other languages than English and the likes.