Parsing a HTML file into structs/classes

Hello all,

I was wondering is there any library that can parse HTML into structs/classes with the parent, child, attributes and content of the inputted HTML tag. I’m having a boring time rewriting my own and I need extract information from wikipedia for a new website I’m working on. I know it easy to do in JavaScript and JQuery, but I’m more familiar with openFrameWorks and I want system that will update when page is updated, and I plan to put the extracted data into MySQL… So I’m unsure how to handle that with PHP and JavaScript.

This is an example of the kind of struct I want, however I’m not asking anyone to write code for me! I’m just wondering if it already been written.

 struct HTML
 	std::string tag;
 	std::string attibutes;
 	std::string contents;
 	HTML * parent;

All the Best

Never mind, I found myHTML and I’m going with that, I might make a wrapper for openFrameWorks depending on far I get!

All the Best

1 Like

Hey @zelm, did you make it work?
I tried with ofxGumbo but I am stuck with that pure C integration errors …