HTML Parser - gumbo for windows

Does anyone have a functioning HTML parser working on windows…regex just is not going to cut it (or at least me and regex). I did find this https://github.com/bakercp/ofxGumbo but it I cannot get it going on windows. Or maybe someone has some clues for this.

Hey @fresla ,
do you found any workaround?

I cannot make it work here, Windows neither macOS.
I also tried the thomasgeissl repo

What I would like to do is to get palettes from here:

https://color.adobe.com/explore

I don’t know if some other add-ons or oF core could be used to do this…

Any help is appreciated.

Hi!

I never used Gumbo and I don’t know all your needs, so I’ll probably go on a tangent here, but I hope it’s useful: what I do is use Firefox and Selenium, via a python script that I call on a thread inside OF.

Most websites no longer are basic HTML pages, so a lot of stuff needs to be dynamically retrieved and rendered, you have cookies and sessions to manage, etc… This approach also allows you to test things in a WYSIWYG way, makes complex navigation much easier way and you are able retrieve any data you want.

You are able to get elements via XPath, CSS selectors, class/id names and more, use forms, inputs and search. So very quickly you can use Firefox dev tools to get this elements and use them with Selenium to automate the fetching of the data. No need for regex.

Like I mentioned, I use a thread and a system call to run the script, which have all the data I need (sometimes is just a JSON, other times images and videos) and when the script is done I load them in OF.

Because I deploy this in final applications, I run the driver headless, so it’s totally invisible to the end user. Another plus. But you can also just set the window on another monitor and see what is going on.

In you case, you probably need for the page to load and render the Javascript. You might need to scroll to lazy load more color palettes. And so on. All this is quickly done with Selenium.

Hope it helps!

1 Like

Thanks @hubris ,
It looks very powerful!

And when you deploy the final user app,
how do you bundle into the OF_APP installer the Selenium driver?
That´s the only point that you need from the user when installing your app?

@moebiussurfing, it is indeed. I’ve been using this setup for two years, scrapping websites, dealing with cookies in different countries, downloading assets (on the Python I call FFMPEG) and it works like a charm.

On Windows you need: Firefox, Geckodriver, Python and Selenium (via pip). Don’t forget to add both Geckodriver and Python to PATH.

I’ve been doing it manually because my final users are art collectors, therefore it falls on my side to prepare a plug-and-play experience.

But I want to script this, and other configs/settings. I checked some installers and they suck. So I will probably go for Rubber Ducky: this way I can install stuff making sure all the check-boxes I need are on, change Windows settings without getting into PowerShell and so on.

1 Like

You don’t liked NSIS?

From what recall, it didn’t fit all my needs or some stuff I wanted to do was way too complicated. I now also need to have a multi OS solution, so Ducky seems like a good approach.