Avatar

webbureaucrat

webbureaucrat@programming.dev
Joined
1 posts • 6 comments
Direct message

I wish I could be more help. My advice is you need a better grade of general purpose HTML parsing library, possibly even a browser emulator, rather than a lib specifically for XHTML 1.0 transitional or a converter.

In my Python web automation course in college we used BeautifulSoup and I think maybe mechanize. I think either of those would probably be robust enough to do what you’re trying to do, but if it has to be Rust I’m not sure what’s out there. Otherwise you could upgrade to Selenium or something.

Or if you’re trying to do something fairly simple and you don’t need to parse the whole thing but it’s still a little too complex for plain old regular expressions, you might be able to build a simple parser with the rust pest crate, but of course I would absolutely not recommend trying to build your own full-featured XHTML parser.

permalink
report
reply

I’ll give my usual contribution to RSS feed discourse, which is that, news flash! RSS feeds support video!

It drives me crazy when podcasters are like, “thanks for listening to our audio podcasts. We also have a video feed for our YouTube subscribers.” Just let me have the video in PocketCasts please!

permalink
report
reply

I live on Termux. I love being able to write software in my own custom emacs configuration from my phone.

permalink
report
reply

I’d think they’d get it back by not having to share their ad rev with Google. There’s something to be said for the economies of scale Google benefits from but with cloud services that’s not as relevant as it was.

permalink
report
parent
reply

I’m working on a fault-tolerant JSON5 parsing library in the service of a JSON5-to-JSON and JSON5-to-YAML transpiler.

My goal is to never write any more YAML ever again.

permalink
report
reply

I am a YAML hater. The biggest thing about YAML that keeps biting me recently is this:

script:
    - echo "a key: a value"

throws parse errors because of the colon, even though it is inside a quoted string.

But there are lots of reasons to hate YAML.

Honestly, an underrated one to me is I just hate significant whitespace. I don’t want to use any language that supports it.

permalink
report
parent
reply