lemm.ee

Local All Communities Log in Sign up

Local All Communities

298

The best answer on StackOverflow: Using RegEx to parse HTML(stackoverflow.com)

posted 27 days ago

by

BaumGeist@lemmy.ml

in

programmerhumor@lemmy.ml

Sort:

Hot Top Controversial New Old

[ +- ]

schnurrito@discuss.tchncs.de

97 points

27 days ago

no, this is one of the worst answers on Stack Overflow

OP had a specific question to capture opening tags. The thing OP asked about can be done with regular expressions. It is true that arbitrarily nested languages like HTML cannot generally be parsed with regular expressions, but that is not what OP asked about.

report

reply

[ +- ]

0 points

27 days ago

It can be done with simple regex of the kind proposed in various answers there iff the html is known to be limited to the subset of html where that sort of thing can easily be made to work. The question does not tell us whether or not that is the case, so everyone is free to make their own assumptions and argue as if they know what’s going on.

report

reply

[ +- ]

moriquende@lemmy.world

7 points

26 days ago

It can’t be done, as an opening tag in html can contain anything in its attributes, even JavaScript (e.g. onclick handler).

report

reply

[ +- ]

schnurrito@discuss.tchncs.de

-1 points

26 days ago

??? Non sequitur

report

reply

[ +- ]

moriquende@lemmy.world

5 points

26 days ago

You can’t parse every html opening tag with regex, because a html opening tag doesn’t have a set structure. How would you match, with regex, this opening tag? <mytag myattribute="<value of \"myattribute\">" >

report

reply

Show more comments

Show more comments

[ +- ]

fartsparkles@sh.itjust.works

86 points

27 days ago

This is StackOverflow after all. Your question is wrong. Your problem is wrong. You are wrong. I am right. Thread locked. Go read this other post that is totally unrelated to your problem I’ve decided isn’t the problem you’re facing because. I. Am. Right.

report

reply

[ +- ]

Quetzalcutlass@lemmy.world

15 points

26 days ago

*

Could be worse. At least it’s not Microsoft’s support forums:

Hey, I see you’re having problems with <copy-paste key words from OP>. Try the following and see if it fixes your issue.

Open a command prompt and enter ”sfc /scannow".

I hope this helps!

(Reply marked as solution, thread closed.)

report

reply

[ +- ]

deadbeef79000@lemmy.nz

4 points

22 days ago

*

I have X years experience with {keyword salad}.

Can you confirm {details already in the opening post}?

report

reply

[ +- ]

4 points

26 days ago

answers.mirosoft.com is the worst. learn.microsoft.com can be decent at times though

report

reply

[ +- ]

Skull giver@popplesburger.hilciferous.nl

6 points

26 days ago

Deleted by creator

report

reply

Show more comments

Show more comments

[ +- ]

JackbyDev@programming.dev

3 points

26 days ago

I had a decade old question marked as a duplicate and downvoted three times after years no no activity. SE is such a joke nowadays.

report

reply

[ +- ]

errer@lemmy.world

17 points

26 days ago

That’s why LLMs are so infuriatingly stubborn, they’re trained on these keyboard warriors

report

reply

[ +- ]

lnxtx@feddit.nl

2 points

27 days ago

xpath <3

report

reply

[ +- ]

3 points

27 days ago

Using a regex on html is like eating wild mushrooms that you found in the woods. There are times where it’s appropriate and safe, other times where it’s completely insane and possibly deadly, and it takes considerable experience to know how to tell the difference.

report

reply

[ +- ]

Nariom@lemmy.world

8 points

26 days ago

I once applied to an internship for a company doing job offers aggregation. During the interview they explained to me that the core of what they did was parsing (partial) html with regex. When I asked why they wouldn’t develop a custom parser, they replied to me that they were working on it, but that the internship wouldn’t focus on that. I was not disappointed when it didn’t get the job.

report

reply

[ +- ]

communism@lemmy.ml

32 points

26 days ago

OP isn’t trying to parse HTML though… they are trying to detect opening xml tags. Which seems quite achievable with regex.

report

reply

[ +- ]

winterayars@sh.itjust.works

1 point

26 days ago

*

It’s still actually pretty sketchy, depending on exactly what you want to do. Strict regex still won’t be able to match correctly if you want to match what an HTML parser considers the opening tag, though fancier regex will. If you’re just looking for the tags in the HTML document as a flat document it’s doable, though. (Mostly.)

report

reply

Programmer Humor

!programmerhumor@lemmy.ml

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

Posts must be relevant to programming, programmers, or computer science.
No NSFW content.
Jokes must be in good taste. No hate speech, bigotry, etc.

Community stats

5.2K
Monthly active users
1.5K
Posts
35K
Comments

Community moderators

AgreeableLandscape@lemmy.ml
cat_programmer@lemmy.ml

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)