How far should a programming language aware diff go?(semanticdiff.com)

posted 4 months ago

DarkPlayer@lemmy.world

programming@programming.dev

22 commentshide report

Sort:

Hot Top Controversial New Old

[ - ]

mox@lemmy.sdf.org

33 points

4 months ago

Using a tool like this to hide sections of code presented for review places a lot of trust in the automation. If Mallory were to discover a blind spot in the semantic diff logic, she could slip in a small change for eventual use in an exploit, and it would never be seen by another human.

For example, consider this part of the exploit used in the recent xz backdoor. In case you don’t see the problem, here’s the fix.

Rather than hiding code from review, if a tool figured out a way to use semantic understanding to highlight code that might be overlooked by a human (and should therefore be reviewed more carefully), it could conceivably help find such things.

permalink

report

[ - ]

Bogasse@lemmy.ml

5 points

4 months ago

I don’t have an opinion on the topic but I see a blind spot in your argument, so I have to be that kind of person … 🥺

One could use the exact same example to argue that humans are very bad at parsing code (especially if whitespace kicks in). In that regard a tool that allows them to reason on a standardized representation of the AST can be a protection against a whole class of attacks.

permalink

report

parent

[ - ]

mox@lemmy.sdf.org

10 points

4 months ago

That’s not a blind spot in my comment. See my final paragraph.

It’s only one sentence. Maybe it was easy to miss. :)

permalink

report

parent

[ - ]

Solemn@lemmy.dbzer0.com

1 point

4 months ago

I like the idea, but I can’t come up with any method that won’t devolve into most reviewers only checking the highlighted parts tbh.

permalink

report

parent

[ - ]

Bogasse@lemmy.ml

1 point

4 months ago

Oh yeah, so I’m that other kind of guy 🥺

I kinda like your idea, but I think it can be difficult to detect some confusing situations. I think it would be a better idea, but I don’t think it’s a full replacement.

permalink

report

parent

[ - ]

FizzyOrange@programming.dev

0 points

4 months ago

If Mallory were to discover a blind spot in the semantic diff logic

This is a very big stretch IMO. That xz change wasn’t actually the exploit, it was just used to make the exploit less detectable. And it was added by people with commit access so it didn’t even have to go through code review.

On top of that, code review is not magic. It’s easy to get bugs past it hiding in plain sight (if that wasn’t the case Linux would be bug free!).

Can you think of an actually realistic example?

permalink

report

parent

[ - ]

floofloof@lemmy.ca

24 points

4 months ago

Interesting question. I’d be comfortable up to level 2 in this list, after which I want to have my eyes on the changes. Even where code is functionally or semantically equivalent, style can make a lot of difference for comprehension and maintainability.

permalink

report

[ - ]

Deebster@programming.dev

6 points

4 months ago

I’d agree, for the same reasons. Communicating intent is definitely one of the main things that separates mediocre from amazing developers (and software can’t check that).

It’s interesting to consider a tool that does all of levels 1-3 (and more) as a way to verify that a style refactoring hasn’t changed logic. I assume that’s what they meant when they wrote “modifications that were supposed to be no-ops but aren’t”.

permalink

report

parent

[ - ]

Lung@lemmy.world

15 points

4 months ago

I was into this until I realized that it’s not open source and not even available outside of vscode and GitHub web

permalink

report

[ - ]

atzanteol@sh.itjust.works

8 points

4 months ago

I’ll opt for “Level 0”.

Unless you’re just doing a diff for personal code or something you should be reviewing everything a developer has done. Yes whitespace changes too.

permalink

report

[ - ]

ZeldaFreak@lemmy.world

5 points

4 months ago

It really depends. Whitespaces are something most languages don’t care. The only people who care are enforcing style guides. Level 2 is the same but there it start to get more critical, because can you be sure that it makes no difference? Level 3 is critical. While it can help to eliminate code that probably didn’t caused the problem, it makes a difference. In code review this can make a difference. If a specific Hex number is well known, like of example 0x4711 and someone changes it to 18193 or even Binary, information to the programmer gets hidden. And even in style this makes a difference. When you have a flag Enum, the thing to use is binary or bit shift, because both is readable. Decimal is readable to a certain point. 4 bytes is fine but at the 5th I don’t know them by heart and can’t even spot them. Level 4 is irrelevant, when its on top of the file and bothering to hide it, is not necessary. Also this can be relevant. For example a while ago at our company we had code that needed to work with .NET 2 and we had parts with .NET 4 and at some point, new files had the using for LINQ, that isn’t available in .NET 2. This happened a lot.

The best solution is to have options and let the person using it decide. What I’m missing is to add my own ignore list. For example with our XML files, we have a date in them. The XML Class is badly written, because instead of having one date attribute for the first node, we have them on all. This is pretty irrelevant to show in a diff, because its not even used. Rewriting the Class is a big task, because its a core feature and can break everything, when one thing is missed.

permalink

report

Programming

!programming@programming.dev

Create post

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person’s post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you’re posting long videos try to add in some form of tldr for those who don’t want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

Community stats

3.1K
Monthly active users
1.8K
Posts
30K
Comments

Rules

Wormhole

Community stats

Community moderators