The first programs were written in binary/hexadecimal, and only later did we invent coding languages to convert between human readable code and binary machine code.

So why can’t we just do the same thing in reverse? I hear a lot about devices from audio streaming to footware rendered useless by abandonware. Couldn’t a very smart person (or AI) just take the existing program and turn it into code?

73 points
*

It is not. idk who told you it was.

Disassembling an executable is trivial to do. Everything is open source if you can read assembly. Obfuscation be damned.

permalink
report
reply
16 points

The hard part isn’t reading assembly. The hard part is figuring out why it’s doing what it’s doing with no comments or function names or anything useful to help.

This is like saying if you can read English you can understand an advanced math or physics paper written in English without having any knowledge or context of those subjects.

permalink
report
parent
reply
15 points

I’ve used a decompiler to peek at the source code of an app written in Visual Basic I wanted to recreate as a browser addon. It was mostly successful but some variable and function names were messed up.

permalink
report
parent
reply
30 points

Variable names, class names, package structure, method names, etc. won’t normally be maintained in the disassembled code. They are meaningless to the CPU, and just a series of memory addresses. In cases where you have method names being mentioned, it’s likely a syscall, and it’s calling a method from an existing library. I’m not familiar with VB, but at least in .Net and .Net Framework, this would be something like the System.Collections.Generic providing the implementation for List<string> and when .Sort() is called, it makes the syscall to that compiled .dll.

permalink
report
parent
reply
-21 points

You could chuck it at an AI to reverse compile it into something readable.

permalink
report
parent
reply
57 points

Well decompiling is only one step in the reverse engineering process. I would recommend taking a look at the Legend of Zelda: Ocarina of Time decompile projects. They reversed engineered the whole thing, which took years and was a team effort.

In the end they got perfectly readable source code, fully documented. And the most amazing thing is, when compiled with the right compiler and right flags, it recreates the original rom perfectly.

I would also recommend a YouTuber called Kaze. He’s been working on Mario 64 for years, re-writing large parts of the engine to get some pretty cool stuff going.

permalink
report
parent
reply
2 points

Assuming you have all the source code… it is possible. It’s usually a huge pain in the ass though and software is so complicated that it’s extremely difficult to get anything useful.

permalink
report
reply
2 points

So after reading through the answers…could compilation be used as a form of encryption?

permalink
report
reply
5 points

No, because it would not be possible to retrieve the original information.

permalink
report
parent
reply
3 points

Hardest part is its a lossy ‘encoder’ which the parts lost are the human readable part

permalink
report
parent
reply
10 points
*

It’s not impossible, just expensive. How much money do you want to spend?

To your point, the programs are already in code. Machine code. Taking random machine code, and making a human readable, that’s the trick

permalink
report
reply
11 points

As others have mentioned, it’s possible but very complicated. Decompilers produce code that isn’t very readable for humans.

I am indeed awaiting the big news headlines that will for some reason catch everyone by surprise when a LLM comes along that’s trained to “translate” machine code into a nice easily-comprehensible high-level programming language. It’s going to be a really big development, even though it doesn’t make programs legally “open source” it’ll make it all source available.

permalink
report
reply
4 points

I am indeed awaiting the big news headlines that will for some reason catch everyone by surprise when a LLM comes along that’s trained to “translate” machine code into a nice easily-comprehensible high-level programming language.

Another commenter dismissed the idea outright. WTF… What is implausible about an LLM that takes decompiled code, deals with the obfuscating bs, recognizes known libraries, and organizes the remaining code. That will totally happen, if it hasn’t already been done.

permalink
report
parent
reply
2 points

It’s easy to say that we should throw AI at a problem and in a few years it will solve it, but most of the time it doesn’t actually work that way. If you think about the Turing Test itself, where the history goes back to the 1950s, how many decades did it take for us to get to anything that could reasonably come close to passing it? So anytime you think to yourself that one of these days AI is going to get there, remember that one of these days might actually be a half century from now.

The other aspect to this challenge, or rather specifically with regards to this challenge, is that the setup involves humans organizing code in a certain way according to some kind of reasoning that the authors know about, and then that being compiled away, and then another computer program trying to get back what the original authors might have been thinking when they designed the thing originally. That’s a steep hill to climb. Can it be done on a small scale? It certainly can. On a large scale? Don’t hold your breath.

permalink
report
parent
reply
3 points

There’s a lot of outright rejection of the possibilities of AI these days, I think because it’s turning out to be so capable. People are getting frightened of it and so jump to denial as a coping mechanism.

I recalled reading about an LLM that had been developed just a couple of weeks ago for translating source code into intermediate representations (a step along the way to full compilation) and when I went hunting for a reference to refresh my memory I found this article from March about exactly what’s being discussed here - an LLM that translates assembly language into high-level source code. Looks like this one’s just a proof of concept rather than something highly practical, but prove the concept it does.

I wonder if there are research teams out there sitting on more advanced models right now, fretting about how big a bombshell it’ll be when this gets out.

permalink
report
parent
reply
5 points

I have a bunch of 16-bit applications that I would love to be able to do that with. Mostly dos and windows 3.1 games.

permalink
report
parent
reply
4 points

You might actually consider dipping your toes into trying to learn how to analyze/reverse those yourself. Relatively speaking, software that old can sometimes be easier to reverse.

permalink
report
parent
reply
2 points

Yeah I’m not unfamiliar (still a novice though) with the process and mostly used it circumvent something obnoxious or tweak save files. Just takes a lot of effort when you’re just looking to spend a couple hours playing a game before bed.

I’m currently experiencing a frustrating bug in dolphin and I’m being tempted to learn enough about it. My MIPS buddy won’t help me with it because he thinks it’s a waste of time.

I like LLMs for the time it saves you to do something laborious or mundane. One day we’ll have general ai fingers crossed

~Love the toes pun

permalink
report
parent
reply

No Stupid Questions

!nostupidquestions@lemmy.world

Create post

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others’ questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)


Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.



Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That’s it.



Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it’s in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.



Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



Rule 10- Majority of bots aren't allowed to participate here.

Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

Community stats

  • 9K

    Monthly active users

  • 3K

    Posts

  • 119K

    Comments