Yeah, that’s about it. I’ve trown buggy code at it, tell it to check it, says it’ll work just fine… scripts as well. You really can’t trust anything that that thing outputs and it’s more than 1 or 2 lines long (hello world examples excluded, they work just fine in most cases).
Have you looked at the project that spins up multiple LLM “identities” where they are “told” the issue to solve, one is asked to generate code for it, the others “critique” it, it generates new code based on the feedback, then it can automatically run it, if it fails it gets the error message so it can fix the issues, and only once it has generated code that works and is “accepted” by the other identities, it is given back to you
It sounds a bit silly, but it turns out to work quite well apparently, critiquing code is apparently easier than generating it, and iterating on code based on critiques and runtime feedback is much easier than producing correct code in one go
The software that implements multi agents called ChatDev, it’s significant more capable than one agent working alone. The ability to critique and fix bugs in the code in an iterative process gives a massive step up to the ability of the AI to program.
Granted it might still get in a loop between the programing and testing departments, but it’s a solid step in the right direction.
Here ya go: https://github.com/Significant-Gravitas/AutoGPT
There is a (non-meme) reason why Prompt Engineer is a real title these days. It takes a measure of skill to get the model to focus on and attempt to solve the right question. This becomes even more apparent if you try to generate a product description where a newb will get something filled with superlative lies and a pro will get something better than most human writers in the field can muster for a much lower cost per text (compared to professional writers, often on par or more expensive than content farms). AI is a great tool, but it’s neither the only tool (don’t hammer in screws) nor is it perfect. The best approach is to let the AI do the easy boiler plate 80% then add that human touch to the hard 20% and at most have the AI prepare the structure / stubs.
I’m totally willing to accept “the world is changing and new skills are necessary” but at the same time, are a prompt engineer’s skills transferrable across subject domains?
It feels to me like “prompt engineering” skills are just skills to compliment the expertise you already have. Like the skill of Google searching. Or learning to use a word processor. These are skills necessary in the world today, but almost nobody’s job is exclusively to Google, or use a word processor. In reality, you need to get something done with your tool, and you need to know shit about the domain you’re applying that tool to. You can be an excellent prompt engineer, and I guess an LLM will allow you to BS really well, but subject matter experts will see through the BS.
I know I’m not really strongly disagreeing, but I’m just pushing back on the idea of prompt engineer as a job (without any other expertise).
We’re not talking small organizations here, nor small projects. In those cases it’s true that you can’t “only” do prompt engineering but where I see it is in larger orgs where you bring into the team the know how about how to prompt efficiently, how to do refinement, where to do variable substitution and how, etc etc. The closest analogy is specific tech skills, like say DBs, for a small firm its just something one backend dude knows decently, at a large firm there are several DBAs and they help teams tackle complex DB questions. Same with say Search, first Solr and nowadays Elastic. Or for that matter Networks, in many cases there might be absolutely no one at the whole firm that knows anything more than the basics because you have another company doing it for you.