Subverting Betteridge’s law of headlines. Yes.
To me this seems obvious, the models are trained off of GitHub as a whole. Most code on GitHub either is unsecure, or it was written without needing to be secure.
I’m already getting pull requests from juniors trying to sneak in AI generated code without actually reading it.
It seemed obvious to me as well, but studies like this are important, so that I have something to point to other than vibes.
Most code on GitHub either is unsecure, or it was written without needing to be secure.
That is a bit of a stretch imho. There are myriads of open source projects hosted on github that do need to be secure in the context where they are used. I am curious how you came to that conclusion.
I’m already getting pull requests from juniors trying to sneak in AI generated code without actually reading it.
That is worrysome though. I assume these people have had some background/education in the field before they were hired?
For the first, there are a lot of very valid projects you mention, but there’s way way way more things like CS201 projects hosted for review. For LLM training I do wonder if they assigned a weight, but I doubt it. For the second point I was trying to make, even then there’s probably a lot of good code that doesn’t have to be security aware. Like a login flow for a local game may be very simple just to access your character and a developer chose a naiive way to do it knowing it was never going to be used, but to an LLM it’s “here’s a login flow” and how does it know it was never intended to be used for prod?
For the second, absolutely. I don’t think it’s intentional, it’s displaced trust in the system mixed with the naive hopes of a jr dev, which hey we’ve all been through. Jr: “Hey it works! Awesome task done!” Sr: “Yeah but does it work well? Does it work for our use case? Will it scale when we hit it with 100k users?”
For LLM training I do wonder if they assigned a weight, but I doubt it.
Given my experience with models I think they might actually do assign a weight. Otherwise, I would get a lot more bogus results. It also isn’t as if it is that difficult to implement some basic, naive, weighing based on the amount of stars/forks/etc.
Of course it might differ per model and how they are trained.
Having said that, I wouldn’t trust the output from an LLM to write secure code either. For me it is a very valuable tool on the end of helping me debug issues on the scale of being a slightly more intelligent rubber ducky. But when you ask most models to create anything more than basic functions/methods you damn well make sure it actually does what it needs it to do.
I suppose there is some role there for seniors to train juniors in how to properly use this new set of tooling. In the end it is very similar to having to deal with people who copy paste answers directly from stack overflow expecting it to magically fix their problem as well.
The fact that you not only need your code/tool to work but also understand why and how it works is also something I am constantly trying to teach to juniors at my place. What I often end up asking them is something along the lines of “Do you want to have learned a trick that might be obsolete in a few years? Or do you want to have mastered a set of skills and understanding which allows you to tackle new challenges when they arrive?”.
I wish I could double-upvote this for the use of “Betteridge’s law of headlines”. Once because I rarely see that referenced and again because I had forgotten what the adage was called.
Quoting the abstract (I added emphasis and paragraphs for readability):
AI code assistants have emerged as powerful tools that can aid in the software development life-cycle and can improve developer productivity. Unfortunately, such assistants have also been found to produce insecure code in lab environments, raising significant concerns about their usage in practice.
In this paper, we conduct a user study to examine how users interact with AI code assistants to solve a variety of security related tasks.
Overall, we find that participants who had access to an AI assistant wrote significantly less secure code than those without access to an assistant. Partici- pants with access to an AI assistant were also more likely to believe they wrote secure code, suggesting that such tools may lead users to be overconfident about security flaws in their code.
To better inform the design of future AI-based code assistants, we release our user-study apparatus and anonymized data to researchers seeking to build on our work at this link.
Caveat; quoting from section 7.2 Limitations:
One important limitation of our results is that our participant group consisted mainly of university students which likely do not represent the population that is most likely to use AI assistants (e.g. software developers) regularly.