But the researchers then dive head-first into wild claims:
GameNGen answers one of the important questions on the road towards a new paradigm for game engines, one where games are automatically generated, similarly to how images and videos are generated by neural models in recent years.
To which the obvious reply is: no it doesn’t, where did you get any of this? You’ve generated three seconds of fake gameplay video where your player shoots something and it shoots back. None of the mechanics of the game work. Nothing other than what’s on-screen can be known to the engine.
Yeah, this was apparent immediately.
Diffusion models are just matrices of positional and temporal probabilities. It is absolutely incompatible with even the simplest demands of a game, since any player will reject a game if it lacks reliable and input-deterministic outcomes. The only way to get that reliability is to create a huge amount of training data, and spend exorbitant resources training on it to the point of harshly over-fitting the model to the data, all of which requires that the team first make the game they’re emulating before they start training. It’s nonsense.
If someone is going to use AI to make a game, they would get exponentially higher ROI using AI to generate code that writes once the relationship between the data, versus inferring the raw data of every individual pixel.
The demo was always effectively a clickbait novelty for likes.