The unit economics of a generative image model resolve, in the end, to a question of how many seconds of graphics-processor time the house is willing to spend per user per picture. A post circulated this week on the ChatGPT subreddit, by a tester who briefly held access to the pre-release GPT Image 2 through a free account, offers a serviceable field report on where that arithmetic is presently coming to rest. The tester found the artistic level marginally superior to the prior generation. He also found that, in every image he produced, the fingers emerged deformed and the eyes, in a recurring share of cases, distorted. His forecast, delivered without heat, is that the version eventually released to non-paying accounts will run at a reduced number of inference steps, and that the hands will accordingly be worse.
The forecast is useful chiefly because of what it concedes. It concedes that anatomical fidelity is not a fixed property of the model but a variable cost, priced per generation, and therefore available for tiering. It concedes that the paying customer is not purchasing a better picture so much as a longer compute budget against which the same apparatus labours for a greater number of iterations. And it concedes, implicitly, that the difference between a hand with five fingers and a hand with seven is, to the operator, a line item.
Inference steps, for the reader unfamiliar with the term, are the successive passes a diffusion-style model makes over an initialising field of noise, each pass resolving the image toward its final state. More steps, more resolution; more steps, more seconds of a rented graphics processor; more seconds, more dollars. The tester is correct that he does not know whether the model in question is stepped in precisely this fashion. The pricing logic is indifferent to the architecture. Whatever the internal mechanism, the operator retains a dial, and the dial has a meter attached to it, and the meter reports to the finance department.
The tester's grievance, it should be noted, is not that the free images were deformed. He raises no objection to the fingers as such. His objection is prospective and distributional: that the deformities, having been observed on his free account, will be retained on the free tier and relieved on the paid tier, and that the relief will be sold as a feature. He is, in this respect, an unusually clear-eyed consumer. He has located the product.
A second observation, offered almost in passing, bears closer attention. The tester reports that another user, on the competing service, published an image generated by GPT Image 2 in which a Gemini watermark, rendered imperfectly, appeared within the frame. If accurate, the detail indicates that the training corpus for the model included output from a competitor's model, which had itself been watermarked, and that the watermark survived the training process as a recurring visual artefact. The industry term for this condition is not yet standardised. Training-set cannibalism will serve. The commercial implication is that the supply of genuinely human-made reference material, on which these systems were initially trained, is being progressively displaced in the crawl by material the systems themselves, or their competitors, have already produced. Each generation is fed, in part, on the leavings of the prior generation. The watermark is the tell.
Taken together, the two observations describe a market in a particular phase of its maturation. The headline product is being segmented: a free grade, with visible defects retained as a differentiator, and a paid grade, from which the defects are, for an additional fee, subtracted. The underlying inputs, meanwhile, are degrading, as the pool of training material grows thinner in the human portion and thicker in the synthetic, and the models begin, quietly, to inherit each other's signatures. One can read the subreddit post as a consumer complaint. It also reads, without alteration, as a note from the factory floor.
What the tester has noticed, and what the industry has not yet found a comfortable way to say aloud, is that the product is being sold twice: once as an image, and once as the absence of the errors that the free image was permitted to contain. The hand is the unit of account. The finger is the margin.
*Continued on Page 7*