As society grapples with escalating concerns regarding the relentless pursuit of artificial intelligence by affluent corporations, it becomes imperative to evaluate the tangible outcomes of this fervent activity.
A recent examination conducted by Ars Technica enlisted four prominent AI coding agents to undertake a seemingly straightforward task: to develop a web-based version of the classic game Minesweeper.
Each clone was required to incorporate sound effects, compatibility for mobile touchscreens, and an engaging gameplay twist.
For the uninitiated, Minesweeper hinges on logic, intertwining gameplay with user interface and experience elements to create a formidable challenge.
While crafting a basic Minesweeper clone isn’t particularly daunting, the underlying logic demands a modicum of ingenuity usually sourced from human input—after all, the ultimate objective is Artificial General Intelligence (AGI).
OpenAI Codex – 9/10

The standout performer was indeed Codex, which not only delivered visually appealing results but also uniquely incorporated a “chording” feature. This mechanic reveals all adjacent tiles when players position their flags correctly, a technique beloved by seasoned gamers; its absence markedly diminishes the refinement of any Minesweeper clone.
Codex boasted fully functional buttons, including a sound toggle that emitted antiquated yet nostalgic bleeps and bloops, complemented by on-screen instructions tailored for both mobile and desktop formats. Furthermore, it introduced a “Lucky Sweep” button that occasionally unveiled a safe tile when players earned it, adding an intriguing twist to gameplay.
The coding experience via Codex was notably seamless, featuring an animated command line interface alongside local permission management, despite the agent taking considerable time to generate the code. Ars Technica deemed this output the nearest semblance to a product ready for deployment with minimal human intervention, awarding it an impressive score of 9/10.
Claude Code – 7/10

Following closely was Claude Code from Anthropic, which executed the task in half the time required by Codex and produced a more aesthetically refined product. Its graphics were particularly polished, featuring custom imagery for bombs and a universally appealing smile emoji at the top. The sound effects were pleasant, and functionality traversed both mobile and desktop devices seamlessly.
However, the experience faltered due to the absence of chording support—termed “unacceptable” by testers. A “Power Mode” introduced a gameplay twist, providing players with simple power-ups that necessitated genuine creativity from the AI. On mobile, users could access a “Flag Mode” button, an acceptable alternative to the standard long-press actions needed to mark tiles.
Overall, Claude Code delivered the most satisfying gameplay experience, crafting a Minesweeper clone in under five minutes and showcasing the cleanest coding interface. It garnered a solid score of 7/10, a rating that could have been higher with the inclusion of chording.
Mistral Vibe – 4/10

In third place emerged Mistral’s Vibe, which produced a game reminiscent of its namesake—functionally adequate yet deeply flawed.
While the clone operated and appeared satisfactory, it woefully lacked the essential chording feature and failed to provide sound effects.
A non-functional “Custom” button added to its disappointment, and it offered no engaging gameplay innovations.
The all-black smiley emoji at the top was a visual misstep, while the “Expert” mode extended the grid beyond its designated boundaries, creating a distracting visual glitch.
On desktop, users could easily right-click to flag mines, whereas mobile users had to awkwardly long-press, risking accidental context menu activations. The coding interface was user-friendly, albeit sluggish.
Despite these shortcomings, Ars Technica’s reviewers appreciated its performance relative to resources available to Mistral, ultimately bestowing a score of 4/10—perhaps lower than warranted based on their commentary.
Google Gemini – 0/10

At the bottom of the rankings was Google’s Gemini CLI, a surprising outcome given the company’s historical prominence in technology benchmarks. The Minesweeper clone produced by Gemini simply failed to function—it had buttons, but lacked tiles, rendering it entirely unplayable.
Visually, Gemini’s output bore an uncanny resemblance to that of Claude Code’s final product, as though the coding process had been abruptly halted.
Moreover, it took the longest time to run each code iteration—approximately one hour—while consistently requesting external dependencies. Even after granting it an additional chance with explicit instructions to utilize HTML5, Gemini could not yield a usable outcome.
Ars Technica reported that Gemini CLI lacked access to the latest Gemini 3 coding models and was confined to a cluster of Gemini 2.5 systems.
It raises the question of whether a subscription to a higher tier offered by Google could have fostered better results, rendering this assessment somewhat incomplete. Nevertheless, the disappointment remains palpable.
This exploration highlights the ongoing ethical dilemmas inherent in quadrupling memory costs while rendering computers less efficient. Codex emerged victorious, with Mistral Vibe and Claude Code close behind, while Google faltered dramatically.
The weight of this experiment raises critical questions about the future trajectory of AI in our digital landscape.
Source link: Tomshardware.com.






