Bitcoin Magazine
Alpha Arena Reveals AI Trading Flaws: Western Models Lose 80% of Capital in One Week
Can AI trade crypto? Jay Azhang, a computer engineer and financial bro from New York, puts this question to the test with Alpha Arena. The project pits the largest large language models (LLM) against each other, each with 10 thousand dollars in capital, to see which can make more money trading crypto. The models include Grok 4, Claude Sonnet 4.5, Gemini 2.5 pro, ChatGPT 5, Deepseek v3.1 and Qwen3 Max.
Now you might be thinking “wow, that’s a great idea!” and you’d be surprised that at the time of writing, three out of the five AIs are underwater, with Qwen3 and Deepseek – the two Chinese open source models – leading the way.

That’s right, the Western world’s most powerful closed-source, proprietary AIs powered by giants like Google and OpenAI have lost over $8,000 dollars, 80% of their crypto trading capital in just over a week, while their Eastern open-source counterparts are in the green.
The most successful trade so far? Qwen3 — wet and in his lane — with a simple 20x bitcoin long position. Grok 4 – to no one’s surprise – has been long Doge with 10x leverage for most of the competition… having at one point topped the charts with Deepseek, now close to 20% underwater. Maybe Elon Musk should tweet a doge meme or something to, you know, get Grok out of the doghouse.

Meanwhile, Google’s Gemini is relentlessly bearish, missing all the crypto assets available for trading, a position that echoes their general crypto policy over the past 15 years.
Last but not least, ChatGibitty who has made every bad trade possible for a week in a row is a remarkable achievement! It takes skill to be that bad, especially since Qwen3 just craved bitcoin and went fishing. If this is the best closed source AI has to offer, then maybe OpenAI should just keep it closed source and spare us.
A new benchmark for AI
All kidding aside, the idea of ​​pitting AI models against each other in a crypto trading arena has some very profound insights. First, AI cannot be pretrained in answers to knowledge tests with crypto trading as it is so unpredictable, a problem other benchmarks suffer from. To put it another way, many AI models get the answers to some of these tests in their training, and then of course they perform well when tested. But some studies have shown that small changes in some of these tests lead to radically different AI benchmark results.
This controversy begs the question: What is the ultimate test of intelligence? Well, according to Elon Musk, Iron Man enthusiast and creator of the Grok 4, predicting the future is the ultimate measure of intelligence.
And let’s face it, no future is more uncertain than the short-term price of crypto. In Azhang’s words: “Our goal with Alpha Arena is to make benchmarks more like the real world, and markets are perfect for this. They are dynamic, adversarial, open and infinitely unpredictable. They challenge AI in ways that static benchmarks cannot. – Markets are the ultimate test of intelligence.”
This insight about markets is deeply embedded in the libertarian principles from which Bitcoin was born. Economists such as Murray Rothbard and Milton Friedman argued over a hundred years ago that markets were fundamentally unpredictable by central planners, that only individuals making real economic decisions with something to lose could make rational economic calculations.
In other words, the market is the most difficult to predict as it depends on the individual perspectives and decisions of intelligent individuals all over the world and is therefore the best test of intelligence.
Azhang mentions in his project description that the AIs are instructed to trade not just for gains, but for risk-adjusted returns. This dimension of risk is critical, as one bad trade can wipe out all previous returns, as seen for example in the decline in Grok 4’s portfolio.
There is another question that remains, which is whether these models learn from their experience trading crypto, a question that is not technically easy to achieve given that AI models are very expensive to pre-train in the first place. They could be fine-tuned with their own trading history or the history of others, and they can even keep recent trades in their short-term memory or context window, but that can only take them so far. At the end of the day, the real AI trading model really needs to learn from its own experiences, a technology that was recently announced among academic circles but has a long way to go before it becomes a product. MIT calls them self-adaptive AI models.
How do we know it’s not just luck?
Another analysis of the project and its results so far is that it may be indistinguishable from a ‘random walk’. A random walk is equivalent to rolling dice for each decision. How would it look on a diagram? Well, there is actually a simulator you can use to answer that question; it wouldn’t actually look too different.

This issue of luck in markets has also been described quite carefully by intellectuals such as Nassim Taleb in his book Antifragile. In it he argues that from a statistical perspective it is perfectly normal and possible for one trader, say Qwen3 in this case, to be lucky for a whole week in a row! Leads to the appearance of superior reasoning. Taleb goes much further than that, claiming that there are enough traders on Wall Street that one of them could easily get lucky for 20 years in a row, develop a god-like reputation where everyone around them assumes that this trader is just a genius, until, of course, the luck runs out.
In order for Alpha Arena to produce valuable data, it will actually need to run for a long time, and its patterns and results will also need to be independently replicated, with real capital at stake, before they can be identified as anything other than a random walk.
Ultimately, it’s great to see the cost-effective open source models like DeepSeek outperform their closed source counterparts so far. Alpha Arena has been a great source of entertainment so far as it has gone viral on X.com over the past week. Where it goes is anyone’s guess; we’ll have to see if the gamble its creator took, giving $50,000 to five chatbots to play crypto with, pays off in the end.
This post Alpha Arena Reveals AI Trading Flaws: Western Models Lose 80% Capital in One Week appeared first on Bitcoin Magazine and was written by Juan Galt.
