Is Artificial Intelligence Truly Coming for Your Job?
A closer look at Apple’s controversial study and what AI really means for your future at work.
The Apple Controversy
About a month ago, we were greeted with some good news, which felt like a breath of fresh air after a while. All the people who had been on their toes due to the fear-mongering that had been going around since the advent of AI collectively sighed in relief. For a moment, it seemed like AI was not coming after our jobs just yet.
Why?
In fact, the study claims that the very models that have become the highlight of corporate calls and an instrument of intimidation for employees and economies alike do not think at all.
Instead, these models study patterns between all the objects they come across. Any task you think AI is able to solve fluently? It is only doing that because it has memorized the solution, not because it reasoned its way through it.
So how do we know that AI models are powered by pattern-based rote learning and not actual reasoning?
Because, as per this headline-making research study, when these models encounter a task they have never seen before, they are stumped. In such cases, they hallucinate their way to nonsense.
They do not adapt as humans do. They do not figure things out the way we do. And they certainly do not think the way we do.
You know what’s the most surprising discovery of this paper? It’s that AI isn’t gritty at all. It gives up when faced with adversity, when it cannot find the answers in the massive repository of data it has access to. That certainly felt like a win for us mortals.
When I first read the study, I came to the conclusion that grit determines success even when we are competing with AI. (Damn, did I love this thought.)
With how heavily the corporate world is investing in AI, it was confounding that we are only now discovering the fact that AI does not actually reason. So how exactly did Apple uncover the hidden reality of AI and its so-called “thinking” capabilities?
The answer, apparently, lies in a shift from typical math problems used to evaluate models to logic-based puzzle games such as Checkers, River Crossing, and Tower of Hanoi.
Large Language Models (LLMs) and Large Reasoning Models (LRMs) were both evaluated in this study. LLMs did well at the easy difficulty level. LRMs performed better at the medium level. But both failed completely once the difficulty level became hard.
And it wasn’t a gradual decline. Their performance didn’t taper off. It just collapsed.
That is when the researchers concluded that these models, whether it’s OpenAI’s ChatGPT, Google’s Gemini, or Claude’s Sonnet, are not actually thinking. When the pattern cannot be found, they don’t try harder. They simply give up.
Apple’s study observed that as puzzle difficulty increased, the number of tokens (words) the models used to reason actually decreased. They didn’t say, “Let me figure this out.” They said, “Meh, too hard,” and stopped trying.
If you think this is bad, it only gets worse from here. These models were quite literally fed the answers. They were provided with the correct algorithm, including the well-known recursive solution for Tower of Hanoi. And they still couldn’t execute it correctly on the harder tasks. It’s like giving someone the answer key to a test and they still fail.
Given the eyebrow-raising findings of this study, I developed the opinion that AI is not going to replace all the jobs, at least not yet. I thought that of course, such behavior, even from a machine, would not be tolerated in the corporate world. Hence, as long as we, as employees, are performing high-level thinking tasks that involve some degree of novelty, we would remain unscathed.
Doomed Study Design
My fledgling view of job security shattered when AI experts began questioning the research study, its methodology, and results.
For instance, as Professor Seok Joon Kwon of Sungkyunkwan University pointed out, we already know from hundreds of studies that as language models get bigger (with more parameters that help them learn), their performance improves. Sometimes it levels off, but it doesn’t get worse. Hence, the abrupt collapse of performance as claimed by Apple is questionable.
It is entirely possible that maybe Apple just didn’t have a big enough GPU setup to test large models properly. After all, Google’s and Microsoft’s GPU systems are massive.
Further scrutiny of Apple’s research has revealed the possibility of a doomed study design. For instance, another AI expert, Alex Lawsen, ripped into the methodology in great detail, discovering major flaws along the way, which he described in his work titled, “The Illusion of the Illusion of Thinking.”
Lawsen found three major problems with Apple’s study. First, the models were running out of space to give full answers. For example, Claude has a 128,000-token limit, and puzzles like Tower of Hanoi with 8 or more disks need a lot of steps. Each step takes up space, so the models couldn’t finish, even if they knew what to do.
Second, some River Crossing puzzles included were mathematically impossible to solve because the boat was too small to carry all characters. But instead of giving the models credit for noticing that, the study still marked them wrong. Talk about out of syllabus.
Third, the study didn’t check if the models stopped because they didn’t understand or just didn’t have enough space to finish. Either way, they were still penalized.
All these comments have drenched the study in controversy and proved that the research cannot be taken at face value.
How Real Is the AI Threat?
While we need more research before we can draw definitive conclusions or make generalizations, that does not mean AI is not useful. It may not be perfect, but it is still incredibly helpful in shaping this era.
It also does not mean AI is not a threat. After all, Google is using AI to write 30 percent of its code. It is important to note that even if AI’s "reasoning" capabilities are poor (which I doubt), it is still coming for our jobs.
And as Kelsey Piper noted in her newsletter, Future Perfect, AI will take over most jobs, irrespective of how well it can reason. The abundance of AI-driven layoffs is proof enough.
So what should be done? Should we make our peace with potential future unemployment? Not just yet. I am carrying out deep research on how to shield yourself against the havoc that AI is likely to cause. You will read about it soon enough.
For now, you can check out my article, “The Illusion of Invulnerability: The Optimism Bias & Layoffs” to understand why we are failing to protect ourselves against layoffs.
Until next time,