Deep Dives #6 - An essay on the history of AI
The coming of the robot overlord has been in the works since 1955.
Good Evening.
Yes, this one has been overdue. I have been coasting along, thinking of things to write but failing to find anything except AI. And AI has been done to death by assorted pundits of all shapes, sizes and stripes on the interwebs.
It makes sense - no development in recent history feels as impactful on our lives as the emergence of AI. Acceleration of this tech is happening at an almost incomprehensible pace. Punters are hoping that artificial general intelligence (AGI) will emerge by 2031. I guess one should be taking these predictions with a bit of salt - Skynet was supposed to have lobbed nukes on August 29, 1997. We are all still around. Maybe glass half full?
However, the need for cheap social media likes is leading to a lot of stupid stuff being written about. Rah Rah-ing about how AI will make jobs obsolete seems to be a bit premature - it is a tool. It can help you think and work more efficiently. It is not able to think for you.
One of the things that really blew my mind over the last few weeks was the launch of plugins for AI. ChatGPT can now use Wolfram Alpha to give answers to questions it gets wrong. AI has a calculator!!! It is interesting to see it pulling live data and making sense of it.
What this has also allowed me to do is to pull a bit of a hack. One of the personal projects I have been working on is an automatic summarizer for meeting notes. So, here’s a high level summary of what I have been doing:
Wrote a very kludgy (admittedly) wrapper for sending call recordings to a TTS transcription service.
Fed the transcript to ChatGPT.
Asked it to summarize the meeting for me.
The results are pretty good and as TTS and the underlying LLMs improve, will only get better. I have twin niggles with it though:
TTS struggles with the accents.
The LLM struggles royally with the context - e.g. it is thinking I am really going to drag the other person on a floor when I talk about drag-along rights.
But hey, it is about ~90% there. Then again, ChatGPT can’t really run the meetings or ask questions for me. No one can. To paraphrase a hindi idiom (badly) - One has to die oneself in order to enter heaven.
Shameless plug about questionable alpha products done, I thought that we could perhaps explore the history of AI. Would you believe me if I told you that one of the very first attempts was done as far back as 1955? Think of this essay as an anthology of the greatest hits of AI.
Let’s go.
# 1 - 1955 - Logic Theorist
Allen Newell, Cliff Shaw and Herbert Simon worked at RAND corp, one of the most influential think tanks in the world. In late 1955, they devised a program capable of proving complex theorems. It used rules, probabilities and deductive logic to navigate a search tree. Their creation, Logic Theorist (LT) is widely regarded as the first AI program. That it was a technical tour-de-force was mostly lost on people at the time. At a 1956 conference organized by Claude Shannon (father of information theory), most people ignored Newell and Simon’s creation.
Perhaps to their detriment - LT successfully proved 38 of the 52 theorems outlined in Bertrand Russell and Alfred North Whitehead’s Principia Mathematica. In fact, one of the theorems was proved more elegantly by LT than Russell and Whitehead did.
LT’s legacy is in 3 key concepts that it introduced:
Reasoning via exploration of a search tree;
Using of rules to trim exploration of non-profitable branches (what we now call heuristics -this is where the term was coined); and
Efficient list processing. If you have ever used LISP and been amazed by its compactness and sheer power, know that it was based on a language called IPL which was developed by Newell, Shaw and Simon.
If AI research has OGs, these guys are right up there.
#2 - 1968 - SHRDLU and Blocks World
Terry Winograd is a legend. A CS professor at Stanford, he also happens to be an AI pioneer. Famously, Winograd told Larry Page to run with the idea that will eventually become Google. Long before he was nudging promising young minds in the direction of trillion dollar ideas, Winograd was working on a pathbreaking program called SHRDLU. The name is a wink to the order of keys on Linotype machines (we have lorem ipsum today, back then they had etaoin shrdlu).
SHRDLU created a simulation called Blocks World. This was an environment populated with colored boxes, blocks, pyramids and so on. By the way of natural language queries, users could have this program manipulate this closed environment and move objects to comply with specific instructions. As an example one could ask the program to “pick the blue block taller than the one you are holding”. A Stanford page on SHRDLU can be found here. It makes for some fascinating reading. The level of awareness the program had about it’s internal state and interactions was deep.
SHRDLU’s key innovations included understanding natural language and a much deeper exploration of the symbolic approach to AI - the “intelligence” on display came via formal logic and not emergent behaviour.
If you have used GPT, you have seen the almost ubiquitous conversational interface. We can thank Prof. Winograd for it.
#3 - 1984 - Cyc
Yes, I know this is a bit on the nose with 1984 references.
Cyc is arguably the most notable failure in AI. However, like all “good” failures, Cyc explored and eliminated many unprofitable approaches to AI. It is a project so audaciously ambitious - it created an entire, new reality - ab initio. In many ways Cyc was the predecessor to IBM’s Watson.
The origins of Cyc lie in a 1972 project called MYCIN. MYCIN was an expert system that used AI to identify bacteria causing infections and to recommend antibiotics. It was subsequently also used to identify clotting disorders. Written in LISP, Mycin is basically a simple inference engine working off ~600 rules. However, Douglas Lenat, a professor at Stanford found them to be little more than glorified flowcharts with a veneer of intellect.
So, in 1984, Lenat set out to solve what he thought was the single biggest limitation of all AI projects upto that point - their lack of common sense. Lenat believed that true AI will be able to build context. The approach was simple - give the program knowledge. Cyc created a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works. Cyc focuses on implicit knowledge that other AI platforms may take for granted. Basically - rules and assertions that reflect consensus reality.
Cyc has an ontology of over a million terms and ~25 million rules. Over 1000 man-years of effort have gone into building it. And yet, Cyc is unable to evolve on its own. The separation of the epistemological problem (what content/rules should be in the Cyc KB) from the heuristic problem (how Cyc could efficiently navigate 25 million rules) means that it’s knowledge outside of math is only true by default. That puts a cap on its ability to evolve or update its knowledge.
The biggest contribution of Cyc has been its demonstration of the limits of expert systems and knowledge based AI. Expert systems do very well in areas they are optimized for and ones where knowledge is easily codified (e.g. Medicine and Law). It can diagnose rare blood disorders but it is hard pressed to learn concepts that 3 year old children know e.g. permanence of objects.
Which is not to say that Cyc is useless. it has been applied to more than a 100 real world applications, including the MIPT terrorism knowledge base. It remains a brilliant tool to solve specific problems, it just isn’t AGI.
#4 - 1990 - Elephants Don’t Play Chess
In the 1980s, another approach for AI was emerging. Building on the idea of experiential learning - this approach advocated that humans do not learn and navigate life via painstakingly inputted rules - they do so by interacting and learning from their environments. Ergo, if AI programs were to be useful, they needed to be situated in the real world.
Rodney Brooks is a professor of robotics at MIT. In his seminal 1990 paper titled “Elephants Don’t Play Chess”, Brooks laid down the ideas of this “behavioural AI” movement. It is a great read, if a little dry. Brooks’ architecture splits a environment roving robot’s desired actions into smaller, discrete behaviours like “avoid obstacles” and “wander around” and assigns priorities to them. He calls it a “Subsumption Architecture” and I would argue that the para detailing it is among the most important bits of text ever written.
Brooks’ version of behavioural AI turned out to have practical applications. Brooks happens to be a founder of iRobot which makes Roombas. and while a Roomba is by no means a thinking machine - behavioural, embodied intelligence can have practical benefits.
Like a clean floor.
#5 - 2012 - Alexnet
Geoffery Hinton is a man after my own heart. He charges forward even when everyone around him is telling him that his approaches are batshit insane and have zero chance of success.
Unlike the symbolic or behavioural camps of AI research, Hinton was part of a small group of researchers who believed that neural nets were the way to go. Simply put, the argument is this: replicating the brain’s structure and functionality is a more profitable route to get to the grail of AGI.
Now, neural nets are nothing new. As far back as 1943, researchers were building circuits to emulate neurons. The first machine on this approach was built in 1958 - called the Perceptron. Although the perceptron initially seemed promising, it was quickly proved that perceptrons could not be trained to recognise many classes of patterns. This caused the field of neural network research to stagnate for many years.
Neural nets1 are supposed to work like our brains do - various inputs lead to neurons firing which produce certain outputs. Now, neural nets need layers. A single layer neural net can only identify linearity - you need 2, 3,4 layers of nets to build complexity into the system. Each layer feeds forward into the next and can sometimes have feedback loops as well.
Anyway, while neural nets became a 4 letter word after the failure of the perceptron, Hinton kept pushing the idea, even though pretty much everyone else thought it to be a waste of time. His paper on backpropagation2 published in Nature in 1986 is one of the foundations of modern data science and AI.
The key problem with neural nets was the lack of computing power. Sometime in 2005 - this changed. Faster chips with vector mathematics became available, walls of PS3s were strung together to build clusters and the concept of deep learning came about. Children of neural nets, Deep Learning Nets were exponentially more powerful - they had more layers, more units and more connections.
Then the ImageNet challenge came about in 2012 and deep learning had its moment in the sun. ImageNet is basically a large ontology of images, a free training set.
In 2012, two students of Hinton’s, Alex Krizhevsky and Ilya Sutskever, created an algorithm to compete in the ImageNet Challenge. This algo - Alexnet - represented a turning point for deep learning and a validation of Hinton’s ideas.
AlexNet won the challenge in 2012 and made neural nets the rage again. Alexnet was a Convolutional Neural Network (CNN), itself a development of the multi-layer perceptron. Alexnet conclusively proved a few ideas3:
Depth is critical
Computational power is a hard limit
Splitting the workload between compute units.
Today these ideas sound pedestrian. in 2012 - the whole concept of two GPUs working together to train a large model was unorthodox.
The results were path breaking. AlexNet owned the competition, achieving an 85% accuracy rate. It outperformed all other algorithms by more than 10%. In the following years, programs inspired by AlexNet would blow past the human threshold.
More than its capabilities, AlexNet represented a turning point for AI and how one thought about building AI systems. The future would be made of deep networks powered by GPUs and trained on vast quantities of data.
#6 - 2013 - Playing Atari with Deep Reinforcement Learning
In 2013, Deepmind did something crazy. They hooked up an AI model to Atari games and let it run. To say that the results were a revelation will be a massive understatement.
AI has a pretty strong relationship with games. Rules based games offer a great sandbox in which to benchmark human and machine behaviours. In the 1950s, Claude Shannon (yes, same guy as above, inventor of the information theory) wrote a study called Programming a Computer for Playing Chess. It is a brilliant example of presenting a complex idea, but I digress. In 1996, building on the work of Shannon, IBM’s Deep Blue wiped the floor with Gary Kasparov - the first demonstration of AI’s ability to beat a human. 20 years later, Go will be conquered by AlphaGo.
Anyway, as I said, in 2013, Deepmind did something crazy. They hooked up an AI model to Atari games. No one told the model anything about the games. No rules, no knowledge, no champion behaviours. Nothing at all. The program played and observed what actions increased the score.
This is among the first documented success cases of reinforcement learning. Much like a brain, when the program does something good (increased score), it gets a reward, and when it does something negative (lower score), it gets a negative reward. Through this iterative feedback process, a program learns to reach its goal. Because it doesn’t receive explicit instructions, it often wins by employing strategies a human might never have conceived of – and may not fully understand.
The engine learned to outperform humans at 29 of the 49 Atari titles initially outlined. In some instances, the program achieved “superhuman levels and demonstrated intelligent, novel techniques”.
Of Course, Deepmind will demolish the human bastions of a few more games. in 2016, it released AlphaGo which was followed by even more powerful programs like AlphaGo Zero and AlphaZero. Alphazero learned to play chess in 9 hours, playing with itself. Think about it. 9 fucking hours. That’s it.
Deepmind would go on to apply these learnings to AlphaFold, arguably one of the most important AI projects on the planet.
To quote Mr. Bowie, sometimes best results come if one would just “let the children boogie”
#7 - 2017 - Attention is all You Need
In 2017, a paper hit my inbox written by a bunch of folks from Google and University of Toronto. The title seemed a lot less like a comp science paper and a lot more like a newly discovered rung at the top of Maslow’s hierarchy.
Then I started reading. And I was hooked. "Attention is All You Need" marked a significant turning point in AI language capabilities. While early models such as SHRDLU demonstrated some capability, the ability to understand and generate language developed more slowly than other AI skills. Deep Blue could defeat Kasparov, but it lacked the ability to craft a simple paragraph describing its own feat. Even as recently as 2015, AI language capabilities lagged behind the stunning abilities of deep learning models in areas such as image recognition and game-playing.
The paper introduced a novel architecture called the "transformer," which relied on a process called "attention." At the highest level, the transformer simultaneously pays attention to all of its inputs and uses them to predict the optimal output. By paying attention in this way, transformers can understand context and meaning much more effectively than previous models.
If you gave a traditional model a prompt such as "describe a sequoia tree," it would likely struggle. It would attempt to process each word individually and would fail to recognize their connections to one another. For example, the word "sequoia" has multiple meanings, which include a type of tree and the VC firm that though SBF was a genius. A model that fails to recognize the proximity of "tree" to "sequoia" could easily provide a description that rambles on about large trees before writing another paean to SBF4.
Transformers have vastly improved text prediction by recognizing and encoding context. This improvement has laid the groundwork for conversational AI that is vastly superior, such as GPT-4 and Claude5. Transformers may actually be emulating the brain more closely than originally thought, which once again supports Hinton's ideas.
Recent research indicates that the hippocampus, a critical part of memory function, is a "transformer in disguise." This finding represents yet another significant step forward in the pursuit of building an AGI that will hopefully meet and ultimately exceed human abilities.
Wrapping Up
All this research is headed to a point where AIs are becoming black boxes. We no longer understand how AI is working. And that creates understandable angst among people who are more deterministic in their outlook like me.
In his book “A Thousand Brains”, Jeff Hawkins writes:
Intelligence is the ability of a system to learn a model of the world. However, the resulting model by itself is valueless, emotionless, and has no goals. Goals and values are provided by whatever system is using the model. It’s similar to how the explorers of the sixteenth through the twentieth centuries worked to create an accurate map of Earth. A ruthless military general might use the map to plan the best way to surround and murder an opposing army. A trader could use the exact same map to peacefully exchange goods. The map itself does not dictate these uses, nor does it impart any value to how it is used. It is just a map, neither murderous nor peaceful. Of course, maps vary in detail and in what they cover. Therefore, some maps might be better for war and others better for trade. But the desire to wage war or trade comes from the person using the map.
Similarly, the neocortex learns a model of the world, which by itself has no goals or values. The emotions that direct our behaviors are determined by the old brain. If one human’s old brain is aggressive, then it will use the model in the neocortex to better execute aggressive behavior. If another person’s old brain is benevolent, then it will use the model in the neocortex to better achieve its benevolent goals. As with maps, one person’s model of the world might be better suited for a particular set of aims, but the neocortex does not create the goals.
The old brain Hawkins alludes to is our animalistic brain, the region that governs our emotions, instincts for survival and reproduction, and controls the subsystems of our body. In contrast, it is the neocortex that enables learning, thinking, and prediction. Without the old brain, our intelligence would lack the capability to take action, either in terms of intention or influence. Consequently, machine intelligence would be equally harmless.
It seems then, that the actual hazard of machine intelligence is the intentions of humans who control it.
Additional Reads:
Physical Symbol Systems as defined by Newell, Simon and Shaw.
The Age of AI by Eric Schmidt, Henry Kissinger, and Daniel Huttenlocher.
A Thousand Brains by Jeff Hawkins
A Brief History of Artificial Intelligence by Michael Wooldridge.
Machines Who Think by Pamela McCorduck.
Housekeeping:
As always, I look forward to hearing from you. If you liked this post, pls feel free to share this or subscribe to this newsletter using the links below. While I have been tardy of late, I try to write a 1000-2000 word essay once every 4 weeks or so.
Actual paper is paywalled but here is a great article on it.
Can’t really resist.
Note the reference to Claude Shannon.