Discover more from The Glimpse
Tesla Bot | The Quasi-Infinite Economy
"The potential really boggles the mind. This means a future of abundance. A future where there is no poverty."
For as long as the StarWars nerds like me have dreamed of a future with our own personal C-3POs, it's never been much more than a pipe dream. Even if we could build a replica of C-3PO and make it walk or say something in his iconic worrisome voice, we were always limited by the ability of our C3POs to see the world and interact with it. That is... until now.
AI has mostly lived behind our computer screen, only surfaced through some sort of chatbot. But it hasn't been able to engage with much else beyond the data we give it. It hasn't been able to "see" the world and interact with it.
The advancements in AI over the last year have acted as a forcing function to expedite the manufacturing of training chips, the development of powerful models, the investment of billions of dollars, and the conversations around safety, ethics and opportunity. The AI flywheel has begun to spin faster and faster, pushing the boundaries of what we thought was possible with new breakthroughs every week.
As a result, we’ve begun to see a wave of robotics companies like Tesla, Figure, Boston Dynamics, X1, Agility, Clone and Sanctuary, building robots powered by artificial intelligence. In a similar way that space exploration was relegated to government programs until companies like SpaceX began to make headway, robotics was largely the work of researchers in academia until now.
Though many of the companies I mentioned are focused on special purpose robots like Boston Dynamics' Spot and Stretch, some, like Tesla and Figure, are building general purpose robots with the ultimate vision of every person having their own humanoid robot in the same way we might have a car or a phone today.
And while that vision is fun to think about if you’ve ever wanted to tinker around with your own C-3PO like Anakin Skywalker, the impact goes far beyond simply a household bot to do your chores. In this article, I want to attempt to answer why 1) humanoid robots in particular are so important and 2) the broader impact on our world as we know it.
What used to be a dream for the nerds like me, may soon be reality for our children and grandchildren. Let's dive in.
Welcome to The Glimpse
Join thousands of leaders from companies like Nike, Google, Uber, Coinbase, Twitter and Venmo as we learn how to build the future.
The Challenge of a General Purpose Robot
Over the years companies like Boston Dynamics have managed to build interesting robots, including a humanoid called Atlas.
Atlas is a research project that's meant to push our understanding of what bi-pedal, humanoid robots are capable of and, to be honest, it's quite impressive. It has the ability to stabilize itself when knocked off balance, maneuver through a complex obstacle course, and even do a backflip off of a box with perfect form. So if a humanoid with this level of ability already exists, you might be asking yourself, "where's my Atlas?"
"You've all seen very impressive humanoid robot demonstrations. And that's great, but what are they missing? They're missing a brain. They don't have the intelligence to navigate the world by themselves. They're also very expensive and made in low volume."
—Elon Musk, CEO of Tesla
There are a number of reasons why robots like Atlas haven't become commercialized:
Cost — Creating a robot like Atlas with all of its materials, sensors, cameras, actuators, processors and batteries gets really expensive. So much so that it becomes cost prohibitive for personal and even most commercial applications.
Practicality — Most tasks in industry and logistics can be performed more efficiently and cheaper by simpler, specialized machines. Putting a general purpose robot on these sorts of tasks would be a waste given the immense cost.
Intelligence — General purpose robots aren't simply programmed to do one repetitive action over and over again. They're meant to be able to do a wide variety of tasks and adapt to its circumstances. This requires incredible decision-making abilities which simply haven't been developed yet.
To achieve the dream of giving every human their own C-3PO, we'd need to dramatically reduce the cost of production and make step-change advancements in artificial intelligence so that it could adapt itself to the world and make decisions.
The Case for a General Purpose Robot
Given the immense costs and complexity of building a humanoid robot like C-3PO, it stands to reason that we might ask "why?"
According to Figure, a company that’s working toward building a commercially ready humanoid robot, it comes down to two schools of thought:
…we could have millions of different types of robots serving unique tasks…
…or one humanoid robot with a general interface, serving millions of tasks.
Aside from the grit and will that drives us to do hard things like go to the moon, the biggest argument in favor of a humanoid is that the world we live in is a world built for humans. From door handles to stairs, utensils to hammers, grocery store isles to roads. Sure, we could build a bunch of specialized robots, but rather than adapting the world to robots, the ideal scenario is to build a robot for the human world.
“With one product we can meet the complex human environment with human-like capabilities, and provide endless types of support across a variety of circumstances.”
If a robot has the same characteristics as a human, it can assist us with the tasks that a human might do. It could put dishes away, make the bed and take out the trash, all very dexterous tasks that would require quite a bit of home re-configuration to build a specialized robot for.
And while my understanding of computer vision is securely at n00b level, I'm sure that sharing a form factor with humans would also make it easier to learn the tasks it's meant to perform. If it's programmed to recognize the human form with a head, torso, arms and legs, then it can use computer vision to identify how the arms extend and the fingers grasp, how the knees and waist bend. Being able to show it a video or perform the action in front of the robot and have it learn what needs to happen based on the movements of the thing (us) that looks like itself would be a useful way to extend its capabilities quickly.
As you might imagine, to build such a robot, it needs to engage with the world as a human might. This means it needs legs to walk, fingers to grab things, and eyes to "see" the world around it. When it "sees" something with its digital eyes, it needs to be able to identify it, understand what it's used for, walk over to it with its legs, bend down with its knees, and pick it up with its fingers.
That's a very different set of inputs and outputs than a robot like a Roomba. With a Roomba, it can map your home by rolling around and bumping into walls and furniture. But it doesn't know what those walls and furniture are. It just knows it's in the way. It can't move the object, it can only roll around it. It can't "see" if the floor is dirty or clean, it simply follows a path until it's called back to its dock.
In order to create a general purpose, humanoid robot, there are three big things that we need:
The Data: We need a large data set to train the neural net on so that it can identify objects in the real world.
The Brain: We need a powerful neural net to process the training data into a useful digital twin of the real world - or as Kevin Kelly calls it, the Mirrorworld - that the robot can use to navigate.
The Machine: We need a human-sized, human-weight, human-looking robot with all of the cameras, sensors, batteries, motors, and actuators it needs to provide it with dexterity, strength, precision and movement and the supply chain to manufacture it at commercial scale.
The Humanoid Tech Stack
With the AI boom, companies like NVIDIA have ramped up production of AI training chips, organizations like Hugging Face are encouraging the rapid evolution and distribution of open sourced AI models, and investors are pouring massive amounts of money into the companies who are pushing AI forward. The advancement we've seen in AI has begun to turn the flywheel for innovations in general purpose robots.
But the most clear answer to the question of, "why now?" is because there are companies that finally have all the pieces of the "Humanoid Tech Stack." I believe Google, Meta, Microsoft, and Amazon are all in a position to make progress in this space. However, the company I'm most bullish on is Tesla.
Let’s take a look at the three big things we need to successfully build a humanoid robot - the data, the brain and the machine - and see how Tesla is best poised to “win” this race.
1. Tesla’s Fleet
Tesla's fleet of over 4,000,000 vehicles on the road, each equipped with 9 cameras, are building the largest real-world dataset for computer vision. They're constantly recording, processing and sending video clips back to Tesla's data center for further training.
According to Tesla's website, the training data is processed for semantic segmentation, object detection and monocular depth estimation. Sounds like a bunch of big words so let's break that down:
Semantic Segmentation: The task of classifying each pixel in an image to a specific class. For example, in an image of a street, it would label which pixels belong to the road, which belong to cars, trees, buildings, and so on. The "semantic" part means that it's not just detecting objects, but also understanding the context and assigning a category to each part of the image.
Object Detection: This is the process of identifying and locating objects in an image or video. Object detection algorithms not only classify what objects are present in the image, but also provide a bounding box that indicates where in the image these objects are. For instance, in a picture of a park, object detection might identify and locate a dog, a bench, and a tree
Monocular Depth Estimation: This is a technique used to estimate the depth (or distance) of objects in an image from a single camera or viewpoint (hence "monocular"). In real life, humans use two eyes to perceive depth. But with monocular depth estimation, a neural network can learn to estimate the distance of objects in an image using just one "eye" (or camera).
Essentially, the processing can categorize every pixel, identify and label every object, and estimate their distance in the real world from the camera.
2. Tesla’s Supercomputer
The second piece of the puzzle that Tesla has is their unique ability to process raw data into useful data. For example, Google has an incredibly powerful image recognition engine that they use to power Google Images. But good image processing isn't the same as real-time video processing and analysis.
Tesla, on the other hand, has been doing on-board processing of the footage captured by the 36,000,000+ cameras on production vehicles to enable partial and full self-driving (FSD). That processing has to be so good that it can ingest the footage, analyze it and make a mission-critical, life or death decision in real-time.
The work that Tesla has put into refining and beefing up their processing capabilities translates directly to a general purpose robot's ability to "see" the world and navigate within it, except rather than four wheels, it's moving on two legs.
They recently announced Dojo, Tesla's supercomputer designed specifically for computer vision video processing and recognition. Because Tesla's goal was to process millions of terabytes of video data, they designed a radically different architecture from traditional supercomputers. Dojo will have more than an exaflop of computing power. Let's put that into perspective...
“To match what a one exaFLOP computer system can do in just one second, you’d have to perform one calculation every second for 31,688,765,000 years,”
—Network World (source)
My mind has a hard time wrapping itself around that number... but I know it's a lot!
3. Tesla’s Gigafactory
The last big piece of the "Humanoid Tech Stack" is commercialization. No matter how cool a robot is, the era of every human having their own C-3PO will never arrive if you can't manufacture the robots on the scale of millions of units. While Google, Apple, Meta, Microsoft and others have set up the necessary supply chains and overseas manufacturing infrastructure for products like computers, phones and tablets, Tesla is unique in that it has built its own factories that are designed to build large, mechanically complex robots (eg. electric vehicles). If anyone can adapt and scale existing manufacturing capacity to include humanoid robots, it's Tesla.
In his "Tesla AI Day" event, Elon stated that they're specifically designing Optimus (aka: Tesla Bot) to be manufactured cost-effectively at scale putting the price tag at sub-$20K. It's estimated that 91.7% of people in America own a vehicle. And that vehicle spends most of its time sitting in a garage or in a parking lot rather than doing the one thing that vehicle is built for - driving. If you could get a robot loan for less than a car loan and that robot could do your chores, take care of your pets, maintain your house and yard, and cook your meals while your car continues to sit in the garage, it would be the biggest no-brainer of the century.
The market demand wouldn't be on the order of millions, but billions of units, likely selling multiple robots per person. And that's not including the tens of millions of robots that would be ordered in bulk by corporations for dangerous, labor-intensive jobs.
The Impact of a General Purpose Robot
"As I'm walking about on Main Street, I'll see these - what they call CoCo robots - little 6-wheeled robots that are rolling down the sidewalk delivering six packs of Diet Coke or burgers. And at first they're an oddity when you see them and people are taking photos. And then you ignore them as you're walking by."
It will feel really odd the first time we see a humanoid robot walking down the frozen section of Walmart. But over time, it will become even stranger to see a human doing their own shopping in Walmart. Not any time soon, but directionally, that’s the future we’re heading toward.
Of course, we're probably still 10 years out from humanoid robots being commercially viable and something like 20 years out from robots in the homes of the early adopters. But with strong company builders like Elon Musk (Tesla) and Brett Adcock (Figure) leading the charge, who knows just how fast we can move and how it will shape our world.
When Karl Benz invented the first vehicle in 1886 and when Henry Ford scaled manufacturing with the moving assembly line in the early 1900's, I don't think anyone anticipated the impact that vehicles would have on the global economy. It unlocked commerce, communication, collaboration and exploration on a scale the world had never seen before. Personally, I believe a general purpose humanoid robot will have a 10x greater impact than the automobile. Here's how Elon explains it...
"The potential really boggles the mind. What is an economy? An economy is capita times productivity per capita. At the point at which there's not a limit on capita, it's not clear what an economy even means at that point. The economy becomes quasi-infinite. Taken to fruition, this means a future of abundance. A future where there is no poverty."
Brett Adcock, the founder of Figure, has expressed a similar vision for the future.
“Manual labor could become optional and higher production could bring an abundance of affordable goods and services, creating the potential for more wealth for everyone. We will have the chance to create a future with a significantly higher standard of living, where people can pursue the lives they want.”
Beyond even the economic advantages that robots unlock, there’s an enormous application in exploration as well. Robots can withstand harsher climates, don’t age, don’t need food or water, and can exist outside of the emotional support of community. They can man the mining rigs on asteroids. They can go ahead of us to Mars and begin constructing a settlement. They can be sent in rockets to the farthest reaches of the galaxy. Despite our automated factories, robot vacuum cleaners and the occasional food delivery robot on the streets of San Fransisco, we haven’t yet entered into an age of robotic abundance nor experienced the wealth and prosperity it can unlock.
It's a grandiose vision to be sure. Some of the assumptions that have to be made include:
Near-limitless, sustainable energy (like Helion)
Ubiquitous, high-speed internet connectivity (like Starlink)
Widespread, automated, precision manufacturing at scale (like Hadrian)
Sustainable and affordable materials needed for batteries (like KoBold Metals)
Despite leaving out a number of very large assumptions in the robot narrative, I love the way that visionaries like Elon and Brett can connect the dots in a simple way to drive home a bigger point. If we can pull it off, a general purpose robot that's built to navigate in a world designed for humans can provide a level of "productivity" that's never been known before. Humans will be freed up to devote their time to creative problem solving, invention, exploration and leisure.
That's the future that Tesla is building.
That’s all for this one - I’ll catch ya next week.
❤️ Smash that heart!
If you enjoyed this article, smash that heart icon to show some love and don’t forget to share it with a friend!