Will LLMs permanently change robotics or will the impact be minimal?

Welcome to the Robotics Roundtable

Welcome to our newsletter. In it, we take an article that piques our interests and we discuss it from our unique perspectives. Sevy - the robotics hardware engineer, Connie - the robotics software engineer, and Sean - the social entrepreneur and marketing + finance dork.

This week we delved into an article examining the current state of robotics and its integration with learning-based components: "The Convergence of Classical and Learning-Based Robotics". The article captured our interest because the surge in integrating LLMs, VLMs, foundation models, and Nerfs into robotics promises remarkable progress in areas like natural language understanding, high-level planning, and improved perception modules. Yet, challenges persist, especially in the "lower-level" aspects of robotics, emphasizing the need for robust solutions in novel scenarios.

If you enjoy our work, please consider sharing, visiting our blog, and subscribing to our newsletter where you can read more about our thinking, get updated on our latest events, hear about breaking robotics news coming out of Los Angeles, and follow our journey as we work to unite the Los Angeles robotics ecosystem.

CONNIE’s Corner

At this point the entire world has heard of ChatGPT. LLMs took the world by storm; when ChatGPT came out, it felt in many ways like the singularity. An AI that “knew everything”? ChatGPT could write for you, talk to you, and maybe even take your job? Absolutely bananas. This is almost too broad a topic to tackle in a single newsletter, but we’ll try to analyze the impact of Large Language Models (LLM) - and their sudden popularity - on the world of robotics.

LLMs are not robots by our definition. We require robots to have reasoning ability, and to move motors or actuators. LLMs only have reasoning, putting them squarely in the AI category. However, behind every actuator there is a full software stack that is powered by “AI”, and this is where LLMs could really shake things up. Majumdar points out six categories of research challenges, but I believe HRI and task planning have the most near-term rewards.

HRI has struggled with a way to express human intention with robots, and to understand intention by robots. What’s missing for robots is context, which is difficult to program ad-hoc, but is something that LLMs excel at. Majumdar points out that GPT’s responses are so human-esque, users attribute other human behaviors to it such as sentience or the ability to do arithmetic. If we can use LLMs to manage the reasoning behind an embodied system, then we might have something very similar to C3PO from Star Wars. It’s very exciting that sci fi seems almost within reach.

The vast amounts of context and the ability to write code could also be useful for task planning, but has some challenges. It is unclear where this might slot into the stack itself. Too high, and the code won’t be able to generate. Too low and the code is too specific. Combined with the lack of history and inability to understand the context it’s using, and you may end up with more confusion than not. In addition, GPT’s lack of understanding has produced erratic responses, making it impossible to ensure any level of safety.

LLMs are not a panacea, and significant work is needed to adapt them to roles outside of an editor and idea generator. At least in the near term, GPT and similar LLMs seem like more of an assistant than a dramatic new research paradigm. Like any tool, it may improve the speed at which grunt work is done. With a human researcher in the loop, it offers an advanced “rubber ducky” for academics, and simplifies the arduous paper writing process, and opens up interesting new avenues for Computer Science and Robotics research. I believe LLMs have a topic in robotics research but it is already improving researchers’ lives in its current state.

SeVY’s Corner

What do LLMs and robots have in common? How can they relate? Let’s explore.

Let’s start with large language models (LLM). For context, an LLM is a text field where a person can put in text and the computer will generate output text that is hopefully useful. You could ask it for help writing a poem, brainstorm ideas for the weekend, or let it summarize a long article. How does it work? The kindergarten answer is it is a program that reads a large part of the internet, figuring out how people talk. Then based on what the machine learned, it will try to guess the best next word using the input text as a guide. Improve the prompt and the results will be more useful. There is a lot more but that’s a good starting point.

Okay now for robots. To me, a mechanical engineer who has been building Lego, competition, and service robots, a robot is a device that senses the world around it and uses that knowledge and movement to accomplish a task. So Roomba senses wall hits, stair drops, and more, and using that knowledge moves to clean the floor > robot. A drone senses its height and tilt and using that knowledge stabilizes its movement so it can get a picture of a wildfire > robot. A coffee maker senses water temperature and if there are beans and using that knowledge makes a cup of coffee. But it does not move so the coffee maker is > not a robot.

So what do they have in common? The way I see it is they are both topical tools that enable humans to accomplish something. How do they relate? Well from the definition I would argue they are different technologies with little direct overlap. The best way I could see them being combined is with HRI (human robot interactions). My first inkling would be to have a human give a robot a task and the LLM could expand on the simple phrase and give the robot a procedure of clear instructions. However the real world is messy, LLMs are not consistent in answering, and robots do well at usually one or two things and need to be meticulously programmed for consistency. Given those reasons, I think this is unlikely to work. However, if we flip the direction and use the LLM to communicate from the robot to a human it becomes interesting. For example, a Roomba could put all its sensing and location data into an LLM, which could then summarize and parse the data to its user. The LLM could then output: “Your DJ Roomba cleaned the whole kitchen and dining room, the hallway had your lying dog blocking the way to the front door, and it got stuck under the leaves of your plant in the living room so it went into power save mode”. They could call it a Roomba Robot Report. I believe LLMs are a powerful way robots can translate to people meaningful insights.

SEAN’s Corner

Artificial Intelligence is increasingly upending knowledge work. However, its impact on physical automation may be even more impactful. In my section, I will discuss how the Robotics Snapshot inspired me to think about three areas that I believe LLMs and other foundational models will radically reshape within 2 years.

Interaction
Control
Design & Upgrade

Interaction

Any robotic tasks that can be procedurally described have likely been already automated, for example, CNC machines. For dynamic, complex environments like those we humans navigate daily, AI is required. This fuzzy input doesn’t just apply to how a robot executes on a task (i.e., control) but how a task is initially conceptualized (or reconceptualized, e.g., a robot discovering that there’s no water in the fridge). In complex-dynamic environments, even the initial tasking will eventually need to reflect the uncertainty of the tasker’s desire. For example, if we tell a robot, “go get me a spoon,” the task is conceptually well defined. But, truly valuable robotics will proactively predict and respond to conceptually vague commands like, “take care of the kids.”

The best models to take care of this are LLMs. These models will likely feature heavily in future robotic tasking. However, additional foundational models will be needed that better understand the social & human dynamics (human-social embodiment). These models will allow robots to fully understand the context in which they operate and therefore create more effective planning.

Control

I predict, new foundational models will be needed to effectively control the next generation of robotics. In fact, several models have already been shown effective:

Liquid Neural Networks (LiNNs)
Physics Informed Neural Networks (PINNs)
Spiking Neural Networks (SpiNNs)

These networks are better than current models at navigating the highly complex-dynamic environments robotics will increasingly find themselves navigating

These models will also impact robot design as additional robotic proprioception will be needed to update control systems. In the future, I believe the bulk of robotic input will be generated not by environmental signals, but by internal attention. I could see robot control increasingly looking like human systems where a model creates an internal estimation simulation (e.g., “I expect my arm to come in contact with the water glass with my next motor step”), which leads to a choice (e.g., “extend my arm more”), creating an action (e.g., the motors extend the robot’s arm). This leads to an evaluation of the previous estimation, then a correction (e.g., “oops, I missed the water. I should turn to the right.”) and then the next simulation loop. Research Professor Lisa Feldman Barrett suggests that it is these “feelings” in combination with context and higher level concepts that lead to what we generally term “emotion.” So it seems very plausible that they will have robotic feelings very soon.

Design & Upgrade

I can also see a future where, as robots prove increasingly capable of navigating more “worlds,” the demand, similar to the demand for AI in knowledge work, will hit an inflection point and spike.

In this world, the current, manual approach to designing robots will prove a bottleneck and new approaches will be needed to rapidly develop, redevelop, reconfigure, and deploy robots. This new approach will have AI-based robotics engineers going from requirements generation to implemented robot in minutes not months. LLMs will remain a critical piece of translating human-needs into usable designs.

An AI-managed design and implementation pipeline will allow robots individualized to specific users and their unique use cases.

Further, we can look past a world where we go text-to-robot, to a world where our personal AI’s predict and request the robots we don’t know we need yet.

Synthesis

Hold onto your robots — this discussion gets heated. During our discussion, we dove into the world of Large Language Models (LLMs) and their implications for robotics. We all agree that LLMs can enrich human interaction with robots. For years, the field of Human-Robot Interaction (HRI) has wrestled with how to make our exchanges with robots more authentic. Enter LLMs. They're so convincingly human that they risk misleading users into attributing them consciousness—a real double-edged sword.

Then the conversation took a pragmatic turn. Imagine asking ChatGPT to fetch you a glass of water. Simple request, right? But anyone versed in robotics knows there's a chasm between asking and doing. Where are the cups? What's the optimal water temperature? Such contextual hurdles are second nature to humans, but a stumbling block for robots. LLMs, however, are context specialists. They may lack physical form (i.e., embodiment), but they can provide a humanistic perspective.

Then, the room divided - Sevy and Connie see LLMs as a powerful tool, not as human replacements. Sean, however, envisions a paradigm shift. He suggested that businesses could employ customized LLMs to pool knowledge usually distributed among various team members, thereby aiding engineers in making informed decisions or making those decisions independently. Sevy and Connie counter with skepticism, arguing that LLMs merely offer "average" solutions by stringing commonly used words together.

Sean wasn’t convinced. He foresees rapid advancements in LLM capabilities, including integration with other foundational models, that he believes will significantly bolster their decision-making. It's also a notion supported by ongoing research, including insights from the Majumdar article.

So where does this lead us? Will LLMs herald a utopian society where machines perform +90% of jobs and humans enjoy universal basic income? Or are we heading toward a dystopia, where robots serve only the elite, leading to widespread unrest? While the future remains uncertain, the consensus is clear: LLMs aren't going anywhere. For roboticists, we all agree it will be critical to integrate these language models into existing workflows while being mindful of their current limitations.

We hope you enjoyed this blog post. If so, please share with someone who would also enjoy it.

Thanks,

Sevy, Sean, Connie