Mind-reading and imagination in robots

A collaborative humanoid robot is the kind of robot you know from sci-fi movies. They communicate with people naturally and need minimal input to understand what humans are really after. The value of these kinds of robots is clear, as they can help humans out in a whole range of annoying, dangerous, or even fun activities, without needing excessive training or instructions.

This level of collaboration between people, or robots and people, implies a few things. Firstly, they are using language to negotiate goals and to coordinate. When collaborating you aren’t just giving instructions, but rather trying to produce a shared understanding of goals, constraints, and plans. For example, it’s natural for humans to ask for clarification if they’re asked to do something and they cannot see the reason behind the request.

Secondly, both collaborators need to productively contribute, both in design of the shared plan and in its execution. Each is expected to offer up ideas and think through complications, and each needs to be able to take on some subset of the tasks that need to be done, and perform them with relative autonomy, while staying synchronised.

At its core these requirements amount to mind-reading and imagination, and those two are more alike than might appear at first glance. Effective language use, grounded in a shared situation and experience (pragmatics), requires imagining what the other person might know, think, or want. Likewise, contributing productively is impossible without a modicum of creativity.

Unfortunately the prevailing approach to language and collaboration in robotics is not congruent with the above reasoning. Language is typically seen as instruction parsing, and planning is seen as action policy selection rather than goal formation. Imagination is a crucial faculty for collaborative robotics, yet it is not being developed seriously.

As such there’s quite the gap between where robotics is headed and where we want robotics to be. We’ll end up with robots that are quite good at handing us apples, but not ones that can put a tent up when we forget the mallet for the pegs, or make toast but also do something if the toaster catches on fire, or alert us if something unexpected happens while supervising a child.

So how do we move forward with the development of robots that can really participate on a human level in a human world? It seems like the starting place must involve taking a stance on what it means to be human, and how the things that are special about being human, like language, creativity, cooperation, and empathy can be modelled from the ground up. This means taking a step back from data processing frameworks (i.e. LLMs/LMMs) that do an impressive job of mimicking humans, and reconsidering how we can begin to build in-the-world systems that emulate humans.