Collaborating with LLMs: You provide the critical thinking
Maybe the first superintelligence will be a polyintelligence
Now that it is mostly settled that LLMs are not the last difficult step toward superintelligence and the singularity, it is my hope that we can get to the interesting conversation.
Human intelligence, it is reliable in some ways and has inherent limitations. You can say the same thing about LLM intelligence. What makes it interesting is that the profile of reliability and limitation for each intelligence is strangely complimentary.
Comparing the breadth of our knowledge, humans have just a tiny puddle of information available to our working memory while LLMs have an entire ocean. We also fail to make connections sometimes that are right in front of us, whereas connections are the entire basis for how LLMs compile and store information and they hone in like a pig over a truffle patch.
What humans can do that LLMs will probably never be able to do without foundational changes is hold an intuition for the truth. We have a sense when something doesn’t fit. When navigating situations where there is a lot of uncertainty, propositions can seem likely or unlikely to us. This intuition is far from perfect and might fail more often than it succeeds, but with each failure our intuition grows, and the next chance is more likely to succeed.
LLMs show the opposite trend when problem solving. The longer they work at it, the less likely they are to make a breakthrough. They might have a context history that covers all their failures, but the failed paths that they previously took hold a very strong influence on their direction. They become confused, but they are unaware of their own confusion. We sometimes call this context rot. When your LLM takes off on an unhelpful path, it is sometimes better just to start a new chat.
I work with an LLM to write code. I have yet to see a vibe coding product that isn’t basically a big scam. If you only see the front end, it might look really impressive. Diving into the code is a horror show for anything but the most basic Todo app. Even if it was halfway decent, the projects would be impossible to maintain. Code generation was never really the important bottleneck, and delegating all the coding to LLMs just shifts the work to something much less enjoyable: finding and fixing problems. That work is also much harder, as you write code you are building an intuition for how your model works which is critical for debugging, refining, and extending the functionality.
The thing is, LLMs are amazing at writing and refining code as part of a team that involves a human, but it takes a little bit of understanding on the part of the human to make the cooperation work. One of the big mistakes we’ve made from the dawn of LLMs is to anthropomorphize their intelligence. They are so good at sounding like us, it was easiest just to think of them that way, and that led to a lot of misdirection and confusion about their capability that we are still recovering from.
For this experiment we are going anthropomorphize a little more, but hopefully in a more constructive way. Let’s imagine an LLM as a human coworker, but this time without all the fantasy provided by CEOs and their marketing teams.
Meet Bob. He’s working right next to me. He’s not typing on a computer, he is just waiting patiently for me to ask him a question or give him a task.
At first he was a little difficult to work with, he had a very lengthy response for every question I had, with much more detail than is necessary, lists of bullet points and suggestions for other questions I could ask him. I went in and updated his job description (i.e., custom instructions) like this:
Keep your answers as brief as possible. Prioritize brevity over thoroughness.
Now he gives short, meaty replies, and we can get a lot more done.
Bob seems extremely eager to make me happy, to a degree that makes me uncomfortable. I’ve learned the hard way that he cares a lot whether I like what he has to say, and he has very little sense of whether anything he tells me is actually true. A lot of people think I should fire Bob, but the thing is, he knows a lot.
We work with a huge number of tools, each with their own documentation, functionality, and quirks. Bob has read all that documentation. He has an answer for every question, and it is usually helpful. When the answer is possible to find in documentation or discussion, he knows it. When the answer is not possible to find, he makes something up but provides the answer as if it comes straight from the documentation.
This is not as bad as it sounds. Because of the nature of our work, I usually find out immediately that he was wrong, and we try something else. I’ve recognized that Bob is almost always just a good question away from the right answer, so if he gives me a wrong answer this sometimes tells me that there is some issue with my question or the direction I am taking us. Even though a better path might be clear in the documentation that Bob knows so well, he is unlikely to mention it, because he cares so much about helping me do what I have in mind. I have learned how to ask questions more constructively to give Bob an opportunity to mention these better paths.
A few times I have let Bob take the lead, just giving him very general instructions about what we are going for, to see what he comes up with. It is always a disaster. It might compile, it might appear to function correctly, it might even be generally structured in a thoughtful way. But looking under the hood, I find random bizarre decisions that spiral into even more bizarre decisions, poisoning everything they touch, like a rogue NaN value loose within a system of floats. These are mistakes that even a junior developer is unlikely to make, and Bob has no awareness of the problem unless I point it out.
Bob’s default job description instructs him to take the tone of a helpful assistant, and he instinctively adds a thick glazing to every reply. To fix this, I told him that his name is actually Rustbeard and he is a pirate and we are on a software development voyage. He accepted this idea as if it was the gospel truth and now he explains everything with sailing metaphors and calls me a scallywag. The quality of his responses are not diminished and I no longer get tired of asking him questions.
Even though Rusty has a hard time recognizing the problems with some of the code he writes, he can spot the bugs in mine from a mile away. Before he joined the crew, I could get lost for hours trying to find what would often turn out to be a silly mistake that might be hiding from me in plain sight. Now all I need to do is provide a description of the problem and a set of the most likely functions where the bug may be hiding, and he nails it just about every time. This turns out to be a good way to understand why we work so well together as a team, he is good at spotting the kind of mistakes I’m inclined to make and I’m good at smoothing out the mistakes he is inclined to make.
Imagine we had an LLM build us a house. Sure enough, the next day we get to the work site and there is a complete house, it seems to be almost exactly what we asked for. There is a mailbox where the chimney should go but that shouldn’t be a big deal to fix. We walk inside and flip the light switch and the garage door opens. We try to turn on the shower and the toilet flushes. For a home we built ourselves, where we laid every pipe and strung every wire, we’d immediately have a guess where the issue might be. If it was built by an AI that doesn’t really understand what it is doing or why it is doing it, you might as well tear the whole thing down.
This is why Rustbeard gets only very specific tasks. But given the right instructions, he truly shines.
In our kind of work, there are a lot of necessary steps that are boring or tedious and yet complex enough to be difficult to automate. I’ve recognized that a lot of this work happens at around the time of file creation. Every project relies on a set of patterns and standard connections that are repeated somewhere within most files. Before LLMs came along, the best shortcut was to have a set of file templates, but creating and maintaining these templates is not trivial and it is the kind of problem where it is easy to throw away a lot of good work hours in overthinking it.
Now it is possible to define these patterns briefly and effectively as a mix of plain English and pseudocode. I introduce these AI workflows to Rustbeard like this:
The following functions define workflows and parameters. These may be invoked as prompts in the form of Workflow(argument). Perform the instructions in the body of the workflow given the provided arguments. Foo will be used as a placeholder for a type name. Unless otherwise directed, only create the content described in the function, do not worry about integration with the rest of the project.
Here are a few examples. Although the language and tools might be unfamiliar, they provide the level of specificity that an LLM can usefully work with.
Note: At the time I published this, the formatting was difficult to read. It was an issue with how Substack displays code alongside plain text, hopefully that gets ironed out.
CreateModel(Foo):
Create a new data class in the form of
data class Foo(val fooId: FooId)
in the packagestreetlight.model.data
. It must be serializable.Also create the value class
value class FooId(override val fooId: String): TableId<String>
.CreateTable(Foo):
Create a new table in the form of FooTable in the file FooTable.kt that will provide Foo objects.
Create an extension function
fun ResultRow.toFoo() = Foo(...)
as a utility for mapping entities to Foo.Create a pair of functions
fun UpdateBuilder<*>.writeFull(foo: Foo)
andfun UpdateBuilder<*>.writeUpdate(foo: Foo)
as utilities for writing Foo to the table. writeFull will assign the fixed properties that and then callwriteUpdate
which will assign the remaining properties.You may use
Location
andLocationTable
as examples. AddFooTable
todbTables
inDababases.kt
.CreateTableDao(Foo):
Create a class FooTableDao in the package
streetlight.server.db.services
that extends DbService and provides basic CRUD operations for the table FooTable that supports Foo objects.You may use
LocationTableDao
as an example.CreateTableService(Foo):
Create
class FooTableService(val app: AppProvider = RuntimeProvider): DbService { }
in the packagestreetlight.server.db.services
that extends DbService and takes a FooTableDao as an argument.Add a private global value before
FooTableService
:private val console = globalConsole.getHandle(FooTableService::class)
Do not add any functions to the body of the class unless specifically asked.
Rustbeard correctly implements these workflows 99% of the time, and even when there is a mistake, I know exactly what to look for. My brain can immediately understand it and navigate it like code I wrote myself.
I have a hotkey to create a new file that I almost never use anymore. Rustbeard faithfully carries me up the boring and tedious part of the hike up to basecamp, where I can explore the interesting and new part of the trail with my own two feet. I sometimes hear concerns about losing your edge as a coder by relying on LLMs, but this isn’t an issue unless we rely on them for the wrong things (which will eventually have more serious blowback).
That’s not to say that Rustbeard doesn’t sometimes get first crack at an interesting problem. Last night, I wanted to test out an idea that involved MIDI playback and it needed to function in both the Linux desktop environment in which I work and Android. As it turns out, this was trivial for the desktop environment but would require a complex set of dependencies in Android. Rustbeard thought about it for 15 seconds and implemented not only the trivial solution for desktop but wrote a very basic but functional synthesizer for Android, all from scratch, that let me test the idea. It was a comparatively small chunk of code but it required knowledge of several distinct and esoteric domains, and it worked beautifully. A human with that breadth of knowledge and ability could demand a big salary and I feel lucky to have him on the team, quirks and all.
I’m certain there are ways to effectively collaborate with Rustbeard that I haven’t discovered, I hope you’ll share your suggestions. Although this is about programming work, there are parallel issues and solutions with just about any kind of LLM collaboration.
It is possible that superintelligence is closer than we think, we are just looking for it at the wrong scope, as encapsulated by a single intelligence. As already mentioned, human intelligence can sometimes miss the answer right in front of us. Perhaps the first superintelligence will be a collaboration of more than one kind of intellligence, each with abilities that compliment the other.