Generative AI Agents Simulate Real Human Behavior

Stanford Researchers have used Generative AI to simulate believable human behavior in a simulated world.

Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, researchers describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. They instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty-five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors.

Generative Agents Work Demonstrated Generative AI That Plans and Have Memories

Agents perceive their environment, and all perceptions are saved in a comprehensive record of the agent’s experiences called the memory stream. Based on their perceptions, the architecture retrieves relevant memories, then uses those retrieved actions to determine an action. These retrieved memories are also used to form longer-term plans and to create higher-level reflections, which are both entered into the memory stream for future use.

Microsoft researchers had previously noted that ChatGPT like AI could do everything humans do (reason, solve problems, think abstractly, comprehend complex ideas) but were not able to plan and did not have much memory. This work does more to fill in the planning and memory issues and learning from experience.

Future Work and Limitations

In this work, researcherse have presented a first instantiation of generative agents. Future research can expand on the modules of the proposed generative agent architecture outlined in this paper. The retrieval module, for example, could be enhanced to retrieve more relevant information given a context by fine-tuning the relevance, recency, and importance functions that form the retrieval function. Additionally, efforts can be made to improve the architecture’s performance, making it more cost-effective. The present study required substantial time and resources to simulate 25 agents for two days, costing thousands of dollars in token credit and taking multiple days to complete. To enhance real-time interactivity, future work can explore parallelizing agents. Furthermore, with advances in underlying models, researchers expect improvements in the agents’ performance.

The evaluation of generative agents’ behavior in this study was limited to a relatively short timescale, and future research should aim to observe their behavior over an extended period to gain a more comprehensive understanding of their capabilities and limitations. Varying and contrasting the underlying models, as well as the hyperparameters used for the agents during future simulations, could provide valuable insights into the impact of these factors on the agents’ behavior. Additionally, given the known biases of language models, it is possible that generative agents may output behavior or stereotypes that reflect bias. To mitigate this, further work on value alignment will be necessary.

Generative agents may fail to generate believable behavior for some subpopulations, particularly marginalized populations, due to data deserts. The researchers have limited knowledge of the robustness of generative agents. They may be vulnerable to prompt hacking, memory hacking—where a carefully crafted conversation could convince an agent of the
existence of a past event that never occurred—and hallucination, among other things. Future research can more comprehensively test these robustness issues, and as large language models become more resilient to such attacks, generative agents can adopt similar mitigations.

9 thoughts on “Generative AI Agents Simulate Real Human Behavior”

  1. All generative AI systems utilize huge databases scraped from the internet. As such, they present reality as depicted in the net content of the internet. That’s great for a lot of things. For example, such systems will easily replace sports writers, because their reports are formulaic. And such systems will surely generate believable behavior of normal people. I have to wonder, though, what could possibly be interesting about reading about the behavior of normal people. Who wants to read a tale of somebody waking up in the morning, brushing their teeth, eating breakfast, driving to work, etc, etc, etc? We already experience that every day!

    People are imagining these systems generating interesting stories. That won’t happen, because stories have far more complex architectures than people realize. The interconnections between events in a story are too extensive to be handled by an LLM. Imagine Luke Skywalker going back to Tatooine after he destroys the Death Star in order to visit his Uncle Owen and Aunt Beru. Wait a minute! They were killed by the Imperial Storm Troopers early in the movie! Do you really think that an LLM could span so long a period of the story to make these events consistent? Stories are full of complex interconnections between their components that MUST be maintained.

    They also contain lots of implicit information based on social intelligence. Why in the world would Luke react the way he did when Darth Vader revealed that he, Darth, was Luke’s father? Why wouldn’t Luke have said, “Well, gee, if that’s the case, Dad, let’s not fight; what do you say we go out and have a beer and catch up on the last twenty years?” How would an LLM know not to do that?

    If you built an LLM based exclusively on the huge amounts of textual material from dramatic stories, you’d get endless contradictions. For example, suppose you tried to build a murder mystery generator, and among the many murder mystery novels you put into your database, you included the work of Agatha Christie. In one of her Hercules Poirot novels, Poirot solves the mystery by attacking and tearing apart a clay model of a horse’s head that the murderess had in her art studio. Inside it he finds the hidden murder weapon. How did he know to tear apart the clay horse’s head? Because early in the story, the woman told him that she hated horses. He therefore concluded that she would not have sculpted a clay horse’s head for artistic reasons, and therefore MUST have made it to conceal the murder weapon. How in the world do you think that an LLM would be able to put those pieces together? Or to quote a more famous example, how would an LLM know that “the dog didn’t bark” would solve the mystery?

    • You know what this would be useful for, potentially?

      Generating fake people with plausible personalities and normal reactions, to download into bots that have to heavily interact with humans socially. You wouldn’t WANT such to be dramatic. You’d want them to be conventional and boring, but able to understand and respond to normal social cues, and make small talk as necessary. You know, like the ‘replicants’ in Blade Runner, who had fake back-stories?

      So your android geriatric nurse can swap stories with you about the grandkids, your grandkids’ robot nanny properly socializes your great grandkids to be sane, your dental assistant chats with you while cleaning your teeth… Half the people you interact with can be synthetic, but interact just like real people would.

      THAT is what you could use this for.

      And when we colonize the next star system over, with bots and artificial wombs, the new generation of humans produced on site can just be slotted into an already functional society.

  2. And by mid-century, when even home computers might be able to host thousands of virtual people, people that are much more people than these are, there are going to be some moral and ethical considerations. Possibly followed by some legal ones.

    I do recall how shocked I was when I discovered my lovely, sweet, caring daughters liked to occasionally take away the pool ladders when a sim was swimming, or delete the doors to a room where a sim was cooking (and had no fire extinguisher) just because they liked to occasionally grow the little cemetery behind the sim house.

      • For now? But a line may eventually be crossed.

        And the ability to create an intelligent being does not automatically confer the right to torment or destroy one, lest many more children would be in far greater risk from their own parents than they are now.

  3. I suspect the game industry is hard at work producing LLMs that are more compact and quick running on players’ local machines in order to cheaply support games with very realistic characters. Obvious paths would be monolingual training, training only on dramatic materials in genre (e.g. high fantasy scripts and novels) plus game-world background. There’ve been some interesting gaming experiments already.

    • What requires big computational resources is the training, but once trained, i think that
      the model can already run on a good pc. The video game industry is already taking notes.

Comments are closed.