Deveshi Modi

D MODI

Resume

D MODI

Resume

Developing Guidelines for AI Companions

How we designed for emotionally intelligent AI companions through analysis, testing, and synthesis.

Role

Researcher

Duration

10 months

Status

Paper under review

Over eight months at UW, I researched how to make AI companions feel genuinely supportive without crossing the line.
I Synthesized 500+ design principles from FAANG AI guidelines and evaluated 17 AI models to create 27 rules for safe, emotionally aware AI — balancing AI autonomy with clear user control.

The work is now under review, so the full framework can’t be shared yet, but the insights shaped how I think about designing AI that understands people.

Why Companion AI Needs Better Design?

AI companions aren’t built for productivity — they’re designed to talk to you like a friend. That makes the design challenge completely different.

These tools hold conversations, reflect emotions, respond to distress, and even form long-term memory.

Sometimes they even act on their own — bringing up past events or steering the conversation in ways you didn’t ask for. That’s where autonomy meets user control, and where most task‑bot guidelines fall short. They miss the nuances of warmth, boundaries, and consent.

📋

🫂

🆚

📋

🫂

🆚

So we decided to ask

What does responsible AI design actually look like?

And what do designers need to build AI that feels human without misleading users?

Does this Matter Beyond AI Companions?

Yes, and here's why — the same design tensions show up far outside the "companion AI" world. Think about agentic AI tools:

A sales dashboard AI that quietly updates forecasts.
A travel tool that books tickets without asking.
A calendar that shifts meetings on its own.

Whether it’s for personal connection or business workflows, the question is identical: how much should the AI act on its own, and when should the human stay in control? That’s the heart of designing for autonomy — in companionship, in enterprise, and everywhere in between.

Timeline

Hover on a dot to pause the timeline!

🕵️Sept 2024Started research (yay)

We started by reviewing what already exists

Our first goal was to understand how existing design guidelines approached tone, memory, and emotional design. We analyzed over 500 principles from sources like Jakob Nielsen, Microsoft, Google, and OpenAI.

We went through every principle individually marking them with yes, no, or maybe based on relevancy to AI companions.

❌

✅

🤔

❌

✅

🤔

Disputed ones? We brought them to the group and talked through the edge cases. This part got intense, but it helped build shared ground.

What are real AI companions doing today?

We evaluated 17 existing AI companions (and non-companions) — including Replika, Pi Ai, Chai, ChatGPT, talkie — to test how they respond to emotionally charged scenarios.

We looked into

How they express empathy

How tone shifts over time

Whether they retain or forget past conversations

How they say "no" or respond to harmful requests

Many bots contradicted themselves. Some went from “I love you” to “I don’t remember you” overnight. Others mirrored trauma too closely, or refused support entirely. We documented the inconsistencies.

Next we grouped our findings

We grouped our findings across both phases — synthesis and evaluation — into core themes. This phase wasn’t just clustering. We thought of what kinds of emotional risks are users facing here? which interactions cross the line? which ones are genuinely helpful?

We printed out pain points and principle summaries, then physically arranged them into thematic maps. Over time, categories started to solidify and guidelines started to form.

An image of themes slowly coming together on a very full table

We tested these ideas with external reviewers

To pressure-test the guidelines, we invited external reviewers to interact with a set of different AI companions first—no instructions, just explore. Then, we asked them to run heuristic evaluations using our draft guidelines.

For each guideline, reviewers were asked:

Does this apply to the interaction you just had?

Is it clear what this principle is asking for?

Can you spot an example of it being followed—or ignored?

This gave us direct, grounded feedback. The process helped us see which guidelines worked—and which needed rethinking.

We ended with 27 guidelines, organized into these 7 categories

🔐

🛡️

🦄

👀

🕺

⚠️

🧭

🔐

🛡️

🦄

👀

🕺

⚠️

🧭

🔐

🛡️

🦄

👀

🕺

⚠️

🧭

These aren’t just nice-to-haves. They’re meant to set emotional boundaries, define tone, and help teams navigate messy edge cases where AI feels too human — or not human enough.

Note: Due to the review process, we can’t publicly share the full guidelines until after publication.

What I took away from this

This project changed how I think about emotion in design. It’s easy to say “AI should be kind.” Much harder to define how

Some of what I learned:

Synthesizing 500+ principles forced me to ask what matters most in emotionally complex contexts.

Clustering taught me how to carve structure out of qualitative mess.

Heuristic evaluations reminded me users want care, boundaries, and consistency—not just features.

How they say "no" or respond to harmful requests

This wasn’t about romantic AI. It was about companionship — about designing something that can listen, acknowledge, and step back when needed.

Whats Next?

This work is currently under review. But the hope is bigger than publication. We want these guidelines to be usable — for teams building emotional AI tools, for researchers studying the space, and for people who just want these systems to be a little more human.