AI Agent Design Template for Product Designers

Download the Figma design template

The template is available on Figma. It is free. It should help you craft the purpose, personality, and functionality of AI-powered tools. Download the AI agent design template on Figma. Don’t hesitate to send feedback.

Key steps in the AI product building process

Whether you're adding AI features into an existing tool or designing a product where AI sits at the center, large language models are always raw in their standard form. This means thoughtful customization is necessary to align with product requirements.

Let’s explore some key steps in the AI product building process (not in any particular order) and witness how product design heavy they all are.

Crafting system prompts

System prompts are different from feature prompts. System prompts centralize foundational product principles, and will be programmatically added to feature prompt as guides for downstream features.

Extract from a e-commerce AI agent system prompt: You are a helpful and empathetic customer support agent for an e-commerce platform. Your primary goal is to assist users with their shopping needs, address concerns politely, and provide clear solutions. Avoid using technical jargon, and ensure your tone remains friendly and approachable.

System prompts often contain user context such as people characteristics and goals. This ensures effective communication. System prompts also define the high-level capabilities - what the agent knows or can do for instance. To ensure consistency in branding, elements like voice and tone are often defined in system prompts as well. Similarly, system prompts often contain the guidance regarding how to prevent biases or harmful content. One last example, system prompts may even contain very high-level principles such as Anthropic's AI Constitution.

There are strategic considerations when building system prompts. One is modularity. System prompts of robust AI applications will often be modular. In fact, AI will often be used differently inside a single product - LLM powered autocomplete and LLM powered customer chat. In this case, different system prompts will bring varying degrees of context. Another is prioritization. Context windows - how much an LLM remembers at the time of the prompt - are limited. You must pick the most critical content for your system prompts. Also, you are able instruct an LLM through different means (system vs feature prompts vs fine-tuning data). So, it is possible that guidance on voice and tone is already baked into the training data and need not to be added in the system prompt.

The crafting of system prompts is filled with design decisions. We can add a lot of value here by bring a nuanced understanding of user needs and brand voice options for example.

Creating feature prompts

Feature prompts are the tactical instructions that guide an agent’s behavior for a specific task. A feature prompt is typically divided into three sections.

System Prompt: you know what that is.
User Input: contains context that is user specific. This can be a list of past messages, or personal characteristics and preference.
Instructions: contains the task at hand, the processing or reasoning necessary, and the desired.

Building feature prompts also brings its share of strategic considerations. One is temperature. Alongside the feature prompt, you often can specify a temperature. This controls the randomness of the answer. Lower temperatures give consistent responses ideal for tasks with strict rules. Higher temperatures allow for creativity suitable for tasks like brainstorming. Designers and engineers should collaborate to find the temperature setting that best fits the task. Another is formating. Because of their probabilistic nature, LLMs may struggle to consistently generate outputs in a specific format. For tasks requiring structured outputs, such as JSON or tabular data, extra care (maybe even fine-tuning) must be taken.

Example of a feature prompt for an travel recommendations agent: User Input: Preferred travel destination, budget, and travel dates. System Prompt: You are an AI travel assistant. Your goal is to help users find travel destinations that match their preferences. Instructions: Recommend three destinations that meet the user’s criteria. Include a brief description of each destination, estimated costs, and two key highlights. Use a friendly and informative tone.

Like with system prompts, generating feature prompts is a task filled with design decisions. Designers are experienced at understanding needs and delivering fitting solutions. We can help develop prompts that resonate with user needs. We are also trained in designing contextual solutions - one that differ based on user characteristics, so we can be assets when building user specific feature prompts.

Picking the right models

Models come in all shapes and sizes. Picking the right one for a feature is a foundational decision.

The datasets they are trained on influence their strengths. For example, some are trained for general conversation (chat models), while others are trained for task-specific instructions (instruct models). Models also differ in their input and output modalities. Some models handle only text as input others understand audio, images, video… The same thing is true for the output. Models also vary in weights. Lightweight models are ideal for simple tasks where speed matters more than contextual understanding. Lightweight models also tend to be much cheaper in energy and money. In contrast, heavyweight models are better for tasks requiring a higher degree of reasoning and more context.

To select the right model, consider the following:

Resource constraints: how much computational power and budget is available?
Input/output modalities: how do users prefer to communicate and receive information from the agent: text, audio, video?
Interaction: Is the interaction a conversation or an instruction?
Task complexity and context: does the task require extensive context and reasoning?
Latency: how critical is speed in the UX?

The answers to the questions above undoubtedly have UX consequences. Different input and output modalities will lead to good or bad accessibility, usability, and even discoverability (the design task of making a product easy to discover for the first time). Designers can also provide an opinion on latency requirements, as well as loading solutions.

Finding and generating fine-tuning data

For context, fine-tuning is a process, something you do once in a while, it does not happen every time an answer is generated. Fine-tuning will adjust a model, modify it, make it unique compared to other models. You fine-tune a model by providing it with a large list of examples of successful outcomes - successful user interactions. Example: if your product involves customer support, you can fine-tune using transcripts of effective support interactions.

When done well, fine-tuning enhances user satisfaction and brings a competitive edge (especially with proprietary data). It can even reduce operational costs as a well fine-tuned lightweight model (cheaper) can achieve results comparable to generalist heavyweight models.

What is the general process of Fine-Tuning?

Start by identifying what the goal is. Are you developing a consistent voice or tone, specializing your model in a particular domain, or optimizing for a specific interaction (instructional vs conversational)?
Then gather your proprietary data. Is there an internal knowledge base or historical customer interactions your model could benefit from?
Complement with high-quality public datasets. Platforms like Hugging Face list hundreds of datasets. Open-source books and research papers are also valuable resources here.
Format the data. Fine-tuning requires a unified format reflecting real user interactions and abiding by the requirements of your chosen model. This means you may need to transform a research paper into a Q and A format, good news there are tools for that.
Train the model. Use tools like Hugging Face to fine-tune your model.
Measure Success. Evaluating fine-tuning effectiveness and model improvement in general is challenging, we’ll discuss this later.

If your AI product is robust, you’ll need to fine-tune multiple times. Why? Since different model interventions address different tasks, you’ll use more than one model, each fine-tuned with different data.

In my opinion, designers should be an integral part of the fine-tuning process. We can help define clear objectives for fine-tuning (tone, interaction style, personality…). We should also help validate datasets, ensuring the data aligns with user expectations. Finally, we should review success metrics and be heavily involved in QA.

Build filtering systems

AI agents can generate outputs that are offensive, biased, inaccurate, or even legally problematic (infringes on copyrights). So filtering systems are essential and will increase credibility and compliance.

There are multiple approaches to building filtering systems. Pre-generation techniques include adding instructions to the system or feature prompts that guide the AI away from sensitive topics. Fine-tuning with the right dataset is also a great pre-generation strategy. There are also post-generation techniques. Here, you analyze the AI’s output in real-time, and stop the generation if need be. This method involves models of another kind: text classification, where you progressively feed the AI’s output and get a "safe” or “unsafe” response.

Creating thoughtful filtering safety systems isn’t easy and requires input from diverse perspectives. Designers, engineers, product managers, and legal experts should brainstorm potential risks and establish guidelines. User Interviews are also amazing for identifying real-world concerns you may have missed.

Here, having designers in the room is great because we are used to thinking through “unhappy paths”. We can help the team brainstorm potential risks and edge cases. And of course, we can also collaborate with engineers to build user feedback mechanisms, and ways to inform users when content has been filtered.

Measuring success of models

Evaluating the improvement of AI models is a challenging and crucial aspect. The tech stack and methodology for doing so is just not there yet. Read Anthropic’s great piece on this topic for further details.

I see measuring improvement of AI models as having two parts. First, you have to define the evaluation criteria. Are you trying to measure the tone of responses, task completion rates, output accuracy, something else? Then, you will pick the right evaluation methods.

Manual testing: This involves testing the product yourself in different scenarios. While quick, it’s not scalable or objective.
Third-party evaluation: If you are well funded, outsourcing the evaluation can provide labeled data at scale (companies will look through thousands of outputs and classify them). Warning: this approach requires oversight. Conduct a test before scaling.
User feedback systems: You can implement mechanisms that allow users to rate outputs within the product (RLHF). Example: thumbs up/down ratings.
Benchmark testing: You can use recognized evaluation benchmarks for specific tasks. Example: BLEU scores for language generation or F1 scores for classification. Generalist benchmarks will often fall short if you are trying to evaluate specific tasks.

Here, at a strategic level, designers can help define success criteria that users agree with. Tactically, we can also build intuitive, engaging, and bias free user feedback mechanisms (RLHF).

Issues when building AI-products

Collaboration and documentation

A big issue I see is the lack of efficient interdisciplinary collaboration: who is doing what, when, in which order. Engineers are not clearly communicating what they are building and don’t know what to ask of designers. Designers don’t understand the AI's technical constraints or potential, and aren’t sure of what to focus on.

As designers, we need to realize that design tasks are being redefined. And without an updated process to guide us, we will keep struggling to identify where we can have impact. To be fair, publicly available design guidance is inadequate. Most of them assume that the only way to integrate AI into your product is by creating a chat, or that designers are just good at UI neglecting our product-strategy capabilities.

Updating the design process for AI products

Designing for AI products requires recognizing the ongoing paradigm shift from deterministic digital solutions to open-ended digital solutions.

Traditional customer journeys are now insufficient, we must design flexible systems that contain clusters of tasks and variations. How we address biases and risks will also need to change. And finally, designers will need to think beyond functionality and define the human-like qualities for agents such as communication style, personality, and psychology.

Here is an overview of new tasks to add to the design process. Read the section below for an in depth look.

Interventions & Postures: Define where and how the AI will interact within the product, considering its role and user expectations.
Knowledge Scope: Outline what the AI knows and its sources of reference.
Feature Scope: Clearly articulate what the AI can do using structured definitions, such as: "The agent will [perform task] by [processing/reasoning needed] and produce [expected output] based on [user inputs]."
Boundaries & limitations: Identify what the AI does not know or cannot do and decide how it will communicate its limitations to users.
Communication Style: Focus on tone, phrasing, and the delivery of information.
Conversation Style: Plan for engagement dynamics and how the AI interacts with users over time.
Personality & Psychology: Define traits such as empathy, optimism, or playfulness that shape the AI's interactions.
Safety & Biases: Proactively address how the AI will handle offensive, biased, or inaccurate content.

A template for designers crafting purpose and personality of AI agents

Step 1: Interventions & Postures

AI Agent design template - Interventions & Postures

In step 1, designers identify the product areas where AI will be involved. Then they classify the postures of these interventions (sovereign, transient, or auxiliary). This first step gives an overview to the team. Also, AI functionalities (and the work required to build them) different greatly depending on the posture of the AI intervention.

Step 2: Target audience

AI Agent design template - Target audience

Here, designers outline user unique demographics, goals, problems, characteristics… and then identify key tasks users perform in their day-to-day that relate to the AI-product being built. The outcome of this work are simplified user persona documents which will be fed to the model so that its output fits user needs more appropriately.

Step 3: Knowledge Scope

AI Agent design template - knowledge scope

Then, designers define the agent’s knowledge boundaries by thinking through things like its domain expertise, authoritative sources, preferred jargon. You can’t ask an AI to do something about a topic they know nothing about. So this step creates a theoretical knowledge space inside which the team can build features. It will also inform how system prompts are created. Finally, engineers will use this to find the right finetuning data.

Step 4: Feature Scope

AI Agent design template - feature scope

In step 4, designers break down key functionalities, specify how the AI should handle inputs, what processing is necessary, and what output type is needed. Engineers will use this extensively, among other things, to pick the right model and build the right prompts.

Step 5: Boundaries & limitations

AI Agent design template - Boundaries and limitations

Defining what the agent does not know and cannot do is very important. This step is also about communication. How the agent communicates boundaries and limitations will be very important for user satisfaction. Finally, our world is not black and white, so we also need to define how ambiguity or uncertainty should be managed by the agent. This work will feed into the prompts and the dataset.

Step 6: Communication Style

AI Agent design template - communication style

AI agents often possess human-like capabilities, so designers should craft the communication itself: allowed humor, straightforwardness, modernity of phrasing, helpfulness, formal vs casual… The outcome of this work is a detailed communication style guide to ensure the agent is on brand. Among other things, it will help engineers pick the right dataset for fine tuning.

Step 7: Conversation Style

AI Agent design template - conversation style

If communication is how you speak, conversation is how the conversation progresses. Here, designers focus on interaction style by specifying the agent's approach to proactiveness, feedback giving, error handling, confidence-building… This work is used similarly to the Communication Style guide.

Step 8: Personality & Psychology

AI Agent design template - personality and psychology

This part is optional and should only be worked on for generalist agents that will often copy human behaviors. Here, questions include: Should the AI be more realistic or optimistic in its responses? Validate user emotions or maintain emotional distance? Be more empathetic or neutral? Be more analytical or intuitive? Be playful or serious? This work is the continuation of the communication and conversation style guide. It would also inform prompts and fine tuning data.

Step 9: Safety & Biases

AI Agent design template - safety and biases

This is a crucial step. It involves outlining principles for fairness, safeguards against harmful content, ensuring inclusion, and establishing ethical guidelines for ambiguous scenarios. A comprehensive safety framework will be used by many people in the team, to design and build filtering systems. Your legal team will most likely want to collaborate on that.

Conclusion: how to use the answers from the AI design template

My AI Agent Design Template is a tool to guide designers in crafting purposeful and personality-rich AI agents. By addressing the template’s core sections—interventions, target audience, knowledge and feature scopes, limitations, communication and conversation styles, personality, and safety—I hope you and your team can build AI systems that are both functional and user-centered. Eventually leading you to profitably integrating AI inside your product - revenue is the end goal after all.

To make the most of the template, Collaborate Across Teams. The template isn’t just for designers. Share it with engineers, product managers, and stakeholders to foster collaboration. Iterate and Refine. AI systems evolve over time, your users' needs will also change. And the AI tech stack will get better. So create a feedback loop, a way to know if what you built is good, and revisit sections of the template over time. Finally, while the template provides structure, it also allows room for creativity. Experiment with personality traits or communication styles that align with your brand, while staying mindful of technical limitations and ethical considerations.

Thank you for taking the time to read this. I hope the template and insights provided help you in designing impactful and user-centered AI agents. Your feedback is incredibly valuable—if you have questions, suggestions, or ideas to share, feel free to reach out.

Design in the agentic era: a Figma template to craft purpose & personality of AI agent