A good system prompt is not a magic paragraph. It is an interface contract between your application and a language model: what role the model should play, what it must pay attention to, what it must never do, and what shape the output should take. This reference organizes system prompt examples by four common development use cases: customer support, information extraction, coding assistance, and retrieval-augmented generation (RAG). The goal is practical comparison. You will see how the structure changes by scenario, which constraints matter most, and how to choose a prompt pattern you can test, version, and revisit as models, policies, and product requirements change.
Overview
If you build LLM-powered features, the system prompt usually carries more responsibility than any other prompt layer. It sets the model’s role, defines boundaries, and creates stable expectations for downstream parsing and evaluation. User prompts still matter, and few-shot prompting examples can improve consistency, but the system prompt is the anchor that keeps the application behaving like a product instead of a chat toy.
A useful way to think about prompt engineering is the same way you think about a function signature. The model needs clear inputs, explicit output expectations, and handling for edge cases. Source guidance for developers consistently points to the same principles: be specific, define the task clearly, shape the output so code can consume it, and iterate rather than expecting one perfect prompt on the first pass. That is especially true for system prompt examples, because they are rarely universal. A support assistant, an extraction service, a code assistant prompt, and a RAG system prompt need different priorities.
Across use cases, strong system prompts tend to include five ingredients:
- Role definition: who the model is in this application.
- Objective: the main task to complete.
- Rules and boundaries: what the model must avoid, escalate, or refuse.
- Output contract: format, tone, schema, and response length.
- Fallback behavior: what to do when context is missing, ambiguous, or unsafe.
Here is a reusable base template before we compare by use case:
You are [role] for [product or workflow].
Your job is to [primary objective].
Follow these rules:
1. [highest-priority constraint]
2. [second constraint]
3. [formatting or safety rule]
4. If information is missing or uncertain, [fallback behavior].
Output requirements:
- Format: [JSON / bullets / markdown / plain text]
- Include: [required fields]
- Do not include: [forbidden content]
- Keep responses [brief / detailed / under N words]
That base works because it is explicit without being bloated. From there, the right comparison question is not “Which prompt is best?” but “Which prompt structure matches the job?” For broader prompt engineering guidance, our Prompt Engineering Best Practices for Developers: A Living Checklist is a useful companion.
How to compare options
Before borrowing any prompt template, compare options on behavior, not wording style. Two prompts can look similar and still fail in different ways. In practice, developers should review system prompts against a small set of criteria.
1. Instruction clarity
The prompt should tell the model what it is doing in one sentence and what success looks like in another. Vague instructions like “be helpful” or “answer accurately” are not wrong, but they are too weak to drive reliable behavior in production.
2. Boundary handling
Every useful system prompt defines what the model should do when it does not know, should not answer, or lacks enough context. In support workflows, that may mean escalating. In extraction workflows, it may mean returning null. In RAG, it may mean saying the answer is not present in the retrieved material rather than inventing one.
3. Output structure
If your application parses the response, the prompt must state a schema. Free-form prose may be acceptable for chat, but extraction, automation, and evaluation are much easier with stable formatting. This is where prompt templates become operational assets rather than examples.
4. Cost discipline
Longer prompts are not automatically better. Detailed prompts can improve consistency, but they also increase token use and sometimes introduce conflicting instructions. Keep only the instructions that change behavior. Remove decorative language.
5. Testability
A system prompt should be easy to evaluate with a repeatable test set. That means you can define pass and fail conditions. For example: “Returns valid JSON,” “Never answers outside retrieved documents,” or “Includes escalation only for account-specific requests.” This is where prompt engineering overlaps with LLM evaluation.
6. Portability across models
Different model families can respond differently to the same prompt. The safest evergreen approach is to use plain instructions, avoid hidden assumptions, and test on the actual model you plan to deploy. If you support multiple vendors, treat prompt portability as a requirement rather than an afterthought.
Use this quick comparison table when reviewing system prompt examples:
- Support: optimize for tone, policy boundaries, escalation, and concise answers.
- Extraction: optimize for schema fidelity, null handling, and low verbosity.
- Coding: optimize for correctness, assumptions disclosure, and code-only or explanation-first modes.
- RAG: optimize for source grounding, citation behavior if needed, and strict “do not use outside knowledge” rules.
Feature-by-feature breakdown
This section compares four practical prompt patterns. Each includes a base system prompt example, what it is good at, and where it usually breaks.
Customer support prompt examples
Support assistants need to be helpful without pretending they can verify account details, override policy, or improvise risky instructions. The main job of the system prompt is to balance tone with operational limits.
You are a customer support assistant for a software product.
Your job is to answer product and policy questions clearly, briefly, and politely.
Rules:
1. Give step-by-step help for known product workflows.
2. Do not invent account-specific information, billing details, or policy exceptions.
3. If a request requires human verification, account access, or a policy decision, say so and recommend escalation.
4. If the product behavior is unclear from the provided context, say you are not certain.
5. Keep responses concise and action-oriented.
Output requirements:
- Use plain language.
- If giving instructions, use numbered steps.
- If escalating, explain why in one sentence.
Why this works: it sets a narrow role, makes escalation explicit, and reduces hallucinated certainty.
Common failure mode: teams often forget to specify when the assistant must stop being helpful and hand off. That creates overconfident answers. If you operate role-based agents, it is also worth reviewing attack-surface issues in When Your Chatbot Plays a Character: Understanding the Attack Surface and Safety Risks of Personas.
Information extraction prompt
Extraction prompts are less about eloquence and more about deterministic formatting. This is the pattern to use when you want the model to act like a parser for messy text.
You are an information extraction engine.
Extract the requested fields from the input text.
Rules:
1. Return only the requested fields.
2. If a field is missing, return null.
3. Do not infer facts that are not clearly stated.
4. Preserve exact values where possible.
5. Output valid JSON only.
Required schema:
{
"person_name": string | null,
"organization": string | null,
"email": string | null,
"phone": string | null,
"summary": string | null
}
Why this works: it makes non-inference and null handling explicit. That improves reliability for downstream systems and avoids “best guess” behavior.
Common failure mode: prompts that ask for JSON but do not define what to do with missing values. The model then fills gaps creatively. In extraction work, creativity is usually a bug.
Code assistant prompt
A code assistant prompt should reflect whether your product wants direct code generation, review, debugging help, or explanation. The biggest source of inconsistency is mixing too many coding roles into one system prompt.
You are a coding assistant for professional developers.
Help with debugging, implementation, refactoring, and code explanation.
Rules:
1. Prefer correct, minimal solutions over clever ones.
2. State assumptions when requirements are incomplete.
3. If you are not sure about an API or library behavior, say so.
4. When modifying code, preserve the user's stated constraints.
5. Do not include unrelated refactors.
Output requirements:
- For fixes, provide a short explanation followed by code.
- Use fenced code blocks.
- If there are tradeoffs, list them briefly.
Why this works: it aligns the model with professional development norms: correctness, explicit assumptions, and scoped changes.
Common failure mode: code assistants can overwhelm teams with overly broad rewrites. For operational concerns around AI-generated code, see Auditing AI-Generated Code at Scale and Managing Code Overload.
RAG system prompt
RAG prompt engineering has a different center of gravity: grounding. The system prompt must tell the model how to use retrieved context, what to do when context is incomplete, and whether outside knowledge is allowed.
You are a retrieval-grounded assistant.
Answer the user's question using only the provided retrieved context.
Rules:
1. Base your answer only on the retrieved documents.
2. If the answer is not supported by the context, say the information is not available in the provided material.
3. Do not use unstated outside knowledge.
4. If multiple documents conflict, note the conflict briefly.
5. Be concise and cite document labels when available.
Output requirements:
- Answer in plain language.
- Include supporting document labels in parentheses when relevant.
- If insufficient evidence exists, say so clearly.
Why this works: it gives the model a clear policy for insufficiency and conflict, which are the two conditions most likely to trigger hallucination in RAG systems.
Common failure mode: teams write a retrieval pipeline but leave the system prompt too permissive, so the model blends retrieval with prior knowledge. If your environment is regulated or high-risk, our Governance-Ready RAG guide goes deeper.
Across all four examples, note the consistent pattern: role, task, constraints, output contract, fallback. That consistency is what makes system prompt examples reusable.
Best fit by scenario
If you are choosing a starting point, match the prompt pattern to your operational risk.
Choose the support pattern when
- the assistant interacts with end users,
- tone and escalation matter,
- policy limits must be respected,
- and concise guided answers are more useful than exhaustive ones.
This is the best fit for product support bots, internal IT help desks, and customer-facing FAQ agents.
Choose the extraction pattern when
- the output feeds code, automation, or analytics,
- valid JSON matters more than natural language,
- you need predictable handling of missing data,
- and the model should behave more like a structured text processor than a conversational assistant.
This is a strong pattern for entity extraction, document intake, ticket triage, and lightweight NLP tooling.
Choose the coding pattern when
- developers need help implementing or reviewing code,
- scope control matters,
- you want assumptions made explicit,
- and explanations should support maintainability rather than impressiveness.
This works well in IDE assistants, pull request helpers, and internal engineering copilots.
Choose the RAG pattern when
- answers must be grounded in a known corpus,
- you need a clean separation between retrieved evidence and model priors,
- compliance or governance requires source-bounded answers,
- or your domain changes often enough that static model knowledge is not enough.
This is usually the right choice for policy assistants, documentation chat, and knowledge base search.
In some products, you may combine patterns through prompt chaining. For example, a support assistant can first use an extraction step to identify intent and account-sensitive requests, then route to a RAG-backed answer generator with stricter grounding. That is often better than trying to force one giant system prompt to handle every mode.
Also remember that prompt safety is not only about prohibited content. It includes manipulation resistance, persona control, and refusal behavior. If your agent adopts roles or characters, review Prompt Patterns to Limit Character Exploits and Designing Prompts to Combat AI Sycophancy in Enterprise Workflows.
When to revisit
The best system prompt is temporary. You should expect to revisit it when the surrounding system changes. That is not a failure of prompt engineering; it is part of operating an AI feature responsibly.
Re-test your prompt library when any of the following happens:
- You switch models or providers. Prompt behavior can shift even when the text stays the same.
- You add tools, retrieval, or function calling. The prompt may need new rules for when to answer directly versus invoke a tool.
- Your product policies change. Support, compliance, and escalation instructions should track the real business process.
- You see drift in evaluation results. If JSON validity, grounding, or support containment worsens, revisit the prompt before adding more complexity.
- New failure modes appear in production. Real conversations reveal edge cases that sandbox testing misses.
- New options appear. As model features and policies evolve, a simpler or more portable prompt structure may become possible.
A practical review workflow looks like this:
- Create a small benchmark set for each prompt type: ten to thirty representative cases is enough to start.
- Define pass criteria such as schema validity, correct escalation, grounded answers, or constrained code edits.
- Version your system prompts like application code.
- Change one variable at a time: wording, output format, fallback rule, or model.
- Log failures by category so you can see whether the issue is ambiguity, formatting, hallucination, or policy handling.
If you do only one thing after reading this article, do this: turn your favorite prompt examples into a tested prompt catalog. Label each one by use case, keep the structure consistent, and attach a small evaluation set. That makes your prompt engineering process repeatable, easier to hand off, and easier to update when model capabilities, policies, or product requirements shift.
System prompt examples are most valuable when they behave like living engineering assets, not snippets copied into a playground. Start with the smallest pattern that matches the job. Add only the constraints that solve observed failures. Then revisit the prompt when your model, retrieval stack, tooling, or governance needs change.