Prompt TipsFeb 13, 20269 min

How to Make AI Creative (Without Begging It to "Be Creative")

A practical, evidence-backed playbook for getting more novel, surprising, and useful outputs from AI using structure, sampling, and evaluation.

How to Make AI Creative (Without Begging It to "Be Creative")

You've tried it: "Be more creative."

The model nods politely and hands you the same warmed-over listicle ideas, the same startup names, the same "In a world where…" story opener. It's not that the AI can't be creative. It's that you're asking for "creativity" as a vibe instead of designing for it as an outcome.

Here's the thing I wish more people said out loud: creativity in LLMs is usually an engineering problem, not a personality problem.

So let's treat it like one.


Creativity is a system property, not a magic word

Most LLM behavior is downstream of two levers you control: how you constrain the search space and how you sample inside it.

On the sampling side, decoding choices (temperature, top-p, etc.) literally reshape the next-token distribution. Research on decoding shows there's a measurable "quality-diversity tradeoff," and common sampling knobs are blunt instruments that balance coherence vs. variety by smoothing or truncating probabilities [5]. That's why "turn temperature up" sometimes helps-and sometimes just makes the model sloppy.

Even more interesting: newer work argues there's a geometric failure mode where probability mass crowds into a tight region of embedding space-basically the model keeps picking from a cluster of near-synonyms instead of exploring distinct directions. That crowding correlates negatively with reasoning success, and a geometry-aware sampler can improve robustness and diversity without extra model calls [5]. You don't need to implement their sampler to benefit from the idea: if you want creative output, you need mechanisms that push the model out of local ruts.

On the prompting side, "creative" outputs don't appear because you asked nicely. They appear because your prompt forces the model to search across diverse frames, then select and refine.

That "search → select → refine" loop is a recurring pattern in real human-AI collaboration. In deep scientific work, successful teams rarely get breakthroughs in one shot; they decompose, iterate, run adversarial review, correct errors, and keep going [2]. Creativity isn't one answer. It's a workflow.

And if you want consistent results, you can't just eyeball it. Prompt variants should be compared systematically. A tournament-style evaluation of prompts (pairwise judging, rankings) was shown to cleanly separate strong prompt templates from weak ones-even when they're all "reasonable" at first glance [1]. Creativity benefits from this too: you can evaluate novelty, usefulness, and surprise as explicit dimensions, not vibes.


My playbook for making an AI "creative"

I use three moves: structured divergence, controlled sampling, and ruthless evaluation.

First, I create divergence on purpose. Not "give me 10 ideas," but "give me ideas from different generative lenses." You'll notice this looks a lot like scaffolding in education prompts: establish a persona, manage the context, then ask for alternative approaches [1]. Those patterns aren't just for teaching. They're for forcing variation.

Second, I control sampling and variation without losing coherence. If you crank temperature and walk away, you get chaos. Instead, I'll keep generation moderately stochastic, but I'll force diversity through the prompt's structure: multiple categories, multiple constraints, multiple formats. That keeps the model "creative" even at conservative settings.

Third, I evaluate outputs like an engineer. I pick winners, feed back what worked, and rerun. The Gemini research case studies call this "iterative refinement" and "selection and refinement": the model can generate lots of candidates; humans should filter and steer [2]. That's not optional. That's the whole job.


Practical prompts you can steal

Below are a few prompts I actually use. They're written to force novelty through structure.

1) The "creative search space" prompt (divergence with constraints)

Use this when you're brainstorming product ideas, stories, features, naming, marketing angles-anything.

You are a creative director and a critical editor.

Task: Generate 24 ideas for: [TOPIC].

Rules:
1) Split ideas into 6 lenses (4 ideas each):
   - Contrarian (go against the default assumption)
   - Cross-domain transfer (borrow from an unrelated industry)
   - Constraint-driven (assume a hard constraint: time/money/legal/UX)
   - Extreme user (design for a weird edge case)
   - Reverse goal (optimize for the opposite of what people want, then flip it)
   - "Tool + Human" workflow (assume AI is a collaborator, not an autopilot)

2) Each idea must be 2 sentences:
   Sentence 1: the core concept.
   Sentence 2: why it might work + the main risk.

3) Avoid generic phrasing. If an idea sounds common, replace it.

Output as numbered items grouped by lens.

Why this works: you're combining the "alternative approaches" pattern with explicit framing and risk tradeoffs. You're not asking for creativity-you're demanding varied mechanisms for generating ideas [1], and then giving yourself selection hooks [2].

2) The metaphor forcing function (great for explanation + marketing)

This one is basically a "diversity by domains" hack. The community version that inspired it is a metaphor generator prompt that forces multiple metaphor domains [6]. I'm including it because it's a clean example of structure-induced creativity.

Explain [CONCEPT] using 5 distinct metaphors from 5 different domains:
Cooking, Architecture, War, Biology, Music.

For each metaphor:
- Metaphor (one line)
- Explanation (2-3 lines)
- Where it breaks (one line: what the metaphor fails to capture)

Keep it punchy and specific. No filler.

3) The "adversarial novelty" loop (generate, then attack, then regenerate)

This is borrowed from how researchers get models to do deeper critique: initial answer, self-critique, revised answer [2]. It's also how you keep "creative" from turning into "nonsense."

Generate 12 bold ideas for [TOPIC]. Then do this loop:

Step A: For each idea, write the harshest critique (market, feasibility, ethics, or execution).
Step B: Improve the idea to survive the critique while keeping it novel.
Step C: Rank the improved ideas by (1) novelty (2) usefulness (3) plausibility.

Output all steps.

4) The prompt tournament (stop arguing; measure)

If your team keeps debating which prompt is "more creative," run a lightweight tournament like the one used in prompt evaluation research: pairwise comparisons, judges pick a winner [1]. You can even make the judge another model (carefully), but I prefer at least one human in the loop.

We are testing prompt variants for creative output.

Generate 6 responses to the same task using 6 different prompt templates (A-F).
Then compare them pairwise and choose winners using this rubric:
- Novelty: not the obvious answer
- Usefulness: could we ship/use it?
- Coherence: internally consistent
- Specificity: not generic

Return:
1) A ranked list of templates
2) A short note on what made the top 2 win
3) One merged "best of both" template
Task: [TASK]
Templates:
A: ...
B: ...
C: ...
D: ...
E: ...
F: ...

This turns creativity into something you can iterate on, not a mood you chase.


The big takeaway I keep coming back to

If you want AI to be creative, don't ask it to "be creative." Build a process that produces creativity: diversify intentionally, sample with care, then evaluate and refine.

When you do that, you stop getting "vanilla answers" and start getting usable weirdness-the kind that leads to real product decisions, not just entertaining outputs.


References

Documentation & Research

  1. LLM Prompt Evaluation for Educational Applications - arXiv (Holmes et al., 2026) http://arxiv.org/abs/2601.16134v1
  2. Accelerating Scientific Research with Gemini: Case Studies and Common Techniques - arXiv (Woodruff et al., 2026) http://arxiv.org/abs/2602.03837v1
  3. Iconix: Controlling Semantics and Style in Progressive Icon Grids Generation - arXiv (Sun et al., 2026) http://arxiv.org/abs/2602.00738v1
  4. AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research - arXiv (AgentCPM Team, 2026) https://arxiv.org/abs/2602.06540
  5. Decoding in Geometry: Alleviating Embedding-Space Crowding for Complex Reasoning - arXiv (Yang et al., 2026) https://arxiv.org/abs/2601.22536

Community Examples

  1. The "Metaphor Generator" prompt: Forces the AI to use 5 distinct, high-concept metaphors to explain any topic - r/PromptEngineering https://www.reddit.com/r/PromptEngineering/comments/1qp31az/the_metaphor_generator_prompt_forces_the_ai_to/
  2. Why AI Sounds Boring (and a Fix in Plain Sight) - r/PromptEngineering https://www.reddit.com/r/PromptEngineering/comments/1qyshl3/why_ai_sounds_boring_and_a_fix_in_plain_sight/
Ilia Ilinskii
Ilia Ilinskii

Founder of Rephrase-it. Building tools to help humans communicate with AI.

Related Articles