Understanding Gemini's thinking levels, multimodality, and Search Grounding for better prompt results.
Gemini is a different beast. I spent some time thinking it works like ChatGPT, and kept getting weird results. It doesn't. It has its own logic, and once you get it, things click.
The latest version - Gemini 3 Pro Preview from December 2025 - introduced this "thinking levels" system. Basically, you can tell the model how hard to think. Sounds gimmicky, but it's actually useful.
Here's what I learned works.
A few things that threw me off at first:
Thinking Levels instead of thinking budget - you pick a level (minimal, low, medium, high) and the model adjusts how deeply it reasons. More on this below.
Native multimodality - Gemini actually understands images, video, and audio at a deep level. Not just "describe this image" but actually working with the content.
Search Grounding - it can check facts through Google Search before answering. Useful for anything with current data.
Media resolution settings - you can optimize how many tokens different media types consume.
This is actually pretty clever. Instead of some abstract "thinking budget" number, you just pick a level:
| Level | When I Use It |
|---|---|
| minimal | Quick chat stuff, simple questions |
| low | Basic formatting, simple instructions |
| medium | Most everyday tasks |
| high | Complex analysis, math, reasoning |
The default is "high" which is often overkill. For simple stuff, use "minimal" or "low" - faster responses, fewer tokens burned.
Google says XML or Markdown both work. I've found XML works better for complex stuff:
<role>
You are a helpful assistant specializing in [domain].
</role>
<constraints>
1. Be objective
2. Cite sources
3. [Other rules]
</constraints>
<context>
[User data goes here]
</context>
<task>
[What you actually want]
</task>
The key insight: Gemini treats stuff inside <context> as data to analyze, not instructions to follow. This matters for security and for getting better results when you're feeding it user content.
Google recommends keeping temperature at 1.0. I tried lowering it once thinking it would make responses more consistent. Bad idea. The model started looping and math tasks got worse. Just leave it at 1.0.
| Content Type | Best Setting | Tokens Used |
|---|---|---|
| Images | media_resolution_high | ~1120 tokens |
| PDFs | media_resolution_medium | ~560 tokens |
| Video | media_resolution_low | ~70 tokens/frame |
For PDFs, don't use high - medium is the sweet spot. Video eats a lot of tokens so plan accordingly.
<role>
You are a professional e-commerce product photographer and marketing expert.
</role>
<task>
Analyze this product photo and provide:
1. Three specific improvements for the lighting
2. Composition suggestions for better conversion
3. Background recommendations
</task>
<output_format>
Structure your response with clear headers for each section.
Be specific and actionable.
</output_format>
<role>
You are a legal document analyst with expertise in contract review.
</role>
<constraints>
- Extract only factual information from the document
- Do not add interpretations or legal advice
- Quote relevant passages when possible
</constraints>
<context>
[Attached: contract.pdf]
</context>
<task>
Extract and summarize:
1. Key terms and conditions
2. Important dates and deadlines
3. Financial obligations
4. Termination clauses
</task>
<role>
You are a video content analyst.
</role>
<task>
Watch this video and provide:
1. Summary of main points (with timestamps)
2. Key quotes from speakers
3. Visual elements worth noting
</task>
<context>
[Attached: presentation.mp4]
</context>
Gemini Pro can automatically verify facts through Google Search. Useful for:
Example:
<task>
Generate an infographic about the current GDP of G7 countries
with accurate and up-to-date data visualization.
</task>
The model checks actual current data before generating. Pretty neat when you need accurate numbers.
| Aspect | Gemini 3 | ChatGPT | Claude |
|---|---|---|---|
| Prompt format | XML or Markdown | Markdown + XML | XML (recommended) |
| Reasoning | Thinking levels | reasoning_effort | Extended Thinking |
| Chain-of-Thought | Built-in | Needs explicit instruction | Built-in |
| Multimodality | Native, advanced | Good | Basic |
| Fact checking | Search Grounding | Web Search | Web Search |
Changing temperature - just leave it at 1.0
Ignoring thinking levels - use minimal for simple stuff, high for complex
Wrong media resolution - for PDFs use medium, not high
Mixing data and instructions - always wrap user data in <context>
Long prompts for video - remember video consumes lots of tokens
Gemini is powerful, especially for multimodal stuff. Key things: