What to actually do to improve your AEO/GEO standing — a practical playbook
PublishedJune 16, 2026 · UpdatedJune 19, 2026 · Quratic Team · 13 min read
Build a 20–30 prompt library, log weekly across ChatGPT, Perplexity, and Gemini, and know what to fix. Free spreadsheet method plus benchmarks and industry templates.
Most teams know they should “do GEO.” Few have a repeatable system for measuring it. Answer Engine Optimization (AEO) and Generative Engine Optimization (GEO) are not mysterious — they are a discipline of running the right prompts, logging the right signals, and acting on gaps. You can start for free with a spreadsheet and three AI tabs open.
This playbook is the operational version of our GEO strategy guide: what prompts to write, what to log, what benchmarks to aim for, and what to fix when you are invisible.
The free starting point: 20–30 prompts, three platforms, one spreadsheet
Before you buy a tool, prove the workflow manually:
- Build a prompt library of 20–30 queries across three intent categories (below)
- Run each prompt weekly in ChatGPT, Perplexity, and Gemini
- Log results in a spreadsheet — one row per prompt × platform × week
- Review monthly for patterns: which queries you win, which competitors dominate, where citations are missing
Expect 2–3 hours per weekly cycle for 25 prompts × 3 platforms if you work efficiently. That is enough to establish a baseline and justify (or avoid) paid tooling.
Assign each prompt to a target country and run it from that context — a Singapore buyer’s answer differs from a US default. Browser-based collection from local IPs matters; a VPN spot-check is better than nothing but not a substitute for scheduled local runs.
Three prompt categories (and why each matters)
Organise every prompt into one of three buckets. Each bucket tells you something different.
1. Branded / direct — your sentiment baseline
These queries assume the buyer already knows your name. They reveal how AI describes and frames you — accuracy, tone, and positioning.
Templates:
- “What is [Brand]?”
- “Is [Brand] good for [use case]?”
- “What do people think of [Brand]?”
- “Is [Brand] legit and reliable for [specific context]?”
What you learn: Factual errors, outdated product descriptions, negative framing, missing differentiators. Fix these before investing in category discovery — if AI misdescribes you on branded queries, category prompts will inherit the same problems.
Benchmark: Strong brands should appear on 90%+ of branded prompts across platforms. Below 70% signals entity confusion or weak third-party validation.
2. Category / non-branded — highest discovery value
These are queries from buyers who do not know you yet. This is where GEO wins pipeline.
Templates:
- “What’s the best [category] for [specific use case / audience]?”
- “I need a [product/service] that does [specific need] — what do you recommend?”
- “What should I look for when choosing a [category]?”
- “Best [category] in [Singapore / Japan / city] for [audience]?”
What you learn: Whether you appear in the consideration set at all. Most brands are invisible here initially — that is normal. The goal is measurable improvement month over month.
Benchmark: WebFX’s GEO benchmarks cite 15–25% visibility on tracked queries as “good,” with 30–50%+ as strong performance on informational/category queries. Discovered Labs notes market leaders often exceed 30% citation rate on core category queries.
Aim for 15–25% mention rate on your core category prompts as an initial target. Competitive categories with sustained effort often reach 30%+.
3. Comparison — competitive positioning
These queries surface you alongside named alternatives. They drive shortlist decisions.
Templates:
- “[Brand] vs [Competitor]”
- “What are alternatives to [Competitor]?”
- “Which is better for [use case], [Brand] or [Competitor]?”
- “Compare [Brand] and [Competitor] for [specific need]”
What you learn: Whether AI recommends you, positions you as second choice, or omits you entirely when a competitor is named. Also surfaces which competitors AI treats as category defaults.
Benchmark: Track share of voice — your mentions ÷ total brand mentions in comparison prompts. Above 25% SOV on comparison queries is a healthy starting point in contested categories.
What to log on every run
Create these columns in your spreadsheet:
| Column | What to record | Why it matters |
|---|---|---|
| Date | Week of run | Trend over time |
| Prompt | Exact query text | Reproducibility |
| Platform | ChatGPT / Perplexity / Gemini | Platforms diverge |
| Country | SG, JP, KR, etc. | Answers vary by market |
| Mentioned? | Y / N | Core visibility metric |
| Cited with link? | Y / N / N/A | Mention ≠ citation |
| Position | 1st, 2nd, 3rd named, or “listed” | Prominence in answer |
| Sentiment | Positive / neutral / negative / mixed | How AI frames you |
| Competitors named | List | SOV denominator |
| Sources cited | URLs if shown | PR/content targets |
Mention vs citation: do not merge them
They are different signals:
- Mention — AI names your brand in the answer text, often without a link
- Citation — AI links to a specific URL as a source
Perplexity almost always cites with numbered links. ChatGPT often mentions brands without linking — especially on browse-enabled queries where it synthesises rather than footnotes.
Log both separately. A brand with high mentions but zero citations may have awareness without click path. A brand with citations but no mentions on category prompts may only appear as a footnote on third-party listicles.
Spreadsheet template: starter rows
Copy this structure into Google Sheets or Notion:
Sheet 1 — Prompt library
| ID | Category | Prompt template | Filled example | Country | Priority |
|---|---|---|---|---|---|
| B1 | Branded | What is [Brand]? | What is Quratic? | SG | High |
| C1 | Category | Best [category] for [audience] in [country]? | Best AI visibility tool for marketing teams in Singapore? | SG | High |
| X1 | Comparison | [Brand] vs [Competitor] | Quratic vs Profound | SG | High |
Sheet 2 — Weekly log
One row per prompt × platform × week with the logging columns above.
Sheet 3 — Summary dashboard
Pivot or manual counts:
- Mention rate by category (% mentioned / total runs)
- Citation rate by category
- SOV on comparison prompts
- Week-over-week delta
Industry prompt packs (fill in your brand and market)
Adapt these for your vertical. Replace [country] with your primary market — Singapore, Indonesia, Japan, etc.
E-commerce and retail
| Type | Example prompt |
|---|---|
| Branded | ”Is [Brand] legit and reliable for online shopping in [country]?” |
| Category | ”What’s the best online shopping platform for [electronics/fashion] in [Singapore/Indonesia]?” |
| Category | ”Where can I buy [product type] online with fast delivery in [city]?” |
| Comparison | ”[Brand] vs [Competitor] — which is better for [fast delivery / returns / pricing]?” |
| Comparison | ”What are good alternatives to [Competitor] in [country]?” |
Banking and fintech
| Type | Example prompt |
|---|---|
| Branded | ”Is [Brand] safe and trustworthy for digital banking / payments?” |
| Category | ”What’s the best digital bank or e-wallet in [country] for [freelancers / SMEs / students]?” |
| Category | ”What’s the cheapest way to send money from [country A] to [country B]?” |
| Comparison | ”[Brand] vs [Competitor] for lower fees / better savings rates?” |
Travel and hospitality
| Type | Example prompt |
|---|---|
| Branded | ”What is [Brand] known for as a hotel / travel brand?” |
| Category | ”Best budget-friendly hotels in [city] for [families / solo travellers]?” |
| Category | ”Best flight booking app for travel within [Southeast Asia]?” |
| Comparison | ”[Brand] vs [Competitor] for a trip to [city]?” |
Healthcare and wellness
| Type | Example prompt |
|---|---|
| Branded | ”Is [Brand] a trusted clinic / telehealth provider in [country]?” |
| Category | ”Best telehealth app in [country] for [mental health / general consultations]?” |
| Category | ”Where can I find a good [dermatologist / dentist] in [city]?” |
| Comparison | ”[Brand] vs [Competitor] for [specific service]?” |
Real estate and property
| Type | Example prompt |
|---|---|
| Branded | ”What is [Brand] known for in the property market?” |
| Category | ”Best platform to find a rental apartment in [city]?” |
| Category | ”How do I find a reliable real estate agent in [city]?” |
| Comparison | ”[Brand] vs [Competitor] for buying / renting property in [city]?” |
B2B SaaS (add-on pack)
| Type | Example prompt |
|---|---|
| Branded | ”What is [Brand] and who is it for?” |
| Category | ”Best [category] software for [SMEs / enterprises] in [country]?” |
| Category | ”I need a tool that [specific job-to-be-done] — what do you recommend?” |
| Comparison | ”[Brand] vs [Competitor] for [use case]?” |
Run each filled prompt through ChatGPT, Perplexity, and Gemini. Log mention, citation, position, sentiment, and competitors every week.
What the numbers mean — and what to do next
Once you have four weeks of data, read the patterns:
Mention rate below 15% on category prompts
Diagnosis: Invisible to discovery queries.
Fixes (in order):
- Map which domains AI cites instead of you — run 10 prompts and record every source URL
- Publish or update comparison pages and “best X for Y” content with direct answers in the first 100 words
- Pursue third-party listicles — listicles are cited 3–5× more often than owned service pages for recommendation prompts
- Add FAQPage and Organization schema — structured markup correlates with higher citation rates (Averi benchmarks: +15–30% from clear H2/H3 structure)
High mentions, low citations
Diagnosis: AI knows your name but does not link to you.
Fixes:
- Improve page extractability — TL;DR blocks, dated publish metadata, citation-friendly statistics
- Publish original data or benchmarks — Passionfruit’s research finds original research cited at 38–65% vs 6–15% for standard blog posts
- Check robots.txt — retrieval bots may be blocked at CDN level
Negative or inaccurate sentiment on branded prompts
Diagnosis: Entity confusion or outdated third-party narrative.
Fixes:
- Correct factual errors on your About, product, and docs pages
- Strengthen Wikipedia/Wikidata/Crunchbase/G2 profiles if applicable
- Publish recent case studies and press that AI can retrieve
- Do not argue with AI — fix the source material it reads
Strong on ChatGPT, weak on Perplexity (or vice versa)
Diagnosis: Platform-specific retrieval gaps — expected, not a failure.
Fixes: Each platform uses different sources. Optimise for the platform where your buyers actually research — and track all three rather than averaging.
Competitor dominates comparison prompts
Diagnosis: AI treats them as category default.
Fixes:
- Dedicated “[You] vs [Them]” pages with fair, factual comparison tables
- Earn mentions on review sites where the competitor already appears
- Track whether competitor advantage is Google rank, AI Overview, or AI answer-only — fix the right layer
Benchmarks to report to leadership
Use these ranges when setting expectations (WebFX, Discovered Labs, Topify):
| Metric | Starting point | Good | Strong |
|---|---|---|---|
| Category mention rate | 0–10% | 15–25% | 30–50%+ |
| Branded mention rate | 70%+ target | 90%+ | 95%+ |
| Comparison SOV | Track relative | 25%+ | 40%+ |
| Citation rate (linked) | Lower than mention | 15–25% | 30%+ |
| Platforms covered | 1 | 3 | 3+ with Google AI Mode |
Frame improvement as month-over-month delta, not absolute perfection. Passionfruit’s 11.2M citation study found 68% of query citations disappear month-to-month — consistency of measurement matters more than any single week’s snapshot.
When to move beyond the spreadsheet
Manual tracking breaks down around 30+ prompts × 3 platforms × weekly cadence — roughly 90+ runs per week before accounting for copy-paste and sentiment scoring.
Free starting points worth trying:
- Ahrefs’ free AI visibility checker — batch prompts against your brand automatically
- HubSpot’s AI Search Grader — automated brand visibility scan
Both are useful for a one-time baseline. They typically lack country-level collection, competitor SOV over time, and scheduled refresh — the gaps that matter for Asian markets.
Paid continuous monitoring (when manual cost exceeds tool cost):
- Global platforms: Profound, Peec, Otterly
- Asia-focused: Quratic — browser collection across ChatGPT, Perplexity, Google AI Mode, Gemini, and Google Rankings in SG, JP, KR, MY, ID, HK
The decision rule: if you are still acting on spreadsheet data and leadership asks for weekly SOV, upgrade. If you are not logging consistently yet, a paid tool will not fix the discipline problem.
90-day improvement loop
| Phase | Weeks | Focus |
|---|---|---|
| Baseline | 1–4 | Build prompt library, log weekly, no content changes yet |
| Diagnose | 5–6 | Identify top 5 invisible category prompts and top 3 cited competitor domains |
| Fix | 7–10 | Content refresh, comparison pages, third-party outreach, schema |
| Measure | 11–12 | Compare mention rate and SOV to baseline; report delta to leadership |
Discovered Labs cites 40–60% improvement in citation frequency within 3–6 months for teams executing systematically — realistic if you fix sources AI already trusts, not only your homepage.
FAQ
How many prompts do I really need?
20–30 to start — enough coverage without drowning in manual work. Expand when the first set drives decisions. Passionfruit’s benchmark protocols often use ~30 prompts across platforms for comparable audits.
ChatGPT, Perplexity, Gemini — is that enough?
Yes for a baseline. Add Google AI Mode / Overviews if your buyers use Google for category research. Add Copilot for enterprise B2B. Platform guide.
Should prompts be in English or local language?
Match your buyer. English for Singapore business queries; Japanese for Tokyo B2B; Bahasa for Indonesia consumer. Same intent, different language = different answers.
Is mention rate the same as citation rate?
No. Report both. Mention rate = brand visibility. Citation rate = linked source visibility. Perplexity-heavy strategies skew citation; ChatGPT-heavy strategies skew mention.
Can I improve GEO without creating new content?
Partially. Fixing structured data, refreshing existing pages, earning third-party mentions, and correcting entity profiles can move numbers without a content sprint. Category discovery prompts usually require new or substantially updated comparison and listicle-aligned content.
How does this connect to Google rank?
GEO and SEO overlap but differ. Track both in a split-screen report — organic rank, AI Overview ownership, and AI mention rate on the same intents.
Skip the spreadsheet setup — start a free Quratic trial with scheduled prompt runs across six Asian markets, or use this playbook to baseline manually first.