Query Optimization Playbook

Most query optimization advice is wrong about where to start. It tells you to look at slow queries. The right starting point is expensive queries — the ones driving cost, which is rarely the same list.

This guide is the workflow we use inside QueryWise: find the right queries, read the plan, apply fixes from the detector library or the AI agent, and confirm savings.

1. Sort by cost, not duration

In Queries, the default sort is estimated cost. That ranking matters because:

A 30-second query that runs once a day is a smaller cost target than a 200ms query that runs 50,000 times an hour.
A short query that scans 8 TB is more expensive than a long query that scans 200 MB.

QueryWise computes estimated cost per query by combining execution count, bytes scanned, slot/credit time, and the vendor's pricing model. The top 20 by cost are almost always where the money is.

2. Open the execution profile

Click any query and you get the execution profile: the most recent explain plan, normalized into the same shape across vendors. We support plan collection on PostgreSQL, MySQL, Snowflake, BigQuery, Redshift, Synapse, Databricks, MongoDB, Oracle, and Cosmos DB.

The normalized plan exposes:

Operators — scans, joins, aggregations, exchanges
Cost signals — bytes scanned, rows produced, spill, partition pruning
Hot operators — the steps that drive >X% of total cost

You don't need to know each vendor's plan dialect to read it. The annotation does that for you.

3. Check the anti-pattern panel

Every query has an anti-pattern panel. We run 130+ detectors covering:

Generic SQL — SELECT * in production, missing predicates, Cartesian joins, N+1, unused indexes, spill to disk, repeated stale duplicates
Snowflake — warehouse undersized for spill, poor clustering, missing partitioning, serverless candidates
BigQuery — slot waste, missing partition pruning, materialized view candidates
Redshift — WLM queueing, dist key skew, vacuum needs, compile overhead
Databricks — shuffle-heavy queries, broadcast candidates, missing liquid clustering
MongoDB — pipeline ordering, shard imbalance, cross-partition queries
Cosmos DB — RU hotspots, low index hit rate, large document scans
Synapse — data movement, columnstore inefficiency, DWU overprovisioning

Each match includes severity, estimated savings, an evidence snapshot, and a remediation snippet. The remediation is concrete — not "consider adding an index" but the actual CREATE INDEX statement targeted at the columns the plan touched.

4. Use the AI agent for hard cases

For queries where pattern detection isn't enough, the deep analysis button hands the query off to the AI agent. The agent has tools to:

Pull the most recent explain plan (fetch_query_plan)
Read schema metadata for the touched tables (fetch_schema_metadata)
Look at billing context for the warehouse/cluster
Check existing recommendations to avoid duplication
Save its findings back as a structured recommendation

The agent runs in a budgeted, audited loop — each tenant has a daily analysis budget, and every run is logged in analysis_audit_logs. The output is the same shape as a detector match: title, evidence, remediation, savings estimate, confidence score.

In practice, deep analysis pays for itself on queries with non-obvious join order issues, missed materialization opportunities, and schema redesign suggestions that pattern detectors can't see.

5. Apply the fix

There are three apply paths:

Manual — copy the remediation, open a PR, deploy.
Auto-apply with a policy — for safe categories (warehouse auto-suspend tweaks, dropping unused indexes), define a policy in Settings → Auto-Apply and let the worker run it.
Ticket sync — push the recommendation as a JIRA or ServiceNow ticket from the recommendation panel. Bi-directional status sync keeps QueryWise current.

For DDL changes (indexes, partitioning), we recommend ticket sync — the cost of getting these wrong is high enough that we don't auto-apply them.

6. Confirm savings, don't assume them

After the fix lands, the before/after impact panel on the recommendation shows the actual cost change in the days following. We compute it from billing data, not estimated savings. If the actual is below the estimate, the recommendation is marked overstated and our savings estimator learns the bias for the next round.

The bar to clear: did this query's cost drop and stay dropped for the next two weeks? If yes, ship it. If no, reopen and try again.

Patterns we keep seeing

A few that come up almost every engagement:

Mixed workloads on one warehouse. Your dbt models, BI dashboards, and analyst notebooks all sharing one warehouse mean every query waits behind every other query. Split before you resize.
SELECT * from a wide table feeding a downstream process. Almost always cheaper to project the columns you need.
Partition column not in the predicate. Looks like a 200 GB query; should be a 2 GB query.
Unused indexes still maintained on writes. Free savings — the writes are paying for nothing.
Joining on functions. WHERE LOWER(email) = ? defeats the index. Materialize or use a generated column.

Where to next

Right-Sizing and Auto-Suspend — once your queries are tight, right-size the compute.
AI / LLM Cost Optimization — the same workflow applies to AI workloads.