Right-Sizing and Auto-Suspend

Right-sizing is the headline FinOps win. It's also where most teams over-correct and break a workload — usually the same week they over-correct.

The QueryWise approach is simple: never resize without seeing the workload, and never auto-suspend without understanding what's running between bursts.

The signals we look at

Three numbers drive every right-sizing decision:

Utilization — how busy is the compute when it's running?
Concurrency — how many concurrent queries does it actually handle?
Idle time — how much of the billed wall-clock is doing nothing?

If you only look at utilization, you'll resize a warehouse that's queueing — sized correctly but starved by concurrency. If you only look at idle time, you'll set aggressive auto-suspend on a warehouse whose hot cache is the reason queries are fast.

QueryWise pulls these signals separately. On the Usage page you see them per resource over time; the recommendation engine combines them into right-sizing suggestions.

Compute right-sizing rules

The detector library has dedicated right-sizing rules for each compute primitive:

Snowflake warehouses — uses WAREHOUSE_LOAD_HISTORY and WAREHOUSE_METERING_HISTORY to detect over-sized + under-loaded warehouses, queueing-but-not-busy warehouses (concurrency, not size, is the bottleneck), and serverless candidates.
Databricks clusters — flags clusters where photon could replace a larger node count, autoscale ranges that never hit the upper bound, and serverless SQL warehouse candidates.
Redshift clusters — detects RA3 over-provisioning by hour, concurrency scaling underuse, and node-count miscalibration.
Synapse pools — DWU overprovisioning across the daily curve, resource class mismatch.
Cosmos DB — RU overprovisioning, autoscale ranges that don't fit usage, partition hotspots that look like global RU pressure.
RDS / Cloud SQL / Azure SQL — instance-class right-sizing using CloudWatch / Cloud Monitoring / Azure Monitor.

Each recommendation includes a savings estimate priced against the vendor's tier catalog, plus the risk of the change ("low — no concurrency pressure observed in the last 30 days" vs "medium — concurrency peaked at 80% of current cluster on 2 days").

Auto-suspend without blowback

For Snowflake specifically, auto-suspend is the change everyone wants to make and the one that breaks the most workloads. A few rules from the field:

Don't set auto-suspend uniformly. A warehouse that serves a Looker dashboard at 9am every weekday is not the same workload as a dbt model that runs every 15 minutes.
Watch the warm-cache effect. If a warehouse's first query after resume is consistently 4× slower than subsequent queries, aggressive auto-suspend will tax every burst.
Use multi-cluster instead of low auto-suspend for spiky concurrency. You pay for the burst, not for keeping a too-large warehouse warm.

QueryWise's auto-suspend detector reads the load history first and only flags candidates where the math actually works out — not just "warehouse spent 42% of the day idle."

Rolling out changes

The temptation is to apply right-sizing in one sweep. Don't.

Our recommended sequence:

Showback first. Add the right-sizing recommendation to a JIRA/ServiceNow ticket via the integration. Let the workload owner see it.
Test in non-prod. For warehouse/cluster changes, run a workload replay against the smaller size for one full business cycle.
Auto-apply only the safe set. In Settings → Auto-Apply, you can scope auto-apply to specific severities and categories. We default to "auto-suspend tweaks below 60s threshold change, low-confidence-risk only."
Confirm with before/after. Every applied recommendation gets a before/after impact panel. If the actual savings come in below 50% of estimated, the recommendation is marked overstated and the engine recalibrates.

What workload-aware means

The phrase "workload-aware right-sizing" is doing real work in QueryWise. It means the recommendation knows:

What query patterns ran on this compute
Which patterns drove the cost
Whether sizing or concurrency is the constraint
How the warehouse behaves through the daily curve, not just on average

That's the difference between "this warehouse is over-sized" (often wrong) and "this warehouse hits 30% utilization for 6 hours and 90% utilization for the 30-minute window before standup — split it" (usually right).

What we don't auto-apply

Right-sizing is high-leverage and high-risk. We deliberately leave these manual:

Going up in size on the basis of queueing alone
Cluster topology changes (Redshift node count, Databricks node count)
Anything that requires a redeploy or restart

These get the ticket-sync workflow instead, with the right-size suggestion attached.

Where to next

Query Optimization Playbook — sometimes the right answer is to fix the query, not the warehouse.
Pipeline Cost Control — the compute powering your pipelines deserves the same treatment.