Workload OptimizationRight-SizingComputeFinOps

Right-Sizing and Auto-Suspend

8 min read

Right-sizing is the headline FinOps win. It's also where most teams over-correct and break a workload — usually the same week they over-correct.

The QueryWise approach is simple: never resize without seeing the workload, and never auto-suspend without understanding what's running between bursts.

The signals we look at

Three numbers drive every right-sizing decision:

  • Utilization — how busy is the compute when it's running?
  • Concurrency — how many concurrent queries does it actually handle?
  • Idle time — how much of the billed wall-clock is doing nothing?

If you only look at utilization, you'll resize a warehouse that's queueing — sized correctly but starved by concurrency. If you only look at idle time, you'll set aggressive auto-suspend on a warehouse whose hot cache is the reason queries are fast.

QueryWise pulls these signals separately. On the Usage page you see them per resource over time; the recommendation engine combines them into right-sizing suggestions.

Compute right-sizing rules

The detector library has dedicated right-sizing rules for each compute primitive:

  • Snowflake warehouses — uses WAREHOUSE_LOAD_HISTORY and WAREHOUSE_METERING_HISTORY to detect over-sized + under-loaded warehouses, queueing-but-not-busy warehouses (concurrency, not size, is the bottleneck), and serverless candidates.
  • Databricks clusters — flags clusters where photon could replace a larger node count, autoscale ranges that never hit the upper bound, and serverless SQL warehouse candidates.
  • Redshift clusters — detects RA3 over-provisioning by hour, concurrency scaling underuse, and node-count miscalibration.
  • Synapse pools — DWU overprovisioning across the daily curve, resource class mismatch.
  • Cosmos DB — RU overprovisioning, autoscale ranges that don't fit usage, partition hotspots that look like global RU pressure.
  • RDS / Cloud SQL / Azure SQL — instance-class right-sizing using CloudWatch / Cloud Monitoring / Azure Monitor.

Each recommendation includes a savings estimate priced against the vendor's tier catalog, plus the risk of the change ("low — no concurrency pressure observed in the last 30 days" vs "medium — concurrency peaked at 80% of current cluster on 2 days").

Auto-suspend without blowback

For Snowflake specifically, auto-suspend is the change everyone wants to make and the one that breaks the most workloads. A few rules from the field:

  • Don't set auto-suspend uniformly. A warehouse that serves a Looker dashboard at 9am every weekday is not the same workload as a dbt model that runs every 15 minutes.
  • Watch the warm-cache effect. If a warehouse's first query after resume is consistently 4× slower than subsequent queries, aggressive auto-suspend will tax every burst.
  • Use multi-cluster instead of low auto-suspend for spiky concurrency. You pay for the burst, not for keeping a too-large warehouse warm.

QueryWise's auto-suspend detector reads the load history first and only flags candidates where the math actually works out — not just "warehouse spent 42% of the day idle."

Rolling out changes

The temptation is to apply right-sizing in one sweep. Don't.

Our recommended sequence:

  1. Showback first. Add the right-sizing recommendation to a JIRA/ServiceNow ticket via the integration. Let the workload owner see it.
  2. Test in non-prod. For warehouse/cluster changes, run a workload replay against the smaller size for one full business cycle.
  3. Auto-apply only the safe set. In Settings → Auto-Apply, you can scope auto-apply to specific severities and categories. We default to "auto-suspend tweaks below 60s threshold change, low-confidence-risk only."
  4. Confirm with before/after. Every applied recommendation gets a before/after impact panel. If the actual savings come in below 50% of estimated, the recommendation is marked overstated and the engine recalibrates.

What workload-aware means

The phrase "workload-aware right-sizing" is doing real work in QueryWise. It means the recommendation knows:

  • What query patterns ran on this compute
  • Which patterns drove the cost
  • Whether sizing or concurrency is the constraint
  • How the warehouse behaves through the daily curve, not just on average

That's the difference between "this warehouse is over-sized" (often wrong) and "this warehouse hits 30% utilization for 6 hours and 90% utilization for the 30-minute window before standup — split it" (usually right).

What we don't auto-apply

Right-sizing is high-leverage and high-risk. We deliberately leave these manual:

  • Going up in size on the basis of queueing alone
  • Cluster topology changes (Redshift node count, Databricks node count)
  • Anything that requires a redeploy or restart

These get the ticket-sync workflow instead, with the right-size suggestion attached.

Where to next

Want help applying this in your environment?

QueryWise design partners get hands-on onboarding from our team.