Specialized WorkloadsPipelinesETLFinOps

Pipeline Cost Control

9 min read

The conversation about cloud database cost almost always focuses on the warehouses. The pipelines feeding them are larger, harder to see, and where the slow leaks live.

QueryWise's Pipeline Cost Intelligence is designed for one job: surface the cost of every pipeline service, by name, in the same view as your warehouse spend.

Where pipeline cost hides

The hard part is structural — pipeline services don't show up as "pipelines" on a bill:

  • Snowflake — Snowpipe is its own line item; Tasks roll up to a warehouse; Dynamic Tables show as serverless compute; Snowpark workloads look like queries.
  • Databricks — Jobs Compute, DLT, and All-Purpose Compute have different rates and different governance. The bill says "DBU."
  • AWS — Glue, EMR, Step Functions, MWAA, AppFlow all bill differently. None of them say "ETL."
  • GCP — Dataflow, Dataproc, Workflows, Cloud Composer.
  • Azure — Data Factory pipelines, Synapse pipelines, Stream Analytics.

QueryWise classifies each line into a service type at billing-sync time so you don't have to read SKU codes.

The pipeline view

In Costs → Pipelines you see total pipeline spend split by:

  • Service type — Snowpipe, DLT, Glue, etc.
  • Pipe / job / task name — the resource the workload owner cares about
  • Trigger / schedule — what's actually launching the work
  • Run count and DPU/credit — for unit-cost analysis

This is enough to answer the questions FinOps gets asked:

  • "Which pipeline costs the most this month?"
  • "Which pipeline is up the most month-over-month?"
  • "How much are we paying per pipeline run?"

The 20 pipeline-specific detectors

The recommendation engine has 20 pipeline-aware detectors, layered on top of the general anti-pattern library. A few examples:

  • Idle Snowpipes — pipes whose last load was >30 days ago but still bill latency overhead.
  • Over-provisioned Glue DPUs — jobs that consistently use a fraction of allocated DPUs.
  • DLT pipeline cold-start tax — pipelines triggered too frequently to benefit from continuous mode.
  • Dataflow streaming with low input rate — streaming for batch work.
  • Data Factory copy activity not pushing down — moving data through ADF when it could be a SQL-side copy.
  • Snowflake Tasks chained on the same warehouse — sequential tasks blocking each other.

Each detector emits a recommendation with a remediation snippet. For Snowpipe, "drop this pipe" is the snippet. For Glue, it's the new DPU configuration.

Pipeline-level unit economics

Pipeline cost is most useful when you can express it as cost-per-run, cost-per-GB, or cost-per-event. QueryWise computes these on the Allocation → Unit Economics tab if you've tagged your pipelines (or if the vendor metadata exposes it natively, which Databricks Jobs and Snowpipe do).

The pattern: tag your pipelines once with pipeline_id and customer_id (or whatever business dimension matters), and the unit economics tab shows you cost per unit over time. From there it's easy to find the pipelines whose unit cost is climbing — those are the ones to fix first.

Where the savings come from

Across our consulting engagements, pipeline savings tend to fall in three buckets:

  1. Schedule changes — running too often. A 5-minute Snowpipe schedule that could be 30-minute. A DLT continuous pipeline that could be triggered. A 24/7 streaming Dataflow job that could be a batch run.
  2. Compute changes — over-provisioned DPUs, oversized Databricks job clusters, Synapse pool overprovisioning during pipeline windows.
  3. Workload moves — pushing a heavy transform from Glue into the warehouse (or vice versa), depending on where the data already lives.

Of these, schedule changes are the easiest to apply and the most reversible. Start there.

What governance looks like

The temptation with pipelines is to fix them without telling anyone. That breaks workloads.

The pattern that works:

  • Tag every pipeline with its owner.
  • Define a budget per pipeline group (e.g. "all DLT" or "all Snowpipes for tenant X").
  • Push high-severity pipeline recommendations as JIRA/ServiceNow tickets to the owner, not the FinOps team.
  • Auto-apply only the most reversible categories — schedule reductions on idle pipelines, DPU reductions on over-provisioned Glue jobs.

Where to next

Want help applying this in your environment?

QueryWise design partners get hands-on onboarding from our team.