BigQuery Cost Problems That Aren’t About Bad Queries

If you spend enough time tuning BigQuery costs in a mature FinTech organization, you start to notice a pattern. When the monthly invoice spikes — or when critical pipelines start missing SLAs — the natural reflex is to look for the “bad” query: the accidental CROSS JOIN, the missing WHERE clause, or the analyst who ran SELECT * on a five-year transaction ledger.
But in highly scaled financial data environments running on BigQuery Editions (Enterprise or Enterprise Plus), the most dangerous root causes usually aren’t poorly written queries. They are perfectly optimized, highly efficient queries that represent a fundamental mismatch between how humans want to consume data and how BigQuery allocates compute (slots).
Here are three common BigQuery resource anomalies I see in FinTech that have nothing to do with bad SQL, and how to track them down.
1. The “Fake” Materialized View (The CURRENT_DATE() Trap)
FinTech organizations live and die by rolling time-windows: “Trailing 30-Day Fraud Features,” “Rolling 90-Day Account Balances,” or “Trailing 365-Day Transaction Volumes.”
Naturally, a data engineer will attempt to build an incremental Materialized View (MV) to serve this data efficiently. But they quickly hit a hard BigQuery limitation: **Materialized Views do not support non-deterministic functions like ****CURRENT_DATE() or **CURRENT_TIMESTAMP().
To get around this, the engineer abandons the MV and sets up a Scheduled Query that runs a CREATE OR REPLACE TABLE script instead. Because Scheduled Queries are just standard SQL executed by a trigger, they fully support CURRENT_DATE().
To keep the fraud models or operational dashboards feeling “live,” they schedule this query to run every 10 minutes
CREATE OR REPLACE TABLE `fintech-prod.fraud_features.rolling_90d_txns` AS
SELECT
account_id,
merchant_category,
COUNT(transaction_id) as txn_count,
SUM(transaction_amount) as total_spend
FROM `fintech-prod.core_ledger.authorized_transactions`
WHERE transaction_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
GROUP BY 1, 2;
The Cost Problem:
Even if the authorized_transactions table is perfectly partitioned by transaction_date, aggregating 90 days of global transaction history requires substantial slot capacity to execute quickly.
When you run this heavy aggregation 144 times a day, it creates a constant background drain on your compute reservation. If your slot capacity is strictly capped, this drain starves critical regulatory reporting pipelines causing them to queue and miss SLAs. To “fix” the pipeline delays, the platform team enables Autoscaling. Now, instead of missing SLAs, this single ‘fake’ materialization triggers expensive pay-as-you-go autoscaling slots 144 times a day, generating massive overage charges at the end of the month. The query is perfectly written; the architecture is the financial leak.
2. The Market-Open Fleet Attack
Let’s say your FinTech has purchased a dedicated baseline commitment of 2,000 slots. Most of the time, this is plenty of compute for your asynchronous ETL workloads.
But user behavior is deeply tied to market hours. At 9:00 AM EST, 400 risk analysts, portfolio managers, and dispute operators all log in, open their BI tools (Tableau, Looker, Superset), and load their daily dashboards.
Suddenly, hundreds of complex, dashboard-generated queries hit BigQuery at exactly the same second. Your 2,000-slot baseline is instantly saturated. If Autoscaling is heavily capped, dashboards start hanging and failing with resourceExhausted errors. To prevent executive complaints about slow dashboards, the FinOps team increases the maximum autoscaling limit to 4,000 slots. Now, every single morning at 9:00 AM, you are paying a massive premium for slot-hours just to weather the 15-minute storm.
You can identify these “fleet attacks” by tracking slot consumption and concurrency by the minute for a specific BI service account:
SELECT
TIMESTAMP_TRUNC(creation_time, MINUTE) AS query_minute,
COUNT(1) AS concurrent_queries,
SUM(total_slot_ms) / (1000 * 60 * 60) AS total_slot_hours_consumed
FROM `region-us.INFORMATION_SCHEMA.JOBS`
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND user_email = 'looker-service-account@fintech-prod.iam.gserviceaccount.com'
-- Filter to your specific dashboard reservation
AND reservation_id = 'bi-reporting-prod'
GROUP BY 1
ORDER BY 3 DESC
LIMIT 20;
If your top results all cluster around market open, your cost problem is concurrency. Caching at the BI semantic layer, pre-warming dashboards via API at 8:45 AM, or building highly aggregated summary tables for the morning rush are your best defenses here.
3. The Wallboard Phantom Menace
Walk onto the floor of any major trading desk or fraud operations center, and you will see giant TVs on the wall displaying real-time operational metrics.
To keep these metrics fresh, a well-meaning operations manager will set the BI tool to auto-refresh every 3 minutes.
If those dashboards are querying a live, unaggregated view of the transaction ledger, you are essentially launching a full-scale analytical query on a continuous loop, 24/7. This looping behavior essentially lowers the effective capacity the organization paid for. Worse, no one turns these TVs off. They will happily burn thousands of autoscaling slot-hours over a holiday weekend while the trading floor is entirely empty.
Because these queries are generated by machines on a tight loop, their exact structure rarely changes. You can hunt down these repetitive resource hogs using the query hash:
SELECT
query_info.query_hashes.normalized_literals AS query_hash,
ANY_VALUE(query) AS sample_query,
COUNT(1) AS execution_count,
SUM(total_slot_ms) / (1000 * 60 * 60) AS total_slot_hours_consumed
FROM `region-us.INFORMATION_SCHEMA.JOBS`
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND statement_type = 'SELECT'
-- Focus on queries taking substantial compute
AND total_slot_ms > 60000
AND reservation_id = 'bi-reporting-prod'
GROUP BY 1
HAVING execution_count > 500
ORDER BY 4 DESC
LIMIT 10;
If you see a SELECT statement hitting your core ledger 10,000 times a month, you've likely found an orphaned wallboard or an insanely aggressive auto-refresh schedule.
The Takeaway
When investigating BigQuery capacity issues and overages in a mature FinTech environment, don’t just look at the EXPLAIN plan of a single expensive execution. Look at the timestamps, the execution counts, and the concurrency limits.
The queries draining your slots are rarely typos or cartesian joins. Often, they are structurally perfect architectural workarounds built to satisfy human behavior. Fixing them usually doesn’t require a SQL rewrite; it requires a conversation with the operations team about caching strategies, refresh cadences, and whether we really need a rolling 365-day fraud feature set updated every 10 minutes.
How QueryWise Solves This
Hunting down these architectural anomalies manually using INFORMATION_SCHEMA queries is tedious, and it often only happens after you’ve already received a massive bill or suffered through a severity-1 SLA breach.
This is exactly why we built QueryWise.
Instead of forcing your FinOps or Data Engineering teams to play detective, QueryWise automates the discovery of these silent slot-killers. By connecting your cloud database environments (including BigQuery, Snowflake, Databricks, and more) via read-only access, QueryWise correlates query metadata, execution plans, and actual billing data to give you per-query cost attribution.
Here is how QueryWise handles the FinTech scenarios above:
- Identifies the “Fake” MVs: QueryWise automatically flags highly repetitive, heavy scheduled queries (like our 10-minute CREATE OR REPLACE TABLE) and ranks them by their actual financial impact or slot consumption.
- Spots BI Tool Fleet Attacks: The platform correlates workload patterns to show you exactly which Looker or Tableau dashboard is saturating your underlying BigQuery Editions reservation during the 9:00 AM market open.
- Flags the Phantom Wallboards: QueryWise’s multi-vendor anti-pattern detection engine automatically identifies “zombie” queries — detecting when identical read-only queries are executing thousands of times a day without human interaction, instantly flagging aggressive auto-refreshes.
Instead of just telling you “compute costs are up 30%,” QueryWise shows you the exact SQL statement driving the spike, estimates the monthly savings of fixing it, and scores the effort and risk of the remediation.
If your BigQuery compute reservation feels like a black box, or you are tired of paying for autoscaling overages caused by dashboard refreshes, book a live demo or apply for the Design Partner Program to see exactly which queries are burning your cloud budget.