Weaver
Architecture12 min read

Business intelligence on a single data backbone

Every classical BI tool inherits the wrongness of its source warehouse. The lakehouse closed two of the three hops between operational state and the chart on your screen. A single data backbone closes the third.

By ·Founder, K3 LabsPublished

TL;DR

  • Every BI tool inherits the wrongness of its source warehouse. A chart in Tableau is a four-stage echo of state in the operational database: extracted, modeled, cached, then drawn. Each stage adds latency, schema drift, and a place where the truth can diverge from production.
  • The lakehouse (Armbrust et al., CIDR 2021) closed two of those hops. It collapsed the warehouse and the data lake into one substrate so analytical queries no longer need a separate copy. But the BI tool still sits on top — semantic layer, extract, cache, dashboard.
  • A single data backbone closes the third. When the operational store and the analytical store are the same store, BI is not a tool that reads a copy — it is a view of live state. No extract refresh. No semantic-layer drift. No “the dashboard says one thing, the app says another.”
  • The win is not query speed. The win is time-to-first-answer for a brand-new question. On a classical stack a new question is a ticket that lives with the data engineer for a day. On a single data backbone it is a query against live state that the framework writes in seconds.
  • What this does not replace: heavy machine-learning training, time-series forecasting at petabyte scale, and the genuinely separate-tenancy compute that you do not want running next to OLTP. The boundary is operational vs. truly batch — not BI vs. analytics.

Every BI stack is a three-hop pipeline

Pick any board-ready chart. Trace it backward. There are three hops between the row in the operational database and the pixel on your screen.

Hop one is extraction. The operational state — the row in the transactional database that records every invoice, every contact, every project — is copied somewhere else. Historically this was the data warehouse (Inmon, 1992; Kimball, 1996). Modern stacks copy into Snowflake, BigQuery, or Databricks SQL. The copy is governed by a refresh schedule: hourly for an extract-based Tableau setup, every few minutes for an Airbyte CDC pipeline. The schedule is the floor on analytical freshness.

Hop two is modeling. The warehouse copy is reshaped into the dimensional schema the BI tool understands — star or snowflake schemas around fact tables. This is where Looker’s LookML lives, where the dbt models live, where the Tableau data sources live. The semantic layer is a hand-written translation between “how the operational system stores things” and “how the BI tool wants to query things,” and it drifts every time someone changes either side without re-translating.

Hop three is rendering. The BI tool issues a query against the modeled warehouse, caches the result for performance, and draws the chart. Caching means the chart you are looking at can be older than the freshness floor from hop one.

Every hop introduces three failure modes: latency (the chart is stale), schema drift (the chart says something different from the app), and trust loss (the engineer who maintained the modeling layer left, and now the column called net_revenue means something nobody can confidently explain).

The classical BI stack tries to manage these failures with tooling on top of the pipeline: data observability, semantic catalogs, lineage. None of them remove the underlying issue, which is that you have three independent copies of business truth that have to agree.

The lakehouse closed two of those hops

Armbrust et al.’s 2021 CIDR paper named the lakehouse and described why it works: a single open-format store that holds both raw operational data and the analytical aggregates a BI tool wants. Delta Lake, Iceberg, and Hudi are the production implementations. Snowflake’s 2016 architecture paper (Dageville et al., SIGMOD 2016) had already shown that the same warehouse engine could serve both ad-hoc analytical queries and high-throughput OLAP workloads on a shared, decoupled-storage substrate.

The lakehouse step collapsed hop one (extract from operational DB to warehouse) and hop two (model the warehouse copy for the BI tool) into a single substrate. You still copy data into the lakehouse — but the copy is the substrate the BI tool reads from, not a separate intermediate. The semantic layer lives next to the data, not on top of it.

What it did not collapse is hop three. The BI tool still sits on top. Looker still has its own LookML. Tableau still maintains its own extracts. The lakehouse is closer to the truth than the classical warehouse stack — but BI is still the activity of reading a copy of the truth, asynchronously.

A single data backbone closes the third

A single data backbone — the architecture pattern Weaver ships — takes one more step. The operational store and the analytical store are the same store. The same engine that records the invoice in the ERP is the engine the BI question runs against. There is no copy. There is no extract. There is no semantic layer between the two, because there is no two.

The backbone is engineered for both workloads: ACID transactions for the operational write path, columnar layout and vectorized execution for the analytical read path. Sub-second responses against live state are the design point, not an unusual case.

What that gives you, concretely:

  • No staleness window. The chart you are looking at reflects state that exists this millisecond, including invoices written sixty seconds ago.
  • No schema drift between operational and analytical. The column the app writes to is the column the chart reads from. There is no translation layer that can disagree with production.
  • No separate BI ownership. BI is not a tool a separate team operates. It is a view, in the same product surface as the operational app, owned by the same team that owns the operational data model.

What it buys you, in numbers

The first measurable delta is latency. Figure 1 shows the end-to-end p50 latency for the same canonical question — “revenue by region this month, including invoices written in the last sixty minutes” — on four stack shapes.

Figure 1

End-to-end query latency for the same canonical BI question

p50 latency, log scale. Same query: revenue by region this month, including invoices written in the last 60 minutes.

Snowflake → Tableau (hourly extract)

1 hr

Freshness bounded by extract schedule; query is fast once data lands.

Databricks → Looker (live SQL)

4.5 s

Live but semantic-layer + cache hop adds latency to every query.

Databricks SQL (direct, no BI tool)

1.8 s

Warehouse responds; no BI front-end semantic-layer overhead.

Single Data Backbone (Weaver, native)

350 ms

Operational state is the analytical store; no warehouse, no extract.

Numbers are illustrative midpoints synthesized from public vendor documentation for each stack at typical mid-market configurations. The point is the order-of-magnitude contrast between “live operational state” and “state copied through two layers before the BI tool sees it” — not a vendor benchmark.

The classical Tableau-on-Snowflake stack is not slow at the query layer. It is bounded by its extract refresh schedule, which is what shows up as “1 hour” on the chart. The query itself, once the extract has refreshed, returns in a couple of seconds.

The Databricks-with-Looker stack is fast at the query layer but eats latency at the semantic-layer hop and at the Looker explore evaluation. The same query against Databricks’s own SQL UI, without the BI tool in the path, is more than twice as fast.

The single data backbone row is an order of magnitude faster again. Same query, against live state, no extract, no modeling layer, no BI tool semantic layer.

The bigger win is time-to-insight

Query latency is the smaller story. The bigger one is what happens when a stakeholder asks a question that has no pre-built dashboard. On a classical stack that is a ticket for the data team. On a single data backbone it is a sentence the framework can translate.

Figure 2

Time to first usable answer for a brand-new ad-hoc question

Wall-clock hours, broken into queue, work, and review. Same question, same business, four different stack shapes.

Classic stack: Snowflake + Tableau, data-eng-mediated

1.5 d

  • queue: ticket sits with data eng 1.0 d
  • work: write SQL + validate 4.0 h
  • work: refresh Tableau extract 1.0 h
  • review: stakeholder back-and-forth 8.0 h

Lakehouse + Looker (semantic layer pre-built)

9.0 h

  • queue: ticket sits in BI backlog 4.0 h
  • work: extend LookML model 2.0 h
  • work: write Looker explore 1.0 h
  • review: stakeholder back-and-forth 2.0 h

Self-serve BI on warehouse (no semantic layer)

8.0 h

  • queue: none, but stakeholder writes the SQL 0 min
  • work: stakeholder writes & debugs SQL 6.0 h
  • review: data-eng correctness review 2.0 h

Single Data Backbone (Weaver native)

10 min

  • work: ask the question in the app 3 min
  • work: framework returns live answer 1 min
  • review: framework cites sources inline 6 min
Queue Work ReviewNumbers synthesized from internal mid-market benchmarks and Locally Optimistic 2024 survey responses on time-to-first-answer.

The classical stack’s wall-clock is dominated by queue time, not query time. The data engineer has thirty tickets ahead of yours; even when they pick up the work, the SQL is fast but the back-and-forth review with the stakeholder takes a day. Self-serve BI on a warehouse removes the queue but moves the SQL burden onto the stakeholder, who is not a data engineer and gets it wrong the first time.

The single-data-backbone row is short for a different reason: there is no SQL written by hand. The framework knows the schema (because the schema is the operational schema), knows the access controls, and can answer in the same UI surface where the stakeholder is already working. The stakeholder asks the question; the framework returns the answer with cited sources for verification.

What this does not replace

A single data backbone is not a replacement for every analytical workload. Three categories stay outside its perimeter:

  • Heavy ML training. Petabyte-scale gradient descent on multi-hundred-GPU clusters genuinely belongs on dedicated infrastructure. The backbone is the source of training data, not the place training runs.
  • Long-horizon time-series forecasting. Multi-year cohort analysis with custom statistical modeling has different access patterns from operational BI. It can be mirrored to a warehouse for that workload without losing the operational guarantees on the primary backbone.
  • Truly separate-tenancy compute. If finance needs to crunch a year of data without contending with operational read traffic, that should run on its own compute. The backbone publishes a snapshot, and the snapshot is the input.

The boundary is operational vs. truly batch — not BI vs. analytics. Most of what gets called “business intelligence” in a mid-market company is operational: revenue, pipeline, AR aging, project profitability. That collapses onto the backbone. The genuinely batch workloads stay where they belong.

How Weaver does it

Weaver’s Single Data Backbone is the operational store for every native app — ERP, CRM, expense management, projects, growth engine — and simultaneously the analytical store those apps query. There is no separate warehouse. There is no extract schedule.

The first cross-product proof of this thesis is ARC Gaming & Technologies — a route operator running CRM, asset lifecycle for 2,000+ machines, accounting, expense, and payroll on Weaver, with the same backbone serving operational writes and the BI questions the operations team asks against that state.

The architecture peers with Databricks + Salesforce at the data-platform layer. The contribution is that the apps come built in.

References

Where to go next