A close reading of Uber's machine-learning forecasting platform — the planning loop, the scenario graph, the optimizer, and the operating model around the ML.

What Uber Actually Built for FP&A

Dan Baciu · 2026-04-22 · ~14 min read

I was a FP&A leader and a very young CFO. I was fascinated — genuinely obsessed — with what world-class planning actually looked like. I came to believe it was a combination of people, process, and technology, in that order. Get the people right first. Then the process. Then the tools.

Then Uber came along and completely blew me away.

What they built is intellectually perfect. But they also had: all the data; a business set up with end-to-end observability into every key value driver; an engineering culture; the machine learning capability to build this level of infrastructure; immense funding and the willingness to spend it; and they built everything the right way from the start to get to this point.

It's clear to me that almost no business could replicate this. And yet I'm in awe. Like — wow.

This post is a deconstruction of what they built: how good it is, and how practically unreplicable it is. The framing and the planning-loop / scenario-graph mechanics come from a single Uber Engineering article — Transforming Financial Forecasting with Data Science and Machine Learning at Uber. Everything else is sourced inline.

The scale of the problem

Uber operates in 850+ markets.¹ $200B+ annualized gross-bookings run rate. 75 million active riders, 3 million active drivers, roughly 15 million trips a day. Each of those cities has its own budget, targets, and operations team. That is the volume of decisions that justifies what they built.

The supporting infrastructure runs at comparable scale. Michelangelo, Uber's machine learning platform, manages 5,000+ production models serving roughly 10 million predictions per second at peak. The query layer behind it is around 15 Presto clusters across 5,000+ nodes, with ~7,000 weekly active users running ~500,000 queries a day reading roughly 50 petabytes. The Finance Computation Platform — the immutable, GAAP-grade ledger underneath all of this — produces more than 40 billion journal entries per year off ~4 billion ride-sharing trips, with the financial accounting services platform processing on the order of 1.5 billion daily journal entries at an average 2,500 queries per second.

The point isn't size for its own sake. It is that the planning graph sits on infrastructure that would not exist if Uber hadn't already built it for the marketplace itself. Most companies don't have a marketplace. Most companies don't generate 15 million daily transactions with trip-level observability. The data that makes Uber's forecasting trustworthy isn't FP&A data — it's product data, logistics data, pricing data. FP&A gets to ride on it.

The three-phase planning loop

Uber replaced the annual budget with a continuous loop of three phases running on different cadences against the same shared model:

Strategic planning sets budget allocation and target expectations across the company and global markets. It runs against an annual baseline with a mid-year rebase and refreshes monthly inside that envelope.

Operations is where city teams execute against their assigned budgets on a much shorter loop — "as little as one week and as long as one month."

Insights monitors business performance, re-evaluates targets, and feeds back into the next strategic refresh. The pulse cadence is bi-weekly across global markets.

There is no budget season. Strategy, operations, and insights run concurrently against the same model the whole year. This is the single most important architectural choice — and it is entirely decoupled from machine learning. You could run this loop on a spreadsheet and it would still be better than what most companies do. The continuous cycle isn't a technology innovation. It's an operating model innovation. The technology just makes it scalable.

Exhibit 01 The continuous planning loop Source: Author's diagram, after Uber Engineering (2018)

Three phases, three cadences, one shared model. The architectural choice that decouples Uber's planning from the calendar and from spreadsheets.

The platform, three layers

The supporting platform is layered cleanly.

UI layer. A collaborative interface that lets multiple planners work on scenarios concurrently — explicitly designed to eliminate the spreadsheet-sharing pattern. One editor, shared view, override capture, approval workflows.

Computation layer. Two services. Scenario Management composes models, evaluates the scenario graph, and stores entities in Cassandra. The Optimization platform allocates budgets using mathematical optimization. Michelangelo, Uber's general-purpose machine learning platform, supports both.

Data & machine learning layer. A data pipeline, a finance data warehouse, and a metrics store, all feeding model training. Forecasting candidates race against each other on sliding and expanding windows before anything goes to production.

The data infrastructure underneath it all

If you read about Uber's planning platform and wonder whether you could build something like it, this is the section that answers that question. The honest answer is: almost certainly not. And the reason isn't the planning software. It's what's underneath it.

The short answer: probably not. Most of it isn't FP&A infrastructure at all — it is marketplace infrastructure that FP&A gets to ride on.

Ingest is Apache Kafka, in one of the largest deployments anywhere — trillions of messages and multiple petabytes daily, with custom routing, throttling, and a zero-data-loss tier specifically for financial events. Stream processing is Apache Flink. Real-time OLAP is Apache Pinot, with sub-minute freshness used for surge pricing and live operational dashboards. The batch warehouse is a 100+ petabyte Hadoop/HDFS data lake queried by Presto. Serving for online machine learning predictions is Cassandra, kept hot from precomputed batch jobs and streaming aggregates.

Underneath the finance side specifically is the Finance Computation Platform — an event-driven, append-only ledger that consumes business events from Kafka and produces immutable, idempotent, GAAP-grade journal entries per trip. In 2024, Uber finished migrating 250 billion historical transaction records into a dedicated Ledger Store with cryptographic sealing for audit. None of this was built for finance. It is the substrate that makes the forecasts trustworthy at trip-level granularity — which is what lets the planning graph attach a P&L node to a single ride.

Most companies can't build this because they never had a reason to build the marketplace layer first. Uber didn't build their data infrastructure to support FP&A. They built it to run a global real-time marketplace. FP&A inherited the asset.

Exhibit 02 The forecasting platform stack Source: Author's diagram, after Uber Engineering blog posts (Michelangelo, Orbit, Finance Computation Platform)

Layer 1 · Presentation collaborative · scenario-shared

UI layer

A collaborative interface — multiple planners work on scenarios concurrently. Explicitly designed to eliminate the spreadsheet-sharing pattern.

Scenario editor Override capture Scenario diff & compare Approval workflow

↓ scenarios & overrides

Layer 2 · Computation two services + ML platform

Computation layer

Composes models, evaluates the scenario graph, and allocates budget across the geographical tree using mathematical optimization.

Scenario Management graph eval Cassandra scenario store Optimization platform convex + gradient Ray distributed solver Michelangelo ML platform Palette feature store

↓ predictions & allocations

Layer 3 · Data & ML 5,000+ models · 10M predictions/sec peak

Data & ML layer

A data pipeline, finance data warehouse, and metrics store, all feeding model training. Forecasting candidates raced against each other on sliding and expanding windows.

Omphalos backtesting harness ARIMA / Theta / Holt-Winters XGBoost LSTM / ES-RNN Orbit Bayesian time-series CausalML lift estimates uMetric version-controlled KPIs

↓ feature definitions & ground truth

Substrate · Marketplace data spine 100+ PB lake · trillions of msgs/day

Data spine (not FP&A's; FP&A rides on it)

The infrastructure that makes trip-level forecasting trustworthy. None of this was built for finance — finance gets to attach a P&L node to a single ride because the marketplace already did.

Apache Kafka ingest, zero-loss tier Apache Flink stream processing Apache Pinot real-time OLAP HDFS data lake 100+ PB Presto 15 clusters · 5,000+ nodes Schemaless OLTP Finance Computation Platform 40B journal entries/yr Ledger Store 250B sealed records

The platform under FP&A is mostly the platform under the marketplace. The forecasting team's investment is the top three rows; everything below already existed.

Scenarios are directed acyclic graphs

A scenario in this system is not a copy of a spreadsheet. It is a directed graph of business metrics connected by computations — explicitly described as "2-colorable and acyclic" — evaluated by topological sort. A node is computed from its parents using a function: x3 = A(x1, x2); x4, x5 = B(x2, x3). Scenario computation is idempotent (meaning: run it a thousand times with the same inputs, you get the same outputs every time). The same inputs always produce the same outputs.

The graph contains three kinds of nodes: machine learning models trained on historical data, mathematical formulas implemented with existing math libraries, and simple arithmetic. The example given is literally trips × fare = gross_bookings.

Three planners can run three scenarios in parallel without stepping on each other. The comparison between scenarios is structural, not visual. Set the inputs at the top of the graph and the outputs fall out the bottom. Re-running a scenario with different inputs is a re-evaluation of the graph, not a copy-paste with Search and Replace.

The design discipline is worth naming: nothing forces every node to be "intelligent." Most FP&A metrics are not forecastable by machine learning in any useful sense — they are definitional (revenue = price × units) or they depend on human judgment (next quarter's hiring plan). Machine learning belongs at the nodes where it genuinely adds lift and nowhere else. Uber's graph puts machine learning in one or two places and lets the rest of the graph be boring. That restraint is harder than it looks.

The human override, preserved and auditable

The example in the source article is the most important part of the whole piece.

A São Paulo scenario starts with two inputs: $200 of acquisition_spending and $100 of engagement_spending. The model predicts that the $200 of acquisition spend produces 35 new rider signups. The downstream graph evaluates and produces a net_inflow of $1,274.

The local São Paulo team knows something the model doesn't — a marketing partnership, a competitive event, a seasonal pattern that hasn't shown up in training data. They override the signup prediction to 50. The system recomputes the downstream nodes. net_inflow moves to $1,350. The override is captured as part of the scenario, not as a comment in a cell.

Exhibit 03 A scenario as a directed acyclic graph — São Paulo, with override Source: Author's diagram, after the worked example in Uber Engineering's forecasting article

Scenarios are not copies of a spreadsheet — they are graphs evaluated by topological sort. The model produces the baseline; the operator's local knowledge becomes a first-class node, not a cell comment.

That is the architecture. The model produces the baseline. The operator explains the delta. The delta is auditable. Most FP&A teams get this wrong by either ignoring the model and going all-judgment, or trusting the model and discarding the operator's local knowledge entirely. Uber's graph captures both and treats the override as first-class data. This is what machine learning-driven, human-in-the-loop design actually looks like in practice — and it's a pattern I've come to believe in deeply: the model sets the prior, the human corrects for what the model can't know, and the system learns from both. Hyndman & Athanasopoulos formalize the same discipline as part of any production forecasting system in Forecasting: Principles and Practice.

Hierarchical objectives that differ by region

The geographical tree is global → region → city, but the objective function can differ at any node in the tree. Three live examples from the article:

Latin America — newer market, invest heavily on growth. Objective: maximize signups and trips.

North America — mature market, some cities near saturation. Objective: maximize net_inflow, subject to a minimum trip floor.

India — enormous untapped potential. Objective: maximize trips, subject to a unit-economics constraint.

The optimizer accepts objectives like minimize spending, maximize drivers or riders, maximize first trips, or maximize gross bookings, with constraints such as a maximum budget by channel, a minimum first-trips floor, or a minimum month-to-month gross-bookings growth rate. Different subtrees carry different objectives simultaneously.

Forcing a single corporate objective across a multi-region business destroys value. Every company does this. Most don't have the infrastructure to do anything else. Uber's structured graph makes it almost free to model segment-specific objectives that a flat spreadsheet structurally cannot.

Exhibit 04 The same tree, three different objective functions Source: Author's diagram, examples paraphrased from Uber Engineering's forecasting article

Different regions, different objective functions, same tree. The optimizer solves all three at once, with per-(city, channel, period) cells as decision variables.

The optimizer itself

The optimization platform uses two approaches in parallel.

Convex optimization. Easy to construct and implement, theoretically bounded by the assumptions required to keep the problem convex.

Gradient-descent optimization. Lets the team express richer non-convex problems. The trade-off is iteration count — gradient descent generates "significantly more iterations."

Operationally, the optimizer "turns the optimization problem into a big while loop, which could include thousands or even millions of iterations. In each iteration, the algorithm provides a set of initial metrics for each city." That is the per-cell allocation that justifies the platform cost — every city, every channel, every period.

The mechanics are more specific than the article suggests. The deep-learning response surface that maps incentive spend to marketplace outcomes is jagged and non-convex, so Uber smooths it with tensor B-spline regression over an adaptive sparse grid before handing it to the solver. In plain terms: the model's raw predictions form a bumpy, irregular surface across thousands of city-channel-period combinations. B-spline regression fits a smooth mathematical curve through that surface, using only the points that matter most (the "sparse grid") rather than every possible combination, which would be computationally prohibitive. The result is a clean, well-behaved approximation the optimizer can actually work with. B-splines give exact analytical gradients and let the team enforce monotonicity (more spend should not reduce volume) and convexity (diminishing returns) as hard constraints, so the optimizer cannot exploit spurious wiggles in the machine learning surface. The optimization itself runs ADMM — Alternating Direction Method of Multipliers, the workhorse decomposition algorithm formalized in Boyd et al.'s 2011 Foundations and Trends monograph — distributed over Ray, which is what makes the per-cell allocation across 600+ cities tractable in a single solve. Causal lift estimates from Uber's open-source CausalML package feed in as a penalty term, so the optimizer respects what A/B and switchback experiments have already proven about budget elasticity rather than re-deriving it from observational data. In concrete terms: billions of dollars of incentive spend, allocated across 600+ cities and dozens of channels, in a single automated run — instead of hundreds of separate analyst decisions made in spreadsheets.

Model retraining is monthly and automated

Every model is re-trained every month against fresh data. The article frames this as a workflow commitment, not a technology one: model refreshing, retraining, and backtesting are automated end-to-end, treating each model as a living artifact that gets re-fit on a schedule rather than a yearly artifact that is produced out of the annual planning cycle.

Underneath the cadence is Michelangelo's feature store, Palette, which holds the same feature definitions used at training and at serving time. Online/offline parity is structural — both paths read from the same governed feature definitions — which is what makes monthly automated retraining safe to ship without a human re-checking whether the production features still match the training features. The same uniformity carries into uMetric, Uber's metric platform, where every business KPI is defined once in a version-controlled YAML spec and consumed identically by dashboards, retraining jobs, and the planning graph.

The article is also honest about where the models fail. New markets with insufficient history are flagged as a known failure mode. So are one-off events — the example given is the Philadelphia Eagles' Super Bowl run distorting the projected trip number for February 2018. Models that incorporate seasonality handle calendar effects; the override mechanic handles everything else.

What the machine learning actually does

To be specific about where machine learning is applied: the stack is deliberately model-agnostic, with deep-learning models flagged as an active experiment for individual-user-level metrics. Machine learning sits at the demand and acquisition-response nodes; the rest of the graph is formulas and arithmetic.

The forecasting stack runs a deliberate hierarchy from classical statistical methods — ARIMA, the Theta method of Assimakopoulos & Nikolopoulos, Holt-Winters — to gradient-boosted trees (XGBoost), to deep learning (LSTMs, and the hybrid ES-RNN that won the 2018 M4 Forecasting Competition), to Bayesian time-series via Orbit, the company's open-source structural time-series package built on Stan and Pyro. Orbit is the workhorse for marketing-mix and budget-allocation forecasts. Its Bayesian Time-Varying Coefficients let the elasticity of each marketing channel drift over time rather than being pinned to a single regression coefficient — the difference between modeling marketing as a constant force and modeling it as one whose return shifts with the market.

Backtesting across all candidates runs through Omphalos, an internal framework that races models against each other on sliding and expanding windows so the platform picks per-metric rather than per-religion.

Does it actually work

External benchmarks are the most credible signal. Slawek Smyl, a forecasting engineer at Uber, won the 2018 World M4 Forecasting Competition with the hybrid ES-RNN architecture — beating both pure classical and pure deep-learning approaches by a clear margin. The methodology is published in the International Journal of Forecasting (Smyl 2020), and the M4 result write-up by Makridakis, Spiliotis & Assimakopoulos (2020) ranks the hybrid above all 60 other entries on a 100,000-series benchmark. The methods that placed those models first are the same ones running in production at Uber.

Public-company guidance is a noisier proxy but worth noting. In full-year 2025, Uber's gross bookings exceeded $200B, above the guided range. Adjusted EBITDA — the operational metric the platform actually forecasts — landed inside the guided range.

What this stack costs

Uber doesn't publish a line item for FP&A infrastructure, so the numbers below are back-of-envelope estimates pulled from disclosed comp benchmarks, the company's Form 10-K filings, and cloud-agreement disclosures — directionally useful, not audit-grade.

Annual run-rate · estimate

What it costs to run Uber's FP&A platform

Component	Low	High	Basis
Labor (200–300 FTE)	$120M	$250M	Financial DS + finance engineering + Strategic Finance, at Bay Area total comp $250–600K depending on level.
Platform R&D allocation	$50M	$150M	Sliver of $3.4B FY25 R&D pointed at Michelangelo, Palette, Finance Computation Platform, uMetric, and the data lake — shared with other tenants.
Cloud & compute share	$20M	$50M	FP&A's portion of the seven-year OCI + GCP commitments, plus on-prem residual during migration.
Total annual run-rate	$190M	$450M	Order of magnitude. Allocates against tens of billions in marketing & incentive spend.

A 1% efficiency gain on the budget this platform allocates pays for the platform several times over. That is the only reason a finance org gets to spend nine figures a year on its own infrastructure.

The stack as an operating model

Pull the pieces together and the design is coherent. Continuous planning against a shared scenario graph. Machine learning at the nodes that benefit. Operator overrides preserved and auditable. Region-specific objectives handled by an optimizer that solves the whole tree simultaneously. Monthly automated retraining keeping the models current.

The transformation isn't the machine learning. The transformation is the operating model around it.

I came up in FP&A believing that world-class planning was a combination of people, process, and technology — in that order. I still believe that. But Uber did something I hadn't fully accounted for: they collapsed the gap between all three. Their people are world-class. Their process runs continuously rather than annually. And their technology encodes the process so precisely that the framework itself is the operating model — not a support layer for it.

What they built is intellectually perfect. It is also practically unreplicable for almost any business on earth. You need the data infrastructure, the engineering culture, the machine learning platform, the funding, the willingness to spend, and — critically — a decade of building everything the right way from the start. Most companies have one of those things. A handful have two or three. Uber has all of them.

That is what makes it worth understanding. Not to copy it. But because seeing the ceiling tells you something about what's actually possible. And the ceiling, it turns out, is higher than I thought.

For whether any of this is worth copying — and the comparison against the more conventional options of hiring more analysts or implementing Anaplan / Workday Adaptive Planning — see the upcoming people vs. Anaplan vs. Uber.

Sources

Primary source. Uber Engineering Blog. Transforming Financial Forecasting with Data Science and Machine Learning at Uber.

Uber engineering — supporting platforms.

Uber Engineering Blog. Meet Michelangelo: Uber's Machine Learning Platform.
Uber Engineering Blog. Orbit: A Bayesian Time Series Forecasting Package.
Uber Open Source. CausalML.

Methods cited (peer-reviewed).

Smyl, S. (2020). A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting, 36(1), 75–85.
Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2020). The M4 Competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1), 54–74.
Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.
Moritz, P. et al. Ray: A Distributed Framework for Emerging AI Applications.
Hyndman, R. J., & Athanasopoulos, G. Forecasting: Principles and Practice (3rd ed.).

Public-company disclosure. Uber Technologies, Inc. Form 10-K filings (2021–2024) and Q4 2025 earnings release.

¹ 600+ ridesharing cities and 250+ Uber Eats markets. ↩