From Ledger to Forecast: Building an ML Pipeline on the XRP Ledger DEX

Albert Simmons·

Feb 18, 2026 09:00 UTC

machine-learning architecture dex technical

Most machine learning research in crypto starts with the same data: minute-level OHLCV bars pulled from a centralized exchange API. Binance, Coinbase, Kraken. The data arrives pre-aggregated, sampled, rate-limited. You see what the exchange decides to show you.

The XRP Ledger is different. Its decentralized exchange is built directly into the protocol — an on-chain order book where every trade, every order placement, every cancellation is recorded as a transaction in the ledger itself. Nothing is filtered, sampled, or intermediated. You get the complete microstructure: the full order book at every ledger close, every fill, every payment, at roughly four-second resolution.

This matters for ML because the quality of your predictions is bounded by the quality of your data. When you're working with sampled minute bars, you've already lost information before your model sees it. When you're working with ledger-native data, the information loss is minimal. Every participant's action is observable.

The XRP/RLUSD pair — XRP against Ripple's native USD stablecoin — provides a clean USD-denominated market directly on the DEX. No wrapping, no bridging, no centralized intermediary. Just two assets settling on the ledger in seconds.

I built XRPulse to take advantage of this data. Not because the XRPL DEX is the largest venue by volume, but because it is the most transparent. The entire market microstructure is auditable, on-chain, in real-time. For the kind of ML system I wanted to build — one grounded in interpretable signals from observable market behavior — it's the ideal data source.

System Overview

XRPulse is structured as a six-stage pipeline: from raw ledger data to calibrated forecasts displayed on a live dashboard. The system runs in two modes — an offline training pipeline orchestrated by Dagster, and a real-time streaming inference engine that produces predictions continuously as new ledger data arrives.

The stages break down as follows:

Data Collection subscribes to the XRP Ledger via WebSocket and persists raw market data into PostgreSQL for historical analysis and Redis for real-time access. This dual-write architecture supports both the batch training pipeline and the streaming inference path.

Feature Engineering transforms raw market data into the structured inputs the model consumes. This runs as a Dagster pipeline for offline training and as a mirrored computation in the streaming worker for real-time inference. A core design principle: features must be computed identically in both paths. Training/serving skew — where offline feature computation diverges from real-time computation — is one of the most common and insidious failure modes in production ML systems. We solved this architecturally, not with ad-hoc reconciliation.

Model Training produces the gradient boosted models and calibration artifacts used for inference. More on this below.

Real-Time Inference takes live features, runs them through the trained model, applies uncertainty quantification and safety checks, and emits a calibrated forecast.

Dashboard renders the forecast, confidence intervals, and market data on the XRPulse web interface in real-time via WebSocket.

The key technology choices — Python, PostgreSQL, Redis, Dagster, CatBoost, Next.js — were selected for pragmatic reasons. PostgreSQL and Redis provide a reliable dual-store for batch and streaming access. Dagster gives us reproducible, observable training pipelines with built-in lineage tracking. CatBoost handles the core prediction task. Next.js powers a responsive dashboard. Nothing exotic, just the right tools for each job.

Data Foundation

Understanding how XRPL transactions map to market data is foundational. The ledger provides three primary data channels relevant to forecasting.

OfferCreate transactions represent limit orders on the DEX order book. When these orders match against existing orders, they produce fills — executed trades with a price, a volume, and a direction. From fills, we derive OHLC price candles and trade counts per ledger close.

Payment transactions capture cross-currency settlements that execute through the DEX's autobridging mechanism. These contribute to volume measurement and can reveal liquidity flows that OfferCreate transactions alone miss.

Order book state is captured at each ledger close: the current bids and asks, their depths, the spread between them. This is the real-time liquidity landscape — how much capital is willing to buy or sell at each price level.

All of this is aggregated per ledger close into a structured market state record at approximately four-second resolution. To put that in perspective: most crypto ML research works with one-minute bars, which means 15 ledger closes of microstructure detail compressed into a single candle. We work with the raw resolution.

The prediction target is expressed in log-return space — we predict the change in log price, not the price itself. This is standard practice in quantitative finance for a simple reason: raw prices are non-stationary (they trend), but returns are approximately stationary. Stationary targets are far easier for models to learn from. A model that predicts "XRP will be $2.47" is fragile; a model that predicts "the log-return over the next interval will be positive" generalizes across price regimes.

The ML Approach

Why Gradient Boosted Trees

The choice of model architecture was driven by the evidence. Grinsztajn et al. (2022) showed that tree-based models consistently outperform deep learning on structured tabular data — and our feature set is exactly that: structured tabular features derived from market microstructure. CatBoost specifically (Prokhorenkova et al., 2018) handles categorical features natively, is robust to overfitting with built-in regularization, and supports incremental retraining — a property that matters when your data distribution shifts over time.

We chose gradient boosted trees — specifically CatBoost — because the academic evidence strongly favors them for this problem type. The allure of deep learning architectures (transformers, LSTMs) is real, but on tabular data with a moderate feature set, they typically underperform ensembles of decision trees while requiring orders of magnitude more compute.

Feature Philosophy

Features fall into broad categories familiar to anyone working in market microstructure: order book structure, momentum and trend indicators, volatility measures, and temporal patterns. We follow the taxonomy established by Kumbure et al. (2022) in their survey of ML techniques for financial forecasting — not because it's prescriptive, but because it provides a useful organizing framework.

The feature engineering philosophy is: start broad, prune aggressively. We use SHAP values (Lundberg & Lee, 2017) to understand which features drive predictions and systematically eliminate those that contribute noise rather than signal. The final feature set is an order of magnitude smaller than where we started. Every surviving feature has earned its place through measured contribution to out-of-sample performance.

Ensemble Architecture

The core model uses CatBoost. Multiple model outputs are combined into a final prediction, but the ensemble architecture is where much of the system's value lies and is therefore proprietary. What I can say is that the system predicts both magnitude (how much the price will move) and direction (which way) as complementary signals. Magnitude without direction is useless for trading. Direction without magnitude tells you nothing about sizing. The pipeline produces both.

Uncertainty Quantification

This is where the system diverges from most crypto ML projects. A point prediction — "the model thinks XRP will go up" — is nearly worthless without a measure of how confident that prediction is. If the model says "up" but the confidence interval spans the entire historical range, the prediction carries no actionable information.

We use conformal prediction, specifically building on the framework introduced by Angelopoulos et al. at NeurIPS 2023 in "Conformal PID Control for Time Series." The core idea is elegant: instead of assuming a particular distribution for prediction errors (Gaussian, Student-t, etc.), conformal prediction constructs confidence intervals directly from observed errors using only the assumption of exchangeability. The result is distribution-free coverage guarantees. If you target 95% coverage, the interval will contain the true value approximately 95% of the time — regardless of the underlying distribution of returns.

This matters in financial markets where returns are famously non-Gaussian. Heavy tails, volatility clustering, and regime changes make parametric assumptions dangerous. Conformal prediction sidesteps the problem entirely.

The 95% confidence interval visible on the XRPulse dashboard is powered by this approach. When the interval is narrow, the model is confident. When it widens, conditions are uncertain. That visual signal — the breathing of the confidence band — is arguably more useful than the point prediction itself.

Online Adaptation

Financial markets are non-stationary. The volatility regime of January is not the volatility regime of February. A model trained on last month's data will degrade this month — not because it was bad, but because the world changed. Any system that doesn't account for this is building on sand.

XRPulse addresses non-stationarity through several layers of online adaptation.

Confidence interval recalibration. The conformal prediction framework includes a feedback loop: after each prediction, the system observes the true outcome, computes the prediction error, and adjusts the confidence interval width. This is the PID control concept from Angelopoulos et al. — a controller that treats coverage as a control signal and adjusts the interval to maintain the target level. When the model is temporarily miscalibrated (too wide or too narrow), the controller corrects it within a few observations.

Regime awareness. Market conditions shift between calm and volatile periods. The system detects these transitions and adapts its uncertainty estimates accordingly — faster adjustment during regime shifts, slower during stable periods. This prevents both sluggish responses to genuine changes and overreaction to noise during calm markets.

Incremental retraining. CatBoost supports warm-starting from a previously trained model, allowing the system to incorporate new data without full retraining from scratch. This is critical for a system that needs to adapt on a daily or sub-daily cadence. A full retrain takes hours; an incremental update takes minutes.

Bias correction. The system maintains a running estimate of its own systematic prediction error and corrects for it. If the model consistently overshoots in a particular regime, the correction term pulls predictions back toward observed reality. This is a simple idea — exponentially weighted tracking of residuals — but it closes the loop between prediction and observation in a way that static models cannot.

The philosophical principle behind all of this: the system should never be fully confident in its own parameters. Every weight, every threshold, every calibration constant is provisional. The market is the ground truth, and the system's job is to listen.

When the System Says "I Don't Know"

Most ML systems in crypto predict confidently all the time. XRPulse doesn't. The ability to say "I don't know" is, in my view, the most valuable property a prediction system can have.

The reasoning is simple: a wrong prediction with high confidence is worse than no prediction at all. If the system signals "strong buy" during conditions it has never seen before, and a trader acts on that signal, the outcome depends entirely on luck. A system that flags its own uncertainty gives the human operator the one thing they actually need: an honest assessment of whether the model's output is trustworthy.

Out-of-distribution detection. The system continuously monitors whether current market conditions resemble the data it was trained on. When they don't — when a feature pattern falls outside the historical envelope — the system flags the prediction as potentially unreliable. This draws on the e-value framework from sequential testing theory (Vovk & Wang, 2021; Ramdas et al., 2023), which provides formal, anytime-valid measures of evidence for distribution shift.

The consensus principle. No single detector can trigger an alert. Multiple independent monitoring signals must agree before the system declares current conditions anomalous. This reduces false positives: one noisy sensor shouldn't shut down the pipeline.

Prediction abstention. When the uncertainty check fails — when the system determines that it genuinely doesn't have enough information to produce a reliable forecast — it abstains. The dashboard shows no prediction rather than a misleading one.

These guardrails were not designed in advance on a whiteboard. They were built iteratively, in response to real failures.

On February 7, 2026, XRP experienced a flash crash on the XRPL DEX: the price dropped from approximately $1.42 to $0.105 in a matter of seconds — a 93% collapse — before recovering almost immediately. This is publicly verifiable on-chain data; anyone can look up the ledger transactions. A volume spike of multiple orders of magnitude accompanied the dislocation.

The system detected this as anomalous and abstained from predicting during the event. But the detection wasn't instantaneous in the first implementation — the guardrails were hardened specifically because of what this event revealed about edge cases in the monitoring pipeline. Honesty requires saying: the current safety mechanisms exist because earlier versions were insufficient. Every guardrail in the system is a scar from a past failure.

What We've Learned

Building this system has reinforced a few convictions and taught me some new ones.

Training/serving skew is the silent killer. The most dangerous bugs in production ML are not model bugs — they're data bugs. If feature computation diverges even slightly between your training pipeline and your inference path, your model sees data at inference time that it was never trained on. The predictions will be wrong in ways that are invisible to standard monitoring. We solved this at the architecture level, ensuring that features are computed from the same logic in both paths.

Uncertainty is more valuable than accuracy. A well-calibrated confidence interval is more useful for decision-making than a more "accurate" point prediction with unknown reliability. If a trader knows the prediction is uncertain, they can reduce position size, tighten stops, or sit out entirely. If they don't know, they're flying blind. Most ML benchmarks optimize for accuracy; real-world systems should optimize for calibration.

Data quality at the source matters more than model sophistication. The most expensive lesson in machine learning is learning that your model is faithfully reproducing patterns in garbage data. Ledger-native data quality checks — validating that prices, volumes, and spreads make physical sense — prevent downstream model pathologies that no amount of regularization can fix.

The system is never done. Markets evolve. Models degrade. New failure modes emerge that no test suite anticipated. Building for adaptation — making every parameter provisional, every threshold adjustable, every component replaceable — is more important than building for peak performance on today's data. The February flash crash broke assumptions that were reasonable the day before. The next surprise will too.

The honest summary: I don't have a crystal ball, and neither does the model. What I have is a system that is transparent about its limitations, that corrects itself when it's wrong, and that refuses to predict when it can't. That's not a finished product. It's a foundation for iterating in public.

What's Next

The system continues to evolve. Each market event reveals new edge cases, each model iteration improves calibration, and each architectural decision compounds. I'll continue writing about what works, what fails, and what I'm learning.

If you want to see the system in action — the confidence intervals breathing, the predictions updating in real-time — the XRPulse dashboard is live.

References

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. NeurIPS.
Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on typical tabular data? NeurIPS.
Lundberg, S. M. & Lee, S. I. (2017). A unified approach to interpreting model predictions. NeurIPS.
Angelopoulos, A. N., Barber, R. F., & Bates, S. (2023). Conformal PID control for time series. NeurIPS.
Vovk, V. & Wang, R. (2021). E-values: calibration, combination and applications. Annals of Statistics.
Ramdas, A., Grünwald, P., Vovk, V., & Shafer, G. (2023). Game-theoretic statistics and safe anytime-valid inference. Statistical Science.
Kumbure, M. M., Lohrmann, C., Luukka, P., & Porber, J. (2022). Machine learning techniques and data for stock market forecasting: a literature review. Expert Systems with Applications.
Fama, E. F. (1970). Efficient capital markets: a review of theory and empirical work. Journal of Finance.