Skip to content

From Ledger to Forecast: The Architecture for an Algorithmic Trading System

Albert Simmons

Building XRPulse reframed how I think about software: how does a system transform information into signal?

On the surface, the answer seems obvious: data, features, models. Each component is necessary, but their interaction determines whether signal is preserved or obscured by noise.

What matters is not just the inputs, but how a system is structured to transform them. Architecture determines what is preserved, what is discarded, and what ultimately becomes signal. Here is what XRPulse preserves, what it discards, and what it shapes into signal.

This technical overview describes the architecture rather than trading performance or execution mechanics.

XRPulse is a predictive algorithmic trading system on the XRP Ledger's decentralized exchange (DEX). It forecasts next-bar log-returns from microstructure signals and converts forecasts into trades only when the expected edge is strong enough after costs. It is not a market-making strategy, not arbitrage, and not high-frequency trading: the system samples on information bars (event-driven, not clock-driven), and abstains far more often than it acts.

The System at a Glance

XRPulse ingests raw ledger events, structures them into bars that carry information rather than time, generates features that define the hypothesis space, trains models on sliding windows rather than fixed splits, and serves forecasts with uncertainty bounds.

Two pipelines run this system:

  • Offline (batch): trains models on sliding windows and evaluates them. A deployment gate decides if a trained model reaches production.
  • Online (streaming): serves forecasts in real time and monitors model health, calibration, and trade economics. Resolved forecasts and execution telemetry inform the next experiment.

The split reflects different operating constraints. Training needs a fixed dataset and the ground truth; serving needs low latency and streaming state. The two paths stay aligned through a shared feature and prediction pipeline: the same transformations used in offline evaluation are reused online, with parity tests guarding against train/serve skew.

XRPulse system architecture: a shared pipeline (Market Data → Market State → Information Bars → Feature Engineering) forks into Offline (reproducible experiments, per release: Model Training → Evaluation) and Online (streaming forecasts, live bar updates: Real-time Inference → Monitoring). The Deployment Gate crosses a model from offline to online. Monitoring feeds back: realized performance informs the next experiment. A legend distinguishes data flow from feedback.

Market Data

which channels to subscribe to.

The XRP Ledger has a native DEX built into the protocol. The DEX has two liquidity mechanisms: a central limit order book (CLOB) and Automated Market Maker (AMM) pools. XRPulse treats the CLOB as the primary quoting surface, while separately tracking AMM and auto-bridge activity because they affect venue state and realized execution costs. Every trade, every order placement, every cancellation is recorded on-chain. No venue gatekeeper samples the data before you see it; no aggregator rate-limits or rewrites it. Tick-level events are the building blocks for per-ledger microstructure state.

XRPulse extracts those events through two XRP Ledger WebSocket methods, scoped to the XRP/RLUSD market before any model-facing state is built:

  • subscribe with a books filter: a persistent stream of every transaction that touches the XRP/RLUSD book.
  • book_offers polled per validated ledger with taker_pays and taker_gets set to the XRP/RLUSD pair (XRP by currency only; RLUSD by currency code and issuer), capturing order book depth on both bid and ask sides.

Two kinds of ledger data matter for this market state: transaction events and venue state. The transaction stream is filtered by type:

  • OfferCreate: limit orders on the DEX. Filled offers are price-eligible and volume-eligible.
  • Payment: cross-currency settlements routed through the book or bridge paths. Stored for volume, flow, and liquidity-impact features, but excluded from price-discovery targets because the effective rate can reflect path-sweeping across intermediate currencies.
  • OfferCancel: withdrawal events preserved alongside fills as part of the event stream.

Separately, XRPulse tracks AMM state and AMM-routed fills because the pool can affect liquidity, venue state, and realized execution costs even when the CLOB remains the primary quoting surface.

Only validated ledgers are stored: confirmed by network consensus and immutable. A closed ledger is merely proposed; a validated ledger is final.

The XRP Ledger validates roughly every 3–5 seconds, but not every validated ledger contains useful XRP/RLUSD information. The raw store can observe ledgers continuously; the model consumes event-driven bars built from active market state. The observation cadence is market-driven, not clock-driven.

Market Data extraction: XRP Ledger (XRP/RLUSD) feeds transaction events, order book depth, and AMM venue state into the market-data layer.

Market State

what to aggregate per validated ledger.

Raw events do not go directly into a model. They are aggregated into a canonical market-state record keyed by validated ledger. Each row reduces the model-relevant dimensions into a consistent representation: prices, depth, spread, trade flow, cancellation activity, venue mix, and selected AMM/RLUSD-flow context.

This layer does two jobs.

It makes the data idempotent: the same event stream always produces the same state. Every training run is a reproducible experiment because the same inputs always produce the same state.

It gives every downstream component a stable contract: ask for state at ledger T and get the same answer whether training offline or inferring online.

The canonical state abstraction enforces these guarantees at the architecture level (Sculley et al., 2015).

Market State: OfferCreate and cancel events, Payment flow, order book depth, and AMM venue state aggregate per validated ledger into a canonical market state record.

Information Bars

what closes a bar.

The prediction target is expressed in a log-return space. Raw prices are non-stationary; returns are approximately stationary. Stationary targets generalize across price regimes.

Log-returns over what interval? That is the question financial data structures answer.

Time bars are the standard among practitioners and academics: one-minute, one-hour, or daily intervals. A fixed clock produces the same number of observations whether the market is trending, volatile, or silent. Quiet periods oversample noise; volatile periods undersample information. Standard evaluation protocols assume observations are roughly exchangeable and information-dense. Clock-based sampling distorts both.

XRPulse uses information-driven sampling. Bars close when the market produces enough new information to warrant a new observation, not when a clock ticks. The principle is simple: the model should learn from market activity rather than from arbitrary time intervals. The concept follows López de Prado's broader argument in Advances in Financial Machine Learning (Wiley, 2018): sampling should be driven by information, not by the clock.

The candles visible on the XRPulse dashboard are a display convention for readers. They are not what the model consumes.

Time-based bars (fixed intervals) versus information-driven bars (variable intervals triggered by market activity)

Feature Engineering

which features survive selection.

Features define the hypothesis space.

The feature set spans categories including microstructure, momentum, volatility, statistical, temporal, complexity, spectral, venue-mix, and stablecoin-flow signals. Each captures a distinct aspect of market behavior. Together they give the model a multi-dimensional view of the current regime.

Two principles harden the feature pipeline.

Shared generators. Features are computed from shared SQL/Python generators, and the streaming path is tested against the batch path. This reduces train/serve skew, a silent killer of production machine learning (ML).

Aggressive selection. The pipeline starts with a broad candidate pool and prunes hard. Selection runs clustered feature importance (López de Prado, 2018): grouped permutation importance computed across temporal folds, converted into e-value evidence per cluster, then gated with e-BH-style false-discovery control and an effective-tests correction for correlated features (Wang & Ramdas, 2022; Li & Ji, 2005). A feature also has to show a distribution stable enough to be modeled. The final feature set is much smaller than the candidate pool.

When capital is at risk, features should not survive because they sound plausible. The model itself is a tree ensemble and is not interpretable in any strict sense; what is auditable is the selection process. A feature reaches production only if it repeatedly shows statistically meaningful contribution across temporal out-of-sample tests.

Feature Engineering pipeline: candidate categories including microstructure, momentum, volatility, statistical, temporal, complexity, spectral, venue mix, and stablecoin flow funnel into clustered feature importance, e-values per cluster, e-BH FDR plus Li & Ji correction, producing selected features.

Model Training

how to train so generalization is proven, not assumed.

A common failure mode in production machine learning is leakage: the model sees information during training that it will not have at inference time. The architecture prevents this through prequential sliding-window evaluation. Each window trains on a fixed historical segment, enforces an embargo period sized to cover both the maximum feature lag and the forecast horizon, and evaluates on unseen data. No expanding windows. No fixed splits. The evaluation protocol mirrors the deployment conditions the model will face.

Prequential sliding windows: stacked windows step forward across the dataset by a fixed stride. Each window is a Train segment, an Embargo gap, and a Test segment. The embargo separates training from testing data so no information leaks across the boundary.

The model is a CatBoost gradient-boosted regressor over the selected feature set. The current architecture estimates both a central forecast and uncertainty around that forecast, so downstream layers can distinguish a small expected move from a noisy regime where abstention is safer.

The choice of gradient boosting over deep learning was evidence-driven: tree-based models consistently outperform deep architectures on structured tabular data (Grinsztajn et al., 2022). CatBoost specifically uses ordered boosting to produce unbiased gradient estimates (Prokhorenkova et al., 2018), which avoids the prediction shift that standard boosting accumulates from reusing the same examples for both gradient computation and leaf fitting.

Each training run is an isolated, reproducible experiment. Dagster orchestrates the pipeline as a directed acyclic graph where every asset declares its dependencies explicitly. Experiment IDs are content addressable: the same versioned code, data snapshot, feature definitions, and configuration produce the same hash, and model IDs extend that identity with hyperparameters. Randomness is controlled through fixed seeds. Any change, however small, yields a distinct model ID.

PostgreSQL serves as the system of record for experiment lineage, model artifacts, and feature state, giving every prediction end-to-end traceability back to the exact code and data that produced it. This level of rigor is necessary because production machine learning systems accumulate hidden dependencies and failure modes across data, code, and configuration boundaries (Sculley et al., 2015).

I accept two trade-offs. First, gradient boosting gives up the representational power of deep sequence models. For the current feature set and data scale, the evidence favors trees. That trade would reverse with richer sequential structure or orders of magnitude more data. Second, tree-based model outputs are bounded by the leaf values seen during training. If the market enters a regime the model has never seen, predictions are bounded by the training distribution, not by the new conditions (Grinsztajn et al., 2022). This is why the uncertainty, monitoring, and abstention layers downstream are not optional: they help separate "small expected move" from "wide uncertainty" and detect when the model is operating outside its training distribution.

ML training pipeline DAG: Data Source to Experiment Setup to Preliminary Model to Feature Selection to Prequential Evaluation to Model Bundle.

Model Evaluation

whether a trained model reaches production.

A trained model is not a deployable model.

The evaluation baseline is a naive forecast that assumes zero next-bar log-return: a martingale null. This is a modeling choice, not a claim about market efficiency. Any model that cannot beat the null has no business in production. Forecast Value Added (FVA) measures the improvement. Positive FVA means the model adds value; zero or negative means it would be better to predict nothing.

Statistical significance is enforced through a Nadeau-Bengio corrected t-test (Nadeau & Bengio, 2003), which corrects the variance underestimation that arises when training folds (or sliding windows) share data. The null hypothesis is that mean FVA is zero or worse. A model whose corrected p-value does not clear the threshold does not ship.

The statistical gate is necessary but not sufficient. A model can be statistically valid and economically worthless. We define edge as the per-prediction expected log-return net of the cost of acting; the spread sets the floor of that cost, while slippage, partial fills, path selection, and fees raise it further. A model whose expected edge is not positive cannot generate returns even when its directional predictions are correct.

The current evaluation also records calibration and actionability diagnostics: forecast behavior, uncertainty behavior, activation/abstention buckets, and spread-conditioned performance. Those diagnostics are not a license to trade; they are a way to find where the model is honest, where it is underconfident, and where it should stay silent.

If the gate fails, the model does not ship. The candidate goes back for iteration.

Model Evaluation: candidate model evaluated against FVA statistical gate and economic-edge hurdle before the deployment gate releases it to production.

Real-time Inference

whether expected edge after costs is positive.

The prediction pipeline is shared between offline evaluation and online inference: same layers, same configuration, same code path. If a model was evaluated through this pipeline offline, it runs through the same pipeline online. The same shared-code principle applied to features extends to inference: skew between evaluation and serving is removed.

The pipeline produces a forecast with calibrated uncertainty from the selected feature set. The displayed value is the central forecast.

Not every forecast becomes actionable. Cost, risk, and safety checks determine whether a forecast is strong enough to consider; otherwise, the system abstains. The dashboard shows forecasts, not trade instructions.

The confidence intervals visible on the dashboard are adaptive conformal prediction intervals, updated online from realized prediction errors following the conformal PID (proportional-integral-derivative) approach of Angelopoulos et al. (2023). They are a reader-facing calibration display, not a trading input.

The ability to say "I don't know" is one of the most valuable properties a production prediction system can have.

Real-time inference pipeline: features produce a forecast, cost and risk checks determine actionability, and confidence intervals appear as dashboard calibration.

Monitoring & Alerting

what counts as realized performance.

Markets evolve. Models degrade. A production machine learning system is only as trustworthy as its ability to detect its own failures.

Each prediction is resolved against the realized log-return when the next information bar closes. Error, coverage, actionability, and cost-aware diagnostics are computed per prediction and stored. This is the ground truth loop: every forecast the system emits is eventually measured.

Drift detection runs continuously via sequential e-value hypothesis testing (Vovk & Wang, 2021; Ramdas et al., 2023). E-values provide anytime-valid evidence for distribution shift and performance degradation under continuous monitoring. When evidence accumulates beyond a threshold, the system surfaces an alert. Alerts are designed for human review, not automatic intervention: the system detects degradation and makes it visible.

The alerting model is event-based: specific conditions trigger specific alerts for human review. Performance degradation, coverage breaches (measured against the adaptive conformal method's target coverage), distributional shift, and prediction bias each produce a distinct e-value signal with WARNING or CRITICAL severity. The operator investigates the root cause and decides whether to retrain, adjust, or hold. In trading systems, automated intervention can be more dangerous than the degradation it addresses. The human stays in the loop by design.

Monitoring & Alerting: a 4-row by 2-column matrix of sequential e-value detectors over the prediction and ground-truth stream. Detector rows: Performance degradation, Coverage breach, Distributional shift, Prediction bias. Severity columns: WARNING, CRITICAL. Alerts route to Operator Review with no automated intervention.

Tradeoffs & Limitations

Every architectural choice is a trade.

Selective action vs frequency. The economic gate means the system abstains often. High-confidence opportunities are rare by construction, so the baseline operating state is inaction. There is no stream of small wins to compound while waiting for a confident call. Long quiet stretches are the operating norm, not a failure state.

Transparency vs edge. The dashboard exposes live forecasts and calibration metrics, but not the full decision policy. The asymmetry is deliberate: readers can inspect the prediction system without turning the strategy into a blueprint.

Market and ledger data only. XRPulse uses price, volume, order book, AMM-state, and RLUSD-flow data from the XRP Ledger. It does not incorporate news, sentiment, social media, ETF flows, SEC filings, or off-chain alternative data. If the signal is not in ledger-observable microstructure or flow, the model will not find it.

Single-asset concentration. The entire system operates on the XRP/RLUSD pair. There is no diversification across assets, venues, or strategies. Concentration risk is real.

CLOB primary, AMM-aware. XRPulse models the CLOB as the primary quoting surface, but AMM state and AMM-routed fills are now tracked because XRPL execution can bridge across liquidity mechanisms. That improves cost discovery and venue-state awareness; it does not make the system a generalized cross-venue router.

External infrastructure. The system depends on XRPL network access and third-party/public node infrastructure for ledger data and submission paths. Availability, latency, fee dynamics, and reliability are outside the model's control.

Limited manipulation detection. The system uses account exclusions and data-quality checks, but it does not fully solve layering, spoofing, wash trading, or adversarial liquidity. If the observable market state is manipulated in a way the filters do not catch, the model can consume that state as if it were genuine.

Fat tails. Tree-based models cannot anticipate events outside the range of their training data. Black swan dislocations will produce predictions bounded by what the model has seen, not by what the market is doing.

Overfitting. Prequential evaluation and the deployment gate reduce this risk but do not eliminate it. Complex models on noisy financial data can always find patterns that do not generalize.

Technical risk. Algorithmic systems can fail in ways that produce missed, stale, or incorrect outputs. Container restarts, network interruptions, schema drift, node lag, and infrastructure outages are operational realities.

No automated retraining. Model retraining is triggered manually. The system monitors for drift but does not yet close the loop by automatically retraining and redeploying when performance degrades. A human is still in the loop.

What comes next is more of what is here: better features, tighter calibration, more robust monitoring.

References

Market State

Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D. Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NeurIPS), 2015.

Information Bars

López de Prado, M. Advances in Financial Machine Learning. Wiley, 2018.

Feature Engineering

Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity, 95(3), 221–227, 2005.

López de Prado, M. Advances in Financial Machine Learning. Wiley, 2018.

Wang, R. & Ramdas, A. False discovery rate control with e-values. Journal of the Royal Statistical Society: Series B, 84(3), 822–852, 2022.

Model Training

Grinsztajn, L., Oyallon, E., & Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems 35 (NeurIPS), 2022.

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems 31 (NeurIPS), 2018.

Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D. Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NeurIPS), 2015.

Model Evaluation

Nadeau, C. & Bengio, Y. Inference for the Generalization Error. Machine Learning, 52, 239–281, 2003.

Real-time Inference

Angelopoulos, A. N., Barber, R. F., & Bates, S. Conformal PID Control for Time Series. Advances in Neural Information Processing Systems 36 (NeurIPS), 2023.

Monitoring

Ramdas, A., Grünwald, P., Vovk, V., & Shafer, G. Game-theoretic statistics and safe anytime-valid inference. Statistical Science, 38(4), 576–601, 2023.

Vovk, V. & Wang, R. E-values: calibration, combination and applications. Annals of Statistics, 49(3), 1736–1754, 2021.

Share this article

About the author

Albert builds and operates XRPulse, an algorithmic trading system on the XRP Ledger. He writes about architecture, research, and market behavior.

Follow