Building XRPulse reframed how I think about software: how does a system transform information into signal?
On the surface, the answer seems obvious: data, features, models. Each component is necessary, but their interaction determines whether signal is preserved or obscured by noise.
What matters is not just the inputs, but how a system is structured to transform them. Architecture determines what is preserved, what is discarded, and what ultimately becomes signal. Here is what XRPulse preserves, what it discards, and what it shapes into signal.
This essay describes the architecture. Realized outcomes and risk characteristics, along with the execution logic, remain private.
XRPulse is a predictive algorithmic trading system on the XRP Ledger DEX. It forecasts future log-returns from microstructure signals and acts only when the probability that edge will clear costs is high enough. It is not a market-making strategy, not arbitrage, and not high-frequency trading: the system samples on information bars (event-driven, not clock-driven), and abstains far more often than it acts.
The System at a Glance
XRPulse ingests raw ledger events, structures them into bars that carry information rather than time, generates features that define the hypothesis space, trains models on sliding windows rather than fixed splits, and serves forecasts with uncertainty bounds.
Two pipelines run this system:
- Offline (batch): trains models on sliding windows and evaluates them. A deployment gate decides if a trained model reaches production.
- Online (streaming): serves forecasts in real time and monitors both model drift and per-trade economics. Realized performance feeds the next experiment.
The split reflects different operating constraints. Training needs a fixed dataset and the ground truth; serving needs low latency and streaming state. A single feature-generation library is the shared artifact, written once and run in both pipelines. The same code path produces training and serving features, removing train/serve skew.
Market Data
which channels to subscribe to.
The XRP Ledger has a native decentralized exchange (DEX) built into the protocol. The DEX has two execution venues: a central limit order book (CLOB) and Automated Market Maker (AMM) pools. XRPulse focuses on the CLOB. Every trade, every order placement, every cancellation is recorded on-chain. No venue gatekeeper samples the data before you see it; no aggregator rate-limits or rewrites it. Tick-level events are the building blocks for per-ledger microstructure state.
XRPulse extracts those events through two XRP Ledger WebSocket methods, both restricted to the XRP/RLUSD pair before any data is stored:
subscribewith abooksfilter: a persistent stream of every transaction that touches the XRP/RLUSD book.book_offerspolled per validated ledger withtaker_paysandtaker_getsset to the XRP/RLUSD pair (XRP by currency only; RLUSD by currency code and issuer), capturing order book depth on both bid and ask sides.
By protocol, only three transaction types touch an order book; XRPulse stores each with a type-specific filter:
- OfferCreate: limit orders on the DEX. Only those that fill (produce balance changes) are stored.
- Payment: cross-currency settlements routed through the order book. Stored for volume contribution, not for price discovery: the effective rate reflects path-sweeping across intermediate currencies, not the XRP/RLUSD market price.
- OfferCancel: withdrawal events preserved alongside fills as part of the event stream.
Only validated ledgers are stored: confirmed by network consensus and immutable. A closed ledger is merely proposed; a validated ledger is final.
The XRP Ledger validates roughly every 3–5 seconds, but not every validated ledger contains XRP/RLUSD activity. Only those with filled offers produce data. The observation cadence is market-driven, not clock-driven.
Market State
what to aggregate per validated ledger.
Raw events do not go directly into a model. They are aggregated into a canonical record per validated ledger: the market state. One row per validated ledger, every observable dimension reduced to a consistent representation.
This layer does two jobs.
It makes the data idempotent: the same event stream always produces the same state. Every training run is a reproducible experiment because the same inputs always produce the same state.
It gives every downstream component a stable contract: ask for state at time T and get the same answer whether training offline or inferring online.
The canonical state abstraction enforces these guarantees at the architecture level (Sculley et al., 2015).
Information Bars
what closes a bar.
The prediction target is expressed in a log-return space. Raw prices are non-stationary; returns are approximately stationary. Stationary targets generalize across price regimes.
Log-returns over what interval? That is the question financial data structures answer.
Time bars are the standard among practitioners and academics: one-minute, one-hour, or daily intervals. A fixed clock produces the same number of observations whether the market is trending, volatile, or silent. Quiet periods oversample noise; volatile periods undersample information. Standard evaluation protocols assume observations are roughly exchangeable and information-dense. Clock-based sampling distorts both.
XRPulse uses information-driven sampling. Bars close when the market produces enough new information to warrant a new observation, not when a clock ticks. The concept was introduced by López de Prado in Advances in Financial Machine Learning (Wiley, 2018): sampling should be driven by market activity, not by arbitrary time intervals.
The candles visible on the XRPulse dashboard are a display convention for readers. They are not what the model consumes.
Feature Engineering
which features survive selection.
Features define the hypothesis space.
The feature set spans categories including microstructure, momentum, volatility, statistical, temporal, complexity, and spectral. Each captures a distinct aspect of market behavior. Together they give the model a multi-dimensional view of the current regime.
Two principles harden the feature pipeline.
Shared generators. Features are computed from a single source of truth: the same code path produces training features (batch) and serving features (streaming). This removes train/serve skew, a silent killer of production machine learning (ML).
Aggressive selection. The pipeline starts with a broad candidate pool and prunes hard. Selection runs Clustered Feature Importance (López de Prado, 2018): grouped permutation importance computed across temporal folds, constructed as e-values (anytime-valid evidence statistics) per cluster, then gated by the e-value Benjamini-Hochberg (e-BH) procedure for false discovery rate (FDR) control, with a Li & Ji effective-tests correction that keeps FDR valid under correlated features (Wang & Ramdas, 2022; Li & Ji, 2005). A feature also has to show a distribution stable enough to be modeled. The final feature set is an order of magnitude smaller than the candidate pool.
When capital is at risk, every feature in production needs a documented rationale. The model itself is a tree ensemble and is not interpretable in any strict sense; what is auditable is the feature set. Each surviving feature has a recorded reason for inclusion and a measurable marginal contribution to out-of-sample performance.
Model Training
how to train so generalization is proven, not assumed.
A common failure mode in production machine learning is leakage: the model sees information during training that it will not have at inference time. The architecture prevents this through prequential sliding-window evaluation. Each window trains on a fixed historical segment, skips an embargo gap sized to cover both the longest feature lag and the forecast horizon, and evaluates on data the model has never seen. No expanding windows. No fixed splits. The evaluation protocol mirrors the deployment conditions the model will face.
The model is a CatBoost gradient-boosted regressor over the selected feature set. The choice of gradient boosting over deep learning was evidence-driven: tree-based models consistently outperform deep architectures on structured tabular data (Grinsztajn et al., 2022). CatBoost specifically uses ordered boosting to produce unbiased gradient estimates (Prokhorenkova et al., 2018), which avoids the prediction shift that standard boosting accumulates from reusing the same examples for both gradient computation and leaf fitting.
Each training run is an isolated, reproducible experiment. Dagster orchestrates the pipeline as a directed acyclic graph where every asset declares its dependencies explicitly. Experiment IDs are content-addressable: the same data range and feature configuration always produce the same experiment hash. Model IDs are derived from experiment IDs plus hyperparameters. Reproducibility is enforced by pinned dependencies and content-addressable IDs, up to numerical tolerance (Sculley et al., 2015).
I accept two trade-offs. First, gradient boosting gives up the representational power of deep sequence models. For the current feature set and data scale, the evidence favors trees. That trade would reverse with richer sequential structure or orders of magnitude more data. Second, tree-based model outputs are bounded by the leaf values seen during training. If the market enters a regime the model has never seen, predictions are bounded by the training distribution, not by the new conditions. This is why the monitoring and abstention layers downstream are not optional: they detect when the model is operating outside its training distribution (Grinsztajn et al., 2022).
Model Evaluation
whether a trained model reaches production.
A trained model is not a deployable model.
The evaluation baseline is a naive forecast that assumes zero change in log returns: a martingale null. This is a modeling choice, not a claim about market efficiency. Any model that cannot beat the null has no business in production. Forecast Value Added (FVA) measures the improvement. Positive FVA means the model adds value; zero or negative means it would be better to predict nothing.
Statistical significance is enforced through a Nadeau-Bengio corrected t-test (Nadeau & Bengio, 2003), which corrects the variance underestimation that arises when training folds (or sliding windows) share data. The null hypothesis is that mean FVA is zero or worse. A model whose corrected p-value does not clear the threshold does not ship.
The statistical gate is necessary but not sufficient. A model can be statistically valid and economically worthless. We define edge as the per-prediction expected log-return net of the realized cost of acting; the spread sets the floor of that cost, while slippage, partial fills, and fees raise it further. A model whose expected edge is not positive cannot generate returns even when its directional predictions are correct.
If the gate fails, the model does not ship. The candidate goes back for iteration.
Real-time Inference
whether expected edge after costs is positive.
The prediction pipeline is shared between offline evaluation and online inference: same layers, same configuration, same code path. If a model was evaluated through this pipeline offline, it runs through the same pipeline online. The same shared-code principle applied to features extends to inference: skew between evaluation and serving is removed.
The pipeline produces point estimates from the selected feature set. Whether a prediction becomes actionable depends on one gate: the estimated probability that expected edge after costs is positive. Below that threshold, the prediction is non-actionable. The dashboard shows all forecasts; the gate determines which are worth acting on.
The confidence intervals visible on the dashboard are adaptive conformal prediction intervals, updated online from realized prediction errors following the conformal PID (proportional-integral-derivative) approach of Angelopoulos et al. (2023). They are a display feature for readers, not a trading input.
A wrong prediction with high confidence is worse than no prediction. The ability to say "I don't know" is one of the most valuable properties a production prediction system can have.
Monitoring
what counts as realized performance.
Markets evolve. Models degrade. A production ML system is only as trustworthy as its ability to detect its own failures.
Each prediction is resolved against the realized log-return when the next information bar closes. Error, coverage, and edge are computed per prediction and stored. This is the ground truth loop: every forecast the system emits is eventually measured.
Drift detection runs continuously via sequential e-value hypothesis testing (Vovk & Wang, 2021; Ramdas et al., 2023). E-values provide anytime-valid evidence for distribution shift without the multiple-testing problems of classical hypothesis tests. When evidence accumulates beyond a threshold, the system surfaces an alert. Alerts are designed for human review, not automatic intervention: the system detects degradation and makes it visible.
The alerting model is event-based: specific conditions trigger specific alerts for human review. Performance degradation, coverage breaches (measured against the adaptive conformal method's target coverage), distributional shift, and prediction bias each produce a distinct e-value signal with WARNING or CRITICAL severity. The operator investigates the root cause and decides whether to retrain, adjust, or hold. In trading systems, automated intervention can be more dangerous than the degradation it addresses. The human stays in the loop by design.
Tradeoffs & Limitations
Every architectural choice is a trade.
Selective action vs frequency. The economic gate means the system abstains often. Most of the time, it does nothing. That is the design, not a bug. But it also means there is no stream of small wins to compound while waiting for a confident call. Long quiet stretches are the operating norm, not a failure state.
Transparency vs edge. The public dashboard exposes live forecasts and calibration metrics. Everything that constitutes the edge stays private. The asymmetry is deliberate.
Every system also has boundaries it cannot see past.
Market data only. XRPulse uses price, volume, and order book data from the XRP Ledger. It does not incorporate fundamental data, sentiment, social media, ETF flows, SEC filings, or any alternative data source. If the signal is not in the microstructure, the model will not find it.
Single-asset concentration. The entire system operates on the XRP/RLUSD pair. There is no diversification across assets, venues, or strategies. Concentration risk is real.
CLOB only, no AMM. XRPulse models the central limit order book on the XRP Ledger. AMM pools use a different liquidity mechanism and are excluded from the model; price moves originating in those pools register only indirectly, through CLOB arbitrage that closes the gap.
Public infrastructure. The system connects to public Ripple servers for ledger data. Infrastructure availability, latency, and reliability are outside the system's control.
No manipulation detection. The system uses a simple account exclusion list but has no built-in economic skepticism for layering, spoofing, or wash trading. If the order book is manipulated, the model consumes the manipulated state as if it were genuine.
Fat tails. Tree-based models cannot anticipate events outside the range of their training data. Black swan dislocations will produce predictions bounded by what the model has seen, not by what the market is doing.
Overfitting. Prequential evaluation and the deployment gate reduce this risk but do not eliminate it. Complex models on noisy financial data can always find patterns that do not generalize.
Technical risk. Algorithmic systems can fail in ways that produce missed or incorrect outputs. Container restarts, network interruptions, and infrastructure outages are operational realities.
No automated retraining. Model retraining is triggered manually. The system monitors for drift but does not yet close the loop by automatically retraining and redeploying when performance degrades. A human is still in the loop.
What comes next is more of what is here: better features, tighter calibration, more robust monitoring. No crystal ball. No permanent edge. Just the humility to keep iterating.
References
Market State
Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D. Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NeurIPS), 2015.
Information Bars
López de Prado, M. Advances in Financial Machine Learning. Wiley, 2018.
Feature Engineering
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity, 95(3), 221–227, 2005.
López de Prado, M. Advances in Financial Machine Learning. Wiley, 2018.
Wang, R. & Ramdas, A. False discovery rate control with e-values. Journal of the Royal Statistical Society: Series B, 84(3), 822–852, 2022.
Model Training
Grinsztajn, L., Oyallon, E., & Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems 35 (NeurIPS), 2022.
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems 31 (NeurIPS), 2018.
Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D. Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NeurIPS), 2015.
Model Evaluation
Nadeau, C. & Bengio, Y. Inference for the Generalization Error. Machine Learning, 52, 239–281, 2003.
Real-time Inference
Angelopoulos, A. N., Barber, R. F., & Bates, S. Conformal PID Control for Time Series. Advances in Neural Information Processing Systems 36 (NeurIPS), 2023.
Monitoring
Ramdas, A., Grünwald, P., Vovk, V., & Shafer, G. Game-theoretic statistics and safe anytime-valid inference. Statistical Science, 38(4), 576–601, 2023.
Vovk, V. & Wang, R. E-values: calibration, combination and applications. Annals of Statistics, 49(3), 1736–1754, 2021.