Adaptive Drift Forecasting: A Control-Theoretic Modeling Study

adaptive-drift-forecasting

2026-03-13

1. Problem Context
Model evolution
2. Baseline Modeling Framework
3. Limitations of Static Drift
4. Control-Theoretic Motivation
5. PID Control Background
6. PID-Based Drift Adjustment
7. Experimental Development
8. Model Comparison
9. Interpretation
10. Conclusions
Minimal supporting visual

1. Problem Context

The forecasting target considered here is log realized volatility. Volatility is an instructive test case because it is persistent, clustered, and highly nonstationary. The process frequently alternates between relatively calm periods and abrupt transitions in which the conditional scale changes quickly. A model that appears adequate in one interval can become misaligned soon afterward, not because the forecast equation is intrinsically poor, but because the underlying drift level has shifted.

This creates a specific statistical difficulty. Standard time-series models represent persistence well, but they are less effective when the underlying level of the process moves over time in a way that is neither constant nor fully captured by periodic refitting. In practice, the forecaster then faces repeated forecast lag: after a transition, the model reacts slowly, produces directional bias, and accumulates error until the parameter estimates catch up.

In volatility forecasting, that lag matters. A drift misspecification can affect both point forecasts and derived event probabilities. If the model underestimates the current volatility level during a transition, the point forecast is biased downward and threshold-based probability assessments become miscalibrated. For this reason, drift modeling is not a secondary detail. It is a central part of the forecasting problem.

ARIMA-style models typically deal with drift in one of two ways. They either include a fixed intercept or allow adaptation only through rolling or repeated refitting. Both approaches are reasonable under mild nonstationarity, but both effectively assume that drift evolves slowly relative to the retraining schedule. When the process moves more quickly, the assumption becomes restrictive.

The central question of the analysis is therefore the following: can the drift term be updated online in a disciplined way, using forecast error itself as feedback, rather than relying only on periodic re-estimation?

Model evolution

Baseline ARIMA
      |
      v
Adaptive Drift Adjustment
      |
      v
Regime-Aware Drift
      |
      v
Composite Drift Model

2. Baseline Modeling Framework

The starting point is a conventional one-step-ahead forecasting framework for log realized volatility. The baseline model family includes naive persistence, EWMA smoothing, static AR(1), static ARIMA, rolling AR(1), and rolling ARIMA. These specifications provide a standard comparison set: some are deliberately simple and some allow limited parameter adaptation through rolling estimation.

The adaptive formulations preserve the same basic predictive structure,

\[\hat{y}_t = \mu_t + \phi(y_{t-1} - \mu_t),\]

where \(\phi\) governs short-run persistence and \(\mu_t\) is the drift or local equilibrium level. In the static case, \(\mu_t\) is fixed. In the adaptive case, the equation is unchanged except for the rule used to update \(\mu_t\) over time. That design choice is important because it isolates the effect of drift adaptation from broader changes in model class.

Evaluation is strictly walk-forward. Each forecast is constructed using information available up to the forecast origin. No future observations are used in fitting or updating. This is a stronger design than a pooled in-sample comparison because it exposes regime transitions, forecast lag, and accumulation of error in real time.

The primary forecast metrics are root mean squared error, mean absolute error, and correlation between the forecast and the realized target. For the probability experiments, the main diagnostics are expected calibration error, Brier score, and log loss. Together, these metrics cover both level accuracy and probabilistic reliability.

baseline_top = point_phase_1 |>
  dplyr::arrange(rmse) |>
  dplyr::slice_head(n = 8)

knitr::kable(
  baseline_top,
  digits = 4,
  caption = "Leading models from the initial walk-forward comparison"
)

Leading models from the initial walk-forward comparison
model	n	rmse	mae	correlation	mape
pid_adaptive	5829	0.2567	0.1676	0.9106	0.1035
attention_pid_leaky	5829	0.2569	0.1678	0.9105	0.1032
attention_pid	5829	0.2569	0.1678	0.9105	0.1032
kf_pid	5829	0.2572	0.1670	0.9104	0.1020
sliding_mode	5829	0.2573	0.1669	0.9104	0.1011
l1_adaptive_bw	5829	0.2588	0.1677	0.9098	0.1002
arima_rolling	5829	0.2588	0.1685	0.9090	0.1062
ar1_rolling	5829	0.2589	0.1687	0.9090	0.1065

3. Limitations of Static Drift

The initial comparison indicates that fixed or slowly updated drift creates at least three problems.

First, forecasts tend to lag when the latent volatility level changes rapidly. The model inherits persistence from the recent past but adapts too slowly to the new regime. In practice this appears as underreaction near volatility upshifts and overreaction after conditions normalize.

Second, static drift encourages systematic forecast bias during transition periods. The model may still have reasonable average accuracy over a long interval, but that aggregate score can conceal localized episodes in which the sign of the forecast error is persistently one-sided.

Third, the effect on probabilities is often more severe than the effect on point error. A forecast with tolerable RMSE can still produce event probabilities that are poorly calibrated if the conditional distribution has shifted. This matters whenever the forecast is used for threshold-based decisions rather than only for ranking or smoothing.

The first phase of results shows that adaptive variants are competitive with or better than the stronger rolling baselines, but the margins are not large enough to justify complexity without a clear mechanism. That is precisely why a more explicit motivation is needed.

knitr::kable(
  phase_1_rank_summary |>
    dplyr::slice_head(n = 8),
  digits = 4,
  caption = "Cross-run rank summary from the initial controller comparison"
)

Cross-run rank summary from the initial controller comparison
model	mean_rmse_rank	median_rmse_rank	mean_rmse
pid_adaptive	1.0000	1	0.2575
attention_pid_leaky	2.7143	2	0.2577
attention_pid	2.8571	3	0.2577
sliding_mode	4.8571	5	0.2581
kf_pid	5.0000	4	0.2581
arima_rolling	5.7143	6	0.2591
ar1_rolling	6.1429	7	0.2591
l1_adaptive_bw	8.5714	8	0.2602

4. Control-Theoretic Motivation

Control theory becomes relevant once forecast drift is viewed as an online adjustment problem rather than a purely static estimation problem.

The analogy is straightforward. In a feedback control system, one defines a target, observes the deviation between the target and the realized state, and updates a control input to reduce the deviation. In the forecasting setting, the realized value is observed after the forecast is issued, and the forecast error can be treated as the relevant deviation signal. The drift term then plays the role of a control input: it shifts the forecast center in response to persistent error.

This perspective is attractive for nonstationary forecasting because it focuses directly on the quantity that reveals model mismatch. If the process is drifting upward, forecast errors will tend to be positive; if it is drifting downward, forecast errors will tend to be negative. An adaptive drift rule can therefore respond to the empirical direction and persistence of error without requiring a full model re-estimation at every step.

The objective is not to convert the forecasting problem into a physical control problem literally. The objective is to borrow a disciplined feedback logic: use observed error to adjust the local operating point of the forecast equation in a stable and interpretable way.

5. PID Control Background

Proportional-integral-derivative control is one of the most common feedback mechanisms in engineering.

The proportional component reacts to the current error. If the system is above target, the proportional term pushes in the opposite direction immediately. If it is below target, it pushes back upward. Proportional correction is therefore fast, but by itself it can leave a residual steady-state error.

The integral component reacts to accumulated error over a recent window. If small errors persist in the same direction for many periods, the integral term grows and forces a stronger correction. In engineering systems, this is the component that removes sustained bias.

The derivative component reacts to the change in error. It anticipates turning points by looking at the slope of the error signal rather than only its current level. In practice, derivative correction can reduce overshoot or improve responsiveness around transitions, though it is also more sensitive to noise.

Together, the three terms offer a simple decomposition of feedback:

proportional for immediate response
integral for persistent bias
derivative for acceleration or turning-point behavior

These properties make PID control a natural candidate for drift adjustment in a forecasting system whose main failure mode is delayed response to shifts in the local level.

6. PID-Based Drift Adjustment

The adaptive drift formulation keeps the AR(1)-style forecast equation but allows the drift parameter to evolve over time. Let \(e_t = y_t - \hat{y}_t\) denote the forecast error and let \(z_t\) denote a normalized version of that error. The drift update is then constructed from three components:

a proportional term based on the current normalized error
an integral term based on the recent average of normalized errors
a derivative term based on the change in normalized error

Conceptually, the update has the form

\[ \Delta \mu_t = K_p z_t + K_i \bar{z}_t + K_d (z_t - z_{t-1}), \]

with clipping or other stabilization to keep the adjustment within a feasible range. The new drift becomes

\[ \mu_{t+1} = \mu_t + \Delta \mu_t. \]

This rule has an intuitive interpretation in the forecasting setting. If the current forecast is persistently too low, recent errors are positive and the proportional and integral terms push the drift upward. If the error is changing rapidly, the derivative term sharpens or dampens the update depending on whether the misspecification is accelerating or reversing.

The appeal of the construction is that the model remains interpretable. The persistence parameter still controls short-run autoregressive dynamics, while the drift update controls how quickly the local level is allowed to move in response to forecast error.

7. Experimental Development

Phase 1: Initial adaptive drift models

Motivation

The first phase asked whether online drift adjustment could improve on static or rolling drift treatment. The underlying hypothesis was that forecast error in nonstationary volatility series contains directional information about latent drift movement.

Model modification

Several adaptive controllers were tested on the same AR(1) backbone: PID, sliding mode, L1 adaptive, attention-modulated PID, and Kalman-tuned PID variants. The purpose was not to search for the most complicated mechanism, but to determine whether structured feedback could outperform periodic refitting.

Observed results

The leading initial model is pid_adaptive, with RMSE 0.256692 and correlation 0.910587. Attention-PID and KF-PID are close, but they do not provide a decisive gain.

Interpretation

The main result of the first phase is that adaptive drift is useful, but the simplest effective controller captures most of the benefit. There is no strong empirical case that more elaborate gain adaptation dominates a well-tuned PID update. That finding sets the direction for the later phases: continue with adaptive drift, but avoid unnecessary complexity.

Phase 2: Adaptive calibration under drift

Motivation

Even a reasonable point forecast may generate poor event probabilities if the distribution shifts. The second phase therefore asked whether the same feedback logic could stabilize calibration maps over time.

Model modification

Point forecasts were translated into probabilities for a volatility event, and several calibration methods were compared: static Platt scaling, isotonic regression, static BBQ, adaptive EMA updates, and PID-style BBQ bin offsets.

Observed results

knitr::kable(
  phase_2_pid_table,
  digits = 6,
  caption = "Phase 2 probability metrics for the PID-based forecast family"
)

Phase 2 probability metrics for the PID-based forecast family
base_model	calibration_method	n	ece	brier	log_loss
pid_adaptive	bbq_ema	4830	0.008068	0.095310	0.321895
pid_adaptive	isotonic_static	4830	0.017670	0.093625	0.312843
pid_adaptive	bbq_static	4830	0.032111	0.095807	0.322513
pid_adaptive	uncalibrated	4830	0.049495	0.097253	0.321153
pid_adaptive	platt_static	4830	0.052600	0.097547	0.322198
pid_adaptive	bbq_pid	4830	0.056015	0.100059	0.357729

For the PID-based forecast family, the uncalibrated specification has ECE 0.049495. The best retained adaptive calibration method is bbq_ema, with ECE 0.008068.

Interpretation

Adaptive calibration clearly helped, but the simplest feedback rule was again the strongest. EMA-based bin updating dominated the PID-maintained bin correction in this phase. The implication is that the calibration problem benefited from gradual local frequency adaptation rather than from a more aggressively controlled offset rule.

Phase 3: Regime-aware drift adjustment and composite forecast modification

Motivation

The calibration phase suggested that regime information and calibration diagnostics contain useful information beyond raw point error. The next step was to test whether those signals could improve the point forecast itself.

Model modification

The third phase introduced regime-aware and composite forecast adjustments. These included regime-based drift adjustments, PID-linked forecast adjustments, and responsiveness terms applied to the base forecasts.

Observed results

knitr::kable(
  phase_3_forecast_top,
  digits = 6,
  caption = "Phase 3 forecast metrics recomputed from retained prediction paths"
)

Phase 3 forecast metrics recomputed from retained prediction paths
model	n	rmse	mae	correlation
pid_adaptive_regime_adj	4810	0.255108	0.164416	0.920423
ar1_rolling_regime_adj	4810	0.258070	0.166691	0.917735
pid_adaptive_resp_adj	4810	0.260042	0.168523	0.916014
pid_adaptive	4810	0.260647	0.169934	0.915462
sliding_mode	4810	0.261211	0.169061	0.915326
ar1_rolling_resp_adj	4810	0.261701	0.168781	0.914684
arima_rolling	4810	0.262069	0.169819	0.914383
ar1_rolling	4810	0.262085	0.169962	0.914375

knitr::kable(
  phase_3_prob_pid,
  digits = 6,
  caption = "Phase 3 probability metrics for the PID-based forecast family"
)

Phase 3 probability metrics for the PID-based forecast family
base_model	method_key	n	ece	brier	log_loss
pid_adaptive	bbq_ema	4810	0.008573	0.094952	0.321097
pid_adaptive	composite_smooth	4810	0.013544	0.093021	0.311447
pid_adaptive	composite_ema_pid	4810	0.014572	0.095115	0.321712
pid_adaptive	composite_hard	4810	0.016955	0.093271	0.311942
pid_adaptive	isotonic_static	4810	0.018088	0.093398	0.312339
pid_adaptive	bbq_static	4810	0.032596	0.095557	0.321930
pid_adaptive	uncalibrated	4810	0.049932	0.097047	0.320672
pid_adaptive	platt_static	4810	0.052797	0.097325	0.321676
pid_adaptive	bbq_pid	4810	0.055314	0.099533	0.356394

The strongest point forecast in this phase is pid_adaptive_regime_adj, with RMSE 0.255108. On the probability side, the best PID-family composite calibration result is less compelling than the best result from the previous calibration phase.

Interpretation

The evidence suggests that regime-aware modification helped point forecasting more than it helped probability calibration. That is an important distinction. A model component can be informative as a point-forecast correction even if it is not the best probability transformation. The third phase therefore broadens the understanding of drift: some information about regime intensity is most useful when fed back into the forecast level rather than only into the probability map.

Phase 4: Integrated composite formulation

Motivation

The final phase aimed to align the retained forecast and probability variants on a common evaluation set. The purpose was to determine which ideas remained useful once comparisons were made on the same timestamp panel.

Model modification

The integrated run kept a smaller subset of adaptive and composite variants and evaluated them jointly. This reduced ambiguity caused by comparing different methods on different effective samples.

Observed results

knitr::kable(
  phase_4_forecast_top,
  digits = 6,
  caption = "Phase 4 forecast metrics recomputed from the integrated result panel"
)

Phase 4 forecast metrics recomputed from the integrated result panel
model	n	rmse	mae	correlation
pid_adaptive_bbq_ema	3810	0.255168	0.168754	0.909737
pid_adaptive_bbq_pid	3810	0.255168	0.168754	0.909737
pid_adaptive_bbq_static	3810	0.255168	0.168754	0.909737
pid_adaptive_composite_hard	3810	0.255168	0.168754	0.909737
pid_adaptive_composite_smooth	3810	0.255168	0.168754	0.909737
pid_adaptive_uncalibrated	3810	0.255168	0.168754	0.909737
ar1_rolling_bbq_ema	3810	0.255976	0.167825	0.909358
ar1_rolling_bbq_pid	3810	0.255976	0.167825	0.909358

knitr::kable(
  phase_4_prob_pid,
  digits = 6,
  caption = "Phase 4 probability metrics for the PID-based forecast family"
)

Phase 4 probability metrics for the PID-based forecast family
base_model	model_name	n	ece	brier	log_loss
pid_adaptive	pid_adaptive_bbq_ema	3810	0.006669	0.062270	0.224060
pid_adaptive	pid_adaptive_composite_smooth	3810	0.007467	0.061802	0.218755
pid_adaptive	pid_adaptive_bbq_static	3810	0.008861	0.061846	0.217692
pid_adaptive	pid_adaptive_composite_hard	3810	0.009142	0.061565	0.215946
pid_adaptive	pid_adaptive_bbq_pid	3810	0.033974	0.064610	0.246986
pid_adaptive	pid_adaptive_uncalibrated	3810	0.035596	0.065127	0.227101

The best retained point forecast is pid_adaptive, with RMSE 0.255168. The best retained PID-family probability model is pid_adaptive_bbq_ema, with ECE 0.006669.

Interpretation

The integrated phase favors a cleaner decomposition of the problem. A simple adaptive drift model remains strongest for point forecasting, while adaptive calibration remains strongest for event-probability refinement. This is a more nuanced conclusion than simply naming a universal winner.

8. Model Comparison

knitr::kable(
  comparison_table,
  digits = 6,
  caption = "Concise comparison of major model variants"
)

Concise comparison of major model variants
model	rmse	correlation	interpretation
pid_adaptive	0.2566919	0.9105869	Best initial adaptive drift model among the controller variants retained in the first comparison.
pid_adaptive_regime_adj	0.2551085	0.9204226	Strongest point forecast in the regime-aware adjustment phase.
pid_adaptive			Best retained point forecast after aligning the final integrated evaluation set.
pid_adaptive_bbq_ema	NA	NA	Best retained PID-family probability model in the integrated phase by expected calibration error.

The table highlights a consistent pattern. The main gain in point forecasting comes from adaptive drift, especially when the model is allowed to adjust during regime transitions. The main gain in probability forecasting comes from adaptive calibration, especially when the calibration map is updated smoothly rather than controlled too aggressively.

9. Interpretation

Several substantive conclusions follow from the experiments.

First, control-theoretic ideas appear useful because they address the right failure mode. The key empirical problem is not lack of autoregressive structure. It is delayed response to a changing local level. Feedback-based drift updates target exactly that issue.

Second, the best-performing adaptive methods are not the most elaborate ones. The initial controller comparison suggests that moving from static drift to adaptive drift matters more than moving from a simple adaptive controller to a highly parameterized one. This is consistent with the idea that the dominant benefit comes from correcting the direction and persistence of drift, not from modeling every local nuance.

Third, the experiments separate point forecasting from calibration in an instructive way. The strongest probability improvements came from adaptive calibration rules, while the strongest point-forecast improvements came from adaptive or regime-aware drift adjustments. These are related problems, but not identical ones.

Fourth, some intermediate gains weaken when the evaluation design is tightened. The regime-adjusted variant is strongest in the third phase, but the aligned integrated comparison retains a simpler adaptive drift forecast as the best point model. This suggests that some apparent gains are phase-specific and that model retention should depend on stable comparative evidence rather than on a single favorable run.

Remaining limitations should also be noted. The analysis is centered on one asset and one forecasting target. The adaptive rules are intentionally simple. They improve responsiveness, but they do not explicitly model deeper structural drivers of regime change. In addition, the strongest point model and the strongest probability model are not identical, which means practical deployment would require clarity about the forecasting objective.

10. Conclusions

The analysis supports a clear conclusion: control-theoretic drift adjustment is a useful and interpretable way to handle nonstationary forecasting problems when the main failure mode is lag in the local level.

Three findings are especially important.

Static or periodically refit drift specifications react too slowly during transitions in the volatility environment.
Online drift adjustment improves point forecasting, but the strongest gains come from disciplined simplicity rather than from maximal controller complexity.
Adaptive calibration delivers a large improvement in probability quality and complements, rather than replaces, adaptive point forecasting.

The broader lesson is methodological. Treating forecast error as a feedback signal provides a practical way to update drift without reconstructing the entire model at every step. In this setting, that idea was strong enough to survive several rounds of refinement and to produce both better point forecasts and materially better calibrated event probabilities.

Several research directions remain open. A natural next step is to extend the adaptive drift framework to richer state-space or multivariate volatility settings. Another is to formalize regime detection rather than using it only indirectly through forecast and calibration signals. A third is to examine whether the same feedback logic improves downstream decision rules, not only forecast accuracy and calibration diagnostics.

Minimal supporting visual

forecast_plot_data = forecast_plot_data |>
  dplyr::mutate(
    hover_text = paste0(
      "Date: ", date,
      "<br>Series: ", model,
      "<br>Value: ", sprintf("%.4f", dplyr::if_else(model == "actual", actual, forecast))
    )
  )

actual_trace = phase_3_predictions |>
  dplyr::select(date, actual) |>
  dplyr::distinct() |>
  dplyr::mutate(date = as.Date(date)) |>
  dplyr::filter(date >= max(date) - 260)

plotly::plot_ly() |>
  plotly::add_lines(
    data = actual_trace,
    x = ~date,
    y = ~actual,
    name = "Actual",
    line = list(color = "#222222", width = 2)
  ) |>
  plotly::add_lines(
    data = forecast_plot_data |> dplyr::filter(model == "ar1_rolling"),
    x = ~date,
    y = ~forecast,
    name = "AR(1) rolling",
    line = list(color = "#1f77b4", width = 1.2)
  ) |>
  plotly::add_lines(
    data = forecast_plot_data |> dplyr::filter(model == "pid_adaptive"),
    x = ~date,
    y = ~forecast,
    name = "PID adaptive",
    line = list(color = "#ff7f0e", width = 1.2)
  ) |>
  plotly::add_lines(
    data = forecast_plot_data |> dplyr::filter(model == "pid_adaptive_regime_adj"),
    x = ~date,
    y = ~forecast,
    name = "PID adaptive regime-adjusted",
    line = list(color = "#2ca02c", width = 1.4)
  ) |>
  plotly::layout(
    title = list(text = "Representative forecast comparison over a recent evaluation window"),
    xaxis = list(title = ""),
    yaxis = list(title = "Log realized volatility"),
    legend = list(orientation = "h", x = 0, y = -0.2)
  )