Due to the non-stationary nature of the real-world environment, the data distribution could keep changing with continuous data streaming. Such a phenomenon/problem is called concept drift (Lu et al. 2018open in new window), where the basic assumption is that concept drift happens unexpectedly and is unpredictable for streaming data.
To handle concept drift, previous studies usually leverage a two-step approach.
Detect the concept drift.
Adapt the model to the new data distribution.
Retrain the model
Fine-tune the model
Assumption
The latest data contains more useful information than the previous data.
Existing methods handle concept drift on the latest arrived data Data(t) at timestamp t and adapt the forecasting model accordingly. The concept drift continues, and the adapted model on Data(t) will be used on unseen streaming data in the future (e.g., Data(t+1)). The previous model adaptation has a one-step delay to the concept drift of upcoming streaming data, which means a new concept drift has occurred between timestamp t and t+1.
In this paper, we focus on predictable concept drift by forecasting future data distribution.
where each element x(t)βRm is a m-dimensional vector.
x(t)=[x1(t)β,x2(t)β,β―,xm(t)β]
Given a target sequence y={y(1),y(2),β―,y(T)} corresponding to X.
Algorithm are designed to build the model on historical data {x(i),y(i)}i=1tβ and predict the future data x(t) and forecast y on unseen streaming data Dtest(t)β={x(t),y(t)}t=1Tβ.
Assume (x(t),y(t))βΌptβ(x,y), where ptβ(x,y) is the data distribution at timestamp t. Generally, ptβ(x,y) is non-stationary and keeps changing with time t, which is called concept drift. Formally, the concept drift between two timestamps t and t+1 can be defined as
βx:ptβ(x,y)ξ =pt+1β(x,y)
Adapting models to accommodate the evolving data distribution.
Given task(t)βTasktestβ, the forecast model is trained on Dresam(t)β(Ξ) and forcast on Dtest(t)β.
presam(t)β(x,y;Ξ) is more similar to ptest(t)β(x,y) than ptrain(t)β(x,y). So the preference of model f(t) on Dresam(t)β(Ξ) is more similar to f(t) on Dtest(t)β than f(t) on Dtrain(t)β.
Example
To handle the concept drift in data, we retrain a new model each month (the rolling time interval is 1 month) based on two years of historical data.
Each chance to retrain a new model to adapt the concept drift is called a task. For example, the task task(2011/01) contains Dtrain(2011/01)β from 2009/01 to 2010/12 and Dtest(2011/01)β in 2011/01.
Set all Dtest(t)β range from 2011 to 2015 and DDG-DA will evaluated on Tasktestβ range from 2016 to 2020.
where DKLβ is the KL divergence. β£β£ is the divergence between two distributions. ExβΌptest(t)β(x)β is the expectation over the test data distribution ptest(t)β(x).
Normal distribution assumption is reasonable for unknown variables and often used in maximum likelihood estimation. So we assume ptest(t)β(yβ£x) and presam(t)β(yβ£x;Ξ) are normal distributions.
ptest(t)β(yβ£x)=N(ytest(t)β(x),Ο)
presam(t)β(yβ£x;Ξ)=N(yresam(t)β(x;Ξ),Ο)
Tips
yresam(t)β(x;Ξ) is the expectation of y under the predicted distribution presam(t)β(yβ£x;Ξ).
According to the definition of KL divergence, we have