Skip to content Skip to sidebar Skip to footer

Statsmodels Ols With Rolling Window Problem

I would like to do a regression with a rolling window, but I got only one parameter back after the regression: rolling_beta = sm.OLS(X2, X1, window_type='rolling', window=30).fit(

Solution 1:

I think the problem is that the parameters window_type='rolling' and window=30 simply do not do anything. First I'll show you why, and at the end I'll provide a setup I've got lying around for linear regressions on rolling windows.


1. The problem with your function:

Since you haven't provided some sample data, here's a function that returns a dataframe of a desired size with some random numbers:

# Function to build synthetic dataimport numpy as np
import pandas as pd
import statsmodels.api as sm
from collections import OrderedDict

defsample(rSeed, periodLength, colNames):

    np.random.seed(rSeed)
    date = pd.to_datetime("1st of Dec, 1999")   
    cols = OrderedDict()

    for col in colNames:
        cols[col] = np.random.normal(loc=0.0, scale=1.0, size=periodLength)
    dates = date+pd.to_timedelta(np.arange(periodLength), 'D')

    df = pd.DataFrame(cols, index = dates)
    return(df)

Output:

X1X22018-12-01 -1.085631-1.2940852018-12-02  0.997345-1.0387882018-12-03  0.2829781.7437122018-12-04 -1.506295-0.7980632018-12-05 -0.5786000.029683...2019-01-17  0.412912-1.3634722019-01-18  0.9787360.3794012019-01-19  2.238143-0.379176

Now, try:

rolling_beta = sm.OLS(df['X2'], df['X1'], window_type='rolling', window=30).fit()
rolling_beta.params

Output:

X1   -0.075784
dtype: float64

And this at least represents the structure of your output too, meaning that you're expecting an estimate for each of your sample windows, but instead you get a single estimate. So I looked around for some other examples using the same function online and in the statsmodels docs, but I was unable to find specific examples that actually worked. What I did find were a few discussions talking about how this functionality was deprecated a while ago. So then I tested the same function with some bogus input for the parameters:

rolling_beta = sm.OLS(df['X2'], df['X1'], window_type='amazing', window=3000000).fit()
rolling_beta.params

Output:

X1   -0.075784
dtype: float64

And as you can see, the estimates are the same, and no error messages are returned for the bogus input. So I suggest that you take a look at the function below. This is something I've put together to perform rolling regression estimates.


2. A function for regressions on rolling windows of a pandas dataframe

df = sample(rSeed = 123, colNames = ['X1', 'X2', 'X3'], periodLength = 50)

defRegressionRoll(df, subset, dependent, independent, const, win, parameters):
    """
    RegressionRoll takes a dataframe, makes a subset of the data if you like,
    and runs a series of regressions with a specified window length, and
    returns a dataframe with BETA or R^2 for each window split of the data.

    Parameters:
    ===========

    df: pandas dataframe
    subset: integer - has to be smaller than the size of the df
    dependent: string that specifies name of denpendent variable
    inependent: LIST of strings that specifies name of indenpendent variables
    const: boolean - whether or not to include a constant term
    win: integer - window length of each model
    parameters: string that specifies which model parameters to return:
                BETA or R^2

    Example:
    ========
        RegressionRoll(df=df, subset = 50, dependent = 'X1', independent = ['X2'],
                   const = True, parameters = 'beta', win = 30)

    """# Data subsetif subset != 0:
        df = df.tail(subset)
    else:
        df = df

    # Loopinfo
    end = df.shape[0]
    win = win
    rng = np.arange(start = win, stop = end, step = 1)

    # Subset and store dataframes
    frames = {}
    n = 1for i in rng:
        df_temp = df.iloc[:i].tail(win)
        newname = 'df' + str(n)
        frames.update({newname: df_temp})
        n += 1# Analysis on subsets
    df_results = pd.DataFrame()
    for frame in frames:
        #print(frames[frame])# Rolling data frames
        dfr = frames[frame]
        y = dependent
        x = independent

        if const == True:
            x = sm.add_constant(dfr[x])
            model = sm.OLS(dfr[y], x).fit()
        else:
            model = sm.OLS(dfr[y], dfr[x]).fit()

        if parameters == 'beta':
            theParams = model.params[0:]
            coefs = theParams.to_frame()
            df_temp = pd.DataFrame(coefs.T)

            indx = dfr.tail(1).index[-1]
            df_temp['Date'] = indx
            df_temp = df_temp.set_index(['Date'])

        if parameters == 'R2':
            theParams = model.rsquared
            df_temp = pd.DataFrame([theParams])
            indx = dfr.tail(1).index[-1]
            df_temp['Date'] = indx
            df_temp = df_temp.set_index(['Date'])
            df_temp.columns = [', '.join(independent)]
        df_results = pd.concat([df_results, df_temp], axis = 0)

    return(df_results)


df_rolling = RegressionRoll(df=df, subset = 50, dependent = 'X1', independent = ['X2'], const = True, parameters = 'beta',
                                     win = 30)

Output: A dataframe with beta estimates for OLS of X2 on X1 for each 30 period window of the data.

constX2Date2018-12-30  0.0440420.0326802018-12-31  0.074839-0.0232942019-01-01 -0.0632000.077215...2019-01-16 -0.075938-0.2151082019-01-17 -0.143226-0.2155242019-01-18 -0.129202-0.170304

Post a Comment for "Statsmodels Ols With Rolling Window Problem"