Extract Business Days In Time Series Using Python/pandas
I am working with high frequency data in Time Series and I would like to get all the business days from my data. My data observations are separated by seconds, so there are 86400 s
Solution 1:
Unfortunately this is a little slow, but should at least give the answer you are looking for.
#create an index of just the date portion of your index (this is the slow step)ts_days = pd.to_datetime(ts.index.date)
#create a range of business days over that periodbdays = pd.bdate_range(start=ts.index[0].date(), end=ts.index[-1].date())
#Filter the series to just those days contained in the business day range.ts = ts[ts_days.isin(bdays)]
Solution 2:
Modern pandas
stores timestamps as numpy.datetime64
with a nanosecond time unit (one could check that by inspecting ts.index.values
). It is much faster to convert both the original index and the one generated by bdate_range
to a daily time unit ([D]
) and to check the inclusion on these two arrays:
import numpy as np
import pandas
def_get_days_array(index):
"Convert the index to a datetime64[D] array"return index.values.astype('<M8[D]')
defretain_business_days(ts):
"Retain only the business days"
tsdays = _get_days_array(ts.index)
bdays = _get_days_array(pandas.bdate_range(tsdays[0], tsdays[-1]))
mask = np.in1d(tsdays, bdays)
return ts[mask]
Post a Comment for "Extract Business Days In Time Series Using Python/pandas"