Python Pandas Dataframe Resample Daily Data To Week By Mon-sun Weekly Definition?
Solution 1:
In case anyone else was not aware, it turns out that the weekly Anchored Offsets are based on the end date. So, just resampling 'W' (which is the same as 'W-SUN') is by default a Monday to Sunday sample. The date listed is the end date. See this old bug report wherein neither the documentation nor the API got updated.
Given that you specified label='left'
in the resample parameters, you must have realized that fact. It's also why using 'W-MON' does not have the desired effect. What is confusing is that the left bound is not actually in the interval.
So, to display the start date for the period instead of the end date, you may add a day to the index. That would mean you would do:
df_resampled.index = df_resampled.index + pd.DateOffset(days=1)
For completeness, here is your original data with another day (a Sunday) added on the beginning to show the grouping really is Monday to Sunday:
importpandasaspdimportnumpyasnpdates=pd.date_range('20141228',periods=15,name='Day')df=pd.DataFrame({'Sum1': [10000, 1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],'Sum2': [10000, 91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],'Sum3': [10000, 365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],'Sum4': [10000, 5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates);print(df)df_resampled=df.resample('W',how='sum',label='left')df_resampled.index=df_resampled.index-pd.DateOffset(days=1)print(df_resampled)
This outputs:
Sum1Sum2Sum3Sum4Day2014-12-28 100001000010000.00100002014-12-29 1667 91365.9552014-12-30 1229 75398.9752014-12-31 1360 75285.1212015-01-01 9232 254992.1752015-01-02 8866 2391116.57 82015-01-03 4083 108512.1182015-01-04 3671 99504.4722015-01-05 100852591190.96 102015-01-06 100053951753.60 122015-01-07 8730 3551646.25 162015-01-08 100563321344.05 162015-01-09 101763861582.67 62015-01-10 3792 96560.9562015-01-11 3518 111736.443Sum1Sum2Sum3Sum4Day2014-12-22 100001000010000.00100002014-12-29 301089414175.36 342015-01-05 563621934 8814.92 69
I believe that is what you wanted for Question 1.
Update
There is now a loffset
argument to resample()
that allows you to shift the label offset. So, instead of modifying the index, you simple add the loffset
argument like so:
df.resample('W', how='sum', label='left', loffset=pd.DateOffset(days=1))
Also of note how=sum
is now deprecated in favor of using .sum()
on the Resampler object that .resample()
returns. So, the fully updated call would be:
df_resampled = df.resample('W', label='left', loffset=pd.DateOffset(days=1)).sum()
Update 1.1.0
The handy loffset
argument is deprecated as of version 1.1.0. The documentation indicates the shifting should be done after the resample. In this particular case, I believe that means this is the correct code (untested):
from pandas.tseries.frequencies import to_offset
df_resampled = df.resample('W', label='left').sum()
df_resampled.index = df_resampled.index + to_offset(pd.DateOffset(days=1))
Solution 2:
This might help.
importpandasaspdimportnumpyasnpdf=pd.DataFrame(np.random.randint(1,1000,(100,4)),columns='Sum1Sum2Sum3Sum4'.split(),index=pd.date_range('2014-12-29',periods=100,freq='D'))deffunc(group):returnpd.Series({'Sum1':group.Sum1.sum(),'Sum2':group.Sum2.sum(),'Sum3':group.Sum3.sum(),'Sum4':group.Sum4.sum(),'Day':group.index[1],'Period':'{0} - {1}'.format(group.index[0].date(),group.index[-1].date())})df.groupby(lambdaidx:idx.week).apply(func)Out[386]:DayPeriodSum1Sum2Sum3Sum412014-12-30 2014-12-29-2015-01-04 3559 3692 3648 408622015-01-06 2015-01-05-2015-01-11 2990 3658 3348 330432015-01-13 2015-01-12-2015-01-18 3168 3720 3518 327342015-01-20 2015-01-19-2015-01-25 2275 4968 4095 236652015-01-27 2015-01-26-2015-02-01 4146 2167 3888 4576....................112015-03-10 2015-03-09-2015-03-15 4035 3518 2588 2714122015-03-17 2015-03-16-2015-03-22 3399 3901 3430 2143132015-03-24 2015-03-23-2015-03-29 3227 3308 3185 3814142015-03-31 2015-03-30-2015-04-05 4278 3369 3623 4167152015-04-07 2015-04-06-2015-04-07 1466 6321136 1392
[15rowsx6columns]
Post a Comment for "Python Pandas Dataframe Resample Daily Data To Week By Mon-sun Weekly Definition?"