Grouby And Fill Missing Months In Multiple Columns Data Frame In Python
For a data frame like this, how could I group by id and fill missing months while keep price of missing months as na, the expected date range is from 2015/1/1 to 2019/8/1. city
Solution 1:
EDIT:
In real data is necessary unique values per columns city
,district
,id
, date
:
df = df.groupby(['city','district','id', 'date'], as_index=False)['price'].sum()
If need grouping by id
column:
rng = pd.date_range('2015-01-01','2019-08-01', freq='MS')
df['date'] = pd.to_datetime(df['date'])
df1 = (df.set_index('date')
.groupby('id')
.apply(lambda x: x.reindex(rng))
.rename_axis(('id','date'))
.drop('id', axis=1)
.reset_index()
)
print (df1)
id date city district price
0 20101 2015-01-01 NaN NaN NaN
1 20101 2015-02-01 NaN NaN NaN
2 20101 2015-03-01 NaN NaN NaN
3 20101 2015-04-01 NaN NaN NaN
4 20101 2015-05-01 NaN NaN NaN
.. ... ... ... ... ...
163 20103 2019-04-01 NaN NaN NaN
164 20103 2019-05-01 NaN NaN NaN
165 20103 2019-06-01 NaN NaN NaN
166 20103 2019-07-01 NaN NaN NaN
167 20103 2019-08-01 NaN NaN NaN
[168 rows x 5 columns]
Also if need grouping by more columns:
rng = pd.date_range('2015-01-01','2019-08-01', freq='MS')
df['date'] = pd.to_datetime(df['date'])
df2 = (df.set_index('date')
.groupby(['city','district','id'])['price']
.apply(lambda x: x.reindex(rng, fill_value=0))
.rename_axis(('city','district','id','date'))
.reset_index()
)
print (df2)
city district id date price
0 hz sn 20101 2015-01-01 0.0
1 hz sn 20101 2015-02-01 0.0
2 hz sn 20101 2015-03-01 0.0
3 hz sn 20101 2015-04-01 0.0
4 hz sn 20101 2015-05-01 0.0
.. ... ... ... ... ...
219 xz pd 20103 2019-04-01 0.0
220 xz pd 20103 2019-05-01 0.0
221 xz pd 20103 2019-06-01 0.0
222 xz pd 20103 2019-07-01 0.0
223 xz pd 20103 2019-08-01 0.0
[224 rows x 5 columns]
Solution 2:
Using reindex
with MS
which is month start and pd.concat
with GroupBy
:
dates = pd.date_range('2015-01-01','2019-08-01', freq='MS')
new = pd.concat([
d.set_index('date').reindex(dates).reset_index().rename(columns={'index':'date'}) for _, d in df.groupby('id')
], ignore_index=True)
new = new.ffill().bfill()
Output
date city district id price
0 2015-01-01 hz sn 20101.0 2.2
1 2015-02-01 hz sn 20101.0 2.2
2 2015-03-01 hz sn 20101.0 2.2
3 2015-04-01 hz sn 20101.0 2.2
4 2015-05-01 hz sn 20101.0 2.2
.. ... ... ... ... ...
163 2019-04-01 xz pd 20103.0 3.1
164 2019-05-01 xz pd 20103.0 3.1
165 2019-06-01 xz pd 20103.0 3.1
166 2019-07-01 xz pd 20103.0 3.1
167 2019-08-01 xz pd 20103.0 3.1
[168 rows x 5 columns]
Post a Comment for "Grouby And Fill Missing Months In Multiple Columns Data Frame In Python"