Summing The Number Of Occurrences Per Day Pandas
I have a data set like so in a pandas dataframe: score timestamp 2013-06-29 00:52:28+00:00 -0.420070 2013-
Solution 1:
If your timestamp
index is a DatetimeIndex
:
import io
import pandas as pd
content = '''\
timestamp score
2013-06-29 00:52:28+00:00 -0.420070
2013-06-29 00:51:53+00:00 -0.445720
2013-06-28 16:40:43+00:00 0.508161
2013-06-28 15:10:30+00:00 0.921474
2013-06-28 15:10:17+00:00 0.876710
'''
df = pd.read_table(io.BytesIO(content), sep='\s{2,}', parse_dates=[0], index_col=[0])
print(df)
so df
looks like this:
scoretimestamp2013-06-29 00:52:28 -0.4200702013-06-29 00:51:53 -0.4457202013-06-28 16:40:43 0.5081612013-06-28 15:10:30 0.9214742013-06-28 15:10:17 0.876710print(df.index)# <class 'pandas.tseries.index.DatetimeIndex'>
You can use:
print(df.groupby(df.index.date).count())
which yields
score2013-06-28 32013-06-29 2
Note the importance of the parse_dates
parameter. Without it, the index would just be a pandas.core.index.Index
object. In which case you could not use df.index.date
.
So the answer depends on the type(df.index)
, which you have not shown...
Solution 2:
Otherwise, using the resample function.
In [419]:dfOut[419]:timestamp2013-06-29 00:52:28 -0.4200702013-06-29 00:51:53 -0.4457202013-06-28 16:40:43 0.5081612013-06-28 15:10:30 0.9214742013-06-28 15:10:17 0.876710Name:score,dtype:float64In [420]:df.resample('D',how={'score':'count'})Out[420]:2013-06-28 32013-06-29 2dtype:int64
UPDATE : with pandas 0.18+
as @jbochi pointed out, resample with how
is now deprecated. Use instead :
df.resample('D').apply({'score':'count'})
Solution 3:
In [145]:dfOut[145]:timestamp2013-06-29 00:52:28 -0.4200702013-06-29 00:51:53 -0.4457202013-06-28 16:40:43 0.5081612013-06-28 15:10:30 0.9214742013-06-28 15:10:17 0.876710Name:score,dtype:float64In [160]:df.groupby(lambdax:x.date).count()Out[160]:2013-06-28 32013-06-29 2dtype:int64
Post a Comment for "Summing The Number Of Occurrences Per Day Pandas"