Skip to content Skip to sidebar Skip to footer

Pandas: Split String, And Count Values?

I've got a pandas dataset with a column that's a comma-separated string, e.g. 1,2,3,10: data = [ { 'id': 1, 'score': 9, 'topics': '11,22,30' }, { 'id': 2, 'score': 7, 'topics':

Solution 1:

unnest then groupby and agg

df.topics=df.topics.str.split(',')
New_df=pd.DataFrame({'topics':np.concatenate(df.topics.values),'id':df.id.repeat(df.topics.apply(len)),'score':df.score.repeat(df.topics.apply(len))})

New_df.groupby('topics').score.agg(['count','mean'])

Out[1256]: 
        count  mean
topics             
125.01128.01216.01825.52219.03046.5

Solution 2:

In [111]: defmean1(x): return np.array(x).astype(int).mean()

In [112]: df.topics.str.split(',', expand=False).agg([mean1, len])
Out[112]:
       mean1  len021.0000003119.6666673214.3333333316.3333333

Solution 3:

This is one way. Reindex & stack, then groupby & agg.

import pandas as pd

data = [
  { 'id': 1, 'score': 9, 'topics': '11,22,30' },
  { 'id': 2, 'score': 7, 'topics': '11,18,30' },
  { 'id': 3, 'score': 6, 'topics': '1,12,30' },
  { 'id': 4, 'score': 4, 'topics': '1,18,30' }
]
df = pd.DataFrame(data)
df.topics = df.topics.str.split(',')
df2 = pd.DataFrame(df.topics.tolist(), index=[df.id, df.score])\
                   .stack()\
                   .reset_index(name='topics')\
                   .drop('level_2', 1)

df2.groupby('topics').score.agg(['count', 'mean']).reset_index()

Post a Comment for "Pandas: Split String, And Count Values?"