Skip to content Skip to sidebar Skip to footer

Pandas: Create A New Data Frame Using Multiple Groupby Results

My data is a Data Frame with retail items and their sales performance. Columns include: 2016 unit sales, 2015 unit sales, item description, etc. When I try to do a groupby for bran

Solution 1:

you can do it this way:

Data.groupby(by="Major Brand")["2016 Units","2015 Units"].sum()

Demo:

In [29]: Data.groupby(by="Major Brand")["2016 Units","2015 Units"].sum()
Out[29]:
             2016 Units  2015 Units
Major Brand
1218238217212231922734176172

Data:

In[30]: DataOut[30]:
    MajorBrand2016Units2015UnitsX017583xxx118295xxx238547xxx33140xxx414343xxx543565xxx633871xxx745690xxx83977xxx911817xxx1035938xxx1148517xxx1226413xxx1323233xxx1427676xxx

Solution 2:

I get the following error: TypeError: unorderable types: int() < str()

Could it be that your dtypes are not correct? Eg str. instead of int? You could try create your dataframe with something as follows:

In [18]: import numpy as np; import pandas as pd

In [19]: col1 = ['adidas','nike','yourturn','zara','nike','nike','bla','bla','zalando','amazon']

In [20]: data = {'Major Brand':col1, '2016 Units':range(len(col1)), '2015 Units':range(len(col1),len(col1)*2)}

In [21]: x = pd.DataFrame(data, dtype=np.int64  )

In [22]: 

In [22]: x.groupby(by="Major Brand").sum()
Out[22]: 
             2015 Units  2016 Units
Major Brand                        
adidas               100
amazon               199
bla                  3313
nike                 4010
yourturn             122
zalando              188
zara                 133

In [23]: x.groupby(by="Major Brand")["2016 Units","2015 Units"].sum()
Out[23]: 
             2016 Units  2015 Units
Major Brand                        
adidas                010
amazon                919
bla                  1333
nike                 1040
yourturn              212
zalando               818
zara                  313

In [24]: x.dtypes
Out[24]: 
2015 Units      int64
2016 Units      int64
Major Brand    object
dtype: object

In [25]: x.groupby(by="Major Brand").agg(['count','sum','mean','median'])
Out[25]: 
            2015 Units                       2016 Units                     
                 count sum       mean median      count sum      mean median
Major Brand                                                                 
adidas               11010.00000010.0100.0000000.0
amazon               11919.00000019.0199.0000009.0
bla                  23316.50000016.52136.5000006.5
nike                 34013.33333314.03103.3333334.0
yourturn             11212.00000012.0122.0000002.0
zalando              11818.00000018.0188.0000008.0
zara                 11313.00000013.0133.0000003.0

Post a Comment for "Pandas: Create A New Data Frame Using Multiple Groupby Results"