Skip to content Skip to sidebar Skip to footer

Apply Expanding Function On Dataframe

I have a function that I wish to apply to a subsets of a pandas DataFrame, so that the function is calculated on all rows (until current row) from the same group - i.e. using a gro

Solution 1:

An possible solution is to make the expanding part of the function, and use GroupBy.apply:

deffoo1(_df):
    return _df['x1'].expanding().max() * _df['x2'].expanding().apply(lambda x: x[-1], raw=True)

df['foo_result'] = df.groupby('group').apply(foo1).reset_index(level=0, drop=True)
print (df)
  group  time   x1  x2  foo_result
0     A     110110.03     B     11002200.01     A     240280.04     B     220000.02     A     330140.05     B     33003900.0

This is not a direct solution to the problem of applying a dataframe function to an expanding dataframe, but it achieves the same functionality.

Solution 2:

Applying a dataframe function on an expanding window is apparently not possible (at least for not pandas version 0.23.0), as one can see by plugging a print statement into the function.

Running df.groupby('group').expanding().apply(lambda x: bool(print(x)) , raw=False) on the given DataFrame (where the bool around the print is just to get a valid return value) returns:

01.0
dtype: float6401.012.0
dtype: float6401.012.023.0
dtype: float64010.0
dtype: float64010.0140.0
dtype: float64010.0140.0230.0
dtype: float64

(and so on - and also returns a dataframe with '0.0' in each cell, of course).

This shows that the expanding window works on a column-by-column basis (we see that first the expanding time series is printed, then x1, and so on), and does not really work on a dataframe - so a dataframe function can't be applied to it.

So, to get the obtained functionality, one would have to put the expanding inside the dataframe function, like in the accepted answer.

Post a Comment for "Apply Expanding Function On Dataframe"