Skip to content Skip to sidebar Skip to footer

Edit A Dataframe 'inplace' In A Function, Or Return The Edited Dataframe?

I am currently working on a function to update a dataframe. There are two ways I can do this. Example 1: Edit in place. Create the dataframe mydf = pd.DataFrame({'name':['jim','jo

Solution 1:

I would always choose to return the DataFrame. If you plan to assign the output to another variable (df1 = my_func(df)) call the function with df.copy() or ensure you .copy() right at the top of your function to never accidentally modify your input.

DataFrames are mutable, so like lists they can be modified within functions without returning them. However, this can lead to a lot of confusion when you use a pandas function that returns a new object, instead of modifying the original.

mydf = pd.DataFrame({'name': ['jim', 'jim'],
                     'age': [12, 46]})

def modify(df):
    df.loc[df.name.eq('jim'), 'age'] = 1000print(mydf)
#  name  age#0  jim   12#1  jim   46

modify(mydf)
print(mydf)
#  name   age#0  jim  1000#1  jim  1000

Okay great, that changed. But what about if we continue with:

def modify2(df):
    df.drop_duplicates(inplace=True)
    df['age'] = df['age'] + 1

    df = pd.concat([df]*4)
    df['age'] = df['age'] + 17

modify2(mydf)
print(mydf)
#  name   age#0  jim  1001

So that's not great. Basically the function only succeeded in modifying df up until some part of our function returned a new object and not a reference to the original. This is very problematic, and requires every operation to operate inplace otherwise it's going to fail.

Solution 2:

We usually do np.where which will speed up the whole process

df['name']=np.where(df.name.str[0]=='j',df.name+'smith',df.name)
df['age']=np.where(df.age>40,df.age*2,df.age)
df
Out[90]: 
        name  age
0   jimsmith   12
1  johnsmith   92
2       mary   88
3    michael   32

Post a Comment for "Edit A Dataframe 'inplace' In A Function, Or Return The Edited Dataframe?"