Edit A Dataframe 'inplace' In A Function, Or Return The Edited Dataframe?
Solution 1:
I would always choose to return the DataFrame. If you plan to assign the output to another variable (df1 = my_func(df)
) call the function with df.copy()
or ensure you .copy()
right at the top of your function to never accidentally modify your input.
DataFrames
are mutable, so like lists they can be modified within functions without returning them. However, this can lead to a lot of confusion when you use a pandas
function that returns a new object, instead of modifying the original.
mydf = pd.DataFrame({'name': ['jim', 'jim'],
'age': [12, 46]})
def modify(df):
df.loc[df.name.eq('jim'), 'age'] = 1000print(mydf)
# name age#0 jim 12#1 jim 46
modify(mydf)
print(mydf)
# name age#0 jim 1000#1 jim 1000
Okay great, that changed. But what about if we continue with:
def modify2(df):
df.drop_duplicates(inplace=True)
df['age'] = df['age'] + 1
df = pd.concat([df]*4)
df['age'] = df['age'] + 17
modify2(mydf)
print(mydf)
# name age#0 jim 1001
So that's not great. Basically the function only succeeded in modifying df
up until some part of our function returned a new object and not a reference to the original. This is very problematic, and requires every operation to operate inplace otherwise it's going to fail.
Solution 2:
We usually do np.where
which will speed up the whole process
df['name']=np.where(df.name.str[0]=='j',df.name+'smith',df.name)
df['age']=np.where(df.age>40,df.age*2,df.age)
df
Out[90]:
name age
0 jimsmith 12
1 johnsmith 92
2 mary 88
3 michael 32
Post a Comment for "Edit A Dataframe 'inplace' In A Function, Or Return The Edited Dataframe?"