Compare Pandas Dataframes By Multiple Columns
What is the best way to figure out how two dataframes differ based on a combination of multiple columns. So if I have the following: df1: A B C 0 1 2 3 1 3 4 2 df2: A B C 0 1
Solution 1:
We can use .all
and pass axis=1
to perform row comparisons, we can then use this boolean index to show the rows that differ by negating ~
the boolean index:
In[43]:
df[~(df==df1).all(axis=1)]Out[43]:
ABC1342
breaking this down:
In [44]:
df==df1
Out[44]:
A B C
0TrueTrueTrue1TrueFalseTrueIn [45]:
(df==df1).all(axis=1)
Out[45]:
0True1False
dtype: bool
We can then pass the above as a boolean index to df
and invert it using ~
Post a Comment for "Compare Pandas Dataframes By Multiple Columns"