Skip to content Skip to sidebar Skip to footer

Compare Pandas Dataframes By Multiple Columns

What is the best way to figure out how two dataframes differ based on a combination of multiple columns. So if I have the following: df1: A B C 0 1 2 3 1 3 4 2 df2: A B C 0 1

Solution 1:

We can use .all and pass axis=1 to perform row comparisons, we can then use this boolean index to show the rows that differ by negating ~ the boolean index:

In[43]:

df[~(df==df1).all(axis=1)]Out[43]:
   ABC1342

breaking this down:

In [44]:

df==df1
Out[44]:
      A      B     C
0TrueTrueTrue1TrueFalseTrueIn [45]:

(df==df1).all(axis=1)
Out[45]:
0True1False
dtype: bool

We can then pass the above as a boolean index to df and invert it using ~

Post a Comment for "Compare Pandas Dataframes By Multiple Columns"