Skip to content Skip to sidebar Skip to footer

How To Subtract Rows Of One Pandas Data Frame From Another?

The operation that I want to do is similar to merger. For example, with the inner merger we get a data frame that contains rows that are present in the first AND second data frame.

Solution 1:

Consider Following:

  1. df_one is first DataFrame
  2. df_two is second DataFrame

Present in First DataFrame and Not in Second DataFrame

Answer : by Index df = df_one[~df_one.index.isin(df_two.index)]

index can be replaced by required column upon which you wish to do exclusion. In above example, I've used index as a reference between both Data Frames

Additionally, you can also use a more complex query using boolean pandas.Series to solve for above.

Solution 2:

How about something like the following?

printdf1TeamYearfoo0Hawks2001    51Hawks2004    42Nets1987    33Nets1988    64Nets2001    85Nets2000   106Heat2004    67Pacers2003   12printdf2TeamYearfoo0Pacers2003   121Heat2004    62Nets1988    6

As long as there is a non-key commonly named column, you can let the added on sufffexes do the work (if there is no non-key common column then you could create one to use temporarily ... df1['common'] = 1 and df2['common'] = 1):

new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.foo_y.isnull()]

     Team  Year  foo_x  foo_y
0  Hawks  20015    NaN
1  Hawks  20044    NaN
2   Nets  19873    NaN
4   Nets  20018    NaN
5   Nets  200010    NaN

Or you can use isin but you would have to create a single key:

df1['key'] = df1['Team'] + df1['Year'].astype(str)
df2['key'] = df1['Team'] + df2['Year'].astype(str)
print df1[~df1.key.isin(df2.key)]

     Team  Year  foo         key
0   Hawks  20015   Hawks2001
2    Nets  19873    Nets1987
4    Nets  20018    Nets2001
5    Nets  200010    Nets2000
6    Heat  20046    Heat2004
7  Pacers  200312  Pacers2003

Solution 3:

You could run into errors if your non-index column has cells with NaN.

printdf1TeamYearfoo0Hawks2001    51Hawks2004    42Nets1987    33Nets1988    64Nets2001    85Nets2000   106Heat2004    67Pacers2003   128Problem2112  NaNprintdf2TeamYearfoo0Pacers2003   121Heat2004    62Nets1988    63Problem2112  NaNnew=df1.merge(df2,on=['Team','Year'],how='left')printnew[new.foo_y.isnull()]TeamYearfoo_xfoo_y0Hawks2001      5NaN1Hawks2004      4NaN2Nets1987      3NaN4Nets2001      8NaN5Nets2000     10NaN6Problem2112    NaNNaN

The problem team in 2112 has no value for foo in either table. So, the left join here will falsely return that row, which matches in both DataFrames, as not being present in the right DataFrame.

Solution:

What I do is to add a unique column to the inner DataFrame and set a value for all rows. Then when you join, you can check to see if that column is NaN for the inner table to find unique records in the outer table.

df2['in_df2']='yes'printdf2TeamYearfooin_df20Pacers2003   12yes1Heat2004    6yes2Nets1988    6yes3Problem2112  NaNyesnew=df1.merge(df2,on=['Team','Year'],how='left')printnew[new.in_df2.isnull()]TeamYearfoo_xfoo_yin_df1in_df20Hawks2001      5NaNyesNaN1Hawks2004      4NaNyesNaN2Nets1987      3NaNyesNaN4Nets2001      8NaNyesNaN5Nets2000     10NaNyesNaN

NB. The problem row is now correctly filtered out, because it has a value for in_df2.

Problem2112    NaNNaNyesyes

Solution 4:

I suggest using parameter 'indicator' in merge. Also if 'on' is None this defaults to the intersection of the columns in both DataFrames.

new= df1.merge(df2,how='left', indicator=True) # adds a newcolumn'_merge'new=new[(new['_merge']=='left_only')].copy() #rowsonlyin df1 andnot df2
new= new.drop(columns='_merge').copy()

    Team    Year    foo
0   Hawks   200151   Hawks   200442   Nets    198734   Nets    200185   Nets    200010

Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

indicator : booleanor string, defaultFalse

If True, adds a columnto output DataFrame called “_merge” with information on the source ofeach row. 
Information columnis Categorical-type and takes on a valueof 
“left_only” for observations whose merge key only appears inleft’ DataFrame,
“right_only” for observations whose merge key only appears inright’ DataFrame, 
andboth” if the observation’s merge key is found in both.

Post a Comment for "How To Subtract Rows Of One Pandas Data Frame From Another?"