How To Subtract Rows Of One Pandas Data Frame From Another?
Solution 1:
Consider Following:
- df_one is first DataFrame
- df_two is second DataFrame
Present in First DataFrame and Not in Second DataFrame
Answer : by Index
df = df_one[~df_one.index.isin(df_two.index)]
index can be replaced by required column upon which you wish to do exclusion. In above example, I've used index as a reference between both Data Frames
Additionally, you can also use a more complex query using boolean pandas.Series to solve for above.
Solution 2:
How about something like the following?
printdf1TeamYearfoo0Hawks2001 51Hawks2004 42Nets1987 33Nets1988 64Nets2001 85Nets2000 106Heat2004 67Pacers2003 12printdf2TeamYearfoo0Pacers2003 121Heat2004 62Nets1988 6
As long as there is a non-key commonly named column, you can let the added on sufffexes do the work (if there is no non-key common column then you could create one to use temporarily ... df1['common'] = 1
and df2['common'] = 1
):
new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.foo_y.isnull()]
Team Year foo_x foo_y
0 Hawks 20015 NaN
1 Hawks 20044 NaN
2 Nets 19873 NaN
4 Nets 20018 NaN
5 Nets 200010 NaN
Or you can use isin
but you would have to create a single key:
df1['key'] = df1['Team'] + df1['Year'].astype(str)
df2['key'] = df1['Team'] + df2['Year'].astype(str)
print df1[~df1.key.isin(df2.key)]
Team Year foo key
0 Hawks 20015 Hawks2001
2 Nets 19873 Nets1987
4 Nets 20018 Nets2001
5 Nets 200010 Nets2000
6 Heat 20046 Heat2004
7 Pacers 200312 Pacers2003
Solution 3:
You could run into errors if your non-index column has cells with NaN.
printdf1TeamYearfoo0Hawks2001 51Hawks2004 42Nets1987 33Nets1988 64Nets2001 85Nets2000 106Heat2004 67Pacers2003 128Problem2112 NaNprintdf2TeamYearfoo0Pacers2003 121Heat2004 62Nets1988 63Problem2112 NaNnew=df1.merge(df2,on=['Team','Year'],how='left')printnew[new.foo_y.isnull()]TeamYearfoo_xfoo_y0Hawks2001 5NaN1Hawks2004 4NaN2Nets1987 3NaN4Nets2001 8NaN5Nets2000 10NaN6Problem2112 NaNNaN
The problem team in 2112 has no value for foo in either table. So, the left join here will falsely return that row, which matches in both DataFrames, as not being present in the right DataFrame.
Solution:
What I do is to add a unique column to the inner DataFrame and set a value for all rows. Then when you join, you can check to see if that column is NaN for the inner table to find unique records in the outer table.
df2['in_df2']='yes'printdf2TeamYearfooin_df20Pacers2003 12yes1Heat2004 6yes2Nets1988 6yes3Problem2112 NaNyesnew=df1.merge(df2,on=['Team','Year'],how='left')printnew[new.in_df2.isnull()]TeamYearfoo_xfoo_yin_df1in_df20Hawks2001 5NaNyesNaN1Hawks2004 4NaNyesNaN2Nets1987 3NaNyesNaN4Nets2001 8NaNyesNaN5Nets2000 10NaNyesNaN
NB. The problem row is now correctly filtered out, because it has a value for in_df2.
Problem2112 NaNNaNyesyes
Solution 4:
I suggest using parameter 'indicator' in merge. Also if 'on' is None this defaults to the intersection of the columns in both DataFrames.
new= df1.merge(df2,how='left', indicator=True) # adds a newcolumn'_merge'new=new[(new['_merge']=='left_only')].copy() #rowsonlyin df1 andnot df2
new= new.drop(columns='_merge').copy()
Team Year foo
0 Hawks 200151 Hawks 200442 Nets 198734 Nets 200185 Nets 200010
Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
indicator : booleanor string, defaultFalse
If True, adds a columnto output DataFrame called “_merge” with information on the source ofeach row.
Information columnis Categorical-type and takes on a valueof
“left_only” for observations whose merge key only appears in ‘left’ DataFrame,
“right_only” for observations whose merge key only appears in ‘right’ DataFrame,
and “both” if the observation’s merge key is found in both.
Post a Comment for "How To Subtract Rows Of One Pandas Data Frame From Another?"