Python Pandas: Merge Or Filter Dataframe By Another. Is There A Better Way?
One situation I sometimes encounter is, I have two dataframes (df1, df2) and I want to create a new dataframe (df3) based on the intersection of multiple columns between df1 and df
Solution 1:
Assuming that your df1
and df2
have exactly the same columns. You can first set those join-key columns as index and use df1.reindex(df2.index)
and a further .dropna()
to produce the intersection.
df3 = df1.set_index(['Campaign', 'Group'])
df4 = df2.set_index(['Campaign', 'Group'])
# reindex firstand dropna will produce the intersection
df3.reindex(df4.index).dropna(how='all').reset_index()
Campaign Group Metric
0 Campaign 3Group12921 Campaign 3Group2373
Edit:
Use .isin
when key is not unique.
# createsome duplicated keys andvalues
df3 = df3.append(df3)
df4 = df4.append(df4)
# isin
df3[df3.index.isin(df4.index)].reset_index()
Campaign Group Metric
0 Campaign 3Group12921 Campaign 3Group23732 Campaign 3Group12923 Campaign 3Group2373
Solution 2:
Alternatively, you can use groupby
and filter
as follows:
# Compute the set of values you're interested in.# In your example, this will be {('Campaign 3', 'Group 1'), ('Campaign 3', 'Group 2')}
interesting_groups = set(df2[['Campaign', 'Group']].apply(tuple, axis=1))
# Filter df1, keeping only values in that set
result = df1.groupby(['Campaign', 'Group']).filter(
lambda x: x.name in interesting_groups
)
See the filter
docs for another example.
Post a Comment for "Python Pandas: Merge Or Filter Dataframe By Another. Is There A Better Way?"