Skip to content Skip to sidebar Skip to footer

Python Pandas: Merge Or Filter Dataframe By Another. Is There A Better Way?

One situation I sometimes encounter is, I have two dataframes (df1, df2) and I want to create a new dataframe (df3) based on the intersection of multiple columns between df1 and df

Solution 1:

Assuming that your df1 and df2 have exactly the same columns. You can first set those join-key columns as index and use df1.reindex(df2.index) and a further .dropna() to produce the intersection.

df3 = df1.set_index(['Campaign', 'Group'])
df4 = df2.set_index(['Campaign', 'Group'])
# reindex firstand dropna will produce the intersection
df3.reindex(df4.index).dropna(how='all').reset_index()

     Campaign    Group  Metric
0  Campaign 3Group12921  Campaign 3Group2373

Edit:

Use .isin when key is not unique.

# createsome duplicated keys andvalues
df3 = df3.append(df3)
df4 = df4.append(df4)

# isin
df3[df3.index.isin(df4.index)].reset_index()

     Campaign    Group  Metric
0  Campaign 3Group12921  Campaign 3Group23732  Campaign 3Group12923  Campaign 3Group2373

Solution 2:

Alternatively, you can use groupby and filter as follows:

# Compute the set of values you're interested in.# In your example, this will be {('Campaign 3', 'Group 1'), ('Campaign 3', 'Group 2')}
interesting_groups = set(df2[['Campaign', 'Group']].apply(tuple, axis=1))
# Filter df1, keeping only values in that set
result = df1.groupby(['Campaign', 'Group']).filter(
    lambda x: x.name in interesting_groups
)

See the filter docs for another example.

Post a Comment for "Python Pandas: Merge Or Filter Dataframe By Another. Is There A Better Way?"