Skip to content Skip to sidebar Skip to footer

Conditionally Fill Column With Value From Another Dataframe Based On Row Match In Pandas

I find myself lost trying to solve this problem (automating tax paperwork). I have two dataframes: one with the quarterly historical records of EUR/USD exchange rates, and another

Solution 1:

You can change your rates dataframe to include all the dates and then forward fill,create a column called "Currency" in your Rates Dataframe and then join the two df's on both the date & currency columns.

idx = pd.DataFrame(pd.date_range('2017-07-05', '2017-07-12'),columns=['Date'])
rates = pd.merge(idx,rates,how="left",on="Date")
rates['Currency'] = 'USD'
rates['Rate'] = rates['Rate'].ffill()           

     Date   Rate    Currency
02017-07-051.1329  USD
12017-07-061.1385  USD
22017-07-071.1412  USD
32017-07-081.1412  USD
42017-07-091.1412  USD
52017-07-101.1387  USD
62017-07-111.1405  USD
72017-07-121.1449  USD

then doing a left join would give:

result= pd.merge(sales,rates,how="left",on=["Currency","Date"])
result['Rate'] = np.where(result['Currency'] =='EUR', 1, result['Rate_y'])
result= result.drop(['Rate_x','Rate_y'],axis =1)

would give:

DateFromCurrencyAmountRate02017-07-06  PayPalUSD1001.138512017-07-06  FastspringUSD2001.138522017-07-09  FastspringUSD1001.141232017-07-10  EUEUR1001.000042017-07-10  PayPalUSD2001.1387

Solution 2:

I break down the steps , by using pd.merge_asof

sales=pd.merge_asof(sales,rates,on='Date',direction='backward',allow_exact_matches =True)
sales.loc[sales.From=='EU','Rate_y']=sales.Rate_x

sales
Out[748]: 
        DateFrom Currency  Amount  Rate_x  Rate_y
02017-07-06      PayPal      USD     10011.138512017-07-06  Fastspring      USD     20011.138522017-07-09  Fastspring      USD     10011.141232017-07-10          EU      EUR     10011.000042017-07-10      PayPal      USD     20011.1387

Then

sales.drop('Rate_x',1).rename(columns={'Rate_y':'Rate'})Out[749]:DateFromCurrencyAmountRate02017-07-06      PayPalUSD1001.138512017-07-06  FastspringUSD2001.138522017-07-09  FastspringUSD1001.141232017-07-10          EUEUR1001.000042017-07-10      PayPalUSD2001.1387

Solution 3:

Here is how I would do it without merge. 1. Fill rates with missing dates and ffill as with other answers but keep Date as index. 2. Map this dataframe to sales, use loc to not include rows with EUR

idx = pd.date_range(rates['Date'].min(), rates['Date'].max())
rates = rates.set_index('Date').reindex(idx).ffill()
sales.loc[sales['Currency'] !='EUR','Rate'] = sales.loc[sales['Currency'] !='EUR','Date'].map(rates['Rate'])

    DateFrom        Currency    Amount  Rate
02017-07-06  PayPal      USD         1001.138512017-07-06  Fastspring  USD         2001.138522017-07-09  Fastspring  USD         1001.141232017-07-10  EU          EUR         1001.000042017-07-10  PayPal      USD         2001.1387

Or you can even do it without changing the dataframe rates

mapper = rates.set_index('Date').reindex(sales['Date'].unique()).ffill()['Rate']

sales.loc[sales['Currency'] != 'EUR','Rate'] = sales.loc[sales['Currency'] != 'EUR','Date'].map(mapper)

Timetesting:

wen:0.011892538983374834gayatri:0.13312408898491412vaishali :0.009498710976913571

Post a Comment for "Conditionally Fill Column With Value From Another Dataframe Based On Row Match In Pandas"