Join/Merge Dataframe Based On Interval
I have two tables: one is a table of planned leak patch, and the other is a table of realization of leak patched. Like follows: The second table has more columns (omitted here). T
Solution 1:
There are six cases which range can intersect or not. You have documented four cases, other two cases are one where REAL is completely before PLAN. This condition can be tested by stating where REAL.REAL_END < PLAN.INTERVAL_START, and the other condition is where REAL is complete after PLAN. This condition can be tested by stating when REAL.REAL_START > PLAN.INTERVAL_END.
So, you can simply your SQL query to check this two conditions if neither is true then you have intersecting ranges. In pandas, you can use the following, very similiar to what you are are doing in SQL, do a cross join and filter the results:
import pandas as pd
import numpy as np
plan = pd.DataFrame({'PIPE_ID':['P_01','P_01'],'INTERVAL_START':[150,175],'INTERVAL_END':[151.5,177.5]})
real = pd.DataFrame({'PIPE_ID':['P_01','P_01'],'REAL_START':[148,174.5],'REAL_END':[150.5,178]})
df_merged = plan.merge(real, on='PIPE_ID')
df_out = df_merged[~((df_merged['REAL_END'] < df_merged['INTERVAL_START']) | (df_merged['REAL_START']>df_merged['INTERVAL_END']))]
print(df_out)
Output:
INTERVAL_END INTERVAL_START PIPE_ID REAL_END REAL_START
0 151.5 150 P_01 150.5 148.0
3 177.5 175 P_01 178.0 174.5
Post a Comment for "Join/Merge Dataframe Based On Interval"