Skip to content Skip to sidebar Skip to footer

Join/Merge Dataframe Based On Interval

I have two tables: one is a table of planned leak patch, and the other is a table of realization of leak patched. Like follows: The second table has more columns (omitted here). T

Solution 1:

There are six cases which range can intersect or not. You have documented four cases, other two cases are one where REAL is completely before PLAN. This condition can be tested by stating where REAL.REAL_END < PLAN.INTERVAL_START, and the other condition is where REAL is complete after PLAN. This condition can be tested by stating when REAL.REAL_START > PLAN.INTERVAL_END.

enter image description here enter image description here

So, you can simply your SQL query to check this two conditions if neither is true then you have intersecting ranges. In pandas, you can use the following, very similiar to what you are are doing in SQL, do a cross join and filter the results:

import pandas as pd
import numpy as np

plan = pd.DataFrame({'PIPE_ID':['P_01','P_01'],'INTERVAL_START':[150,175],'INTERVAL_END':[151.5,177.5]})

real  = pd.DataFrame({'PIPE_ID':['P_01','P_01'],'REAL_START':[148,174.5],'REAL_END':[150.5,178]})

df_merged = plan.merge(real, on='PIPE_ID')

df_out = df_merged[~((df_merged['REAL_END'] < df_merged['INTERVAL_START']) | (df_merged['REAL_START']>df_merged['INTERVAL_END']))] 

print(df_out)

Output:

   INTERVAL_END  INTERVAL_START PIPE_ID  REAL_END  REAL_START
0         151.5             150    P_01     150.5       148.0
3         177.5             175    P_01     178.0       174.5

Post a Comment for "Join/Merge Dataframe Based On Interval"