Skip to content Skip to sidebar Skip to footer

I Want To Filter Data For Excel Files Using Pandas

I am trying to filter Data, for Excel Files in Pandas. Based on the Column Value i.e. String Value. I Have tried the following to achieve what I want :- Latest Code shown Below as

Solution 1:

[Updated] - This is kin of weird but it respects the rules you want to apply

(which are a little weird as well, so it makes sense)

1. Create the Dataframe

In [1]:
import pandas as pd
 
data = [
        [475, 'SHAWBURY', 'DAK', 'DISPLAY', '2008-07-24 00:00:00', 188],
        [476, 'SHAWBURY', 'SPIT', 'DISPLAY', '2008-07-24 00:00:00', 188],
        [477, 'COTTESMORE', 'SPIT', 'DISPLAY', None, 757],                
        [478, 'COTTESMORE', 'DAK', 'DISPLAY', None, 757],               
        [484, 'SUNDERLAND', 'SPIT', 'DISPLAY', None, 333],           
        [487, 'EAST FORTUNE', 'SPIT', 'DISPLAY', None, 406],             
        [489, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-25 00:00:00', 138],
        [490, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-25 00:00:00', 138],
        [504, 'WIGTON', 'DHS', 'DISPLAY', '2008-07-26 00:00:00', 144],
        [506, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-26 00:00:00', 138],
        [507, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-26 00:00:00', 138],
        [508, 'SUNDERLAND', 'HS', 'DISPLAY', None, 333],                
        [509, 'SUNDERLAND', 'DAK', 'DISPLAY', None, 333]
       ]
df = pd.DataFrame(data, columns=['Index', 'Venue', 'A/C', 'DISPLAY', 'Date', 'BID']).set_index('Index')
df

Out [1]:

       Venue        A/C     DISPLAY     Date                    BID
Index                   
475    SHAWBURY     DAK     DISPLAY     2008-07-2400:00:00188476    SHAWBURY     SPIT    DISPLAY     2008-07-2400:00:00188477    COTTESMORE   SPIT    DISPLAY     None757478    COTTESMORE   DAK     DISPLAY     None757484    SUNDERLAND   SPIT    DISPLAY     None333487    EAST FORTUNE SPIT    DISPLAY     None406489    WINDERMERE   HS      DISPLAY     2008-07-2500:00:00138490    WINDERMERE   DAK     DISPLAY     2008-07-2500:00:00138504    WIGTON       DHS     DISPLAY     2008-07-2600:00:00144506    WINDERMERE   HS      DISPLAY     2008-07-2600:00:00138507    WINDERMERE   DAK     DISPLAY     2008-07-2600:00:00138508    SUNDERLAND   HS      DISPLAY     None333509    SUNDERLAND   DAK     DISPLAY     None333

2. Manipulate your dataframe

In [2] :
## Keep BID where we have at least 2 rows
test = df.groupby(by=['BID', 'Venue', 'DISPLAY']).count()
test = test[test['A/C']>1]
bids = test.reset_index().BID.tolist()

# Here if there is already `DHS` and `DS` in the column `A/C`, I want to keep them
df.loc[df['A/C']=='DHS', 'Aircraft'] = 'DHS'
df.loc[df['A/C']=='DS', 'Aircraft'] = 'DS'# I keep 1 row for each bid that has at least 2 rows, and their Aircraft's value are updatedfor bid in bids:
    df.loc[(df['BID']==bid) & (df['A/C']=='DAK'), 'Aircraft']= 'DHS' 
    df.loc[(df['BID']==bid) & (df['A/C']=='SPIT'), 'Aircraft'] = 'DS' 
    

df = df[df['Aircraft'].notnull()].drop(columns=['A/C'], axis=1)
data

Out [2]:

        Venue       DISPLAY     Date                BID     Aircraft
Index                   
475     SHAWBURY    DISPLAY     2008-07-2400:00:00188     DHS
476     SHAWBURY    DISPLAY     2008-07-2400:00:00188     DS
477     COTTESMORE  DISPLAY     None757     DS
478     COTTESMORE  DISPLAY     None757     DHS
484     SUNDERLAND  DISPLAY     None333     DS
490     WINDERMERE  DISPLAY     2008-07-2500:00:00138     DHS
504     WIGTON      DISPLAY     2008-07-2600:00:00144     DHS
507     WINDERMERE  DISPLAY     2008-07-2600:00:00138     DHS
509     SUNDERLAND  DISPLAY     None333     DHS

Post a Comment for "I Want To Filter Data For Excel Files Using Pandas"