I Want To Filter Data For Excel Files Using Pandas
I am trying to filter Data, for Excel Files in Pandas. Based on the Column Value i.e. String Value. I Have tried the following to achieve what I want :- Latest Code shown Below as
Solution 1:
[Updated] - This is kin of weird but it respects the rules you want to apply
(which are a little weird as well, so it makes sense)
1. Create the Dataframe
In [1]:
import pandas as pd
data = [
[475, 'SHAWBURY', 'DAK', 'DISPLAY', '2008-07-24 00:00:00', 188],
[476, 'SHAWBURY', 'SPIT', 'DISPLAY', '2008-07-24 00:00:00', 188],
[477, 'COTTESMORE', 'SPIT', 'DISPLAY', None, 757],
[478, 'COTTESMORE', 'DAK', 'DISPLAY', None, 757],
[484, 'SUNDERLAND', 'SPIT', 'DISPLAY', None, 333],
[487, 'EAST FORTUNE', 'SPIT', 'DISPLAY', None, 406],
[489, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-25 00:00:00', 138],
[490, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-25 00:00:00', 138],
[504, 'WIGTON', 'DHS', 'DISPLAY', '2008-07-26 00:00:00', 144],
[506, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-26 00:00:00', 138],
[507, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-26 00:00:00', 138],
[508, 'SUNDERLAND', 'HS', 'DISPLAY', None, 333],
[509, 'SUNDERLAND', 'DAK', 'DISPLAY', None, 333]
]
df = pd.DataFrame(data, columns=['Index', 'Venue', 'A/C', 'DISPLAY', 'Date', 'BID']).set_index('Index')
df
Out [1]:
Venue A/C DISPLAY Date BID
Index
475 SHAWBURY DAK DISPLAY 2008-07-2400:00:00188476 SHAWBURY SPIT DISPLAY 2008-07-2400:00:00188477 COTTESMORE SPIT DISPLAY None757478 COTTESMORE DAK DISPLAY None757484 SUNDERLAND SPIT DISPLAY None333487 EAST FORTUNE SPIT DISPLAY None406489 WINDERMERE HS DISPLAY 2008-07-2500:00:00138490 WINDERMERE DAK DISPLAY 2008-07-2500:00:00138504 WIGTON DHS DISPLAY 2008-07-2600:00:00144506 WINDERMERE HS DISPLAY 2008-07-2600:00:00138507 WINDERMERE DAK DISPLAY 2008-07-2600:00:00138508 SUNDERLAND HS DISPLAY None333509 SUNDERLAND DAK DISPLAY None333
2. Manipulate your dataframe
In [2] :
## Keep BID where we have at least 2 rows
test = df.groupby(by=['BID', 'Venue', 'DISPLAY']).count()
test = test[test['A/C']>1]
bids = test.reset_index().BID.tolist()
# Here if there is already `DHS` and `DS` in the column `A/C`, I want to keep them
df.loc[df['A/C']=='DHS', 'Aircraft'] = 'DHS'
df.loc[df['A/C']=='DS', 'Aircraft'] = 'DS'# I keep 1 row for each bid that has at least 2 rows, and their Aircraft's value are updatedfor bid in bids:
df.loc[(df['BID']==bid) & (df['A/C']=='DAK'), 'Aircraft']= 'DHS'
df.loc[(df['BID']==bid) & (df['A/C']=='SPIT'), 'Aircraft'] = 'DS'
df = df[df['Aircraft'].notnull()].drop(columns=['A/C'], axis=1)
data
Out [2]:
Venue DISPLAY Date BID Aircraft
Index
475 SHAWBURY DISPLAY 2008-07-2400:00:00188 DHS
476 SHAWBURY DISPLAY 2008-07-2400:00:00188 DS
477 COTTESMORE DISPLAY None757 DS
478 COTTESMORE DISPLAY None757 DHS
484 SUNDERLAND DISPLAY None333 DS
490 WINDERMERE DISPLAY 2008-07-2500:00:00138 DHS
504 WIGTON DISPLAY 2008-07-2600:00:00144 DHS
507 WINDERMERE DISPLAY 2008-07-2600:00:00138 DHS
509 SUNDERLAND DISPLAY None333 DHS
Post a Comment for "I Want To Filter Data For Excel Files Using Pandas"