How To Delete Row Based On Row Above? Python Pandas
I have a dataset which looks like this: df = pd.DataFrame({'a': [1,1,1, 2, 3, 3, 4], 'b': [1,np.nan, np.nan, 2, 3, np.nan, 4]}) I'm looking to delete all rows which have np.nan in
Solution 1:
You want to find all the rows that have a np.nan in the next row. Use shift for that:
df.shift().isnull()
a b
0TrueTrue1FalseFalse2FalseTrue3FalseTrue4FalseFalse5FalseFalse6FalseTrue
Then you want to figure out if anything in that row was nan, so you want to reduce this to a single boolean mask.
df.shift().isnull().any(axis=1)
0True1False2True3True4False5False6True
dtype: bool
Then just drop the columns:
df.drop(df.shift().isnull().any(axis=1))
ab21 NaN
32243353 NaN
644
Solution 2:
Yes you can create a mask which will remove unwanted rows by combining df.notnull
and df.shift
:
notnull = df.notnull().all(axis=1)
df = df[notnull.shift(-1)]
Solution 3:
Test whether the rows are null with notnull:
In [11]: df.notnull()
Out[11]:
a b
0TrueTrue1TrueFalse2TrueFalse3TrueTrue4TrueTrue5TrueFalse6TrueTrueIn [12]: df.notnull().all(1)
Out[12]:
0True1False2False3True4True5False6True
dtype: bool
In [13]: df[df.notnull().all(1)]
Out[13]:
a b
011322433644
You can shift down to get whether the above row was NaN:
In [14]: df.notnull().all(1).shift().astype(bool)
Out[14]:
0True1True2False3False4True5True6False
dtype: bool
In [15]: df[df.notnull().all(1).shift().astype(bool)]
Out[15]:
a b
01111 NaN
43353 NaN
Note: You can shift upwards with shift(-1)
.
Post a Comment for "How To Delete Row Based On Row Above? Python Pandas"