How To Sort A Pandas Dataframe By A Column That Has Both Numbers And Strings?
I have a dataframe that looks like this col0 col1 col2 col4 1 '1ZE7999' 865545 20 20 2 'R022428' 865584 297 0 3
Solution 1:
pd.to_numeric
+ sort_values
+ loc
-
df.loc[pd.to_numeric(df.col0, errors='coerce').sort_values().index]
col0 col1 col2 col4
3 34 865665 296 0
4 56 865700 297 0
5 100 865628 292 5
1 '1ZE7999' 865545 20 20
2 'R022428' 865584 297 0
Details
pd.to_numeric
coerces non-integral values to NaN
-
i = pd.to_numeric(df.col0, errors='coerce')
i
1 NaN
2 NaN
334.0456.05100.0
Name: col0, dtype: float64
sort_values
sorts the column, ignoring NaNs.
j = i.sort_values()
j
3 34.0
4 56.0
5 100.0
1 NaN
2 NaN
Name: col0, dtype: float64
Observe the index. All you need to do is use the index to reindex the dataframe. Either loc
or reindex
will do it.
df.loc[j.index]
col0 col1 col2 col4
3 34 865665 296 0
4 56 865700 297 0
5 100 865628 292 5
1 '1ZE7999' 865545 20 20
2 'R022428' 865584 297 0
df.reindex(index=j.index)
col0 col1 col2 col4
33486566529604568657002970510086562829251'1ZE7999'86554520202'R022428'8655842970
If you need to reset the index, that's easily done.
df.loc[j.index].reset_index(drop=True)
col0 col1 col2 col4
03486566529601568657002970210086562829253'1ZE7999'86554520204'R022428'8655842970
Solution 2:
By using natsort
from natsort import natsorted
df.set_index('col0').reindex(natsorted(df.col0.tolist(), key=lambda y: y.lower())).reset_index()
Out[736]:
col0 col1 col2 col4
03486566529601568657002970210086562829253'1ZE7999'86554520204'R022428'8655842970
Solution 3:
Use index_humansorted
from natsort
import natsort
df = df.iloc[natsort.index_humansorted(df['col0'])]
Post a Comment for "How To Sort A Pandas Dataframe By A Column That Has Both Numbers And Strings?"