Naturally Sort A List Moving Alphanumeric Values To The End
Solution 1:
natsort
has a function natsort_key
that converts the item into a tuple based on which sorting is done.
So you can use it as:
sorted(c, key=lambda x: (not x.isdigit(), *ns.natsort_key(x)))
This produces:
>>> sorted(c, key=lambda x: (not x.isdigit(), *ns.natsort_key(x)))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '2Y', '3Y', '4Y', '5Y', '9Y']
You can also use it without iterable unpacking, since in that case we have two 2-tuples, and in case of a tie-break on the first item, it will thus compare the outcome of the natsort_key
call:
sorted(c, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
Solution 2:
You can actually perform this using natsorted
and the correct choice of key
.
>>> ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
['0',
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'10',
'11',
'2Y',
'3Y',
'4Y',
'5Y',
'9Y']
The key returns a tuple with the original input as the second element. Strings that are digits get placed at the front, all others at the back, then the subsets are sorted individually.
As a side note, Willem Van Onsem's solution uses natsort_key
, which has been deprecated as of natsort
version 3.0.4 (if you turn on DeprecationWarning
in your interpreter you will see that, and the function is now undocumented). It's actually pretty inefficient... it is preferred to use natort_keygen
which returns a natural sorting key. natsort_key
calls this under the hood, so for every input you are creating a new function and then calling it once.
Below I repeat the tests shown here, and I added my solution using the natsorted
method as well as the timing of the other solutions using natsort_keygen
instead of natsort_key
.
In [13]: %timeit sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
1 loop, best of 3: 33.3 s per loop
In [14]: natsort_key = ns.natsort_keygen()
In [15]: %timeit sorted(d, key=lambda x: (not x.isdigit(), natsort_key(x)))
1 loop, best of 3: 11.2 s per loop
In [16]: %timeit sorted(ns.natsorted(d), key=str.isdigit, reverse=True)
1 loop, best of 3: 9.77 s per loop
In [17]: %timeit ns.natsorted(d, key=lambda x: (not x.isdigit(), x))
1 loop, best of 3: 23.8 s per loop
Solution 3:
I'm grateful to Willem Van Onsem for posting his answer. However, I should note here that the original function's performance is an order of magnitude faster. Taking PM2 Ring's suggestions into account, here's some benchmarks between the two methods:
Setup
c = \
['0',
'1',
'10',
'11',
'2',
'2Y',
'3',
'3Y',
'4',
'4Y',
'5',
'5Y',
'6',
'7',
'8',
'9',
'9Y']
d = c * (1000000 // len(c) + 1) # approximately 1M elements
%timeit sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
1 loop, best of 3: 2.78 s per loop
Original(w/ PM 2Ring's enhancement)
%timeit sorted(ns.natsorted(d), key=str.isdigit, reverse=True)
1loop, best of3: 796 ms per loop
The explanation for the high performance of the original is because Tim Sort seems to be highly optimised for nearly sorted lists.
Sanity Check
x = sorted(d, key=lambda x: (not x.isdigit(), ns.natsort_key(x)))
y = sorted(ns.natsorted(d), key=str.isdigit, reverse=True)
all(i == j for i, j inzip(x, y))
True
Post a Comment for "Naturally Sort A List Moving Alphanumeric Values To The End"