Solution 1:
Sorry, didn't notice that you're not merely updating the field but you actually want to replace a number at the end, but even if that's the case - it's much better to properly convert your number to roman numerals than to map every possible occurrence of such (what would happen with your code if there is a number larger than 25?). So, here's one way to do it:
ROMAN_MAP = [(1000, 'M'), (900, 'CM'), (500, 'D'), (400, 'CD'), (100, 'C'), (90, 'XC'),
(50, 'L'), (40, 'XL'), (10, 'X'), (9, 'IX'), (5, 'V'), (4, 'IV'), (1, 'I')]
ifnot data ornotisinstance(data, str): # we know how to work with strings onlyreturn data
data = data.rstrip() # remove potential extra whitespace at the end
space_pos = data.rfind(" ") # find the last space before the numberif space_pos != -1:
number = int(data[space_pos + 1:]) # get the number at the end
roman_number = ""for i, r in ROMAN_MAP: # loop-reduce substitution based on the ROMAN_MAPwhile number >= i:
roman_number += r
number -= i
return data[:space_pos + 1] + roman_number # put everything back togetherexcept (TypeError, ValueError):
pass# couldn't extract a numberreturn data
So now if we create your data frame as:
HSP_OLD = pd.DataFrame({"tryl": ["SAF/HSP: Secondary diagnosis E code 1",
"SAF/HSP: Secondary diagnosis E code 11",
"Something else without a number at the end"]})
We can noe easily apply our function over the whole column with:
HSP_OLD['tryl'] = HSP_OLD['tryl'].apply(romanize)
Which results in:
tryl0SAF/HSP: SecondarydiagnosisEcodeI1None2SAF/HSP: SecondarydiagnosisEcodeXI3Somethingelsewithoutanumberattheend
Of course, you can adapt the romanize()
function to your needs to search any number within your string and turn it to roman numerals - this is just an example for how to quickly find the number at the end of the string.
Solution 2:
You need to keep the order of the items, and start searching with the longest substring.
You may use an OrderDict
here. To initialize it, use a list of tuples. You may reverse it already here, when initializing, but you can do it later, too.
import collections
import pandas as pd
# My test data
HSP_OLD = pd.DataFrame({'tryl':['1. Text', '11. New Text', '25. More here']})
d_hsp_lst=[("1","I"),("2","II"),("3","III"),("4","IV"),("5","V"),("6","VI"),("7","VII"),("8","VIII"), ("9","IX"),("10","X"),("11","XI"),("12","XII"),("13","XIII"),("14","XIV"),("15","XV"), ("16","XVI"),("17","XVII"),("18","XVIII"),("19","XIX"),("20","XX"),("21","XXI"), ("22","XXII"),("23","XXIII"),("24","XXIV"),("25","XXV")]
d_hsp = collections.OrderedDict(d_hsp_lst) # Creating the OrderedDict
d_hsp = collections.OrderedDict(reversed(d_hsp.items())) # Here, reversing>>> HSP_OLD['tryl'] = HSP_OLD['tryl'].replace(d_hsp, regex=True)
0 I. Text
1 XI. New Text
2 XXV. More here
