Extract Specific Letters From Text Using Regex And Compare With Dictionary
Solution 1:
We can use .find
to get the code word, if it exists, and then use the dictionary to map the code word to its code number. We can use the dictionary .get
method to return the null code for missing or unknown code words. This version returns None
if it encounters bad data: a name that doesn't contain '-'
, or a name that doesn't have either 8 or 5 letters before the '-'
.
env_code = {
'ICS': '1',
'IGW': '2',
'RTL': '3',
'TDZ': '4',
}
null_code = '9'defget_env_code(name):
idx = name.find('-')
if idx == 8:
# code may be valid
code = name[idx-3:idx]
elif idx == 5:
# code is missing
code = ''else:
# Bad namereturnNonereturn env_code.get(code, null_code)
# test
data = [
'AABBBICS-CCCDDD001',
'AABBBIGW-CCCDDD001',
'AABBBRTL-CCCDDD001',
'AABBBTDZ-CCCDDD001',
'USNYCRTL-LANDCE001',
'AABBBXYZ-CCCDDD001',
'AABBB-CCCDDD001',
'BADDATA',
]
for s in data:
print(s, get_env_code(s))
output
AABBBICS-CCCDDD001 1
AABBBIGW-CCCDDD001 2
AABBBRTL-CCCDDD001 3
AABBBTDZ-CCCDDD001 4
USNYCRTL-LANDCE001 3
AABBBXYZ-CCCDDD001 9
AABBB-CCCDDD001 9
BADDATA None
Here's a simpler version that returns the null code instead of None
for bad data.
defget_env_code(name):
idx = name.find('-')
code = name[idx-3:idx] if idx == 8else''return env_code.get(code, null_code)
Solution 2:
If you're just checking if a member of ENVIRONMENTCODE
is found within each test string, then regex not necessary. You can just use the python keyword in
, e.g.
ENVIRONMENTCODE = {
'ICS': '1',
'IGW': '2',
'RTL': '3',
'TDZ': '4'
}
NULLCODE = {
'NULL': '9'
}
def environment_code(test_string, code_dict):
if'-' not in test_string:
return'no dash'for code, value in code_dict.items():
if code in test_string:
return value
return NULLCODE['NULL']
to_test = ['AABBBICS-CCCDDD001',
'AABBBIGW-CCCDDD001',
'AABBBRTL-CCCDDD001',
'AABBBTDZ-CCCDDD001']
for test_str in to_test:
print(environment_code(test_str, ENVIRONMENTCODE))
The problem with your original code was that you were trying to do
test_string in code_dict
which only checks for exact matches between the string under test and the keys withint the dictionary.
Solution 3:
My proposal:
def environmentcode(s):
if"-" not in s: #(**)
returnNone #(**)
h,t=s.split("-")
code=h.strip()[5:]
return ENVIRONMENTCODE.get(code,9)
data="AABBBICS-CCCDDD001 AABBBIGW-CCCDDD001 AABBBRTL-CCCDDD001 AABBBTDZ-CCCDDD001 USNYCRTL-LANDCE001 AABBB-CCCDDD001 something"forsin data.split():
print(s,"-->",environmentcode(s))
Output:
AABBBICS-CCCDDD001 -->1
AABBBIGW-CCCDDD001 -->2
AABBBRTL-CCCDDD001 -->3
AABBBTDZ-CCCDDD001 -->4
USNYCRTL-LANDCE001 -->3
AABBB-CCCDDD001 -->9
something -->None
#---------------------------------------------------------
# Filtering text with regex. In this case, (**) not needed.
text="""AABBBICS-CCCDDD001 Alice was beginning to get very tired of sitting by her sister on the bank... AABBBIGW-CCCDDD001 AABBBRTL-CCCDDD001 AABBBTDZ-CCCDDD001 USNYCRTL-LANDCE001 AABBB-CCCDDD001 AABBBXYZ-CCCDDD001 something"""
import re
data= re.findall(r"\b[A-Z]{5,8}-[A-Z]{6}001\b",text)
forsin data:
print(s,"-->",environmentcode(s))
Post a Comment for "Extract Specific Letters From Text Using Regex And Compare With Dictionary"