Parse A Custom Log File In Python
Solution 1:
You don't need to be that precise with your regex:
import re
log_pattern = re.compile(r"([0-9\-]*)T([0-9\-:.+]*)\s*\[([^]]*)\](.*)")
withopen(name, "r") as f:
for line in f:
match = log_pattern.match(line)
ifnot match:
continue
grps = match.groups()
print("Log line:")
print(f" date:{grps[0]},\n time:{grps[1]},\n type:{grps[2]},\n text:{grps[3]}")
You could even imagine being less precise than that, for example r"(.*)T([^\s]*)\s*\[([^]]*)\](.*)"
works too. Here is a nice tool to use to test regular expressions: regex101.
Solution 2:
A good piece of advice when parsing is to stop trying to do things in one shot (even though it is fun). For example, writing a big regex to parse everything:
re.findall("...", TEXT)
Or extracting a value from a piece of text in a single (sometimes chained) line of code:
LINE.split("...")[...].split("...")[...]
Instead, decompose the logic into a sequence of easy steps (typically with assignment to intermediate variables), where each step prepares the way for another easy step. In your case, those steps might be:
time, rest = line.split(' [', 1)
line_type, msg = rest.split('] ', 1)
And in the real world of messy data, you sometimes need to add error-handling or sanity-checking logic in between the small steps.
Post a Comment for "Parse A Custom Log File In Python"