Skip to content Skip to sidebar Skip to footer

Parse A Custom Log File In Python

I have a log file with new line character Sample File: 2019-02-12T00:01:03.428+01:00 [Error] ErrorCode {My error: 'A'} - - - 00000000-0000-0000-6936-008007000000 2019-02-12T00:

Solution 1:

You don't need to be that precise with your regex:

import re

log_pattern = re.compile(r"([0-9\-]*)T([0-9\-:.+]*)\s*\[([^]]*)\](.*)")

withopen(name, "r") as f:
  for line in f:
      match = log_pattern.match(line)
      ifnot match:
        continue
      grps = match.groups()
      print("Log line:")
      print(f"  date:{grps[0]},\n  time:{grps[1]},\n  type:{grps[2]},\n  text:{grps[3]}")

You could even imagine being less precise than that, for example r"(.*)T([^\s]*)\s*\[([^]]*)\](.*)" works too. Here is a nice tool to use to test regular expressions: regex101.

Solution 2:

A good piece of advice when parsing is to stop trying to do things in one shot (even though it is fun). For example, writing a big regex to parse everything:

re.findall("...", TEXT)

Or extracting a value from a piece of text in a single (sometimes chained) line of code:

LINE.split("...")[...].split("...")[...]

Instead, decompose the logic into a sequence of easy steps (typically with assignment to intermediate variables), where each step prepares the way for another easy step. In your case, those steps might be:

time, rest = line.split(' [', 1)
line_type, msg = rest.split('] ', 1)

And in the real world of messy data, you sometimes need to add error-handling or sanity-checking logic in between the small steps.

Post a Comment for "Parse A Custom Log File In Python"