Skip to content Skip to sidebar Skip to footer

Parsing Message Parameters Received By A Gsm Modem In Python

I'm trying to parse messages that I receive from a GSM modem in python. I have a lot of messages that I need to parse. I receive new messages every couple of hours or so. Here's a

Solution 1:

Using regexp for this is not a very robust solution because it will not handle variations in different phone behaviour. In your example the the format of the response is

+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"

but other phones will give responses like

+CMGL: 1,"REC READ","+31612123738",,"08/12/22,11:37:52+04"

Notice the difference for the forth parameter, "" versus nothing. Checking out 27.005, the syntax for the response in text mode is

+CMGL: <index>,<stat>,<oa/da>,[<alpha>],[<scts>][,<tooa/toda>,<length>]<CR><LF><data><CR><LF>

and <alpha> is indeed optional. Yes, it is probably possible to write a regexp that takes this into account, but then you sort of wander into two problems land.


What I recommend you to do is to switch to doing proper parsing of the response, that is: start on the very first character and advance in chunks depending on expected parameter format (and presence). See this answer for a quick and dirty way to just exctract the phone number. It is not as robust as the algorithm I describe below (for instance comma + 2 is assuming too much).

The absolute correct algorithm for parsing responses is:

Match the prefix on the start of the line (e.g. +CMGL:). Then start parsing differentiating the following tokens:

  • white-space ' ' or '\t'
  • comma ','
  • double-quote '"'
  • carriage-return '\r'
  • line-feed '\n'
  • any-non-white-space-non-comma-non-double-quote-non-cr-non-lf-character

For each parameter, start by ignoring any leading white space. If getting a comma, the parameter was not present, advance to parsing next parameter. If getting carriage return, the next character should be line feed and the end of line is reached. If getting a non-white-space-non... character this is the start of a numerical parameter. Collect all non-white-space-non... characters following for this parameter. Following this the only legal characters should be zero or more white space followed by either comma or carriage return. If getting a double quote character advance to the next double quote character, that is the end of the string (this is safe and correct because even if the string should contain a double quote characters, they are escaped but not as \"). Following this the only legal characters should be zero or more white-space followed by either comma or carriage return.

The above might seem a bit overwhelming at first, but it is really not that complicated when you start dealing with it.

Solution 2:

Since I don't get your material I just make a sample.

'\xef\xbb\xbfAT+CMGL="ALL"\n\n+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"\nhere\'s message one \n\n+CMGL: 2,"REC READ","+918884100421","","13/04/05,09:40:38+22"\nhere\'s message two\n\n+CMGL: 3,"REC READ","+918884100421","","13/04/05,09:41:04+22"\nhere\'s message three\n\n+CMGL: 4,"REC READ","+918884100421","","13/04/05,10:04:18+22"\nhere\'s message four\n\n+CMGL: 5,"REC READ","+918884100421","","13/04/05,10:04:32+22"\nhere\'s message five\n'

This comes from your question using ''.join(). And then I use your regex pattern, just replace the \r\n with \n because the sample I use using \n. And I get the result. I don't know why the findall doesn't work with you.

def parse(x):
    res = []
    match = re.finditer("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\n(.+)\n", x)
    foreachinmatch:
        res.append(each.group(6))
    return res

The result I get is ["here's message one ", "here's message two", "here's message three", "here's message four", "here's message five"]. finditer returns an iterator and findall also works OK.

 def parse(x):
        res = []
        match= re.findall("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\n(.+)\n", x)
        foreachinmatch:
            res.append(each[5])
        return res

Solution 3:

If the message is always on newline

(?:[\n\r]+|^)\+CMGL.*?[\n\r]+(.*?)(?=[\n\r]+|$)

Group 1 contains your required message

Post a Comment for "Parsing Message Parameters Received By A Gsm Modem In Python"