Parsing Message Parameters Received By A Gsm Modem In Python
Solution 1:
Using regexp for this is not a very robust solution because it will not handle variations in different phone behaviour. In your example the the format of the response is
+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"but other phones will give responses like
+CMGL: 1,"REC READ","+31612123738",,"08/12/22,11:37:52+04"Notice the difference for the forth parameter, "" versus nothing.
Checking out 27.005, the syntax for the response in text mode is
+CMGL: <index>,<stat>,<oa/da>,[<alpha>],[<scts>][,<tooa/toda>,<length>]<CR><LF><data><CR><LF>and <alpha> is indeed optional. Yes, it is probably possible to write a regexp that takes this into account, but then you sort of wander into two problems land.
What I recommend you to do is to switch to doing proper parsing of the response, that is: start on the very first character and advance in chunks depending on expected parameter format (and presence). See this answer for a quick and dirty way to just exctract the phone number. It is not as robust as the algorithm I describe below (for instance comma + 2 is assuming too much).
The absolute correct algorithm for parsing responses is:
Match the prefix on the start of the line (e.g. +CMGL:). Then start parsing differentiating the following tokens:
- white-space ' 'or'\t'
- comma ','
- double-quote '"'
- carriage-return '\r'
- line-feed '\n'
- any-non-white-space-non-comma-non-double-quote-non-cr-non-lf-character
For each parameter, start by ignoring any leading white space.
If getting a comma, the parameter was not present, advance to parsing next parameter.
If getting carriage return, the next character should be line feed and the end of line is reached. If getting a non-white-space-non... character this is the start of a numerical parameter. Collect all non-white-space-non... characters following for this parameter. Following this the only legal characters should be zero or more white space followed by either comma or carriage return.
If getting a double quote character advance to the next double quote character, that is the end of the string (this is safe and correct because even if the string should contain a double quote characters, they are escaped but not as \"). Following this the only legal characters should be zero or more white-space followed by either comma or carriage return.
The above might seem a bit overwhelming at first, but it is really not that complicated when you start dealing with it.
Solution 2:
Since I don't get your material I just make a sample.
'\xef\xbb\xbfAT+CMGL="ALL"\n\n+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"\nhere\'s message one \n\n+CMGL: 2,"REC READ","+918884100421","","13/04/05,09:40:38+22"\nhere\'s message two\n\n+CMGL: 3,"REC READ","+918884100421","","13/04/05,09:41:04+22"\nhere\'s message three\n\n+CMGL: 4,"REC READ","+918884100421","","13/04/05,10:04:18+22"\nhere\'s message four\n\n+CMGL: 5,"REC READ","+918884100421","","13/04/05,10:04:32+22"\nhere\'s message five\n'
This comes from your question using ''.join(). And then I use your regex pattern, just replace the \r\n with \n because the sample I use using \n. And I get the result. I don't know why the findall doesn't work with you.
def parse(x):
    res = []
    match = re.finditer("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\n(.+)\n", x)
    foreachinmatch:
        res.append(each.group(6))
    return res
The result I get is ["here's message one ", "here's message two", "here's message three", "here's message four", "here's message five"]. finditer returns an iterator and findall also works OK.
 def parse(x):
        res = []
        match= re.findall("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\n(.+)\n", x)
        foreachinmatch:
            res.append(each[5])
        return res
Solution 3:
If the message is always on newline
(?:[\n\r]+|^)\+CMGL.*?[\n\r]+(.*?)(?=[\n\r]+|$)
Group 1 contains your required message
Post a Comment for "Parsing Message Parameters Received By A Gsm Modem In Python"