Parsing Message Parameters Received By A Gsm Modem In Python
Solution 1:
Using regexp for this is not a very robust solution because it will not handle variations in different phone behaviour. In your example the the format of the response is
+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"
but other phones will give responses like
+CMGL: 1,"REC READ","+31612123738",,"08/12/22,11:37:52+04"
Notice the difference for the forth parameter, ""
versus nothing.
Checking out 27.005, the syntax for the response in text mode is
+CMGL: <index>,<stat>,<oa/da>,[<alpha>],[<scts>][,<tooa/toda>,<length>]<CR><LF><data><CR><LF>
and <alpha>
is indeed optional. Yes, it is probably possible to write a regexp that takes this into account, but then you sort of wander into two problems land.
What I recommend you to do is to switch to doing proper parsing of the response, that is: start on the very first character and advance in chunks depending on expected parameter format (and presence). See this answer for a quick and dirty way to just exctract the phone number. It is not as robust as the algorithm I describe below (for instance comma + 2
is assuming too much).
The absolute correct algorithm for parsing responses is:
Match the prefix on the start of the line (e.g. +CMGL:
). Then start parsing differentiating the following tokens:
- white-space
' '
or'\t'
- comma
','
- double-quote
'"'
- carriage-return
'\r'
- line-feed
'\n'
- any-non-white-space-non-comma-non-double-quote-non-cr-non-lf-character
For each parameter, start by ignoring any leading white space.
If getting a comma, the parameter was not present, advance to parsing next parameter.
If getting carriage return, the next character should be line feed and the end of line is reached. If getting a non-white-space-non... character this is the start of a numerical parameter. Collect all non-white-space-non... characters following for this parameter. Following this the only legal characters should be zero or more white space followed by either comma or carriage return.
If getting a double quote character advance to the next double quote character, that is the end of the string (this is safe and correct because even if the string should contain a double quote characters, they are escaped but not as \"
). Following this the only legal characters should be zero or more white-space followed by either comma or carriage return.
The above might seem a bit overwhelming at first, but it is really not that complicated when you start dealing with it.
Solution 2:
Since I don't get your material I just make a sample.
'\xef\xbb\xbfAT+CMGL="ALL"\n\n+CMGL: 1,"REC READ","+918884100421","","13/04/05,08:24:36+22"\nhere\'s message one \n\n+CMGL: 2,"REC READ","+918884100421","","13/04/05,09:40:38+22"\nhere\'s message two\n\n+CMGL: 3,"REC READ","+918884100421","","13/04/05,09:41:04+22"\nhere\'s message three\n\n+CMGL: 4,"REC READ","+918884100421","","13/04/05,10:04:18+22"\nhere\'s message four\n\n+CMGL: 5,"REC READ","+918884100421","","13/04/05,10:04:32+22"\nhere\'s message five\n'
This comes from your question using ''.join()
. And then I use your regex pattern, just replace the \r\n
with \n
because the sample I use using \n
. And I get the result. I don't know why the findall
doesn't work with you.
def parse(x):
res = []
match = re.finditer("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\n(.+)\n", x)
foreachinmatch:
res.append(each.group(6))
return res
The result I get is ["here's message one ", "here's message two", "here's message three", "here's message four", "here's message five"]
. finditer
returns an iterator and findall
also works OK.
def parse(x):
res = []
match= re.findall("\+CMGL: (\d+),""(.+)"",""(.+)"",(.*),""(.+)""\n(.+)\n", x)
foreachinmatch:
res.append(each[5])
return res
Solution 3:
If the message is always on newline
(?:[\n\r]+|^)\+CMGL.*?[\n\r]+(.*?)(?=[\n\r]+|$)
Group 1 contains your required message
Post a Comment for "Parsing Message Parameters Received By A Gsm Modem In Python"