Pyparsing: SetResultsName For Multiple Elements Get Combined

January 05, 2023 Post a Comment

Here is the text I'm parsing: x ~ normal(mu, 1) y ~ normal(mu2, 1) The parser matches those lines using: model_definition = Group(identifier.setResultsName('random_variable_name')

Solution 1:

Not documented that I know of, but I found something in pyparsing.py:

I changed .setResultsName('model_definition') to .setResultsName('model_definition*') and they listed correctly!

Edit: it is documented, but it is a flag you pass to setResultsName:

setResultsName( string, listAllMatches=False ) - name to be given to tokens matching the element; if multiple tokens within a repetition group (such as ZeroOrMore or delimitedList) the default is to return only the last matching token - if listAllMatches is set to True, then a list of matching tokens is returned.

Solution 2:

Here is enough of your code to get things to work:

from pyparsing import *

# fake in the bare minimum to parse the given test strings
identifier = Word(alphas, alphanums)
integer = Word(nums)
function_call = identifier + '(' + Optional(delimitedList(identifier | integer)) + ')'
expression = function_call

model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression)

sample = """
x ~ normal(mu, 1)
y ~ normal(mu2, 1)
"""

The trailing '*' is there in setResultsName for those cases where you use the short form of setResultsName: expr("name*") vs expr.setResultsName("name", listAllMatches=True). If you prefer calling setResultsName, then I would not use the '*' notation, but would pass the listAllMatches argument.

If you are getting names that step on each other, you may need to add a level of Grouping. Here is your solution using listAllMatches=True, by virtue of the trailing '*' notation:

model_definition1 = model_definition('model_definition*')
print OneOrMore(model_definition1).parseString(sample).dump()

It returns this parse result:

[['x', '~', 'normal', '(', 'mu', '1', ')'], ['y', '~', 'normal', '(', 'mu2', '1', ')']]
- model_definition: [['x', '~', 'normal', '(', 'mu', '1', ')'], ['y', '~', 'normal', '(', 'mu2', '1', ')']]
  [0]:
    ['x', '~', 'normal', '(', 'mu', '1', ')']
    - random_variable_name: x
  [1]:
    ['y', '~', 'normal', '(', 'mu2', '1', ')']

Here is a variation that does not use listAllMatches, but adds another level of Group:

model_definition2 = model_definition('model_definition')
print OneOrMore(Group(model_definition2)).parseString(sample).dump()

gives:

[[['x', '~', 'normal', '(', 'mu', '1', ')']], [['y', '~', 'normal', '(', 'mu2', '1', ')']]]
[0]:
  [['x', '~', 'normal', '(', 'mu', '1', ')']]
  - model_definition: ['x', '~', 'normal', '(', 'mu', '1', ')']
    - random_variable_name: x
[1]:
  [['y', '~', 'normal', '(', 'mu2', '1', ')']]
  - model_definition: ['y', '~', 'normal', '(', 'mu2', '1', ')']
    - random_variable_name: y

In both cases, I see the full content being returned, so I don't quit understand what you mean by "if you return multiple, it fails to split out each child."

Getting Started with Python

Pyparsing: SetResultsName For Multiple Elements Get Combined

Solution 1:

Solution 2:

Post a Comment for "Pyparsing: SetResultsName For Multiple Elements Get Combined"