Skip to content Skip to sidebar Skip to footer

Pyparsing: Nested Markdown Emphasis

I'm noodling around with some simple Markdown text to play with and learn Pyparsing and grammars in general. I've run into a problem almost immediately that I'm having trouble sol

Solution 1:

With those additional rules, I don't think you need to worry about the recursion at all, just handle the opening and closing emphasis expressions as they are found, whether they match up or not:

from pyparsing import *

openEmphasis = (LineStart() | White()) + Suppress('*')
openEmphasis.setParseAction(lambda x: ''.join(x.asList()+['<em>']))
closeEmphasis = '*' + FollowedBy(White() | LineEnd())
closeEmphasis.setParseAction(lambda x: '</em>')

emphasis = (openEmphasis | closeEmphasis).leaveWhitespace()

test = """
*foo *bar* bar*
"""print test
print emphasis.transformString(test)

Prints:

*foo *bar* bar*

<em>foo <em>bar</em> bar</em>

You are not the first to trip over this kind of application. When I presented at PyCon'06, an eager attendee dove right in to parse out some markdown, with an input string something like "****a** b**** c**" or something. We worked on it a bit together, but the disambiguation rules were just too context-aware for a basic pyparsing parser to handle.

Solution 2:

Think about what you're asking for. When does a second * close emphasis, and when does it open a nested emphasis? You have written no rules to distinguish that. Since it's always 100% ambiguous, that means the only possible outcomes you can get are:

  • No emphasis can ever be closed, or
  • No emphasis can ever be nested.

I doubt you're asking how to switch from the second to the first.

So then what are you asking for?

You need to implement some kind of rule to disambiguate these two possibilities.

In fact, if you read the docs you linked to, they have a complicated set of rules that define exactly when a * can open emphasis and when it can't, and likewise for closng; given those rules, if it's still ambiguous, it closes emphasis. You have to implement that.

Post a Comment for "Pyparsing: Nested Markdown Emphasis"