Skip to content Skip to sidebar Skip to footer

How To Use Beautifulsoup To Get The Same Result Obtained By Regex?

I'm trying to extract all the values (which are links) of attribute data-src-mp3 in the content1 generated from the url. The link is contained in ] for tag in soup.select('.cB.cB-def.dictionary.biling [data-src-mp3]')]

or

mp3s = list(map(lambda tag: tag.attrs['data-src-mp3'],
                soup.select('.cB.cB-def.dictionary.biling [data-src-mp3]')))

[data-src-mp3] selects only elements that have the data-src-mp3 attribute (with any value).

With a small change to have 'data-src-mp3' in a single place:

mp3_tag = 'data-src-mp3'
mp3s = list(map(lambda tag: tag.attrs[mp3_tag],
                soup.select('.cB.cB-def.dictionary.biling [{}]'.format(mp3_tag))))

This solution might look more intimidating at first, but is much better than relying on the wrong tool (such as regex when parsing HTML).

Post a Comment for "How To Use Beautifulsoup To Get The Same Result Obtained By Regex?"