Which Microdata parser should I use in Python [closed]

Question

I'm looking for a good quality HTML Microdata parser in Python. It doesn't have to be blazing fast but I'd like it to support as much of the spec as possible including itemref.

Here's what I've found so far:

https://github.com/edsu/microdata
https://github.com/RDFLib/pymicrodata
https://pypi.python.org/pypi/pelican-microdata/0.1

Have you used any of these libraries? What were the pros and cons?

I'm also curious about parsing poorly formatted HTML documents. Have you found a Microdata parser that handles messy input or do you run the input through something like BeautifulSoup first?

Jason R · Accepted Answer

What format do you want the Microdata parsed to?

https://github.com/RDFLib/pymicrodata will parse to RDF.

If you want JSON instead you should use https://github.com/edsu/microdata, which has recently gotten some attention and should be more conformant to the spec.

https://pypi.python.org/pypi/pelican-microdata/0.1 looks like a way to generate Microdata for a particular static site generator, so I don't think it will help with parsing.

I don't know how tolerant to poorly formatted HTML either of the above parsers are. If you know of some poorly formatted markup on the wild that uses Microdata, I'd be interested in seeing how well the Ruby parsers handle these cases.

Which Microdata parser should I use in Python [closed]

Tags:

python

beautifulsoup

microdata

Shawn Simister

1 Answers

Jason R

Recent Activity

Donate For Us

Which Microdata parser should I use in Python [closed]

Tags:

python

beautifulsoup

microdata

Shawn Simister

1 Answers

Jason R

Related questions

Recent Activity

Donate For Us