GEDCOM is a standard for exchanging genealogical data.
I've found parsers written in
but none so far written in Python. The closest I've come is the file libgedcom.py from the GRAMPS project, but that is so full of references to GRAMPS modules as to not be usable for me.
I just want a simple standalone GEDCOM parser library written in Python. Does this exist?
A few years ago I wrote a simplistic GEDCOM to XML translator in Python as part of a larger project. I found that dealing with the GEDCOM data in an XML format was much easier (especially when the next step involved XSLT).
I don't have the code online at the moment, so I've pasted the module into this message. This works for me; no guarantees. Hope this helps though.
import codecs, os, re, sys
from xml.sax.saxutils import escape
fn = sys.argv[1]
ged = codecs.open(fn, encoding="cp437")
xml = codecs.open(fn+".xml", "w", "utf8")
xml.write("""<?xml version="1.0"?>\n""")
xml.write("<gedcom>")
sub = []
for s in ged:
s = s.strip()
m = re.match(r"(\d+) (@(\w+)@ )?(\w+)( (.*))?", s)
if m is None:
print "Error: unmatched line:", s
level = int(m.group(1))
id = m.group(3)
tag = m.group(4)
data = m.group(6)
while len(sub) > level:
xml.write("</%s>\n" % (sub[-1]))
sub.pop()
if level != len(sub):
print "Error: unexpected level:", s
sub += [tag]
if id is not None:
xml.write("<%s id=\"%s\">" % (tag, id))
else:
xml.write("<%s>" % (tag))
if data is not None:
m = re.match(r"@(\w+)@", data)
if m:
xml.write(m.group(1))
elif tag == "NAME":
m = re.match(r"(.*?)/(.*?)/$", data)
if m:
xml.write("<forename>%s</forename><surname>%s</surname>" % (escape(m.group(1).strip()), escape(m.group(2))))
else:
xml.write(escape(data))
elif tag == "DATE":
m = re.match(r"(((\d+)?\s+)?(\w+)?\s+)?(\d{3,})", data)
if m:
if m.group(3) is not None:
xml.write("<day>%s</day><month>%s</month><year>%s</year>" % (m.group(3), m.group(4), m.group(5)))
elif m.group(4) is not None:
xml.write("<month>%s</month><year>%s</year>" % (m.group(4), m.group(5)))
else:
xml.write("<year>%s</year>" % m.group(5))
else:
xml.write(escape(data))
else:
xml.write(escape(data))
while len(sub) > 0:
xml.write("</%s>" % sub[-1])
sub.pop()
xml.write("</gedcom>\n")
ged.close()
xml.close()
I've taken code from mwhite's answer, extended it a bit (OK, more than just a bit) and posted at github: http://github.com/dijxtra/simplepyged. I take suggestions about what else to add :-)
I know this thread is pretty old, but I found it in my searches as well as this project https://github.com/madprime/python-gedcom/
The source is squeeky clean and very functional.
A general-purpose GEDCOM parser in Python is linked from http://ilab.cs.byu.edu/cs460/2006w/assignments/program1.html
You could use the SWIG tool for including C libraries though the native language interface. You'll have to make calls against the C api from within Python, but the rest of your code can be Python only.
May sound a bit daunting, but once you get thing setup, using the two together won't be bad. There may be some quirks depending how the C library was written, but you'd have to deal with some no matter which option you used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With