Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a GEDCOM parser written in Python? [closed]

GEDCOM is a standard for exchanging genealogical data.

I've found parsers written in

  • C
  • perl
  • Ruby
  • and even Factor

but none so far written in Python. The closest I've come is the file libgedcom.py from the GRAMPS project, but that is so full of references to GRAMPS modules as to not be usable for me.

I just want a simple standalone GEDCOM parser library written in Python. Does this exist?

like image 863
BioGeek Avatar asked Dec 17 '09 05:12

BioGeek


5 Answers

A few years ago I wrote a simplistic GEDCOM to XML translator in Python as part of a larger project. I found that dealing with the GEDCOM data in an XML format was much easier (especially when the next step involved XSLT).

I don't have the code online at the moment, so I've pasted the module into this message. This works for me; no guarantees. Hope this helps though.

import codecs, os, re, sys
from xml.sax.saxutils import escape

fn = sys.argv[1]

ged = codecs.open(fn, encoding="cp437")
xml = codecs.open(fn+".xml", "w", "utf8")
xml.write("""<?xml version="1.0"?>\n""")
xml.write("<gedcom>")
sub = []
for s in ged:
    s = s.strip()
    m = re.match(r"(\d+) (@(\w+)@ )?(\w+)( (.*))?", s)
    if m is None:
        print "Error: unmatched line:", s
    level = int(m.group(1))
    id = m.group(3)
    tag = m.group(4)
    data = m.group(6)
    while len(sub) > level:
        xml.write("</%s>\n" % (sub[-1]))
        sub.pop()
    if level != len(sub):
        print "Error: unexpected level:", s
    sub += [tag]
    if id is not None:
        xml.write("<%s id=\"%s\">" % (tag, id))
    else:
        xml.write("<%s>" % (tag))
    if data is not None:
        m = re.match(r"@(\w+)@", data)
        if m:
            xml.write(m.group(1))
        elif tag == "NAME":
            m = re.match(r"(.*?)/(.*?)/$", data)
            if m:
                xml.write("<forename>%s</forename><surname>%s</surname>" % (escape(m.group(1).strip()), escape(m.group(2))))
            else:
                xml.write(escape(data))
        elif tag == "DATE":
            m = re.match(r"(((\d+)?\s+)?(\w+)?\s+)?(\d{3,})", data)
            if m:
                if m.group(3) is not None:
                    xml.write("<day>%s</day><month>%s</month><year>%s</year>" % (m.group(3), m.group(4), m.group(5)))
                elif m.group(4) is not None:
                    xml.write("<month>%s</month><year>%s</year>" % (m.group(4), m.group(5)))
                else:
                    xml.write("<year>%s</year>" % m.group(5))
            else:
                xml.write(escape(data))
        else:
            xml.write(escape(data))
while len(sub) > 0:
    xml.write("</%s>" % sub[-1])
    sub.pop()
xml.write("</gedcom>\n")
ged.close()
xml.close()
like image 169
Greg Hewgill Avatar answered Nov 14 '22 01:11

Greg Hewgill


I've taken code from mwhite's answer, extended it a bit (OK, more than just a bit) and posted at github: http://github.com/dijxtra/simplepyged. I take suggestions about what else to add :-)

like image 41
dijxtra Avatar answered Nov 14 '22 01:11

dijxtra


I know this thread is pretty old, but I found it in my searches as well as this project https://github.com/madprime/python-gedcom/

The source is squeeky clean and very functional.

like image 5
iLoveTux Avatar answered Nov 14 '22 01:11

iLoveTux


A general-purpose GEDCOM parser in Python is linked from http://ilab.cs.byu.edu/cs460/2006w/assignments/program1.html

like image 2
mwhite Avatar answered Nov 14 '22 03:11

mwhite


You could use the SWIG tool for including C libraries though the native language interface. You'll have to make calls against the C api from within Python, but the rest of your code can be Python only.

May sound a bit daunting, but once you get thing setup, using the two together won't be bad. There may be some quirks depending how the C library was written, but you'd have to deal with some no matter which option you used.

like image 1
Dana the Sane Avatar answered Nov 14 '22 03:11

Dana the Sane