I am a Hobby Xojo-User. I wanna import a Gedcom-File to my Program, espacially to a SQLite-Database.
 - ID: Integer
 - Gender: Varchar // M, F or U
 - Surname: Varchar
 - Givenname: Varchar
 - ID: Integer
 - Husband: Integer
 - Wife: Integer
 - ID: Integer
 - PersonID: Integer
 - FamilyID: Integer
 - Order: Integer
 - ID: Integer
 - PersonID: Integer
 - EventType: Varchar // e.g. BIRT, DEAT, BURI, CHR
 - Date: Varchar
 - Description: Varchar
 - Order: Integer
 - ID: Integer
 - RelationshipID: Integer
 - EventType: Varchar // e.g. MARR, DIV, DIVF
 - Date: Varchar
 - Description: Integer
 - Order: Integer
I wrote a working Gedcom-Line-Parser. He splits a single Gedcomline into:
 - Level As Integer
 - Reference As String // optional
 - Tag As String
 - Value As String // optional
I load the Gedcom-File via TextInputStream (working fine). No i need to parse every Line.
0 @I1@ INDI
1 NAME George /Clooney/
2 GIVN George
2 SURN Clooney
1 BIRT
2 DATE 6 MAY 1961
2 PLAC Lexington, Fayette County, Kentucky, USA
You'll see, the Level-Numbers shows us a "Tree-Structure". So i thought it would be the best and simplest way to parse the File into separated Objects (PersonObj, RelationshipObj, EventObj etc.) into a JSONItem, because there its easy to get the Childs of a Node. Later on, i can simple read the Nodes, Child-Nodes to create the Database-Entries. But i don't know how to create such an Algorithm.
Can anyone help my please?
To parse the Gedcom lines with a good speed, try these ideas:
Read the entire file into a String and split the lines up:
dim f as FolderItem = ...
dim fileContent as String = TextInputStream.Open(f).ReadAll
fileContent = fileContent.DefineEncoding (Encodings.WindowsLatin1)
dim lines() as String = ReplaceLineEndings(fileContent,EndOfLine).Split(EndOfLine)
Parse every line using RegEx to extract its 3 columns
dim re as new RegEx
re.SearchPattern = "^(\d+) ([^ ]+)(.*)$"
for each line as String in lines
  dim rm as RegExMatch = re.Search (line)
  if rm = nil then
    // nothing found in this line. Is this correct?
    break
    continue // -> onward with next line
  end
  dim level as Integer = rm.SubExpressionString(1).Val
  dim code as String = rm.SubExpressionString(2)
  dim value as String = rm.SubExpressionString(3).Trim
  ... process the level, code and value
next
The RegEx search pattern means that it looks for the start of the line ("^"), then for one or more digits ("\d"), a blank, one or more non-blank chars ("[^ ]"), and finally any more chars (".") before the end of the string ("$"). The parentheses around each of these groups is for extracting their results with SubExpression() then.
The check for rm = nil hits whenever the line does not contain at least a number, a blank and at least one more character. If the Gedcom file is malformed or has blank lines, this may be the case.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With