Currently, I'm designing some format conversion tools in the area of glycobiology. The format conversion involves going from a text file to an XML file that is standard in the field. Most of the time, the data we get contains the information of interest in a plain text file like below. The actual file has all of this in one line. Reading and splitting this text to get the information is trivial (probably not intuitive) but XML is where the problem is.
[][b-D-GlcpNAc]
{[(4+1)][b-D-GlcpNAc]
{[(4+1)][b-D-Manp]
{[(3+1)][a-D-Manp]
{[(2+1)][a-D-Manp]{}
}
[(6+1)][a-D-Manp]
{[(3+1)][a-D-Manp]{}
[(6+1)][a-D-Manp]{}
}
}
}
How to interpret this:
You can probably read the XML and figure out how the linkages work. But if you guys would prefer a more detailed explanation, just ask.
What the XML should look like is shown below.
<?xml version="1.0" encoding="UTF-8"?>
<GlydeII>
<molecule subtype="glycan" id="From_GlycoCT_Translation">
<residue subtype="base_type" partid="1" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=b-dglc-HEX-1:5" />
<residue subtype="substituent" partid="2" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=n-acetyl" />
<residue subtype="base_type" partid="3" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=b-dglc-HEX-1:5" />
<residue subtype="substituent" partid="4" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=n-acetyl" />
<residue subtype="base_type" partid="5" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=b-dman-HEX-1:5" />
<residue subtype="base_type" partid="6" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=a-dman-HEX-1:5" />
<residue subtype="base_type" partid="7" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=a-dman-HEX-1:5" />
<residue subtype="base_type" partid="8" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=a-dman-HEX-1:5" />
<residue subtype="base_type" partid="9" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=a-dman-HEX-1:5" />
<residue subtype="base_type" partid="10" ref="http://www.monosaccharideDB.org/GLYDE-II.jsp?G=a-dman-HEX-1:5" />
<residue_link from="2" to="1">
<atom_link from="N1H" to="C2" to_replace="O2" bond_order="1" />
</residue_link>
<residue_link from="3" to="1">
<atom_link from="C1" to="O4" from_replace="O1" bond_order="1" />
</residue_link>
<residue_link from="4" to="3">
<atom_link from="N1H" to="C2" to_replace="O2" bond_order="1" />
</residue_link>
<residue_link from="5" to="3">
<atom_link from="C1" to="O4" from_replace="O1" bond_order="1" />
</residue_link>
<residue_link from="6" to="5">
<atom_link from="C1" to="O3" from_replace="O1" bond_order="1" />
</residue_link>
<residue_link from="7" to="6">
<atom_link from="C1" to="O2" from_replace="O1" bond_order="1" />
</residue_link>
<residue_link from="8" to="5">
<atom_link from="C1" to="O6" from_replace="O1" bond_order="1" />
</residue_link>
<residue_link from="9" to="8">
<atom_link from="C1" to="O3" from_replace="O1" bond_order="1" />
</residue_link>
<residue_link from="10" to="8">
<atom_link from="C1" to="O6" from_replace="O1" bond_order="1" />
</residue_link>
</molecule>
</GlydeII>
So far I've been trivially able to get all the residue fields and written them to XML. But I'm having trouble even writing pseudo code for the residue_link fields. Even if I can just get help and ideas on how to go about adding the linkage information in the xml I would appreciate it.
Okay! Cool problem, it hurts my brain in a good way.
First... for my sanity I tabbed your raw data into a way that makes sense:
[][b-D-GlcpNAc] {
[(4+1)][b-D-GlcpNAc] {
[(4+1)][b-D-Manp] {
[(3+1)][a-D-Manp] {
[(2+1)][a-D-Manp] { }
}
[(6+1)][a-D-Manp] {
[(3+1)][a-D-Manp] { }
[(6+1)][a-D-Manp] { }
}
}
}
I think that the key to this is figuring out what the pairs are, and you want to programmatically figure out what level you're on.
Pseudocode:
hierarchy = 0
nextChar = getNextChar()
while (Parsing):
if (nextChar = "{"):
hierarchy += 1
elif (nextChar = "}"):
hierarchy -= 1
if (nextChar = "["):
storeSugar(hierarchy)
You'd also want to keep track of which sugar is the previous "parent" sugar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With