I have some files which have tree like structure. For example:
A
Result
a11
a12
Lolim
a21
a22
Uplim
a31
a32
B
Result
b11
b12
Lolim
b21
b22
I am interested in parsing this files in order to obtain a dataframe which looks like this:
Name Result Lolim Uplim
A a12 a22 a32
B b12 b22 NA
My idea was to split somehow the file in two parts: A and B. And after that split each one in subcategories. For A would be Result, Lolim and Uplim and for B Result and Lolim. Finally each subcategory in 2 parts. Therefore I will end up with a nested list, and than I will be able to create a dataframe. But I don't know how to obtain this nested list.
Or is there another method for this? Can you recommend me modules or functions which can be useful?
import collections
import pandas as pd
with open("data_tree.dat", "r") as data:
dct = collections.OrderedDict()
key = ""
sub_key = ""
for line in data:
if " " not in line: # single space
key = line.strip()
dct[key] = collections.OrderedDict()
elif " " * 4 in line and " " * 6 not in line: # 4 spaces
sub_key = line.strip()
dct[key][sub_key] = ""
elif " " * 6 in line: # 6 spaces
item = line.strip()
dct[key][sub_key] = item # overwrite, last element only
df = pd.DataFrame.from_dict(dct).transpose()
df.columns.names = ["Name"]
df = df[["Result", "Lolim", "Uplim"]] # if column order matters
df = df.fillna("NA") # in case you want NA and not NaN
print(df)
Output:
Name Result Lolim Uplim
A a12 a22 a32
B b12 b22 NA
This assumes that data_tree.dat looks like this and is contained within the same folder as the .py file containing the above code.
Or as a function:
import collections
import pandas as pd
def dat_to_df(path_to_file):
with open(path_to_file, "r") as data:
dct = collections.OrderedDict()
key = ""
sub_key = ""
for line in data:
if " " not in line:
key = line.strip()
dct[key] = collections.OrderedDict()
elif " " * 4 in line and " " * 6 not in line:
sub_key = line.strip()
dct[key][sub_key] = ""
elif " " * 6 in line:
item = line.strip()
dct[key][sub_key] = item
df = pd.DataFrame.from_dict(dct).transpose()
df.columns.names = ["Name"]
df = df[["Result", "Lolim", "Uplim"]]
return df.fillna("NA")
dataframe = dat_to_df("data_tree.dat")
print(dataframe)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With