Scenario: I retrieved 3 lists from a N-triples file, and now I am trying to combine them into a single, organized list.
Original format:
+--------+---------+--------+
| 100021 | hasdata | y |
+--------+---------+--------+
| 100021 | name | USER1 |
+--------+---------+--------+
| 100021 | extra1 | typer |
+--------+---------+--------+
| 100021 | extra2 | reader |
+--------+---------+--------+
| 50003 | hasdata | y |
+--------+---------+--------+
| 50003 | name | USER2 |
+--------+---------+--------+
| 50003 | extra1 | reader |
+--------+---------+--------+
| 50003 | extra2 | writer |
+--------+---------+--------+
| 50003 | extra3 | coder |
+--------+---------+--------+
| 30007 | hasdata | n |
+--------+---------+--------+
| 30007 | name | 0001 |
+--------+---------+--------+
| 30007 | extra1 | Null |
+--------+---------+--------+
While looping the ntriples file, I produced 3 lists (each is a column of the table above). I am now trying to match them into something like this:
+--------+---------+-------+--------+--------+--------+
| | hasdata | name | extra1 | extra2 | extra3 |
+--------+---------+-------+--------+--------+--------+
| 100021 | y | USER1 | typer | reader | |
+--------+---------+-------+--------+--------+--------+
| 50003 | y | USER2 | reader | writer | coder |
+--------+---------+-------+--------+--------+--------+
| 30007 | extra2 | n | 0001 | Null | |
+--------+---------+-------+--------+--------+--------+
So far, I used the function:
def listOfTuples(l1, l2, l3):
return list(map(lambda x, y, z:(x,y, z), l1, l2, l3))
But this only gave me a direct merge of correspondent items.
Question: I know it is possible to loop through the lists and get the matching items and build an array/dataframe manually. My question is is there any function or package that can do this automatically and in a less convoluted way?
Obs: I already have a way to produce the dataframe by looping manually. I just wanted to know if there is another more efficient way.
If I understand you correctly, you have a list which has tuple elements with a size of three objects, and you want to put them in another tuple You can use zip
to achieve this.
result = list(zip(list1, zip(*[(l1,l2,l3) for i in list1])))
You say you want a dataframe, so I'll operate on the assumption that pandas operations are acceptable.
I'm also assuming that the symbols are only you formatting, and not part of the actual data file (in the future, those sort of decorators are unnecessary and even detrimental for these type of questions)
Using your given data, I create a df (pd.read_csv
or some such) then pivot it
col1 col2 col3
0 100021 hasdata y
1 100021 name USER1
2 100021 extra1 typer
3 100021 extra2 reader
4 50003 hasdata y
5 50003 name USER2
6 50003 extra1 reader
7 50003 extra2 writer
8 50003 extra3 coder
9 30007 hasdata n
10 30007 name 0001
11 30007 extra1 Null
df.pivot(index='col1',columns='col2',values='col3')
col2 extra1 extra2 extra3 hasdata name
col1
30007 Null NaN NaN n 0001
50003 reader writer coder y USER2
100021 typer reader NaN y USER1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With