Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List of list of tuples to pandas dataframe

I have this array (it's a result from similarity calcul) it's a list of tuples like this:

example = [[(a,b), (c,d)], [(a1,b1), (c1,d2)] …]

In example there is 121044 list of 30 tuples each.

I want to have a pandas Dataframe like of just the second value of the tuples (i.e : b, d, b1, d2) without spending to much time compute it

Do you have any ideas ?

like image 642
blabla Avatar asked Oct 16 '22 18:10

blabla


2 Answers

Use nested list comprehension:

df = pd.DataFrame([[y[1] for y in  x] for x in example])
print (df)
    0   1
0   b   d
1  b1  d2

df = pd.DataFrame([[y[1] for y in  x] for x in example], columns=['col1','col2'])
print (df)
  col1 col2
0    b    d
1   b1   d2
like image 189
jezrael Avatar answered Oct 21 '22 01:10

jezrael


For numeric data, you can use numpy indexing directly. This should be more efficient than a list comprehension, as pandas uses numpy internally to store data in contiguous memory blocks.

import pandas as pd, numpy as np

example = [[(1,2), (3,4)], [(5,6), (7,8)]]

df = pd.DataFrame(np.array(example)[..., 1],
                  columns=['col1', 'col2'])

print(df)

   col1  col2
0     2     4
1     6     8
like image 36
jpp Avatar answered Oct 21 '22 00:10

jpp