Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

quickest way to to convert list of tuples to a series

Consider a list of tuples lst

lst = [('a', 10), ('b', 20)]

question
What is the quickest way to convert this to the series

i
a    10
b    20
Name: c, dtype: int64

attempts

pd.DataFrame(lst, list('ic')).set_index('i').c

This is inefficient.

like image 990
piRSquared Avatar asked Nov 28 '16 18:11

piRSquared


3 Answers

Two possible downsides to @Divakar's np.asarray(lst) - it converts everything to string, requiring Pandas to convert them back. And speed - making arrays is relatively expensive.

An alternative is to use the zip(*) idiom to 'transpose' the list:

In [65]: lst = [('a', 10), ('b', 20), ('j',1000)]
In [66]: zlst = list(zip(*lst))
In [67]: zlst
Out[67]: [('a', 'b', 'j'), (10, 20, 1000)]
In [68]: out = pd.Series(zlst[1], index = zlst[0])
In [69]: out
Out[69]: 
a      10
b      20
j    1000
dtype: int32

Note that my dtype is int, not object.

In [79]: out.values
Out[79]: array(['10', '20', '1000'], dtype=object)

So in the array case, Pandas doesn't convert the values back to integer; it leaves them as strings.

==============

My guess about timings is off - I don't have any feel for pandas Series creation times. Also the sample is too small to do meaningful timings:

In [71]: %%timeit
    ...: out=pd.Series(dict(lst))
1000 loops, best of 3: 305 µs per loop
In [72]: %%timeit
    ...: arr=np.array(lst)
    ...: out = pd.Series(arr[:,1], index=arr[:,0])
10000 loops, best of 3: 198 µs per loop
In [73]: %%timeit
    ...: zlst = list(zip(*lst))
    ...: out = pd.Series(zlst[1], index=zlst[0])
    ...: 
1000 loops, best of 3: 275 µs per loop

Or forcing the integer interpretation

In [85]: %%timeit
    ...: arr=np.array(lst)
    ...: out = pd.Series(arr[:,1], index=arr[:,0], dtype=int)
    ...: 
    ...: 
1000 loops, best of 3: 253 µs per loop
like image 152
hpaulj Avatar answered Oct 21 '22 13:10

hpaulj


The simplest way is pass your list of tuples as a dictionary:

>>> pd.Series(dict(lst))
a   10
b   20
dtype: int64
like image 37
Andy Avatar answered Oct 21 '22 12:10

Andy


One approach with NumPy assuming regular length list -

arr = np.asarray(lst)
out = pd.Series(arr[:,1], index = arr[:,0])

Sample run -

In [147]: lst = [('a', 10), ('b', 20), ('j',1000)]

In [148]: arr = np.asarray(lst)

In [149]: pd.Series(arr[:,1], index = arr[:,0])
Out[149]: 
a      10
b      20
j    1000
dtype: object
like image 3
Divakar Avatar answered Oct 21 '22 11:10

Divakar