Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I turn a dataframe into a series of lists?

Tags:

I have had to do this several times and I'm always frustrated. I have a dataframe:

df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]], ['a', 'b'], ['A', 'B', 'C', 'D'])  print df     A  B  C  D a  1  2  3  4 b  5  6  7  8 

I want to turn df into:

pd.Series([[1, 2, 3, 4], [5, 6, 7, 8]], ['a', 'b'])  a    [1, 2, 3, 4] b    [5, 6, 7, 8] dtype: object 

I've tried

df.apply(list, axis=1) 

Which just gets me back the same df

What is a convenient/effective way to do this?

like image 745
Brian Avatar asked Aug 02 '16 06:08

Brian


People also ask

Can we convert DataFrame to list?

tolist method we can easily convert Pandas DataFrame into a list of 2d lists, by converting either each row or column. To do this first we have to create a list of tuples and then create a dataframe object 'new_val'.

How do you make a DataFrame list in Python?

itertuples() function and then we can append the data of each row to the end of the list. Output : Now we will use the DataFrame. itertuples() function to iterate over each of the row of the given Dataframe and construct a list out of the data of each row.

How do you turn a column in a DataFrame into a list?

Use the tolist() Method to Convert a Dataframe Column to a List. A column in the Pandas dataframe is a Pandas Series . So if we need to convert a column to a list, we can use the tolist() method in the Series . tolist() converts the Series of pandas data-frame to a list.


1 Answers

You can first convert DataFrame to numpy array by values, then convert to list and last create new Series with index from df if need faster solution:

print (pd.Series(df.values.tolist(), index=df.index)) a    [1, 2, 3, 4] b    [5, 6, 7, 8] dtype: object 

Timings with small DataFrame:

In [76]: %timeit (pd.Series(df.values.tolist(), index=df.index)) 1000 loops, best of 3: 295 µs per loop  In [77]: %timeit pd.Series(df.T.to_dict('list')) 1000 loops, best of 3: 685 µs per loop  In [78]: %timeit df.T.apply(tuple).apply(list) 1000 loops, best of 3: 958 µs per loop 

and with large:

from string import ascii_letters letters = list(ascii_letters) df = pd.DataFrame(np.random.choice(range(10), (52 ** 2, 52)),                   pd.MultiIndex.from_product([letters, letters]),                   letters)  In [71]: %timeit (pd.Series(df.values.tolist(), index=df.index)) 100 loops, best of 3: 2.06 ms per loop  In [72]: %timeit pd.Series(df.T.to_dict('list')) 1 loop, best of 3: 203 ms per loop  In [73]: %timeit df.T.apply(tuple).apply(list) 1 loop, best of 3: 506 ms per loop 
like image 99
jezrael Avatar answered Oct 20 '22 06:10

jezrael