I have had to do this several times and I'm always frustrated. I have a dataframe:
df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]], ['a', 'b'], ['A', 'B', 'C', 'D']) print df A B C D a 1 2 3 4 b 5 6 7 8
I want to turn df
into:
pd.Series([[1, 2, 3, 4], [5, 6, 7, 8]], ['a', 'b']) a [1, 2, 3, 4] b [5, 6, 7, 8] dtype: object
I've tried
df.apply(list, axis=1)
Which just gets me back the same df
What is a convenient/effective way to do this?
tolist method we can easily convert Pandas DataFrame into a list of 2d lists, by converting either each row or column. To do this first we have to create a list of tuples and then create a dataframe object 'new_val'.
itertuples() function and then we can append the data of each row to the end of the list. Output : Now we will use the DataFrame. itertuples() function to iterate over each of the row of the given Dataframe and construct a list out of the data of each row.
Use the tolist() Method to Convert a Dataframe Column to a List. A column in the Pandas dataframe is a Pandas Series . So if we need to convert a column to a list, we can use the tolist() method in the Series . tolist() converts the Series of pandas data-frame to a list.
You can first convert DataFrame
to numpy array
by values
, then convert to list and last create new Series
with index from df
if need faster solution:
print (pd.Series(df.values.tolist(), index=df.index)) a [1, 2, 3, 4] b [5, 6, 7, 8] dtype: object
Timings with small DataFrame:
In [76]: %timeit (pd.Series(df.values.tolist(), index=df.index)) 1000 loops, best of 3: 295 µs per loop In [77]: %timeit pd.Series(df.T.to_dict('list')) 1000 loops, best of 3: 685 µs per loop In [78]: %timeit df.T.apply(tuple).apply(list) 1000 loops, best of 3: 958 µs per loop
and with large:
from string import ascii_letters letters = list(ascii_letters) df = pd.DataFrame(np.random.choice(range(10), (52 ** 2, 52)), pd.MultiIndex.from_product([letters, letters]), letters) In [71]: %timeit (pd.Series(df.values.tolist(), index=df.index)) 100 loops, best of 3: 2.06 ms per loop In [72]: %timeit pd.Series(df.T.to_dict('list')) 1 loop, best of 3: 203 ms per loop In [73]: %timeit df.T.apply(tuple).apply(list) 1 loop, best of 3: 506 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With