I have a dataframe as given below
df =
index data1 data2
0 20 120
1 30 456
2 40 34
How to combine two columns in above df into a single list such that first row elements come first and then second row.
My expected output
my_list = [20,120,30,456,40,34]
My code:
list1 = df['data1'].tolist()
list2 = df['data2'].tolist()
my_list = list1+list2
This did not work?
The underlying numpy array is organized array([[row1], [row2], ..., [rowN]])
so we can ravel
it, which should be very fast.
df[['data1', 'data2']].to_numpy().ravel().tolist()
#[20, 120, 30, 456, 40, 34]
Because I was interested: Here are all the proposed methods, plus another with chain, and some timings for making your output from 2 columns vs the length of the DataFrame.
import perfplot
import pandas as pd
import numpy as np
from itertools import chain
perfplot.show(
setup=lambda n: pd.DataFrame(np.random.randint(1, 10, (n, 2))),
kernels=[
lambda df: df[[0, 1]].to_numpy().ravel().tolist(),
lambda df: [x for i in zip(df[0], df[1]) for x in i],
lambda df: [*chain.from_iterable(df[[0,1]].to_numpy())],
lambda df: df[[0,1]].stack().tolist() # proposed by @anky_91
],
labels=['ravel', 'zip', 'chain', 'stack'],
n_range=[2 ** k for k in range(20)],
equality_check=np.allclose,
xlabel="len(df)"
)
That doesn't work since it won't add by same index, use the below list comprehension:
print([x for i in zip(df['data1'], df['data2']) for x in i])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With