Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to concatenate pandas column with list values into one list?

I have a dataframe with one of its column having a list at each index. I want to concatenate these lists into one list. I am using

ids = df.loc[0:index, 'User IDs'].values.tolist()

However, this results in ['[1,2,3,4......]'] which is a string. Somehow each value in my list column is type str. I have tried converting using list(), literal_eval() but it does not work. The list() converts each element within a list into a string e.g. from [12,13,14...] to ['['1'',','2',','1',',','3'......]'].

How to concatenate pandas column with list values into one list? Kindly help out, I am banging my head on it for several hours.

like image 432
SarwatFatimaM Avatar asked Mar 20 '17 17:03

SarwatFatimaM


People also ask

How do I combine column values in pandas?

To start, you may use this template to concatenate your column values (for strings only): df['New Column Name'] = df['1st Column Name'] + df['2nd Column Name'] + ... Notice that the plus symbol ('+') is used to perform the concatenation.

How do you append a list to a DataFrame column in Python?

By using df. loc[index]=list you can append a list as a row to the DataFrame at a specified Index, In order to add at the end get the index of the last record using len(df) function. The below example adds the list ["Hyperion",27000,"60days",2000] to the end of the pandas DataFrame.


2 Answers

consider the dataframe df

df = pd.DataFrame(dict(col1=[[1, 2, 3]] * 2))
print(df)

        col1
0  [1, 2, 3]
1  [1, 2, 3]

pandas simplest answer

df.col1.sum()

[1, 2, 3, 1, 2, 3]

numpy.concatenate

np.concatenate(df.col1)

array([1, 2, 3, 1, 2, 3])

chain

from itertools import chain

list(chain(*df.col1))

[1, 2, 3, 1, 2, 3]

response to comments:
I think your columns are strings

from ast import literal_eval

df.col1 = df.col1.apply(literal_eval)

If instead your column is string values that look like lists

df = pd.DataFrame(dict(col1=['[1, 2, 3]'] * 2))
print(df)  # will look the same

        col1
0  [1, 2, 3]
1  [1, 2, 3]

However pd.Series.sum does not work the same.

df.col1.sum()

'[1, 2, 3][1, 2, 3]'

We need to evaluate the strings as if they are literals and then sum

df.col1.apply(literal_eval).sum()

[1, 2, 3, 1, 2, 3]
like image 146
piRSquared Avatar answered Oct 13 '22 01:10

piRSquared


If you want to flatten the list this is pythonic way to do it:

import pandas as pd

df = pd.DataFrame({'A': [[1,2,3], [4,5,6]]})

a = df['A'].tolist()
a = [i for j in a for i in j]
print a
like image 35
zipa Avatar answered Oct 13 '22 00:10

zipa