Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set the value of a pandas column as list

Tags:

python

pandas

I want to set the value of a pandas column as a list of strings. However, my efforts to do so didn't succeed because pandas take the column value as an iterable and I get a: ValueError: Must have equal len keys and value when setting with an iterable.

Here is an MWE

>> df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
>> df
col1    col2
0   1   4
1   2   5
2   3   6

>> df['new_col'] = None
>> df.loc[df.col1 == 1, 'new_col'] = ['a', 'b']
ValueError: Must have equal len keys and value when setting with an iterable

I tried to set the dtype as list using df.new_col = df.new_col.astype(list) and that didn't work either.

I am wondering what would be the correct approach here.


EDIT

The answer provided here: Python pandas insert list into a cell using at didn't work for me either.

like image 754
Unni Avatar asked Sep 28 '18 09:09

Unni


People also ask

How do I turn a column into a list in Pandas?

values. tolist() you can convert pandas DataFrame Column to List. df['Courses'] returns the DataFrame column as a Series and then use values. tolist() to convert the column values to list.

How do I turn a column into a list in Python?

Use the tolist() Method to Convert a Dataframe Column to a List. A column in the Pandas dataframe is a Pandas Series . So if we need to convert a column to a list, we can use the tolist() method in the Series . tolist() converts the Series of pandas data-frame to a list.

Can a Pandas value be a list?

One problem you will always encounter is that Pandas will read your lists as strings, not as lists. This means that you can not even loop through the lists to count unique values or frequencies. Depending on how your lists are formatted in the dataframe, there is an easy or a more complex solution.

How do I assign a value to a column in Pandas?

You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.

How do I convert a column to a list in pandas?

Use the tolist () Method to Convert a Dataframe Column to a List A column in the Pandas dataframe is a Pandas Series. So if we need to convert a column to a list, we can use the tolist () method in the Series. tolist () converts the Series of pandas data-frame to a list.

How to set cell values in pandas Dataframe in Python?

In this article, we will discuss how to set cell values in Pandas DataFrame in Python. This method is used to set the value for existing value or set a new record. Here we are using loc () method to set the column value based on row index and column name

How to set column as index in pandas Dataframe?

Depending on your needs, you may use either of the two approaches below to set column as index in Pandas DataFrame: df.set_index ( ['column_1','column_2',...])

How do you return a list from a Dataframe in pandas?

return list_ To apply this to your dataframe, use this pseudo code: df [col] = df [col].apply (clean_alt_list) Note that in both cases, Pandas will still assign the series an “O” datatype, which is typically used for strings.


2 Answers

Not easy, one possible solution is create helper Series:

df.loc[df.col1 == 1, 'new_col'] = pd.Series([['a', 'b']] * len(df))
print (df)
   col1  col2 new_col
0     1     4  [a, b]
1     2     5     NaN
2     3     6     NaN

Another solution, if need set missing values to empty list too is use list comprehension:

#df['new_col'] = [['a', 'b'] if x == 1 else np.nan for x in df['col1']]

df['new_col'] = [['a', 'b'] if x == 1 else [] for x in df['col1']]
print (df)
   col1  col2 new_col
0     1     4  [a, b]
1     2     5      []
2     3     6      []

But then you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks.

like image 200
jezrael Avatar answered Sep 28 '22 09:09

jezrael


Don't do this.

Pandas was never designed to hold lists in series / columns. You can concoct expensive workarounds, but these are not recommended.

The main reason holding lists in series is not recommended is you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks. Your series will be of object dtype, which represents a sequence of pointers, much like list. You will lose benefits in terms of memory and performance, as well as access to optimized Pandas methods.

See also What are the advantages of NumPy over regular Python lists? The arguments in favour of Pandas are the same as for NumPy.

That said, since you are going against the purpose and design of Pandas, there are many who face the same problem and have asked similar questions:

  • Python pandas insert list into a cell
  • pandas: how to store a list in a dataframe?
  • Answer on this question
like image 30
jpp Avatar answered Sep 28 '22 08:09

jpp