Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set Pandas column values to an array

I have the following problem: I have a dataframe like this one:

   col1   col2   col3
0   2       5      4
1   4       3      5
2   6       2      7 

Now I have an array for example a = [5,5,5] and i want to insert this array in col3 but only in specific rows (let's say 0 and 2) and obtain something like that:

   col1   col2   col3
0   2       5    [5,5,5]
1   4       3      5
2   6       2    [5,5,5]

The problem is that when I try to do:

 zip_df.at[[0,2],'col3'] = a 

I receive the following error ValueError: Must have equal len keys and value when setting with an ndarray. How can I solve this problem?

like image 212
Marco Miglionico Avatar asked Dec 01 '18 23:12

Marco Miglionico


People also ask

How do you assign an array to a Dataframe column?

To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.

How do I turn a data frame into an array?

You can convert select columns of a dataframe into an numpy array using the to_numpy() method by passing the column subset of the dataframe.

Can a Pandas Dataframe contain an array?

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index , Series , or DataFrame . For some data types, pandas extends NumPy's type system.


1 Answers

What you're attempting is not recommended.1 Pandas is not designed to hold list in series. Having said this, you can define a series explicitly and assign via update or loc. Note at is used to get or set a single value only, not multiple values as in your case.

a = [5, 5, 5]
indices = [0, 2]

df['col3'].update(pd.Series([a]*len(indices), index=indices))

# alternative:
# df.loc[indices, 'col3'] = pd.Series([a]*len(indices), index=indices)

print(df)

   col1  col2       col3
0     2     5  [5, 5, 5]
1     4     3          5
2     6     2  [5, 5, 5]

1 For more information (source):

Don't do this. Pandas was never designed to hold lists in series / columns. You can concoct expensive workarounds, but these are not recommended.

The main reason holding lists in series is not recommended is you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks. Your series will be of object dtype, which represents a sequence of pointers, much like list. You will lose benefits in terms of memory and performance, as well as access to optimized Pandas methods.

See also What are the advantages of NumPy over regular Python lists? The arguments in favour of Pandas are the same as for NumPy.

like image 137
jpp Avatar answered Sep 16 '22 22:09

jpp