having a dataframe, I want to update subset of columns with a series of same length as number of columns being updated:
>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
>>> df
col1 col2
0 1 0
1 2 4
2 4 4
3 4 0
4 0 0
5 3 1
>>> df.loc[:,['col1','col2']] = pd.Series([0,1])
...
ValueError: shape mismatch: value array of shape (6,) could not be broadcast to indexing result of shape (2,6)
it fails, however, I am able to do the same thing using list:
>>> df.loc[:,['col1','col2']] = list(pd.Series([0,1]))
>>> df
col1 col2
0 0 1
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
could you please help me to understand, why updating with series fails? do I have to perform some particular reshaping?
update() function has successfully updated the values in the original series object from the passed series object. Output : Now we will use Series. update() function to update the values identified the passed indexed in the given Series object.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
When assigning with a pandas object, pandas treats the assignment more "rigorously". A pandas to pandas assignment must pass stricter protocols. Only when you turn it to a list (or equivalently pd.Series([0, 1]).values
) did pandas give in and allow you to assign in the way you'd imagine it should work.
That higher standard of assignment requires that the indices line up as well, so even if you had the right shape, it still wouldn't have worked without the correct indices.
df.loc[:, ['col1', 'col2']] = pd.DataFrame([[0, 1] for _ in range(6)])
df
df.loc[:, ['col1', 'col2']] = pd.DataFrame([[0, 1] for _ in range(6)], columns=['col1', 'col2'])
df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With