I am using python2.7 and pandas 0.11.0.
I try to fill a column of a dataframe using DataFrame.apply(func). The func() function is supposed to return a numpy array (1x3).
import pandas as pd
import numpy as np
df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
print(df)
A B C
0 0.910142 0.788300 0.114164
1 -0.603282 -0.625895 2.843130
2 1.823752 -0.091736 -0.107781
3 0.447743 -0.163605 0.514052
The function used for testing purpose:
def test(row):
# some complex calc here
# based on the values from different columns
return np.array((1,2,3))
df['D'] = df.apply(test, axis=1)
[...]
ValueError: Wrong number of items passed 1, indices imply 3
The funny is that when I create the dataframe from scratch, it works pretty well, and returns as expected:
dic = {'A': {0: 0.9, 1: -0.6, 2: 1.8, 3: 0.4},
'C': {0: 0.1, 1: 2.8, 2: -0.1, 3: 0.5},
'B': {0: 0.7, 1: -0.6, 2: -0.1, 3: -0.1},
'D': {0:np.array((1,2,3)),
1:np.array((1,2,3)),
2:np.array((1,2,3)),
3:np.array((1,2,3))}}
df= pd.DataFrame(dic)
print(df)
A B C D
0 0.9 0.7 0.1 [1, 2, 3]
1 -0.6 -0.6 2.8 [1, 2, 3]
2 1.8 -0.1 -0.1 [1, 2, 3]
3 0.4 -0.1 0.5 [1, 2, 3]
Thanks in advance
For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index , Series , or DataFrame .
fill() method is used to fill the numpy array with a scalar value. If we have to initialize a numpy array with an identical value then we use numpy. ndarray. fill().
To store a numpy array into the cell of the dataframe, we will pass the name of the cell in square brackets [] and assign a numpy array to this cell. To add rows to dataframe 1. Add numpy array to Pandas Dataframe as column
Software Tutorials The pandas fillna () function is useful for filling in missing values in columns of a pandas DataFrame. This tutorial provides several examples of how to use this function to fill in missing values for multiple columns of the following pandas DataFrame:
That’s all for numpy.fill () it’s a very simple, very powerful, and very useful function. With numpy.full () we can combine the two lines of code from the last section (one line to create an empty array, and one line to fill the array with a value) into a single function.
You can use the following methods with fillna () to replace NaN values in specific columns of a pandas DataFrame: This tutorial explains how to use this function with the following pandas DataFrame:
If you try to return multiple values from the function that is passed to apply
, and the DataFrame you call the apply
on has the same number of item along the axis (in this case columns) as the number of values you returned, Pandas will create a DataFrame from the return values with the same labels as the original DataFrame. You can see this if you just do:
>>> def test(row):
return [1, 2, 3]
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df.apply(test, axis=1)
A B C
0 1 2 3
1 1 2 3
2 1 2 3
3 1 2 3
And that is why you get the error, since you cannot assign a DataFrame to DataFrame column.
If you return any other number of values, it will return just a series object, that can be assigned:
>>> def test(row):
return [1, 2]
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df.apply(test, axis=1)
0 [1, 2]
1 [1, 2]
2 [1, 2]
3 [1, 2]
>>> df['D'] = df.apply(test, axis=1)
>>> df
A B C D
0 0.333535 0.209745 -0.972413 [1, 2]
1 0.469590 0.107491 -1.248670 [1, 2]
2 0.234444 0.093290 -0.853348 [1, 2]
3 1.021356 0.092704 -0.406727 [1, 2]
I'm not sure why Pandas does this, and why it does it only when the return value is a list
or an ndarray
, since it won't do it if you return a tuple
:
>>> def test(row):
return (1, 2, 3)
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df['D'] = df.apply(test, axis=1)
>>> df
A B C D
0 0.121136 0.541198 -0.281972 (1, 2, 3)
1 0.569091 0.944344 0.861057 (1, 2, 3)
2 -1.742484 -0.077317 0.181656 (1, 2, 3)
3 -1.541244 0.174428 0.660123 (1, 2, 3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With