Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Append columns pandas : TypeError: assign() takes 1 positional argument but 2 were given

Tags:

python

pandas

I wanted to append new column to the "trainData", we are having 712 rows for both the data frames When I try to append a new column "Age" with .assign method throws me below error

What is the right way to append columns using dataFrames?

df = pd.read_csv("data/train.csv")
#Dropping the columns  
df = df.drop(['Ticket','Cabin'], axis=1)
#Dropping the na columns
df = df.dropna() 
print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
print("My train data",trainData)
trainData = trainData.assign(df["Age"])

Below is exception

  File "<ipython-input-79-3f3ce0263545>", line 1, in <module>
    runfile('C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py', wdir='C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network')

  File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 688, in runfile
    execfile(filename, namespace)

  File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py", line 30, in <module>
    trainData = trainData.assign(df["Age"])

TypeError: assign() takes 1 positional argument but 2 were given
like image 878
Syed Rafi Avatar asked Mar 08 '23 05:03

Syed Rafi


1 Answers

I think you need define column name:

trainData = trainData.assign(Age=df["Age"])

Thank you piRSquared for comment, if indices aren't the same use:

trainData = trainData.assign(Age=df["Age"].values)

but then data are not aligned by index.

Sample:

import seaborn as sns
#sample df (similar like your data)
df = sns.load_dataset("titanic")
#capitalize columns names
df.columns = df.columns.str.capitalize()
print (df.head())
   Survived  Pclass     Sex   Age  Sibsp  Parch     Fare Embarked  Class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     Who  Adult_male Deck  Embark_town Alive  Alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes  False  
2  woman       False  NaN  Southampton   yes   True  
3  woman       False    C  Southampton   yes  False  
4    man        True  NaN  Southampton    no   True 

df = df.dropna() 
#print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
#print("My train data",trainData.head())

trainData = trainData.assign(Age=df["Age"])
print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C  \
1          1         0         0           1         0           1   
3          1         0         0           1         0           0   
6          1         0         0           0         1           0   
10         0         0         1           1         0           0   
11         1         0         0           1         0           0   

    Embarked_Q  Embarked_S   Age  
1            0           0  38.0  
3            0           1  35.0  
6            0           1  54.0  
10           0           1   4.0  
11           0           1  58.0  

Another solution with join:

trainData = trainData.join(df["Age"])
print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C  \
1          1         0         0           1         0           1   
3          1         0         0           1         0           0   
6          1         0         0           0         1           0   
10         0         0         1           1         0           0   
11         1         0         0           1         0           0   

    Embarked_Q  Embarked_S   Age  
1            0           0  38.0  
3            0           1  35.0  
6            0           1  54.0  
10           0           1   4.0  
11           0           1  58.0  

After some checking data it seems is possible add column Age to subset:

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3",
                              "Sex_female","Sex_male",
                              "Embarked_C","Embarked_Q","Embarked_S",
                              "Age"]]

print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C \
1          1         0         0           1         0           1   
3          1         0         0           1         0           0   
6          1         0         0           0         1           0   
10         0         0         1           1         0           0   
11         1         0         0           1         0           0   

    Embarked_Q  Embarked_S   Age  
1            0           0  38.0  
3            0           1  35.0  
6            0           1  54.0  
10           0           1   4.0  
11           0           1  58.0  
like image 188
jezrael Avatar answered Apr 27 '23 18:04

jezrael