I wanted to append new column to the "trainData", we are having 712 rows for both the data frames When I try to append a new column "Age" with .assign method throws me below error
What is the right way to append columns using dataFrames?
df = pd.read_csv("data/train.csv")
#Dropping the columns
df = df.drop(['Ticket','Cabin'], axis=1)
#Dropping the na columns
df = df.dropna()
print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])
trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
print("My train data",trainData)
trainData = trainData.assign(df["Age"])
Below is exception
File "<ipython-input-79-3f3ce0263545>", line 1, in <module>
runfile('C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py', wdir='C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network')
File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 688, in runfile
execfile(filename, namespace)
File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py", line 30, in <module>
trainData = trainData.assign(df["Age"])
TypeError: assign() takes 1 positional argument but 2 were given
I think you need define column name:
trainData = trainData.assign(Age=df["Age"])
Thank you piRSquared for comment, if indices aren't the same use:
trainData = trainData.assign(Age=df["Age"].values)
but then data are not aligned by index.
Sample:
import seaborn as sns
#sample df (similar like your data)
df = sns.load_dataset("titanic")
#capitalize columns names
df.columns = df.columns.str.capitalize()
print (df.head())
Survived Pclass Sex Age Sibsp Parch Fare Embarked Class \
0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third
Who Adult_male Deck Embark_town Alive Alone
0 man True NaN Southampton no False
1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
4 man True NaN Southampton no True
df = df.dropna()
#print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])
trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
#print("My train data",trainData.head())
trainData = trainData.assign(Age=df["Age"])
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
Another solution with join
:
trainData = trainData.join(df["Age"])
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
After some checking data it seems is possible add column Age
to subset:
trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3",
"Sex_female","Sex_male",
"Embarked_C","Embarked_Q","Embarked_S",
"Age"]]
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With