I have fed the following CSV file into iPython Notebook:
public = pd.read_csv("categories.csv") public
I've also imported pandas as pd, numpy as np and matplotlib.pyplot as plt. The following data types are present (the below is a summary - there are about 100 columns)
In [36]: public.dtypes Out[37]: parks object playgrounds object sports object roading object resident int64 children int64
I want to change 'parks', 'playgrounds', 'sports' and 'roading' to categories (they have likert scale responses in them - each column has different types of likert responses though (e.g. one has "strongly agree", "agree" etc., another has "very important", "important" etc.), leaving the remainder as int64.
I was able to create a separate dataframe - public1 - and change one of the columns to a category type using the following code:
public1 = {'parks': public.parks} public1 = public1['parks'].astype('category')
However, when I tried to change a number at once using this code, I was unsuccessful:
public1 = {'parks': public.parks, 'playgrounds': public.parks} public1 = public1['parks', 'playgrounds'].astype('category')
Notwithstanding this, I don't want to create a separate dataframe with just the categories columns. I would like them changed in the original dataframe.
I tried numerous ways to achieve this, then tried the code here: Pandas: change data type of columns...
public[['parks', 'playgrounds', 'sports', 'roading']] = public[['parks', 'playgrounds', 'sports', 'roading']].astype('category')
and got the following error:
NotImplementedError: > 1 ndim Categorical are not supported at this time
Is there a way to change 'parks', 'playgrounds', 'sports', 'roading' to categories (so the likert scale responses can then be analysed), leaving 'resident' and 'children' (and the 94 other columns that are string, int + floats) untouched please? Or, is there a better way to do this? If anyone has any suggestions and/or feedback I would be most grateful....am slowly going bald ripping my hair out!
Many thanks in advance.
edited to add - I am using Python 2.7.
astype() method is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type. DataFrame. astype() function comes very handy when we want to case a particular column data type to another data type.
You can change the column type in pandas dataframe using the df. astype() method. Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.
Pandas uses other names for data types than Python, for example: object for textual data. A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype . Make conscious decisions about how to manage missing data.
Sometimes, you just have to use a for-loop:
for col in ['parks', 'playgrounds', 'sports', 'roading']: public[col] = public[col].astype('category')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With