Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas - Changing some column types to categories

I have fed the following CSV file into iPython Notebook:

public = pd.read_csv("categories.csv") public 

I've also imported pandas as pd, numpy as np and matplotlib.pyplot as plt. The following data types are present (the below is a summary - there are about 100 columns)

In [36]:   public.dtypes Out[37]:   parks          object            playgrounds    object            sports         object            roading        object                           resident       int64            children       int64 

I want to change 'parks', 'playgrounds', 'sports' and 'roading' to categories (they have likert scale responses in them - each column has different types of likert responses though (e.g. one has "strongly agree", "agree" etc., another has "very important", "important" etc.), leaving the remainder as int64.

I was able to create a separate dataframe - public1 - and change one of the columns to a category type using the following code:

public1 = {'parks': public.parks} public1 = public1['parks'].astype('category') 

However, when I tried to change a number at once using this code, I was unsuccessful:

public1 = {'parks': public.parks,            'playgrounds': public.parks} public1 = public1['parks', 'playgrounds'].astype('category') 

Notwithstanding this, I don't want to create a separate dataframe with just the categories columns. I would like them changed in the original dataframe.

I tried numerous ways to achieve this, then tried the code here: Pandas: change data type of columns...

public[['parks', 'playgrounds', 'sports', 'roading']] = public[['parks', 'playgrounds', 'sports', 'roading']].astype('category') 

and got the following error:

 NotImplementedError: > 1 ndim Categorical are not supported at this time 

Is there a way to change 'parks', 'playgrounds', 'sports', 'roading' to categories (so the likert scale responses can then be analysed), leaving 'resident' and 'children' (and the 94 other columns that are string, int + floats) untouched please? Or, is there a better way to do this? If anyone has any suggestions and/or feedback I would be most grateful....am slowly going bald ripping my hair out!

Many thanks in advance.

edited to add - I am using Python 2.7.

like image 677
gincard Avatar asked Mar 07 '15 02:03

gincard


People also ask

How pandas can convert a column to a category?

astype() method is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type. DataFrame. astype() function comes very handy when we want to case a particular column data type to another data type.

How do I change the datatype of a specific column in pandas?

You can change the column type in pandas dataframe using the df. astype() method. Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.

Can a pandas column have different data types?

Pandas uses other names for data types than Python, for example: object for textual data. A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype . Make conscious decisions about how to manage missing data.


1 Answers

Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:     public[col] = public[col].astype('category') 
like image 71
unutbu Avatar answered Sep 28 '22 06:09

unutbu