Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a column of csv as dtype list using pandas?

Tags:

python

pandas

csv

I have a csv file with 3 columns, wherein each row of Column 3 has list of values in it. As you can see from the following table structure

Col1,Col2,Col3 1,a1,"['Proj1', 'Proj2']" 2,a2,"['Proj3', 'Proj2']" 3,a3,"['Proj4', 'Proj1']" 4,a4,"['Proj3', 'Proj4']" 5,a5,"['Proj5', 'Proj2']" 

Whenever I try to read this csv, Col3 is getting read as str object and not as list. I tried to alter the dtype of that column to list but got "Attribute Error" as below

df = pd.read_csv("inputfile.csv") df.Col3.dtype = list  AttributeError                            Traceback (most recent call last) <ipython-input-19-6f9ec76b1b30> in <module>() ----> 1 df.Col3.dtype = list  C:\Python27\lib\site-packages\pandas\core\generic.pyc in __setattr__(self,         name, value)    1953                     object.__setattr__(self, name, value)    1954             except (AttributeError, TypeError): -> 1955                 object.__setattr__(self, name, value)    1956     1957     #---------------------------------------------------------------------- 

AttributeError: can't set attribute

It would be really great if you can guide me how to go about it.

like image 708
nachiappanpl Avatar asked Sep 23 '15 14:09

nachiappanpl


People also ask

How do I read a column in a CSV file in Python?

This can be done with the help of the pandas. read_csv() method. We will pass the first parameter as the CSV file and the second parameter the list of specific columns in the keyword usecols. It will return the data of the CSV file of specific columns.

How do you find the Dtype of a column?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.

How can I get a list of column values in a DataFrame?

tolist() you can convert pandas DataFrame Column to List. df['Courses'] returns the DataFrame column as a Series and then use values. tolist() to convert the column values to list.


1 Answers

You could use the ast lib:

from ast import literal_eval   df.Col3 = df.Col3.apply(literal_eval) print(df.Col3[0][0]) Proj1 

You can also do it when you create the dataframe from the csv, using converters:

df = pd.read_csv("in.csv",converters={"Col3": literal_eval}) 

If you are sure the format is he same for all strings, stripping and splitting will be a lot faster:

 df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").split(", ")}) 

But you will end up with the strings wrapped in quotes

like image 85
Padraic Cunningham Avatar answered Sep 21 '22 21:09

Padraic Cunningham