I have a csv file with 3 columns, wherein each row of Column 3 has list of values in it. As you can see from the following table structure
Col1,Col2,Col3 1,a1,"['Proj1', 'Proj2']" 2,a2,"['Proj3', 'Proj2']" 3,a3,"['Proj4', 'Proj1']" 4,a4,"['Proj3', 'Proj4']" 5,a5,"['Proj5', 'Proj2']"
Whenever I try to read this csv, Col3 is getting read as str object and not as list. I tried to alter the dtype of that column to list but got "Attribute Error" as below
df = pd.read_csv("inputfile.csv") df.Col3.dtype = list AttributeError Traceback (most recent call last) <ipython-input-19-6f9ec76b1b30> in <module>() ----> 1 df.Col3.dtype = list C:\Python27\lib\site-packages\pandas\core\generic.pyc in __setattr__(self, name, value) 1953 object.__setattr__(self, name, value) 1954 except (AttributeError, TypeError): -> 1955 object.__setattr__(self, name, value) 1956 1957 #----------------------------------------------------------------------
AttributeError: can't set attribute
It would be really great if you can guide me how to go about it.
This can be done with the help of the pandas. read_csv() method. We will pass the first parameter as the CSV file and the second parameter the list of specific columns in the keyword usecols. It will return the data of the CSV file of specific columns.
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
tolist() you can convert pandas DataFrame Column to List. df['Courses'] returns the DataFrame column as a Series and then use values. tolist() to convert the column values to list.
You could use the ast lib:
from ast import literal_eval df.Col3 = df.Col3.apply(literal_eval) print(df.Col3[0][0]) Proj1
You can also do it when you create the dataframe from the csv, using converters
:
df = pd.read_csv("in.csv",converters={"Col3": literal_eval})
If you are sure the format is he same for all strings, stripping and splitting will be a lot faster:
df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").split(", ")})
But you will end up with the strings wrapped in quotes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With