I'm importing a dataframe from a csv file, but cannot access some of it's columns by name. What's going on?
In more concrete terms:
> import pandas
> jobNames = pandas.read_csv("job_names.csv")
> print(jobNames)
job_id job_name num_judgements
0 933985 Foo 180
1 933130 Moo 175
2 933123 Goo 150
3 933094 Flue 120
4 933088 Tru 120
When I try to access the second column, I get an error:
> jobNames.job_name
AttributeError: 'DataFrame' object has no attribute 'job_name'
Strangely, I can access the job_id column thus:
> print(jobNames.job_id)
0 933985
1 933130
2 933123
3 933094
4 933088
Name: job_id, dtype: int64
Edit (to put the accepted answer in context):
It turns out that the first row of the csv file (with the column names) looks like this:
job_id, job_name, num_judgements
Note the spaces after each comma! Those spaces are retained in the column names:
> jobNames.columns[1]
' job_name'
which don't form valid python identifiers, so those columns aren't available as dataframe attributes. I can still access them dict-style:
> jobNames[' job_name']
You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.
To access a specific column in a dataframe by name, you use the $ operator in the form df$name where df is the name of the dataframe, and name is the name of the column you are interested in. This operation will then return the column you want as a vector.
How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.
When using pandas.read_csv
pass in skipinitialspace=True
flag to remove whitespace after CSV delimiters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With