I'm importing a dataframe from a csv file, but cannot access some of it's columns by name. What's going on? In more concrete terms: <pre class="prettyprint"><code>> import pandas > jobNames = pandas.read_csv("job_names.csv") > print(jobNames) job_id job_name num_judgements 0 933985 Foo 180 1 933130 Moo 175 2 933123 Goo 150 3 933094 Flue 120 4 933088 Tru 120 </code></pre> When I try to access the second column, I get an error: <pre class="prettyprint"><code>> jobNames.job_name </code></pre> <blockquote> AttributeError: 'DataFrame' object has no attribute 'job_name' </blockquote> Strangely, I can access the job_id column thus: <pre class="prettyprint"><code>> print(jobNames.job_id) 0 933985 1 933130 2 933123 3 933094 4 933088 Name: job_id, dtype: int64 </code></pre> Edit (to put the accepted answer in context): It turns out that the first row of the csv file (with the column names) looks like this: <pre class="prettyprint"><code>job_id, job_name, num_judgements </code></pre> Note the spaces after each comma! Those spaces are retained in the column names: <pre class="prettyprint"><code>> jobNames.columns[1] ' job_name' </code></pre> which don't form valid python identifiers, so those columns aren't available as dataframe attributes. I can still access them dict-style: <pre class="prettyprint"><code>> jobNames[' job_name'] </code></pre>

When using <code>pandas.read_csv</code> pass in <code>skipinitialspace=True</code> flag to remove whitespace after CSV delimiters.

Can't access dataframe columns

Tags:

python

pandas

dataframe

csv

removing-whitespace

I'm importing a dataframe from a csv file, but cannot access some of it's columns by name. What's going on?

In more concrete terms:

> import pandas

> jobNames = pandas.read_csv("job_names.csv")
> print(jobNames)

   job_id   job_name   num_judgements
0  933985        Foo              180
1  933130        Moo              175
2  933123        Goo              150
3  933094       Flue              120
4  933088        Tru              120

When I try to access the second column, I get an error:

> jobNames.job_name

AttributeError: 'DataFrame' object has no attribute 'job_name'

Strangely, I can access the job_id column thus:

> print(jobNames.job_id)

0    933985
1    933130
2    933123
3    933094
4    933088
Name: job_id, dtype: int64

Edit (to put the accepted answer in context):

It turns out that the first row of the csv file (with the column names) looks like this:

job_id, job_name, num_judgements

Note the spaces after each comma! Those spaces are retained in the column names:

> jobNames.columns[1]

' job_name'

which don't form valid python identifiers, so those columns aren't available as dataframe attributes. I can still access them dict-style:

> jobNames[' job_name']

812

asked Aug 11 '16 10:08

drevicko

1 Answers

When using pandas.read_csv pass in skipinitialspace=True flag to remove whitespace after CSV delimiters.

170

answered Oct 05 '22 18:10

Maxim Egorushkin

Related questions
                            
                                Python multiple variables on left side of assignment operator
                            
                                Function Approximation: How is tile coding different from highly discretized state space?
                            
                                Vectorized implementation to create multiple rows from a single row in pandas dataframe
                            
                                ForeignKey with multiple models
                            
                                Python "Too many indices for array"
                            
                                How to change tab size in a specific file in Pycharm
                            
                                Is looping through a generator in a loop over that same generator safe in Python?
                            
                                Find the column names which have top 3 largest values for each row
                            
                                How can I change the intensity of a colormap in matplotlib?
                            
                                Plotting hsv values with imshow
                            
                                RabbitMq - pika - python - Dropping messages when published
                            
                                Multiplication of two positive numbers gives a negative output in Python 3
                            
                                Appending to a Pandas Dataframe From a pd.read_sql Output
                            
                                Guided filter in OpenCV and Python
                            
                                stack all levels of a MultiIndex
                            
                                How to reindex a pandas DataFrame after concatenation
                            
                                Is there a pythonic way to process tree-structured dict keys?
                            
                                Pandas: Delete rows based on multiple columns values
                            
                                How can i find all ydl_opts
                            
                                What is the difference between Property Based Testing and Mutation testing?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With