Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas column access w/column names containing spaces

Tags:

string

pandas

If I import or create a pandas column that contains no spaces, I can access it as such:

from pandas import DataFrame  df1 = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],                  'data1': range(7)})  df1.data1 

which would return that series for me. If, however, that column has a space in its name, it isn't accessible via that method:

from pandas import DataFrame  df2 = DataFrame({'key': ['a','b','d'],                  'data 2': range(3)})  df2.data 2      # <--- not the droid I'm looking for. 

I know I can access it using .xs():

df2.xs('data 2', axis=1) 

There's got to be another way. I've googled it like mad and can't think of any other way to google it. I've read all 96 entries here on SO that contain "column" and "string" and "pandas" and could find no previous answer. Is this the only way, or is there something better?

like image 731
Brad Fair Avatar asked Dec 07 '12 04:12

Brad Fair


People also ask

How do you reference column names with spaces in pandas?

You can refer to column names that contain spaces or operators by surrounding them in backticks. This way you can also escape names that start with a digit, or those that are a Python keyword.

How do I remove spaces from a DataFrame column name?

To strip whitespaces from column names, you can use str. strip, str. lstrip and str. rstrip.

How do I get a list of pandas column names?

You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.


2 Answers

Old post, but may be interesting: an idea (which is destructive, but does the job if you want it quick and dirty) is to rename columns using underscores:

df1.columns = [c.replace(' ', '_') for c in df1.columns] 
like image 87
AkiRoss Avatar answered Sep 21 '22 03:09

AkiRoss


I think the default way is to use the bracket method instead of the dot notation.

import pandas as pd  df1 = pd.DataFrame({     'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],     'dat a1': range(7) })  df1['dat a1'] 

The other methods, like exposing it as an attribute are more for convenience.

like image 39
Rutger Kassies Avatar answered Sep 20 '22 03:09

Rutger Kassies