Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Reference a Pandas Column that has a dot in the name

Tags:

python

pandas

I'm working in Python Pandas with a dataframe that got its column names prepended with Content.. I can access a given column by stating df['Content.xyz']. However, when I try to perform queries on it, e.g. df.query("Content.xyz not in @mylist"), it throws an error that Content is not a member of the dataframe.

How can I perform a query or other similar operations with a period prepended in the name?

Also, some of the series names have spaces in them. I'm assuming the solution for a column name with a period would be similar to a solution for a name containing a space.

like image 325
Michael James Avatar asked Dec 04 '19 00:12

Michael James


People also ask

How do I reference a column name in Pandas?

You can get the Pandas DataFrame Column Names by using DataFrame. columns. values method and to get it as a list use tolist(). Each column in a Pandas DataFrame has a label/name that specifies what type of value it holds/represents.

What does dot mean in Pandas?

Dot notation is a strict subset of the brackets. The brackets are also the canonical way to "select subsets of data" from all objects in python. strings, tuples, lists, dictionaries, numpy arrays all use brackets to select subsets of data. medium.com. Selecting Subsets of Data in Pandas: Part 1.

How do you check if a column has special characters in Pandas?

Pandas str. isalpha() method is used to check if all characters in each string in series are alphabetic(a-z/A-Z). Whitespace or any other character occurrence in the string would return false, but if there is a complete numeric value, then it would return NaN.

Is it possible to dictionarize the name of columns in pandas?

If you like to supply spaced columns name to pandas method like assign you can dictionarize your inputs. Show activity on this post. While the accepted answer works for column-specification when using dictionaries or []-selection, it does not generalise to other situations where one needs to refer to columns, such as the assign method:

How to access pyspark/spark dataframe column name with a dot?

In order to access PySpark/Spark DataFrame Column Name with a dot from wihtColumn () & select (), you just need to enclose the column name with backticks (`) Using Column Name with Dot on select (). Using Column Name with Dot on withColumn ()

Is it possible to use backtick to quote a column in pandas?

As of 2021 (pandas v1.3), using backtick to quote your column works for dot as well. You cannot use the df.Content.xyz notation to access the column.

How do you use a dot in a column name?

Using Column Name with Dot on select (). Using Column Name with Dot on withColumn () Have a column name with a dot leads us into confusion as in PySpark/Spark dot notation is used to refer to the nested column of the struct type. so if possible try to replace all column names with dot to underscore before processing it.


Video Answer


2 Answers

From the .query() docs:

New in version 0.25.0.

You can refer to column names that contain spaces by surrounding them in backticks.

For example, if one of your columns is called a a and you want to sum it with b, your query should be `a a` + b.

So that answers the second part of your question; you can use backticks around the column name to escape whitespaces in its name.

Unfortunately this only works for spaces right now and not yet for dots or other special characters. It is currently an open issue which is being worked on (https://github.com/pandas-dev/pandas/issues/27017) and might be fixed soon in a next release.

like image 90
jorijnsmit Avatar answered Oct 10 '22 05:10

jorijnsmit


You cannot use the df.Content.xyz notation to access the column. You can only reference the columns using df['Content.xyz']

df = pd.DataFrame([1,2], columns = ['Content.xyz'])
print(df['Content.xyz'])

0    1
1    2
like image 1
Brandon Avatar answered Oct 10 '22 06:10

Brandon