I'm working in Python Pandas with a dataframe that got its column names prepended with Content.
. I can access a given column by stating df['Content.xyz']
. However, when I try to perform queries on it, e.g. df.query("Content.xyz not in @mylist")
, it throws an error that Content
is not a member of the dataframe.
How can I perform a query or other similar operations with a period prepended in the name?
Also, some of the series names have spaces in them. I'm assuming the solution for a column name with a period would be similar to a solution for a name containing a space.
You can get the Pandas DataFrame Column Names by using DataFrame. columns. values method and to get it as a list use tolist(). Each column in a Pandas DataFrame has a label/name that specifies what type of value it holds/represents.
Dot notation is a strict subset of the brackets. The brackets are also the canonical way to "select subsets of data" from all objects in python. strings, tuples, lists, dictionaries, numpy arrays all use brackets to select subsets of data. medium.com. Selecting Subsets of Data in Pandas: Part 1.
Pandas str. isalpha() method is used to check if all characters in each string in series are alphabetic(a-z/A-Z). Whitespace or any other character occurrence in the string would return false, but if there is a complete numeric value, then it would return NaN.
If you like to supply spaced columns name to pandas method like assign you can dictionarize your inputs. Show activity on this post. While the accepted answer works for column-specification when using dictionaries or []-selection, it does not generalise to other situations where one needs to refer to columns, such as the assign method:
In order to access PySpark/Spark DataFrame Column Name with a dot from wihtColumn () & select (), you just need to enclose the column name with backticks (`) Using Column Name with Dot on select (). Using Column Name with Dot on withColumn ()
As of 2021 (pandas v1.3), using backtick to quote your column works for dot as well. You cannot use the df.Content.xyz notation to access the column.
Using Column Name with Dot on select (). Using Column Name with Dot on withColumn () Have a column name with a dot leads us into confusion as in PySpark/Spark dot notation is used to refer to the nested column of the struct type. so if possible try to replace all column names with dot to underscore before processing it.
From the .query()
docs:
New in version 0.25.0.
You can refer to column names that contain spaces by surrounding them in backticks.
For example, if one of your columns is called
a a
and you want to sum it withb
, your query should be`a a` + b
.
So that answers the second part of your question; you can use backticks around the column name to escape whitespaces in its name.
Unfortunately this only works for spaces right now and not yet for dots or other special characters. It is currently an open issue which is being worked on (https://github.com/pandas-dev/pandas/issues/27017) and might be fixed soon in a next release.
You cannot use the df.Content.xyz
notation to access the column. You can only reference the columns using df['Content.xyz']
df = pd.DataFrame([1,2], columns = ['Content.xyz'])
print(df['Content.xyz'])
0 1
1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With