Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataFrame object has no attribute 'col'

Tags:

apache-spark

In Spark: The Definitive Guide it says:

If you need to refer to a specific DataFrame’s column, you can use the col method on the specific DataFrame.

For example (in Python/Pyspark):

df.col("count")

However, when I run the latter code on a dataframe containing a column count I get the error 'DataFrame' object has no attribute 'col'. If I try column I get a similar error.

Is the book wrong, or how should I go about doing this?

I'm on Spark 2.3.1. The dataframe was created with the following:

df = spark.read.format("json").load("/Users/me/Documents/Books/Spark-The-Definitive-Guide/data/flight-data/json/2015-summary.json")
like image 806
Stephen Avatar asked Aug 12 '18 22:08

Stephen


People also ask

How do you solve a DataFrame object has no attribute?

Fix error while creating the dataframe If we use dataframe it will throw an error because there is no dataframe attribute in pandas. The method is DataFrame(). We need to pass any dictionary as an argument. Since the dictionary has a key, value pairs we can pass it as an argument.

What is withColumn in PySpark?

In PySpark, the withColumn() function is widely used and defined as the transformation function of the DataFrame which is further used to change the value, convert the datatype of an existing column, create the new column etc.


1 Answers

The book you're referring to describes Scala / Java API. In PySpark use []

df["count"]
like image 127
Aaron Makubuya Avatar answered Jan 02 '23 12:01

Aaron Makubuya