In Spark: The Definitive Guide it says:
If you need to refer to a specific DataFrame’s column, you can use the col method on the specific DataFrame.
For example (in Python/Pyspark):
df.col("count")
However, when I run the latter code on a dataframe containing a column count
I get the error 'DataFrame' object has no attribute 'col'
. If I try column
I get a similar error.
Is the book wrong, or how should I go about doing this?
I'm on Spark 2.3.1. The dataframe was created with the following:
df = spark.read.format("json").load("/Users/me/Documents/Books/Spark-The-Definitive-Guide/data/flight-data/json/2015-summary.json")
Fix error while creating the dataframe If we use dataframe it will throw an error because there is no dataframe attribute in pandas. The method is DataFrame(). We need to pass any dictionary as an argument. Since the dictionary has a key, value pairs we can pass it as an argument.
In PySpark, the withColumn() function is widely used and defined as the transformation function of the DataFrame which is further used to change the value, convert the datatype of an existing column, create the new column etc.
The book you're referring to describes Scala / Java API. In PySpark use []
df["count"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With