In Spark: The Definitive Guide it says: <blockquote> If you need to refer to a specific DataFrame’s column, you can use the col method on the specific DataFrame. </blockquote> For example (in Python/Pyspark): <pre class="prettyprint"><code>df.col("count") </code></pre> However, when I run the latter code on a dataframe containing a column <code>count</code> I get the error <code>'DataFrame' object has no attribute 'col'</code>. If I try <code>column</code> I get a similar error. Is the book wrong, or how should I go about doing this? I'm on Spark 2.3.1. The dataframe was created with the following: <pre class="prettyprint"><code>df = spark.read.format("json").load("/Users/me/Documents/Books/Spark-The-Definitive-Guide/data/flight-data/json/2015-summary.json") </code></pre>

The book you're referring to describes Scala / Java API. In PySpark use <code>[]</code> <pre class="prettyprint"><code>df["count"] </code></pre>

DataFrame object has no attribute 'col'

Tags:

apache-spark

In Spark: The Definitive Guide it says:

If you need to refer to a specific DataFrame’s column, you can use the col method on the specific DataFrame.

For example (in Python/Pyspark):

df.col("count")

However, when I run the latter code on a dataframe containing a column count I get the error 'DataFrame' object has no attribute 'col'. If I try column I get a similar error.

Is the book wrong, or how should I go about doing this?

I'm on Spark 2.3.1. The dataframe was created with the following:

df = spark.read.format("json").load("/Users/me/Documents/Books/Spark-The-Definitive-Guide/data/flight-data/json/2015-summary.json")

806

asked Aug 12 '18 22:08

Stephen

1 Answers

The book you're referring to describes Scala / Java API. In PySpark use []

df["count"]

127

answered Jan 02 '23 12:01

Aaron Makubuya

Related questions
                            
                                How can I select a non-sequential subset elements from an array using Scala and Spark?
                            
                                How to install Apache Zeppelin on existing Apache Spark standalone cluster
                            
                                IntelliJ Idea 14: cannot resolve symbol spark
                            
                                How to print rdd in python in spark
                            
                                How to sort an RDD of tuples with 5 elements in Spark Scala?
                            
                                Spark ExecutorLostFailure
                            
                                Stack Overflow while processing several columns with a UDF
                            
                                first_value windowing function in pyspark
                            
                                Advantage of setting name to RDD
                            
                                Copy schema from one dataframe to another dataframe
                            
                                In Apache Spark 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)?
                            
                                check if a row value is null in spark dataframe
                            
                                Replace all ":" with "_" in Spark dataframe [duplicate]
                            
                                Querying json object in dataframe using Pyspark
                            
                                Scala & Spark: Cast multiple columns at once
                            
                                How to parse CSV file with UTF-8 encoding?
                            
                                Spark on YARN + Secured hbase
                            
                                How to use --num-executors option with spark-submit?
                            
                                How to Generate Parquet File Using Pure Java (Including Date & Decimal Types) And Upload to S3 [Windows] (No HDFS)
                            
                                Pyspark 'NoneType' object has no attribute '_jvm' error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With