I have Array[org.apache.spark.sql.Row]
returned by sqc.sql(sqlcmd).collect()
:
Array([10479,6,10], [8975,149,640], ...)
I can get the individual values:
scala> pixels(0)(0)
res34: Any = 10479
but they are Any
, not Int
.
How do I extract them as Int
?
The most obvious solution did not work:
scala> pixels(0).getInt(0)
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Int
PS. I can do pixels(0)(0).toString.toInt
or pixels(0).getString(0).toInt
, but they feel wrong...
Selecting rows using the filter() function The first option you have when it comes to filtering DataFrame rows is pyspark. sql. DataFrame. filter() function that performs filtering based on the specified conditions.
After creating the Dataframe, for retrieving all the data from the dataframe we have used the collect() action by writing df. collect(), this will return the Array of row type, in the below output shows the schema of the dataframe and the actual created Dataframe.
Using getInt
should work. Here is a contrived example as a proof of concept
import org.apache.spark.sql._
sc.parallelize(Array(1,2,3)).map(Row(_)).collect()(0).getInt(0)
This return 1
However,
sc.parallelize(Array("1","2","3")).map(Row(_)).collect()(0).getInt(0)
fails. So, it looks like it is coming in as a string and you will have to convert to an int manually.
sc.parallelize(Array("1","2","3")).map(Row(_)).collect()(0).getString(0).toInt
The documentation states that getInt
:
Returns the value of column i as an int. This function will throw an exception if the value is at i is not an integer, or if it is null.
So, it will not try to cast for you it seems
The Row
class (also see https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.package) has methods getInt(i: Int)
, getDouble(i: Int)
etc.
Also note that a SchemaRDD
is an RDD[Row]
plus a schema
that tells you which column has which data type. If you do .collect()
you will only get an Array[Row]
which does not have that information. So unless you know for sure what your data looks like, get the schema from the SchemaRDD
, then collect the rows and then access each field using the correct type information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With