E.g
sqlContext = SQLContext(sc) sample=sqlContext.sql("select Name ,age ,city from user") sample.show()
The above statement prints theentire table on terminal. But I want to access each row in that table using for
or while
to perform further calculations.
iterrows() This method is used to iterate the columns in the given PySpark DataFrame. It can be used with for loop and takes column names through the row iterator and index to iterate columns. Finally, it will display the rows according to the specified indices.
Collect() is the function, operation for RDD or Dataframe that is used to retrieve the data from the Dataframe. It is used useful in retrieving all the elements of the row from each partition in an RDD and brings that over the driver node/program.
rlike() is similar to like() but with regex (regular expression) support. It can be used on Spark SQL Query expression as well. It is similar to regexp_like() function of SQL.
You simply cannot. DataFrames
, same as other distributed data structures, are not iterable and can be accessed using only dedicated higher order function and / or SQL methods.
You can of course collect
for row in df.rdd.collect(): do_something(row)
or convert toLocalIterator
for row in df.rdd.toLocalIterator(): do_something(row)
and iterate locally as shown above, but it beats all purpose of using Spark.
To "loop" and take advantage of Spark's parallel computation framework, you could define a custom function and use map.
def customFunction(row): return (row.name, row.age, row.city) sample2 = sample.rdd.map(customFunction)
or
sample2 = sample.rdd.map(lambda x: (x.name, x.age, x.city))
The custom function would then be applied to every row of the dataframe. Note that sample2 will be a RDD
, not a dataframe.
Map may be needed if you are going to perform more complex computations. If you just need to add a simple derived column, you can use the withColumn
, with returns a dataframe.
sample3 = sample.withColumn('age2', sample.age + 2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With