how to collect spark sql output to a file?

Name: Convert any SQL Query to Spark Dataframe
Uploaded: 2022-09-12 05:00:50
Description: how to collect spark sql output to a file?Below is my spark sql script which loads a file and uses

Question

Below is my spark sql script which loads a file and uses SQL on top of it, I want to collect the output from the sql query and write it to a file, not sure how to can anyone help.

   //import classes for sql
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

// createSchemaRDD is used to implicitly convert an RDD to a SchemaRDD.
import sqlContext.createSchemaRDD


//hdfs paths
val warehouse="hdfs://quickstart.cloudera/user/hive/warehouse/"
val customers_path=warehouse+"people/people.txt"
customers_path

//create rdd file called file
val file=sc.textFile(customers_path)

val schemaString="name age"

import org.apache.spark.sql._



val schema =
  StructType(
    schemaString.split(",").map(fieldName => StructField(fieldName, StringType, true)))

val rowRDD=file.map(_.split(",")).map(p => Row(p(0),p(1).trim))

val peopleSchemRDD=sqlContext.applySchema(rowRDD, schema)

// Register the SchemaRDD as a table.
peopleSchemRDD.registerTempTable("people")

// SQL statements can be run by using the sql methods provided by sqlContext.
sqlContext.sql("select count(*) from people").collect().foreach(println)
System.exit(0)

Daniel Darabos · Accepted Answer

If you just want to count the number of lines in a big file on HDFS and write it to another file:

import java.nio.file.{ Files, Paths }
val path = "hdfs://quickstart.cloudera/user/hive/warehouse/people/people.txt"
val rdd = sc.textFile(path)
val linesCount = rdd.count
Files.write(Paths.get("line_count.txt"), linesCount.toString.getBytes)

how to collect spark sql output to a file?

Tags:

scala

apache-spark

apache-spark-sql

sri hari kali charan Tummala

Video Answer

1 Answers

Daniel Darabos

Recent Activity

Donate For Us

how to collect spark sql output to a file?

Tags:

scala

apache-spark

apache-spark-sql

sri hari kali charan Tummala

Video Answer

1 Answers

Daniel Darabos

Related questions

Recent Activity

Donate For Us