Getting HDFS Location of Hive Table in Spark

Question

I am trying to parse out the Location from Hive partitioned table in Spark using this query:

val dsc_table = spark.sql("DESCRIBE FORMATTED data_db.part_table")

I was not able to find any query or any other way in Spark to specifically select Location column from this query.

SantoshK · Accepted Answer

df.inputFiles method in dataframe API will print file path. It returns a best-effort snapshot of the files that compose this DataFrame.

spark.read.table("DB.TableName").inputFiles
Array[String]: = Array(hdfs://test/warehouse/tablename)

notNull · Answer

You can also use .toDF method on desc formatted table then filter from dataframe.

DataframeAPI:

scala> :paste
spark.sql("desc formatted data_db.part_table")
.toDF //convert to dataframe will have 3 columns col_name,data_type,comment
.filter('col_name === "Location") //filter on colname
.collect()(0)(1)
.toString

Result:

String = hdfs://nn:8020/location/part_table

(or)

RDD Api:

scala> :paste
spark.sql("desc formatted data_db.part_table")
.collect()
.filter(r => r(0).equals("Location")) //filter on r(0) value
.map(r => r(1)) //get only the location
.mkString //convert as string
.split("8020")(1) //change the split based on your namenode port..etc

Result:

String = /location/part_table

Getting HDFS Location of Hive Table in Spark

Tags:

scala

apache-spark

apache-spark-sql

hive

hiveql

Vin

2 Answers

SantoshK

notNull

Recent Activity

Donate For Us

Getting HDFS Location of Hive Table in Spark

Tags:

scala

apache-spark

apache-spark-sql

hive

hiveql

Vin

2 Answers

SantoshK

notNull

Related questions

Recent Activity

Donate For Us