I have an xml document that looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Position>
<Search>
<Location>
<Region>OH</Region>
<Country>us</Country>
<Longitude>-816071</Longitude>
<Latitude>415051</Latitude>
</Location>
</Search>
</Position>
I read it into a dataframe:
df = sqlContext.read.format('com.databricks.spark.xml').options(rowTag='Position').load('1.xml')
I can see 1 column:
df.columns
['Search']
print df.select("Search")
DataFrame[Search: struct<Location:struct<Country:string,Latitude:bigint,Longitude:bigint,Region:string>>]
How do I access the nested columns. ex Location.Region?
you can do something like below:
df.select("Search.Location.*").show()
output:
+-------+--------+---------+------+
|Country|Latitude|Longitude|Region|
+-------+--------+---------+------+
| us| 415051| -816071| OH|
+-------+--------+---------+------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With