The first column in dfElements2
is an array
. I needed to select the first element (30002| 30005 | 30158 ...)
instead of the array instead the full array at the same time I select the latitude and longitude:
The data frame should be as follows:
+-----------------------------------+
| short_name | lat | lng
+-----------------------------------+
| 30002 |37.9796566|-1.1317041|
| 30005 |37.9868856|-1.1371011|
| 30158 | 37.941845|-1.0681918|
| 30006 |37.9971704|-1.0993366|
+-----------------------------------+
Could you tell me if there is any possibility to edit the command results.address_components.short_name
to access the array elements?
var DFResults2=DF_Google1.select(explode(DF_Google1 ("results"))).toDF("results")
var dfElements2=DFResults2.select("results.address_components.short_name","results.geometry.location.lat","results.geometry.location.lng")**
var dfElements3=dfElements2.select(explode(dfElements2("short_name"))).toDF("CP")
dfElements2.show()
dfElements2.printSchema()
+--------------------+----------+----------+
| short_name| lat| lng|
+--------------------+----------+----------+
|[30002, Murcia, M...|37.9796566|-1.1317041|
|[30005, Murcia, M...|37.9868856|-1.1371011|
|[30158, Murcia, M...| 37.941845|-1.0681918|
|[30006, Murcia, M...|37.9971704|-1.0993366|
|[30100, Murcia, M...|38.0256612|-1.1640968|
|[30009, Murcia, M...|37.9887492|-1.1496969|
|[30008, Murcia, M...|37.9928939|-1.1317041|
|[30007, Murcia, M...|38.0077579|-1.0993366|
|[Murcia, MU, Regi...|37.9922399|-1.1306544|
|[30004, Murcia, M...|37.9822582|-1.1365014|
|[30003, Murcia, M...|37.9850434|-1.1221111|
|[Murcia, MU, Regi...|37.9922399|-1.1306544|
|[30152, Murcia, M...|37.9569734|-1.1496969|
|[30012, Murcia, M...|37.9651726|-1.1233101|
|[30011, Murcia, M...|37.9759009|-1.1089244|
|[30001, Murcia, M...|37.9856424|-1.1287061|
|[30010, Murcia, M...| 37.970285|-1.1424989|
+--------------------+----------+----------+
root
|-- short_name: array (nullable = true)
| |-- element: string (containsNull = true)
|-- lat: double (nullable = true)
|-- lng: double (nullable = true)
Try This:
df.selectExpr("short_name[0]", "lat", "lng")
The selection of the nth item is actually a SQL expression, not a column. So you can also use expr
if you want to use .select
:
df.select(expr("short_name[0]"), expr("lat"), expr("lng"))
you can use the apply
method on column, or alternatively getItem
:
df.select(col("results.address_components.short_name")(0))
or
df.select(col("results.address_components.short_name").getItem(0))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With