I have a column, which is of type array < string >
in spark tables. I am using SQL to query these spark tables. I wanted to convert the array < string >
into string
.
When used the below syntax:
select cast(rate_plan_code as string) as new_rate_plan from
customer_activity_searches group by rate_plan_code
rate_plan_code
column has following values:
["AAA","RACK","SMOBIX","SMOBPX"]
["LPCT","RACK"]
["LFTIN","RACK","SMOBIX","SMOBPX"]
["LTGD","RACK"]
["RACK","LEARLI","NHDP","LADV","LADV2"]
following are populated in the new_rate_plan
column:
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@e4273d9f
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@c1ade2ff
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@4f378397
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d1c81377
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@552f3317
Cast seem to work when I am converting decimal
to int
or int
to double
, but not in this case. Curious why the cast is not not working here.
Greatly appreciate your help.
Using StringBufferCreate an empty String Buffer object. Traverse through the elements of the String array using loop. In the loop, append each element of the array to the StringBuffer object using the append() method. Finally convert the StringBuffer object to string using the toString() method.
In order to convert array to a string, PySpark SQL provides a built-in function concat_ws() which takes delimiter of your choice as a first argument and array column (type Column) as the second argument. In order to use concat_ws() function, you need to import it using pyspark.
JavaScript Array toString() The toString() method returns a string with array values separated by commas. The toString() method does not change the original array.
To convert an array of numbers to an array of strings, call the map() method on the array, and on each iteration, convert the number to a string. The map method will return a new array containing only strings.
In Spark 2.1+ to do the concatenation of the values in a single Array column you can use the following:
concat_ws
standard functionmap
operatorUse concat_ws function.
concat_ws(sep: String, exprs: Column*): Column Concatenates multiple input string columns together into a single string column, using the given separator.
val solution = words.withColumn("codes", concat_ws(" ", $"rate_plan_code"))
scala> solution.show
+--------------+-----------+
| words| codes|
+--------------+-----------+
|[hello, world]|hello world|
+--------------+-----------+
Use map operator to have full control of what and how should be transformed.
map[U](func: (T) ⇒ U): Dataset[U] Returns a new Dataset that contains the result of applying func to each element.
scala> codes.show(false)
+---+---------------------------+
|id |rate_plan_code |
+---+---------------------------+
|0 |[AAA, RACK, SMOBIX, SMOBPX]|
+---+---------------------------+
val codesAsSingleString = codes.as[(Long, Array[String])]
.map { case (id, codes) => (id, codes.mkString(", ")) }
.toDF("id", "codes")
scala> codesAsSingleString.show(false)
+---+-------------------------+
|id |codes |
+---+-------------------------+
|0 |AAA, RACK, SMOBIX, SMOBPX|
+---+-------------------------+
scala> codesAsSingleString.printSchema
root
|-- id: long (nullable = false)
|-- codes: string (nullable = true)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With