Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert column of arrays of strings to strings?

I have a column, which is of type array < string > in spark tables. I am using SQL to query these spark tables. I wanted to convert the array < string > into string.

When used the below syntax:

select cast(rate_plan_code  as string) as new_rate_plan  from
customer_activity_searches group by rate_plan_code

rate_plan_code column has following values:

["AAA","RACK","SMOBIX","SMOBPX"] 
["LPCT","RACK"]
["LFTIN","RACK","SMOBIX","SMOBPX"]
["LTGD","RACK"] 
["RACK","LEARLI","NHDP","LADV","LADV2"]

following are populated in the new_rate_plan column:

org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@e4273d9f
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@c1ade2ff
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@4f378397
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d1c81377
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@552f3317

Cast seem to work when I am converting decimal to int or int to double, but not in this case. Curious why the cast is not not working here. Greatly appreciate your help.

like image 830
tjt Avatar asked Aug 12 '16 18:08

tjt


People also ask

How do I turn a string array into strings?

Using StringBufferCreate an empty String Buffer object. Traverse through the elements of the String array using loop. In the loop, append each element of the array to the StringBuffer object using the append() method. Finally convert the StringBuffer object to string using the toString() method.

How do you convert a column into a string in Pyspark?

In order to convert array to a string, PySpark SQL provides a built-in function concat_ws() which takes delimiter of your choice as a first argument and array column (type Column) as the second argument. In order to use concat_ws() function, you need to import it using pyspark.

Which method converts an array to string?

JavaScript Array toString() The toString() method returns a string with array values separated by commas. The toString() method does not change the original array.

How do you turn an array of numbers into a string?

To convert an array of numbers to an array of strings, call the map() method on the array, and on each iteration, convert the number to a string. The map method will return a new array containing only strings.


1 Answers

In Spark 2.1+ to do the concatenation of the values in a single Array column you can use the following:

  1. concat_ws standard function
  2. map operator
  3. a user-defined function (UDF)

concat_ws Standard Function

Use concat_ws function.

concat_ws(sep: String, exprs: Column*): Column Concatenates multiple input string columns together into a single string column, using the given separator.

val solution = words.withColumn("codes", concat_ws(" ", $"rate_plan_code"))
scala> solution.show
+--------------+-----------+
|         words|      codes|
+--------------+-----------+
|[hello, world]|hello world|
+--------------+-----------+

map Operator

Use map operator to have full control of what and how should be transformed.

map[U](func: (T) ⇒ U): Dataset[U] Returns a new Dataset that contains the result of applying func to each element.

scala> codes.show(false)
+---+---------------------------+
|id |rate_plan_code             |
+---+---------------------------+
|0  |[AAA, RACK, SMOBIX, SMOBPX]|
+---+---------------------------+

val codesAsSingleString = codes.as[(Long, Array[String])]
  .map { case (id, codes) => (id, codes.mkString(", ")) }
  .toDF("id", "codes")

scala> codesAsSingleString.show(false)
+---+-------------------------+
|id |codes                    |
+---+-------------------------+
|0  |AAA, RACK, SMOBIX, SMOBPX|
+---+-------------------------+

scala> codesAsSingleString.printSchema
root
 |-- id: long (nullable = false)
 |-- codes: string (nullable = true)
like image 101
Jacek Laskowski Avatar answered Sep 23 '22 21:09

Jacek Laskowski