How to take only 2 data from arraytype
column in Spark Scala?
I got the data like val df = spark.sqlContext.sql("select col1, col2 from test_tbl")
.
I have data like following:
col1 | col2
--- | ---
a | [test1,test2,test3,test4,.....]
b | [a1,a2,a3,a4,a5,.....]
I want to get data like following:
col1| col2
----|----
a | test1,test2
b | a1,a2
When I am doing df.withColumn("test", col("col2").take(5))
it is not working. It give this error:
value take is not a member of org.apache.spark.sql.ColumnName
How can I get the data in above order?
RLIKE searches a string for a regular expression pattern.
In pyspark SQL, the split() function converts the delimiter separated String to an Array. It is done by splitting the string based on delimiters like spaces, commas, and stack them into an array. This function returns pyspark. sql.
rlike() function can be used to derive a new Spark/PySpark DataFrame column from an existing column, filter data by matching it with regular expressions, use with conditions, and many more.
Inside withColumn
you can call udf getPartialstring
for that you can use slice
or take
method like below example snippet untested.
import sqlContext.implicits._
import org.apache.spark.sql.functions._
val getPartialstring = udf((array : Seq[String], fromIndex : Int, toIndex : Int)
=> array.slice(fromIndex ,toIndex ).mkString(","))
your caller will appear like
df.withColumn("test",getPartialstring(col("col2"))
col("col2").take(5)
is failing because column doesn't have a method take(..)
that's why your error message says
error: value take is not a member of org.apache.spark.sql.ColumnName
You can use udf approach to tackle this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With