How to take only 2 data from <code>arraytype</code> column in Spark Scala? I got the data like <code>val df = spark.sqlContext.sql("select col1, col2 from test_tbl")</code>. I have data like following: <pre class="prettyprint"><code>col1 | col2 --- | --- a | [test1,test2,test3,test4,.....] b | [a1,a2,a3,a4,a5,.....] </code></pre> I want to get data like following: <pre class="prettyprint"><code>col1| col2 ----|---- a | test1,test2 b | a1,a2 </code></pre> When I am doing <code>df.withColumn("test", col("col2").take(5))</code> it is not working. It give this error: <blockquote> value take is not a member of org.apache.spark.sql.ColumnName </blockquote> How can I get the data in above order?

Inside <code>withColumn</code> you can call udf <code>getPartialstring</code> for that you can use <code>slice</code> or <code>take</code> method like below example snippet untested. <pre class="prettyprint"><code> import sqlContext.implicits._ import org.apache.spark.sql.functions._ val getPartialstring = udf((array : Seq[String], fromIndex : Int, toIndex : Int) => array.slice(fromIndex ,toIndex ).mkString(",")) </code></pre> your caller will appear like <pre class="prettyprint"><code> df.withColumn("test",getPartialstring(col("col2")) </code></pre> <code>col("col2").take(5)</code> is failing because column doesn't have a method <code>take(..)</code> that's why your error message says <blockquote> error: value take is not a member of org.apache.spark.sql.ColumnName </blockquote> You can use udf approach to tackle this.

How to split comma separated string and get n values in Spark Scala dataframe?

Tags:

dataframe

scala

apache-spark

apache-spark-sql

spark-dataframe

How to take only 2 data from arraytype column in Spark Scala? I got the data like val df = spark.sqlContext.sql("select col1, col2 from test_tbl").

I have data like following:

col1  | col2                              
---   | ---
a     | [test1,test2,test3,test4,.....]   
b     | [a1,a2,a3,a4,a5,.....]

I want to get data like following:

col1| col2
----|----
a   | test1,test2
b   | a1,a2

When I am doing df.withColumn("test", col("col2").take(5)) it is not working. It give this error:

value take is not a member of org.apache.spark.sql.ColumnName

How can I get the data in above order?

668

asked Jul 13 '17 17:07

Narendra Mohan Prasad

1 Answers

Inside withColumn you can call udf getPartialstring for that you can use slice or take method like below example snippet untested.

  import sqlContext.implicits._
  import org.apache.spark.sql.functions._

  val getPartialstring = udf((array : Seq[String], fromIndex : Int, toIndex : Int) 
   => array.slice(fromIndex ,toIndex ).mkString(","))

your caller will appear like

 df.withColumn("test",getPartialstring(col("col2"))

col("col2").take(5) is failing because column doesn't have a method take(..) that's why your error message says

error: value take is not a member of org.apache.spark.sql.ColumnName

You can use udf approach to tackle this.

100

answered Sep 28 '22 11:09

Ram Ghadiyaram

Related questions
                            
                                Spark Task not serializable (Case Classes)
                            
                                Android/Scala project in IntelliJ 14 compiles, but crashes when launched not finding Scala class
                            
                                Is there a way to rewrite Spark RDD distinct to use mapPartitions instead of distinct?
                            
                                Why does Future.onSuccess require a partial function
                            
                                Implicit Resolution Failure?
                            
                                how to build a graph from tuples in graphx and label the nodes after ?
                            
                                Subtype for a table element in a Scala Slick Query
                            
                                How to do Slick configuration via application.conf from within custom sbt task?
                            
                                How do you compose tasks in sbt?
                            
                                Why does a for-comprehension used with an extractor of type tuple result in a compile warning on `filter`?
                            
                                Scala top level package object
                            
                                Read Kafka topic in a Spark batch job
                            
                                No implementation for play.api.db.slick.DatabaseConfigProvider was bound
                            
                                Map the types of a shapeless HList
                            
                                spark: SAXParseException while writing to parquet on s3
                            
                                What is the difference between "container" and "provided" in SBT dependencies?
                            
                                How to use "cube" only for specific fields on Spark dataframe?
                            
                                Understanding type inferrence in Scala
                            
                                How do I set the scala sdk using gradle in Idea module?
                            
                                When is case syntactically significant?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With