<code>df1.printSchema()</code> prints out the column names and the data type that they possess. <code>df1.drop($"colName")</code> will drop columns by their name. Is there a way to adapt this command to drop by the data-type instead?

If you are looking to drop specific columns in the dataframe based on the types, then the below snippet would help. In this example, I have a dataframe with two columns of type String and Int respectivly. I am dropping my String (all fields of type String would be dropped) field from the schema based on its type. <pre class="prettyprint"><code>import sqlContext.implicits._ val df = sc.parallelize(('a' to 'l').map(_.toString) zip (1 to 10)).toDF("c1","c2") df.schema.fields .collect({case x if x.dataType.typeName == "string" => x.name}) .foldLeft(df)({case(dframe,field) => dframe.drop(field)}) </code></pre> The schema of the <code>newDf</code> is <code>org.apache.spark.sql.DataFrame = [c2: int]</code>

Dropping columns by data type in Scala Spark

1 Answers

If you are looking to drop specific columns in the dataframe based on the types, then the below snippet would help. In this example, I have a dataframe with two columns of type String and Int respectivly. I am dropping my String (all fields of type String would be dropped) field from the schema based on its type.

import sqlContext.implicits._

val df = sc.parallelize(('a' to 'l').map(_.toString) zip (1 to 10)).toDF("c1","c2")

df.schema.fields
    .collect({case x if x.dataType.typeName == "string" => x.name})
    .foldLeft(df)({case(dframe,field) => dframe.drop(field)})

The schema of the newDf is org.apache.spark.sql.DataFrame = [c2: int]

answered Sep 26 '22 19:09

rogue-one

Related questions
                            
                                Cannot resolve symbol 'play' error with Play Framework 2.4.x and IntellijIdea 14.x
                            
                                SBT: Exclude resource subdirectory
                            
                                On Spark's RDD's take and takeOrdered methods
                            
                                Operate on neighbor elements in RDD in Spark
                            
                                Kryo serializer causing exception on underlying Scala class WrappedArray
                            
                                Add a compile time only sub-project dependency in sbt
                            
                                scala.js — getting complex objects from JavaScript
                            
                                reduce() vs. fold() in Apache Spark
                            
                                How to convert column to vector type?
                            
                                Scala-Spark Dynamically call groupby and agg with parameter values
                            
                                Spark random forest binary classifier metrics
                            
                                Local assignment affects type?
                            
                                How to put a variable into z ZeppelinContext in javascript in Zeppelin?
                            
                                Spark History Server on S3A FileSystem: ClassNotFoundException
                            
                                Can non-persistent data structures be used in a purely functional way?
                            
                                Generic Numeric division
                            
                                Chain functions in different way
                            
                                value read is not a member of org.apache.spark.SparkContext
                            
                                scala.MatchError: [Ljava.lang.String; (of class [Ljava.lang.String;)
                            
                                Inserting Data Into Cassandra table Using Spark DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dropping columns by data type in Scala Spark

Tags:

scala

apache-spark

Leothorn

People also ask

1 Answers

rogue-one

Recent Activity

Donate For Us