I have a Spark dataframe with a very large number of columns. I want to remove two columns from it to get a new dataframe. Had there been fewer columns, I could have used the select method in the API like this: <pre class="prettyprint"><code>pcomments = pcomments.select(pcomments.col("post_id"),pcomments.col("comment_id"),pcomments.col("comment_message"),pcomments.col("user_name"),pcomments.col("comment_createdtime")); </code></pre> But since picking columns from a long list is a tedious task, is there a workaround?

Try this: <pre class="prettyprint"><code>val initialDf = ... val dfAfterDropCols = initialDf.drop("column1", "coumn2") </code></pre>

remove a column from a dataframe spark

Tags:

dataframe

scala

apache-spark

apache-spark-sql

I have a Spark dataframe with a very large number of columns. I want to remove two columns from it to get a new dataframe.

Had there been fewer columns, I could have used the select method in the API like this:

pcomments = pcomments.select(pcomments.col("post_id"),pcomments.col("comment_id"),pcomments.col("comment_message"),pcomments.col("user_name"),pcomments.col("comment_createdtime"));

But since picking columns from a long list is a tedious task, is there a workaround?

903

asked Jan 20 '17 12:01

Count

2 Answers

Use drop method and withColumnRenamed methods.

Example:

    val initialDf= ....      val dfAfterDrop=initialDf.drop("column1").drop("coumn2")      val dfAfterColRename= dfAfterDrop.withColumnRenamed("oldColumnName","new ColumnName")

answered Sep 18 '22 09:09

SanthoshPrasad

Try this:

val initialDf = ...  val dfAfterDropCols = initialDf.drop("column1", "coumn2")

answered Sep 20 '22 09:09

Manoj Kumar Dhakad

Related questions
                            
                                Closures in Scala vs Closures in Java
                            
                                Scala asInstanceOf with parameterized types
                            
                                Cleaner tuple groupBy
                            
                                SBT: is it wise to fix eviction warnings of library dependencies
                            
                                Run custom task automatically before/after standard task
                            
                                java.lang.NoClassDefFoundError: scala/Product$class
                            
                                "using" function
                            
                                Configuration data in Scala -- should I use the Reader monad?
                            
                                Scala, importing class
                            
                                Scala contravariance - real life example
                            
                                What factors could determine whether Clojure, Scala or Haskell will gain traction?
                            
                                Matching against a regular expression in Scala
                            
                                How can I force Spark to execute code?
                            
                                how to sort a scala.collection.Map[java.lang.String, Int] by its values?
                            
                                How do I "get" a Scala case object from Java?
                            
                                What are the default Akka dispatcher configuration values?
                            
                                Background task in Scala
                            
                                Play's execution contexts vs scala global
                            
                                Why does Spark fail with "Detected cartesian product for INNER join between logical plans"?
                            
                                What are the main differences between Scala and Frege (in programming paradigms)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With