Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Spark DataFrame : dataFrame.select multiple columns given a Sequence of column names

val columnName=Seq("col1","col2",....."coln"); 

Is there a way to do dataframe.select operation to get dataframe containing only the column names specified . I know I can do dataframe.select("col1","col2"...) but the columnNameis generated at runtime. I could do dataframe.select() repeatedly for each column name in a loop.Will it have any performance overheads?. Is there any other simpler way to accomplish this?

like image 820
Himaprasoon Avatar asked Mar 21 '16 12:03

Himaprasoon


People also ask

How do I order columns in Spark DataFrame?

In Spark, you can use either sort() or orderBy() function of DataFrame/Dataset to sort by ascending or descending order based on single or multiple columns, you can also do sorting using Spark SQL sorting functions, In this article, I will explain all these different ways using Scala examples.

How do I get column names in Spark DataFrame?

You can get the all columns of a Spark DataFrame by using df. columns , it returns an array of column names as Array[Stirng] .

How to select single or multiple columns in spark dataframe?

First, Create a Spark Dataframe. 1. Select Single & Multiple Columns You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select () function.

What is the use of select () function in spark Scala?

This example is also available at Spark Scala GitHub Project for reference. 10. Conclusion In this article, you have learned select () is a transformation function of the DataFrame and is used to select one or more columns, you have also learned how to select nested elements from the DataFrame. Happy Learning !!

How to select and order multiple columns from a Dataframe using pyspark?

In this article, we will discuss how to select and order multiple columns from a dataframe using pyspark in Python. For this, we are using sort () and orderBy () functions along with select () function. Select (): This method is used to select the part of dataframe columns and return a copy of that newly selected dataframe.

How do I select a specific column in a Dataframe?

You can select the single or multiple columns of the DataFrame by passing the column names you wanted to select to the select () function. Since DataFrame is immutable, this creates a new DataFrame with selected columns. show () function is used to show the Dataframe contents.


1 Answers

val columnNames = Seq("col1","col2",....."coln")  // using the string column names: val result = dataframe.select(columnNames.head, columnNames.tail: _*)  // or, equivalently, using Column objects: val result = dataframe.select(columnNames.map(c => col(c)): _*) 
like image 121
Tzach Zohar Avatar answered Sep 28 '22 09:09

Tzach Zohar