Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL - Select all AND computed columns?

This is a total noob question, sorry for that. In Spark, I can use select as:

df.select("*"); //to select everything
df.select(df.col("colname")[, df.col("colname")]); //to select one or more columns
df.select(df.col("colname"), df.col("colname").plus(1)) //to select a column and a calculated column

But. How can I select all the columns PLUS a calculated one? Obviously select("*", df.col("colname").plus(1)) doesn't work (compilation error). How can this be done under JAVA? Thank you!

like image 482
lte__ Avatar asked Jul 19 '16 20:07

lte__


People also ask

How do I select all columns in Spark?

In general, we use "*" to select all the columns from a DataFrame, and another way is by using df. columns and map as shown below. In this first, by df.

How do I select multiple columns in Spark?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.

How do I select columns in Spark Dataset?

To select a column from the Dataset, use apply method in Scala and col in Java. Note that the Column type can also be manipulated through its various functions. and in Java: // To create Dataset<Row> using SparkSession Dataset<Row> people = spark.

How do you select all columns in PySpark?

In PySpark, select() function is used to select single, multiple, column by index, all columns from the list and the nested columns from a DataFrame, PySpark select() is a transformation function hence it returns a new DataFrame with the selected columns.


1 Answers

Just do:

df.select(df.col("*"), df.col("colName").plus(1));
like image 195
Yuan JI Avatar answered Oct 30 '22 07:10

Yuan JI