This is a total noob question, sorry for that. In Spark, I can use select as:
df.select("*"); //to select everything
df.select(df.col("colname")[, df.col("colname")]); //to select one or more columns
df.select(df.col("colname"), df.col("colname").plus(1)) //to select a column and a calculated column
But. How can I select all the columns PLUS a calculated one? Obviously
select("*", df.col("colname").plus(1))
doesn't work (compilation error). How can this be done under JAVA?
Thank you!
In general, we use "*" to select all the columns from a DataFrame, and another way is by using df. columns and map as shown below. In this first, by df.
You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.
To select a column from the Dataset, use apply method in Scala and col in Java. Note that the Column type can also be manipulated through its various functions. and in Java: // To create Dataset<Row> using SparkSession Dataset<Row> people = spark.
In PySpark, select() function is used to select single, multiple, column by index, all columns from the list and the nested columns from a DataFrame, PySpark select() is a transformation function hence it returns a new DataFrame with the selected columns.
Just do:
df.select(df.col("*"), df.col("colName").plus(1));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With