Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use both dataset.select and selectExpr in apache spark

I want below mentioned data using Spark (2.2) dataset

Name    Age Age+5

A       10  15

B       5   10

C       25  30

I tried using the following :

dataset.select( 
        dataset.col("Name"), 
        dataset.col("Age),
        dataset.col( dataset.selectExpr("Age"+5).toString() )
       );

This throws exception as Age column not found.

like image 815
Sarvesh Belose Avatar asked Nov 22 '17 06:11

Sarvesh Belose


People also ask

What is the difference between select and selectExpr in spark?

Therefore, select() method is useful when you simply need to select a subset of columns from a particular Spark DataFrame. On the other hand, selectExpr() comes in handy when you need to select particular columns while at the same time you also need to apply some sort of transformation over particular column(s).

How do I select a DataFrame in spark?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.

What is permissive mode in spark?

mode (default PERMISSIVE ): allows a mode for dealing with corrupt records during parsing. PERMISSIVE : sets other fields to null when it meets a corrupted record, and puts the malformed string into a new field configured by columnNameOfCorruptRecord .


1 Answers

selectExpr has the definition :

public Dataset<Row> selectExpr(String... exprs)

It takes varargs String as it's parameter. So, you can just use :

dataset.selectExpr( "Name", "Age", "Age+5" )
like image 166
philantrovert Avatar answered Nov 05 '22 04:11

philantrovert