Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create DataFrame with nulls using toDF?

How do you create a dataframe containing nulls from a sequence using .toDF ?

This works:

val df = Seq((1,"a"),(2,"b")).toDF("number","letter")

but I'd like to do something along the lines of:

val df = Seq((1, NULL),(2,"b")).toDF("number","letter")
like image 626
user2682459 Avatar asked Dec 23 '22 16:12

user2682459


1 Answers

In addition to Ramesh's answer it's worth noting that since toDF uses reflection to infer the schema it's important for the provided sequence to have a correct type. And if scala's type inference isn't enough you need to specify the type explicitly.

For example if you want 2nd column to be nullable integer then neither of the following works:

Seq((1, null)) has inferred type Seq[(Int, Null)] Seq((1, null), (2, 2)) has inferred type Seq[(Int, Any)]

In this case you need to explicitly specify the type for the 2nd column. There are at least two ways how to do it. You can explicitly specify the generic type for the sequence

Seq[(Int, Integer)]((1, null)).toDF

or create a case class for the row:

case class MyRow(x: Int, y: Integer)
Seq(MyRow(1, null)).toDF

Note that I used Integer instead of Int as the later being a primitive type cannot accommodate nulls.

like image 65
Alex Vayda Avatar answered Jan 08 '23 13:01

Alex Vayda