Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Printschema() in Apache Spark [duplicate]

Dataset<Tweet> ds = sc.read().json("/path").as(Encoders.bean(Tweet.class));



Tweet class :-
long id
string user;
string text;


ds.printSchema();

Output:-

root
  |-- id: string (nullable = true)
  |-- text: string (nullable = true)  
  |-- user: string (nullable = true)

json file has all arguments of string type

My question is am taking input and encoding it as Tweet.class .The datatype specified for id in the schema is Long but when schema is printed it is cast to String.

Does it give printscheme a/c to how it reads the file or according to encoding we do (here Tweet.class)?

like image 992
rushikesh jachak Avatar asked Apr 30 '18 09:04

rushikesh jachak


1 Answers

i don't know the exact reason why your code is not working, but if you want to change the filed type you can write your customSchema.

val schema =  StructType(List
                        (
                          StructField("id", LongType, nullable = true),
                          StructField("text", StringType, nullable = true),
                          StructField("user", StringType, nullable = true)
                        )))

you can apply schema to your dataframe as follows:

Dataset<Tweet> ds = sc.read().schema(schema).json("/path")

ds.printSchema()
like image 181
ROOT Avatar answered Oct 20 '22 23:10

ROOT