Printschema() in Apache Spark [duplicate]

Question

Dataset<Tweet> ds = sc.read().json("/path").as(Encoders.bean(Tweet.class));



Tweet class :-
long id
string user;
string text;


ds.printSchema();

Output:-

root
  |-- id: string (nullable = true)
  |-- text: string (nullable = true)  
  |-- user: string (nullable = true)

json file has all arguments of string type

My question is am taking input and encoding it as Tweet.class .The datatype specified for id in the schema is Long but when schema is printed it is cast to String.

Does it give printscheme a/c to how it reads the file or according to encoding we do (here Tweet.class)?

ROOT · Accepted Answer

i don't know the exact reason why your code is not working, but if you want to change the filed type you can write your customSchema.

val schema =  StructType(List
                        (
                          StructField("id", LongType, nullable = true),
                          StructField("text", StringType, nullable = true),
                          StructField("user", StringType, nullable = true)
                        )))

you can apply schema to your dataframe as follows:

Dataset<Tweet> ds = sc.read().schema(schema).json("/path")

ds.printSchema()

Printschema() in Apache Spark [duplicate]

Tags:

apache-spark

apache-spark-dataset

spark-dataframe

rushikesh jachak

1 Answers

ROOT

Recent Activity

Donate For Us

Printschema() in Apache Spark [duplicate]

Tags:

apache-spark

apache-spark-dataset

spark-dataframe

rushikesh jachak

1 Answers

ROOT

Related questions

Recent Activity

Donate For Us