Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting java.lang.RuntimeException: Unsupported data type NullType when turning a dataframe into permanent hive table

I am newbie in spark . I have created a data frame by using sql query inside pyspark. i want to make it as permanent table for getting advantage in future work. i used bellow code

spark.sql("select b.ENTITYID as ENTITYID, cm.BLDGID as BldgID,cm.LEASID as LeaseID,coalesce(l.SUITID,(select EmptyDefault from EmptyDefault)) as SuiteID,(select CurrDate from CurrDate) as TxnDate,cm.INCCAT as IncomeCat,'??' as SourceCode,(Select CurrPeriod from CurrPeriod)as Period,coalesce(case when cm.DEPARTMENT ='@' then 'null' else cm.DEPARTMENT end, null) as Dept,'Lease' as ActualProjected ,fnGetChargeInd(cm.EFFDATE,cm.FRQUENCY,cm.BEGMONTH,(select CurrPeriod from CurrPeriod))*coalesce (cm.AMOUNT,0) as  ChargeAmt,0 as OpenAmt,null as Invoice,cm.CURRCODE as CurrencyCode,case when ('PERIOD.DATACLSD') is null then 'Open' else 'Closed' end as GLClosedStatus,'Unposted'as GLPostedStatus ,'Unpaid' as PaidStatus,cm.FRQUENCY as Frequency,0 as RetroPD from CMRECC cm join BLDG b on cm.BLDGID =b.BLDGID join LEAS l on cm.BLDGID =l.BLDGID and cm.LEASID =l.LEASID and (l.VACATE is null or l.VACATE >= ('select CurrDate from CurrDate')) and (l.EXPIR >= ('select CurrDate from CurrDate') or l.EXPIR < ('select RunDate from RunDate')) left outer join PERIOD on b.ENTITYID =  PERIOD.ENTITYID and ('select CurrPeriod from CurrPeriod')=PERIOD.PERIOD where ('select CurrDate from CurrDate')>=cm.EFFDATE  and (select CurrDate from CurrDate) <= coalesce(cm.EFFDATE,cast(date_add(( select min(cm2.EFFDATE) from CMRECC cm2 where cm2.BLDGID = cm.BLDGID and cm2.LEASID = cm.LEASID and cm2.INCCAT = cm.INCCAT and 'cm2.EFFDATE' > 'cm.EFFDATE'),-1) as timestamp)  ,case when l.EXPIR <(select RunDate from RunDate)then (Select RunDate from RunDate) else l.EXPIR end)").write.saveAsTable('FactChargeTempTable')

for making permanent table but i am getting this error

Job aborted due to stage failure: Task 11 in stage 73.0 failed 1 times, most recent failure: Lost task 11.0 in stage 73.0 (TID 2464, localhost): java.lang.RuntimeException: Unsupported data type NullType.

I hav no idea why it is happening and how could i solve it . Kindly guide me Thank you kalyan

like image 462
Kalyan Avatar asked Oct 22 '16 16:10

Kalyan


3 Answers

The error you have Unsupported data type NullType indicates that one of the columns for the table you are saving has a NULL column. To workaround this issue, you can do a NULL check for the columns in your table and ensure that one of the columns isn't all NULL.

Note, if there is just one row within a column mostly of NULLs, Spark usually is able to identify the datatype (e.g. StringType, IntegerType, etc.) instead of the datatype of NullType.

like image 70
Denny Lee Avatar answered Nov 23 '22 19:11

Denny Lee


I have met this error when I run a spark-sql application. You can cast NULL to String in first, like this:

lit(null).cast("string").
like image 32
aof Avatar answered Nov 23 '22 17:11

aof


@Denny Lee is correct. Someone did opened a Jira for your issue and got similar response. One of the comment suggest the below way around:

Michael:Yeah, parquet doesn't have a concept of null type. I'd probably suggest they case null to a type CAST(NULL AS INT) if they really want to do this, but really you should just omit the column probably.

like image 24
Explorer Avatar answered Nov 23 '22 18:11

Explorer