Over the past few months, I've been using Spark to do my data clean stuff. For most cases, it's just filtering or some simple aggregation.
Recently, I found that large portion of the tasks can be done in Hive using HQL. But I found that it's difficult to do map, especially flatmap using Hive.
In a sense, select -like operations are map operations, but what if flatmap ?
Can someone give some tips ?
Thanks.
In a limited way lateral view (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView) can do a flatmap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With