Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to write spark custom data source based on FileFormat

i saw the spark avro datasource is implemented based on FileFormat interface. Is there any documentation about how to write spark custom datasource based on FileFormat? Up to now i can't find any(except the source code from spark avro).

Thank you!

like image 869
Wei Avatar asked Aug 09 '17 14:08

Wei


People also ask

What is Saveastable in Spark?

Saves the content of the DataFrame as the specified table. In the case the table already exists, behavior of this function depends on the save mode, specified by the mode function (default to throwing an exception).


1 Answers

Here is an example of a simple file-based spark datasource: https://hackernoon.com/extending-our-spark-sql-query-engine-5f4a088de986

Here's a couple examples that implement the Data Sources API, as well: * https://github.com/databricks/spark-csv * https://github.com/databricks/spark-avro

like image 149
yobibytes Avatar answered Sep 27 '22 16:09

yobibytes