The Scala DataFrameReader has a function "option" which has the following signature:
def option(key: String, value: String): DataFrameReader
// Adds an input option for the underlying data source.
So what is an "input option" for the underlying data source, can someone share an example here on how to use this function?
The list of available options varies by the file format. They are documented in the DataFrameReader
API docs.
For example:
def json(paths: String*): DataFrame
Loads a JSON file (one object per line) and returns the result as a DataFrame.
This function goes through the input once to determine the input schema. If you know the schema in advance, use the version that specifies the schema to avoid the extra scan.
You can set the following JSON-specific options to deal with non-standard JSON files:
primitivesAsString
(defaultfalse
): infers all primitive values as a string typeprefersDecimal
(defaultfalse
): infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.allowComments
(defaultfalse
): ignores Java/C++ style comment in JSON recordsallowUnquotedFieldNames
(defaultfalse
): allows unquoted JSON field namesallowSingleQuotes
(defaulttrue
): allows single quotes in addition to double quotesallowNumericLeadingZeros
(defaultfalse
): allows leading zeros in numbers (e.g. 00012)allowBackslashEscapingAnyCharacter
(defaultfalse
): allows accepting quoting of all character using backslash quoting mechanismmode
(defaultPERMISSIVE
): allows a mode for dealing with corrupt records during parsing.
PERMISSIVE
: sets other fields tonull
when it meets a corrupted record, and puts the malformed string into a new field configured bycolumnNameOfCorruptRecord
. When a schema is set by user, it setsnull
for extra fields.DROPMALFORMED
: ignores the whole corrupted records.FAILFAST
: throws an exception when it meets corrupted records.columnNameOfCorruptRecord
(default is the value specified inspark.sql.columnNameOfCorruptRecord
): allows renaming the new field having malformed string created byPERMISSIVE
mode. This overridesspark.sql.columnNameOfCorruptRecord
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With