Available options in the spark.read.option()

Tags:

When I read other people's python code, like, spark.read.option("mergeSchema", "true"), it seems that the coder has already known what the parameters to use. But for a starter, is there a place to look up those available parameters? I look up the apche documents and it shows parameter undocumented.

Thanks.

932

asked Sep 24 '18 05:09

Tim.X

2 Answers

Annoyingly, the documentation for the option method is in the docs for the json method. The docs on that method say the options are as follows (key -- value -- description):

primitivesAsString -- true/false (default false) -- infers all primitive values as a string type
prefersDecimal -- true/false (default false) -- infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.
allowComments -- true/false (default false) -- ignores Java/C++ style comment in JSON records
allowUnquotedFieldNames -- true/false (default false) -- allows unquoted JSON field names
allowSingleQuotes -- true/false (default true) -- allows single quotes in addition to double quotes
allowNumericLeadingZeros -- true/false (default false) -- allows leading zeros in numbers (e.g. 00012)
allowBackslashEscapingAnyCharacter -- true/false (default false) -- allows accepting quoting of all character using backslash quoting mechanism
allowUnquotedControlChars -- true/false (default false) -- allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not.
mode -- PERMISSIVE/DROPMALFORMED/FAILFAST (default PERMISSIVE) -- allows a mode for dealing with corrupt records during parsing.
- PERMISSIVE : when it meets a corrupted record, puts the malformed string into a field configured by columnNameOfCorruptRecord, and sets other fields to null. To keep corrupt records, an user can set a string type field named columnNameOfCorruptRecord in an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds a columnNameOfCorruptRecord field in an output schema.
- DROPMALFORMED : ignores the whole corrupted records.
- FAILFAST : throws an exception when it meets corrupted records.

answered Nov 30 '22 07:11

csjacobs24

For built-in formats all options are enumerated in the official documentation. Each format has its own set of option, so you have to refer to the one you use.

For read open docs for DataFrameReader and expand docs for individual methods. Let's say for JSON format expand json method (only one variant contains full list of options)

json options
For write open docs for DataFrameWriter. For example for Parquet:

parquet options

However merging schema is performed not via options, but using session properties

 spark.conf.set("spark.sql.parquet.mergeSchema", "true")

answered Nov 30 '22 07:11

user10407081

Related questions
                            
                                Do stateless widgets dispose on their own?
                            
                                What are the semantics of overlapping objects in C?
                            
                                How to immediately enable the authority after update user authority in spring security?
                            
                                Clean OO-structure vs. SQL performance
                            
                                Error Handling without Exceptions
                            
                                Using sphinx to auto-document a python class, module
                            
                                Any good tutorials on using OAuth with Piston? [closed]
                            
                                Google App Engine Dev Server Logs Location
                            
                                Scrolling issues with GridView in Android
                            
                                GWT to create utility javascript library
                            
                                Unbearably slow android emulator -- is there a fix?
                            
                                Same Origin Policy - AJAX & using Public APIs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With