My company are in the process of migrating from a local DB to a Data Warehouse as the load is too much for SQL Server at the moment. We've looked at what cloud solutions we might be able to use and decided on Snowflake. We need to process quite heavy compressed JSONs of up to 300 MB per file at times. I've read through the documentation, created the stage, file format and table as (json_data variant). I've loaded my first JSON file using SnowSQL CLI and that worked too. The test file is 3.7 kb. When trying to copy into mytable I got this error
Error parsing JSON: document is too large, max size 16777216 bytes
How can I avoid this error without having to split the files before being uploaded in the Stage? The data is being sent by an app so every hour we'd have to load this data in when going live.
I would assume that error "Error parsing JSON: document is too large, max size 16777216 bytes" is caused because snowflake thinks your file has a single json record which practically does not look correct.
In JSON file format, use STRIP_OUTER_ELEMENT = TRUE .. this will turn your single JSON document array into individual JSON records.
Kindly try this and see if this works for you.
Update:
I have tested this and it works without any issues with way bigger single file than max size 16777216.
Steps
--On Worksheet
create or replace stage testdb.testschema.stage_java;
--On SnowSQL
put file://C:/Users/sonij/OneDrive/Desktop/bigjson.json @STAGE_JAVA AUTO_COMPRESS=false SOURCE_COMPRESSION=NONE OVERWRITE=TRUE;
--On Worksheet
create or replace file format bigjson_ff
type = json
COMPRESSION = none
STRIP_OUTER_ARRAY = true
ALLOW_DUPLICATE = TRUE;
select
parse_json($1)
from @stage_java/bigjson.json (file_format => 'bigjson_ff');
Splitting the files won't help here I'm afraid, as much as Snowflake recommends files from 10 to 100MB compressed for loading, it can handle bigger files as well.
The problem probably is with a single JSON record size (or something Snowflake thinks is a single JSON record).
You can try loading your file with "ON_ERROR=CONTINUE" to see if anything loads at all (if so, you'll know which row is problematic, if not, you'll know which rows are considered a single row)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With