Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I load large JSON files from Snowflake Stage?

My company are in the process of migrating from a local DB to a Data Warehouse as the load is too much for SQL Server at the moment. We've looked at what cloud solutions we might be able to use and decided on Snowflake. We need to process quite heavy compressed JSONs of up to 300 MB per file at times. I've read through the documentation, created the stage, file format and table as (json_data variant). I've loaded my first JSON file using SnowSQL CLI and that worked too. The test file is 3.7 kb. When trying to copy into mytable I got this error

Error parsing JSON: document is too large, max size 16777216 bytes

How can I avoid this error without having to split the files before being uploaded in the Stage? The data is being sent by an app so every hour we'd have to load this data in when going live.

like image 239
Andrei Budaes Avatar asked Oct 31 '25 07:10

Andrei Budaes


2 Answers

I would assume that error "Error parsing JSON: document is too large, max size 16777216 bytes" is caused because snowflake thinks your file has a single json record which practically does not look correct.

In JSON file format, use STRIP_OUTER_ELEMENT = TRUE .. this will turn your single JSON document array into individual JSON records.

Kindly try this and see if this works for you.


Update:

I have tested this and it works without any issues with way bigger single file than max size 16777216.

Steps

--On Worksheet

create or replace stage testdb.testschema.stage_java;

--On SnowSQL

put file://C:/Users/sonij/OneDrive/Desktop/bigjson.json @STAGE_JAVA AUTO_COMPRESS=false SOURCE_COMPRESSION=NONE OVERWRITE=TRUE;

--On Worksheet

create or replace file format bigjson_ff
    type = json
        COMPRESSION = none
        STRIP_OUTER_ARRAY = true
        ALLOW_DUPLICATE = TRUE;

select
    parse_json($1)
from @stage_java/bigjson.json (file_format => 'bigjson_ff');
like image 159
Soni Avatar answered Nov 02 '25 22:11

Soni


Splitting the files won't help here I'm afraid, as much as Snowflake recommends files from 10 to 100MB compressed for loading, it can handle bigger files as well.

The problem probably is with a single JSON record size (or something Snowflake thinks is a single JSON record).

You can try loading your file with "ON_ERROR=CONTINUE" to see if anything loads at all (if so, you'll know which row is problematic, if not, you'll know which rows are considered a single row)

like image 26
MMV Avatar answered Nov 02 '25 23:11

MMV



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!