I'm trying to run a query in redshift where I'm selecting using json_extract_path_text
. Unfortunately, some of the JSON entries in this database column are invalid.
What happens: When the query hits an invalid JSON value, it stops with a "JSON parsing error".
What I want: Ignore any rows with invalid JSON in that column, but return any rows where it can parse the JSON.
Why I can't make it do what I want: I don't think I understand error handling in Redshift/Postgres. It should be possible to simply skip any rows that generate errors, but I tried entering EXEC SQL WHENEVER SQLERROR CONTINUE
(based on the Postgres docs) and got a "syntax error at or near SQLERROR
".
Edit: it seems like Redshift only supports Python UDFs so this answer will not work. I'm going to leave this answer here for posterity (and in the event someone finds this who isn't using Redshift).
Potentially relevant: here is a plpgsql function which will try to decode JSON and return a default value if that fails:
CREATE OR REPLACE FUNCTION safe_json(i text, fallback json) RETURNS json AS $$
BEGIN
RETURN i::json;
EXCEPTION
WHEN others THEN
RETURN fallback;
END;
$$ LANGUAGE plpgsql IMMUTABLE RETURNS NULL ON NULL INPUT;
Then you can use it like this:
SELECT
…
FROM (
SELECT safe_json(my_text, '{"error": "invalid JSON"}'::json) AS my_json
FROM my_table
) as x
To guarantee that you'll always have valid JSON
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With