Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redshift/Postgres: how can I ignore rows that generate errors? (Invalid JSON in json_extract_path_text)

I'm trying to run a query in redshift where I'm selecting using json_extract_path_text. Unfortunately, some of the JSON entries in this database column are invalid.

What happens: When the query hits an invalid JSON value, it stops with a "JSON parsing error".

What I want: Ignore any rows with invalid JSON in that column, but return any rows where it can parse the JSON.

Why I can't make it do what I want: I don't think I understand error handling in Redshift/Postgres. It should be possible to simply skip any rows that generate errors, but I tried entering EXEC SQL WHENEVER SQLERROR CONTINUE (based on the Postgres docs) and got a "syntax error at or near SQLERROR".

like image 829
Kevin S Avatar asked Aug 14 '14 21:08

Kevin S


1 Answers

Edit: it seems like Redshift only supports Python UDFs so this answer will not work. I'm going to leave this answer here for posterity (and in the event someone finds this who isn't using Redshift).

Potentially relevant: here is a plpgsql function which will try to decode JSON and return a default value if that fails:

CREATE OR REPLACE FUNCTION safe_json(i text, fallback json) RETURNS json AS $$
BEGIN
    RETURN i::json;
EXCEPTION
    WHEN others THEN
        RETURN fallback;
END;
$$ LANGUAGE plpgsql IMMUTABLE RETURNS NULL ON NULL INPUT;

Then you can use it like this:

SELECT
    …
FROM (
    SELECT safe_json(my_text, '{"error": "invalid JSON"}'::json) AS my_json
    FROM my_table
) as x

To guarantee that you'll always have valid JSON

like image 53
David Wolever Avatar answered Sep 19 '22 01:09

David Wolever