I have multiple, large, csv files, each of which has missing values in many places. When I import the csv file into SQLite, I would like to have the missing values recorded as NULL for the reason that another application expects missing data to be indicated by NULL. My current method does not produce the desired result.
An example CSV file (test.csv) is:
12|gamma|17|delta
67||19|zeta
96|eta||theta
98|iota|29|
The first line is complete; each of the other lines has (or is meant to show!) a single missing item. When I import using:
.headers on
.mode column
.nullvalue NULL
CREATE TABLE t (
id1 INTEGER PRIMARY KEY,
a1 TEXT,
n1 INTEGER,
a2 TEXT
);
.import test.csv t
SELECT
id1, typeof(id1),
a1, typeof(a1),
n1, typeof(n1),
a2, typeof(a2)
FROM t;
the result is
id1 typeof(id1) a1 typeof(a1) n1 typeof(n1) a2 typeof(a2)
---- ----------- ------ ---------- -- ---------- ------ ----------
12 integer gamma text 17 integer delta text
67 integer text 19 integer zeta text
96 integer eta text text theta text
98 integer iota text 29 integer text
so the missing values have become text. I would appreciate some guidance on how to ensure that all missing values become NULL.
In CSV files, a NULL value is typically represented by two successive delimiters (e.g. ,, ) to indicate that the field contains no data; however, you can use string values to denote NULL (e.g. null ) or any unique string.
There are 2 primary ways of handling missing values: Deleting the Missing values. Imputing the Missing Values.
sqlite3 imports values as text and there does not seem to be a way to make it treat empty values as nulls.
However, you can update the tables yourself after import, setting empty strings to nulls, like
UPDATE t SET a1=NULL WHERE a1='';
Repeat for each column.
You can also create a trigger for such updates:
CREATE TRIGGER trig_a1 AFTER INSERT ON t WHEN new.a1='' BEGIN
UPDATE t SET a1=NULL WHERE rowid=new.rowid;
END;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With