Trying to convert a CSV file into a JSON
Here is two sample lines :
-21.3214077;55.4851413;Ruizia cordata
-21.3213078;55.4849803;Cossinia pinnata
I would like to get something like :
"occurrences": [
{
"position": [-21.3214077, 55.4851413],
"taxo": {
"espece": "Ruizia cordata"
},
...
}]
Here is my script :
echo '"occurences": [ '
cat se.csv | while read -r line
do
IFS=';' read -r -a array <<< $line;
echo -n -e '{ "position": [' ${array[0]}
echo -n -e ',' ${array[1]} ']'
echo -e ', "taxo": {"espece":"' ${array[2]} '"'
done
echo "]";
I get really strange results :
"occurences": [
""position": [ -21.3214077, 55.4851413 ], "taxo": {"espece":" Ruizia cordata
""position": [ -21.3213078, 55.4849803 ], "taxo": {"espece":" Cossinia pinnata
What is wrong with my code ?
The right tool for this job is jq
.
jq -Rsn '
{"occurrences":
[inputs
| . / "\n"
| (.[] | select(length > 0) | . / ";") as $input
| {"position": [$input[0], $input[1]], "taxo": {"espece": $input[2]}}]}
' <se.csv
emits, given your input:
{
"occurences": [
{
"position": [
"-21.3214077",
"55.4851413"
],
"taxo": {
"espece": "Ruizia cordata"
}
},
{
"position": [
"-21.3213078",
"55.4849803"
],
"taxo": {
"espece": "Cossinia pinnata"
}
}
]
}
By the way, a less-buggy version of your original script might look like:
#!/usr/bin/env bash
items=( )
while IFS=';' read -r lat long pos _; do
printf -v item '{ "position": [%s, %s], "taxo": {"espece": "%s"}}' "$lat" "$long" "$pos"
items+=( "$item" )
done <se.csv
IFS=','
printf '{"occurrences": [%s]}\n' "${items[*]}"
Note:
cat
to pipe into a loop (and good reasons not to); thus, we're using a redirection (<
) to open the file directly as the loop's stdin.read
can be passed a list of destination variables; there's thus no need to read into an array (or first to read into a string, and then to generate a heresting and to read from that into an array). The _
at the end ensures that extra columns are discarded (by putting them into the dummy variable named _
) rather than appended to pos
."${array[*]}"
generates a string by concatenating elements of array
with the character in IFS
; we can thus use this to ensure that commas are present in the output only when they're needed.printf
is used in preference to echo
, as advised in the APPLICATION USAGE section of the specification for echo
itself.Here's a python one-liner/script that'll do the trick:
cat my.csv | python -c 'import csv, json, sys; print(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)]))'
The accepted answer uses jq
to parse the input. This works but jq
doesn't handle escapes i.e. input from a CSV produced from Excel or similar tools is quoted like this:
foo,"bar,baz",gaz
will result in the incorrect output, as jq will see 4 fields, not 3.
One option is to use tab-separated values instead of comma (as long as your input data doesn't contain tabs!), along with the accepted answer.
Another option is to combine your tools, and use the best tool for each part: a CSV parser for reading the input and turning it into JSON, and jq
for transforming the JSON into the target format.
The python-based csvkit will intelligently parse the CSV, and comes with a tool csvjson
which will do a much better job of turning the CSV into JSON. This can then be piped through jq to convert the flat JSON output by csvkit into the target form.
With the data provided by the OP, for the desired output, this as as simple as:
csvjson --no-header-row |
jq '.[] | {occurrences: [{ position: [.a, .b], taxo: {espece: .c}}]}'
Note that csvjson automatically detects ;
as the delimiter, and without a header row in the input, assigns the json keys as a
, b
, and c
.
The same also applies to writing to CSV files -- csvkit
can read a JSON array or new-line delimited JSON, and intelligently output a CSV via in2csv
.
Here is an article on the subject: https://infiniteundo.com/post/99336704013/convert-csv-to-json-with-jq
It also uses JQ, but a bit different approach using split()
and map()
.
jq --slurp --raw-input \
'split("\n") | .[1:] | map(split(";")) |
map({
"position": [.[0], .[1]],
"taxo": {
"espece": .[2]
}
})' \
input.csv > output.json
It doesn't handle separator escaping, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With