Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove escape sequence characters like newline, tab and carriage return from JSON file

I have a JSON with 80+ fields. While extracting the message field in the below mentioned JSON file using jq, I'm getting newline characters and tab spaces. I want to remove the escape sequence characters and I have tried it using sed, but it did not work.

Sample JSON file:

{
"HOSTNAME":"server1.example",
"level":"WARN",
"level_value":30000,
"logger_name":"server1.example.adapter",
"content":{"message":"ERROR LALALLA\nERROR INFO NANANAN\tSOME MORE ERROR INFO\nBABABABABABBA\n BABABABA\t ABABBABAA\n\n BABABABAB\n\n"}
}

Can anyone help me on this?

like image 516
user3792699 Avatar asked Oct 29 '16 16:10

user3792699


People also ask

How do I handle newlines in JSON?

In JSON object make sure that you are having a sentence where you need to print in different lines. Now in-order to print the statements in different lines we need to use '\\n' (backward slash). As we now know the technique to print in newlines, now just add '\\n' wherever you want.

Can JSON contain carriage return?

Full JSON grammar The tab character (U+0009), carriage return (U+000D), line feed (U+000A), and space (U+0020) characters are the only valid whitespace characters.

What characters must be escaped in JSON?

In JSON the only characters you must escape are \, ", and control codes. Thus in order to escape your structure, you'll need a JSON specific function.


1 Answers

A pure jq solution:

$ jq -r '.content.message | gsub("[\\n\\t]"; "")' file.json
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

If you want to keep the enlosing " characters, omit -r.

Note: peak's helpful answer contains a generalized regular expression that matches all control characters in the ASCII and Latin-1 Unicode range by way of a Unicode category specifier, \p{Cc}. jq uses the Oniguruma regex engine.


Other solutions, using an additional utility, such as sed and tr.

Using sed to unconditionally remove escape sequences \n and t:

$ jq '.content.message' file.json | sed 's/\\[tn]//g'
"ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB"

Note that the enclosing " are still there, however. To remove them, add another substitution to the sed command:

$ jq '.content.message' file.json | sed 's/\\[tn]//g; s/"\(.*\)"/\1/'
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

A simpler option that also removes the enclosing " (note: output has no trailing \n):

$ jq -r '.content.message' file.json | tr -d '\n\t'
ERROR LALALLAERROR INFO NANANANSOME MORE ERROR INFOBABABABABABBA BABABABA ABABBABAA BABABABAB

Note how -r is used to make jq interpolate the string (expanding the \n and \t sequences), which are then removed - as literals - by tr.

like image 68
mklement0 Avatar answered Oct 04 '22 19:10

mklement0