Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace newline \n with expression using sed (or awk, or tr)

Tags:

grep

sed

awk

tr

I'm trying to clean up the syntax of a pseudo-json file. The file is too large to open in a text editor (20 gb), so I have to do all of this via command line (running Arch linux). The one thing I cannot figure out how to do is replace new line characters in sed (GNU sed v. 4.8)

Specifically I have data of the form:

{
    "id" : 1,
    "value" : 2
}
{
    "id" : 2,
    "value" : 4
}

And I need to put a comma after the closed curly bracket (but not the last one). So I want the output to looks like:

{
    "id" : 1,
    "value" : 2
},
{
    "id" : 2,
    "value" : 4
}

Ideally I'd just do this in sed, but from reading about this, sed flattens the text first, so it's not clear how to replace newline characters. Ideally I'd just run something like sed 's/}\n{/},\n{/g' test.json, but this doesn't work (nor does using \\n in place of \n).

I've also tried awk, but have run into similar issue of not being able to replace the combination of a hard return with brackets. And I can get tr to replace the hard returns, but not the combination of characters.

Any thoughts on how to solve this?

like image 319
user3037237 Avatar asked Sep 16 '25 02:09

user3037237


2 Answers

Yeah, by default sed works line by line. You cannot match across multiple lines unless you use features to bring in multiple lines to the pattern space. Here's one way to do it, provided the input strictly follows the sample shown:

sed '/}$/{N; s/}\n{/},\n{/}' ip.txt
  • /}$/ match } at the end of a line
    • {} allows you to group commands to be executed for a particular address
    • N will add the next line to the pattern space
    • s/}\n{/},\n{/ perform the required substitution
  • Use -i option for in-place editing

This solution can fail for sequences like shown below, but I assume two lines ending with } will not occur in a row.

}
}
{
abc
}

Use sed '/}$/{N; s/}\n{/},\n{/; P; D}' if the above sequence can occur.

like image 149
Sundeep Avatar answered Sep 18 '25 17:09

Sundeep


With your shown samples please try following awk program; using RS and setting its value to null then simply apply gsub(Global substitution) to substitute from }\n{ to },\n{ in matches.

awk -v RS= '{gsub(/}\n{/,"},\n{")} 1' Input_file
like image 38
RavinderSingh13 Avatar answered Sep 18 '25 18:09

RavinderSingh13