What is the correct way to parse a string using regular expressions in a linux shell script? I wrote the following script to print my SO rep on the console using curl
and sed
(not solely because I'm rep-crazy - I'm trying to learn some shell scripting and regex before switching to linux).
json=$(curl -s http://stackoverflow.com/users/flair/165297.json) echo $json | sed 's/.*"reputation":"\([0-9,]\{1,\}\)".*/\1/' | sed s/,//
But somehow I feel that sed
is not the proper tool to use here. I heard that grep
is all about regex and explored it a bit. But apparently it prints the whole line whenever a match is found - I am trying to extract a number from a single line of text. Here is a downsized version of the string that I'm working on (returned by curl
).
{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 silver badge\"\u003e\u003cspan class=\"badge2\"\u003e●\u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}
I guess my questions are:
sed
the right thing to use here? grep
? A regular expression (regex) is a text pattern that can be used for searching and replacing. Regular expressions are similar to Unix wild cards used in globbing, but much more powerful, and can be used to search, replace and validate text.
Regex is a very powerful tool that is available at our disposal & the best thing about using regex is that they can be used in almost every computer language. So if you are Bash Scripting or creating a Python program, we can use regex or we can also write a single line search query.
If you're wondering what is meant by "regular expression", a brief explanation is in order. A regular expression is some sequence of characters that represents a pattern. For example, the [0-9] in the example above will match any single digit where [A-Z] would match any capital letter.
The grep
command will select the desired line(s) from many but it will not directly manipulate the line. For that, you use sed
in a pipeline:
someCommand | grep 'Amarghosh' | sed -e 's/foo/bar/g'
Alternatively, awk
(or perl
if available) can be used. It's a far more powerful text processing tool than sed
in my opinion.
someCommand | awk '/Amarghosh/ { do something }'
For simple text manipulations, just stick with the grep/sed
combo. When you need more complicated processing, move on up to awk
or perl
.
My first thought is to just use:
echo '{"displayName":"Amarghosh","reputation":"2,737","badgeHtml"' | sed -e 's/.*tion":"//' -e 's/".*//' -e 's/,//g'
which keeps the number of sed
processes to one (you can give multiple commands with -e
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With