Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Piping curl output into grep

Just a little disclaimer, I am not very familiar with programming so please excuse me if I'm using any terms incorrectly/in a confusing way.

I want to be able to extract specific information from a webpage and tried doing this by piping the output of a curl function into grep. Oh and this is in cygwin if that matters.

When just typing in

$ curl www.ncbi.nlm.nih.gov/gene/823951

The terminal prints the whole webpage in what I believe to be html. From here I thought I could just pipe this output into a grep function with whatever search term want with:

  $ curl www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene Symbol"

But instead of printing the webpage at all, the terminal gives me:

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  142k    0  142k    0     0  41857      0 --:--:--  0:00:03 --:--:-- 42083

Can anyone explain why it does this/how I can search for specific lines of text in a webpage? I eventually want to compile information like gene names, types, and descriptions into a database, so I was hoping to export the results from the grep function into a text file after that.

Any help is extremely appreciated, thanks in advance!

like image 426
David Xie Avatar asked Apr 06 '16 17:04

David Xie


People also ask

How do I redirect a curl output to a file?

Our browser has been opened and it shows the Html page as output, which was mentioned in the “curl” command. Now, we will use the capital “-O” flag in the curl command to save the Html page into a file without creating a new file name. Hence, try to execute the below query in the terminal of Ubuntu 20.04.

Does curl write to stdout?

When asking curl to get a URL it'll send the output to stdout by default. You can of course easily change this behavior with options or just using your shell's redirect feature, but without any option it'll spew it out to stdout.

Can we use curl command in shell script?

The curl command transfers data to or from a network server, using one of the supported protocols (HTTP, HTTPS, FTP, FTPS, SCP, SFTP, TFTP, DICT, TELNET, LDAP or FILE). It is designed to work without user interaction, so it is ideal for use in a shell script.

What is the flag in curl?

What is a flag in Curl? A flag is a command-line parameter that denotes a specific action in Curl. Curl has over three hundred command-line options, and the number of options increases over time.


1 Answers

Curl detects that it is not outputting to a terminal, and shows you the Progress Meter. You can suppress the progress meter with -s.

The HTML data is indeed being sent to grep. However that page does not contain the text "Gene Symbol". Grep is case-sensitive (unless invoked with -i) and you are looking for "Gene symbol".

$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene symbol"
    <dt class="noline"> Gene symbol </dt>

You probably also want the next line of HTML, which you can make grep output with the -A option:

$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep -A1 "Gene symbol"
    <dt class="noline"> Gene symbol </dt>
    <dd class="noline">AT3G47960</dd>

See man curl and man grep for more information about these and other options.

like image 71
retrospectacus Avatar answered Oct 21 '22 18:10

retrospectacus