Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash: Remove headers from HTTP response

If I have some text containing HTTP headers and body, eg:

HTTP/1.1 200 OK
Cache-Control: public, max-age=38
Content-Type: text/html; charset=utf-8
Expires: Fri, 22 Nov 2013 06:15:01 GMT
Last-Modified: Fri, 22 Nov 2013 06:14:01 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Date: Fri, 22 Nov 2013 06:14:22 GMT

<!DOCTYPE html>
<html>
<head>
    <title>My website</title>
</head>
<body>

Hello world!

</body>
</html>

and this text is being piped in from a command, how can I remove the headers to leave just the body?

(Within the headers, \r\n is used as the line break.  \r\n\r\n marks the end of the headers and the start of the body.)

Here's what I've tried (... indicates any command such as cat or curl which will output some HTTP headers and body to stdout):

sed

My first idea was to do substitution with sed, to remove everything before the first occurrence of \r\n\r\n:

... | sed 's|^.*?\r\n\r\n||'

But this doesn't work, mainly because sed only operates on individual lines, so it can't operate on \r or \n.  (In addition, it doesn't support the ? non-greedy operator.)

grep

I also thought of using grep with a positive lookbehind for \r\n\r\n:

... | grep -oP '(?<=\r\n\r\n).*'

But this doesn't work either (mainly because grep only operates on individual lines).

pcregrep has a multiline mode (-M), but pcregrep is often not available (it's not installed by default in Ubuntu 12.04, Mac OS X 10.7, etc), and I'd like a solution which doesn't require any non-standard tools.

perl

I then thought of doing substitution with perl, using the /s modifier so that . matches line breaks:

... | perl -pe 's/^.*?\r\n\r\n//s'

I think this is closer to a working solution.  However, I think Perl's Input Record Separator ($/) is \n by default, and needs to be changed to \r\n, so that . can match \r\n.  The -0 option can be used to set $/ to a single character, but not multiple characters.  I've tried this, but I don't think it's correct:

... | perl -pe '$/ = "\r\n"; s/^.*?\r\n\r\n//s'

Also, I think ^ is matching "start of line", but needs to match "start of file".

Offset and substring

I had an idea of getting the offset of \r\n\r\n using:

BodyOffset=$(expr index "$MyHttpText" "\r\n\r\n")

and then extracting the body as a substring using:

HttpBody=${MyHttpText:BodyOffset}

Unfortunately, the Mac OS X version of expr doesn't support index.  Also, if possible, I'd like a solution which doesn't require the creation of variables.

Parameter substitution

One other idea I had was to use parameter substitution, where # means "Remove from $MyHttpText the shortest part of *\r\n\r\n that matches the front end of $MyHttpText":

HttpBody=${MyHttpText#*\r\n\r\n}

But I'm not sure how to use this in a piped sequence of commands, and again I'd prefer a solution which doesn't require variables.

like image 483
TachyonVortex Avatar asked Nov 24 '13 19:11

TachyonVortex


People also ask

How do delete a HTTP response header?

Open the site which you would like to open and then click on the HTTP Response Headers option. Click on the X-Powered-By header and then click Remove on the Actions Pane to remove it from the response.

How to remove response header in apache?

To remove a response header in Apache use the Header directive along the unset argument. The Header directive could be used in server config (e.g. httpd. conf ), virtual host, or site specific .

How to remove unwanted HTTP response headers in java?

Depends on where the headers are added. If inside your app, you can use a Spring MVC Interceptor to remove them after your controller calls. If outside your app, you might be able to try a Java EE filter configured in web. xml (the example is security, but the approach will also work for your use case).


2 Answers

sed can do this:

sed '1,/^$/d' data.txt

This command deletes everything starting from line 1, and ending at the first occurrence of an empty line (^$). This works if you have \n as a newline character. If you have \r\n as a newline character, you can use dos2unix and unix2dos to convert them back and forth or you can add the \r character to the sed regex:

sed '1,/^\r$/d' data.txt

However, the last line will only work if you have \r\n as a newline character, to make it work on both types of newlines, you can use:

sed '1,/^\r\{0,1\}$/d' data.txt

Here we are looking for an empty line with either 0 or 1 \r characters.

like image 131
pfnuesel Avatar answered Oct 13 '22 20:10

pfnuesel


Your Perl one-line command does not (can not) remove the headers, because it reads only one line of input at the time. You need to unset the input record separator to read the whole input as one line.

perl -0777 ...
like image 38
TLP Avatar answered Oct 13 '22 21:10

TLP