Bash: Remove headers from HTTP response

If I have some text containing HTTP headers and body, eg:

HTTP/1.1 200 OK
Cache-Control: public, max-age=38
Content-Type: text/html; charset=utf-8
Expires: Fri, 22 Nov 2013 06:15:01 GMT
Last-Modified: Fri, 22 Nov 2013 06:14:01 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Date: Fri, 22 Nov 2013 06:14:22 GMT

<!DOCTYPE html>
<html>
<head>
    <title>My website</title>
</head>
<body>

Hello world!

</body>
</html>

and this text is being piped in from a command, how can I remove the headers to leave just the body?

(Within the headers, \r\n is used as the line break. \r\n\r\n marks the end of the headers and the start of the body.)

Here's what I've tried (... indicates any command such as cat or curl which will output some HTTP headers and body to stdout):

sed

My first idea was to do substitution with sed, to remove everything before the first occurrence of \r\n\r\n:

... | sed 's|^.*?\r\n\r\n||'

But this doesn't work, mainly because sed only operates on individual lines, so it can't operate on \r or \n. (In addition, it doesn't support the ? non-greedy operator.)

grep

I also thought of using grep with a positive lookbehind for \r\n\r\n:

... | grep -oP '(?<=\r\n\r\n).*'

But this doesn't work either (mainly because grep only operates on individual lines).

pcregrep has a multiline mode (-M), but pcregrep is often not available (it's not installed by default in Ubuntu 12.04, Mac OS X 10.7, etc), and I'd like a solution which doesn't require any non-standard tools.

perl

I then thought of doing substitution with perl, using the /s modifier so that . matches line breaks:

... | perl -pe 's/^.*?\r\n\r\n//s'

I think this is closer to a working solution. However, I think Perl's Input Record Separator ($/) is \n by default, and needs to be changed to \r\n, so that . can match \r\n. The -0 option can be used to set $/ to a single character, but not multiple characters. I've tried this, but I don't think it's correct:

... | perl -pe '$/ = "\r\n"; s/^.*?\r\n\r\n//s'

Also, I think ^ is matching "start of line", but needs to match "start of file".

Offset and substring

I had an idea of getting the offset of \r\n\r\n using:

BodyOffset=$(expr index "$MyHttpText" "\r\n\r\n")

and then extracting the body as a substring using:

HttpBody=${MyHttpText:BodyOffset}

Unfortunately, the Mac OS X version of expr doesn't support index. Also, if possible, I'd like a solution which doesn't require the creation of variables.

Parameter substitution

One other idea I had was to use parameter substitution, where # means "Remove from $MyHttpText the shortest part of *\r\n\r\n that matches the front end of $MyHttpText":

HttpBody=${MyHttpText#*\r\n\r\n}

But I'm not sure how to use this in a piped sequence of commands, and again I'd prefer a solution which doesn't require variables.

How do delete a HTTP response header?

Open the site which you would like to open and then click on the HTTP Response Headers option. Click on the X-Powered-By header and then click Remove on the Actions Pane to remove it from the response.

How to remove response header in apache?

To remove a response header in Apache use the Header directive along the unset argument. The Header directive could be used in server config (e.g. httpd. conf ), virtual host, or site specific .

How to remove unwanted HTTP response headers in java?

Depends on where the headers are added. If inside your app, you can use a Spring MVC Interceptor to remove them after your controller calls. If outside your app, you might be able to try a Java EE filter configured in web. xml (the example is security, but the approach will also work for your use case).

sed can do this:

sed '1,/^$/d' data.txt

This command deletes everything starting from line 1, and ending at the first occurrence of an empty line (^$). This works if you have \n as a newline character. If you have \r\n as a newline character, you can use dos2unix and unix2dos to convert them back and forth or you can add the \r character to the sed regex:

sed '1,/^\r$/d' data.txt

However, the last line will only work if you have \r\n as a newline character, to make it work on both types of newlines, you can use:

sed '1,/^\r\{0,1\}$/d' data.txt

Here we are looking for an empty line with either 0 or 1 \r characters.

Your Perl one-line command does not (can not) remove the headers, because it reads only one line of input at the time. You need to unset the input record separator to read the whole input as one line.

perl -0777 ...

Bash: Remove headers from HTTP response

Tags:

regex

grep

bash

sed

perl

sed

grep

perl

Offset and substring

Parameter substitution

TachyonVortex

People also ask

2 Answers

pfnuesel

TLP

Recent Activity

Donate For Us

Bash: Remove headers from HTTP response

Tags:

regex

grep

bash

sed

perl

sed

grep

perl

Offset and substring

Parameter substitution

TachyonVortex

People also ask

2 Answers

pfnuesel

TLP

Related questions

Recent Activity

Donate For Us