If I have some text containing HTTP headers and body, eg:
HTTP/1.1 200 OK
Cache-Control: public, max-age=38
Content-Type: text/html; charset=utf-8
Expires: Fri, 22 Nov 2013 06:15:01 GMT
Last-Modified: Fri, 22 Nov 2013 06:14:01 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Date: Fri, 22 Nov 2013 06:14:22 GMT
<!DOCTYPE html>
<html>
<head>
<title>My website</title>
</head>
<body>
Hello world!
</body>
</html>
and this text is being piped in from a command, how can I remove the headers to leave just the body?
(Within the headers, \r\n
is used as the line break. \r\n\r\n
marks the end of the headers and the start of the body.)
Here's what I've tried (...
indicates any command such as cat
or curl
which will output some HTTP headers and body to stdout):
My first idea was to do substitution with sed
, to remove everything before the first occurrence of \r\n\r\n
:
... | sed 's|^.*?\r\n\r\n||'
But this doesn't work, mainly because sed
only operates on individual lines, so it can't operate on \r
or \n
. (In addition, it doesn't support the ?
non-greedy operator.)
I also thought of using grep
with a positive lookbehind for \r\n\r\n
:
... | grep -oP '(?<=\r\n\r\n).*'
But this doesn't work either (mainly because grep
only operates on individual lines).
pcregrep
has a multiline mode (-M
), but pcregrep
is often not available (it's not installed by default in Ubuntu 12.04, Mac OS X 10.7, etc), and I'd like a solution which doesn't require any non-standard tools.
I then thought of doing substitution with perl
, using the /s
modifier so that .
matches line breaks:
... | perl -pe 's/^.*?\r\n\r\n//s'
I think this is closer to a working solution. However, I think Perl's Input Record Separator ($/
) is \n
by default, and needs to be changed to \r\n
, so that .
can match \r\n
. The -0
option can be used to set $/
to a single character, but not multiple characters. I've tried this, but I don't think it's correct:
... | perl -pe '$/ = "\r\n"; s/^.*?\r\n\r\n//s'
Also, I think ^
is matching "start of line", but needs to match "start of file".
I had an idea of getting the offset of \r\n\r\n
using:
BodyOffset=$(expr index "$MyHttpText" "\r\n\r\n")
and then extracting the body as a substring using:
HttpBody=${MyHttpText:BodyOffset}
Unfortunately, the Mac OS X version of expr
doesn't support index
. Also, if possible, I'd like a solution which doesn't require the creation of variables.
One other idea I had was to use parameter substitution, where #
means "Remove from $MyHttpText
the shortest part of *\r\n\r\n
that matches the front end of $MyHttpText
":
HttpBody=${MyHttpText#*\r\n\r\n}
But I'm not sure how to use this in a piped sequence of commands, and again I'd prefer a solution which doesn't require variables.
Open the site which you would like to open and then click on the HTTP Response Headers option. Click on the X-Powered-By header and then click Remove on the Actions Pane to remove it from the response.
To remove a response header in Apache use the Header directive along the unset argument. The Header directive could be used in server config (e.g. httpd. conf ), virtual host, or site specific .
Depends on where the headers are added. If inside your app, you can use a Spring MVC Interceptor to remove them after your controller calls. If outside your app, you might be able to try a Java EE filter configured in web. xml (the example is security, but the approach will also work for your use case).
sed can do this:
sed '1,/^$/d' data.txt
This command deletes everything starting from line 1, and ending at the first occurrence of an empty line (^$
). This works if you have \n
as a newline character. If you have \r\n
as a newline character, you can use dos2unix
and unix2dos
to convert them back and forth or you can add the \r
character to the sed regex:
sed '1,/^\r$/d' data.txt
However, the last line will only work if you have \r\n
as a newline character, to make it work on both types of newlines, you can use:
sed '1,/^\r\{0,1\}$/d' data.txt
Here we are looking for an empty line with either 0 or 1 \r
characters.
Your Perl one-line command does not (can not) remove the headers, because it reads only one line of input at the time. You need to unset the input record separator to read the whole input as one line.
perl -0777 ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With