I am fetching some info via PHP from a webpage using simple_php_dom
and curl. The problem is that the page is not built correctly so the DOM object contains erroneous info.
How can I get the HTML file as a string in a PHP var so that I can run a regular expression through it?
Curl doesn't work as it is ignoring the bad part.simple_html_dom.php
has the same issue.wget
doesn't work since I don't have permissions for it on the server.
file_get_contents — Reads entire file into a string
string file_get_contents (
string $filename [, int $flags= 0 [, resource $context [, int $offset= -1 [, int $maxlen= -1 ]]]]
)
from the manual:
This function is similar to file(), except that file_get_contents() returns the file in a string, starting at the specified offset up to maxlen bytes. On failure, file_get_contents() will return FALSE.
file_get_contents() is the preferred way to read the contents of a file into a string. It will use memory mapping techniques if supported by your OS to enhance performance.
And it works both with webpages and files. You can grab the HTML, just by using "http://whatever.com/page.html" as $filename.
With curl you would want to make sure that you're setting the CURLOPT_RETURNTRANSFER parameter to ensure that the page is retrieved as a string, e.g.:
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
See http://www.php.net/manual/en/function.curl-setopt.php
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With