When you use the Web.Page(Web.Contents('url')) function to read a table from a web page, some sites will cause an error due to inconsistent linefeeds.
DataSource.Error: The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF
There doesn't appear to be any option you can pass to the Web functions to ignore those errors.
This method works for a short while, but doesn't survive a save/refresh:
let
BufferedBinary = Binary.Buffer(Web.Contents("http://vote.sos.ca.gov/returns/president/party/democratic/county/all")),
CleanedUp = Text.Replace(Text.Replace(Text.FromBinary(BufferedBinary), "#(cr,lf)", "#(lf)"), "#(lf)", "#(cr,lf)"),
Table = Web.Page(CleanedUp)
in
Table
To remove rows with errors in Power Query, first select the column that contains errors. On the Home tab, in the Reduce rows group, select Remove rows. From the drop-down menu, select Remove errors. The result of that operation will give you the table that you're looking for.
If so, you could first do a Group By on dataset keep with an Aggregation of Keep all rows. You can then sort on dataset key descending and keep the first 10 rows. You can then expand the column with the tables in it and load.
It might look like the other code sample works, but that probably means it just hasn't been run yet.
The two library functions Web.Contents('url')
and Web.Page(Web.Contents('url'))
use different HTTP clients.
Web.Contents
uses a pretty basic HTTP client, but Web.Page
over a non-transformed Web.Contents
will instead use the IE browser to load the URL directly. IE is very forgiving for any HTTP protocol errors, but the basic HTTP client is much more strict, which causes the error you see.
By calling Binary.Buffer
or Text.Replace
in "between," that will skip the optimization where Web.Page
uses the browser directly, and it instead uses Web.Contents
to make the web request and then loads those bytes into the browser.
So in this case if you invoke:
Web.Page(Web.Contents("http://vote.sos.ca.gov/returns/president/party/democratic/county/all"))
You'll get a good result table, but "From Web" will be broken because the Web.Contents('url')
by itself will error.
Since you shared the URL, I was able to try http://vote.sos.ca.gov/returns/president/party/democratic/county/all and get the same problem.
The underlying problem is the fourth cookie in the HTTP response header has an invalid hex character 0x01 right in the middle:
Maybe this is something that http://vote.sos.ca.gov/ can fix on their server? I tried submitting a "bug report" on their Contact Us page but I'm not sure that is the right channel...
BTW, our HTTP client library doesn't give very good error messages, there's nothing wrong with CR or LF characters. Even if there was, there's nothing that Text.Replace
over the response body can fix in this case because the problem is in the HTTP response headers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With