Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Power BI (Power Query) Web request results in "CR must be followed by LF" Error

Tags:

When you use the Web.Page(Web.Contents('url')) function to read a table from a web page, some sites will cause an error due to inconsistent linefeeds.

DataSource.Error: The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF

There doesn't appear to be any option you can pass to the Web functions to ignore those errors.

This method works for a short while, but doesn't survive a save/refresh:

let
   BufferedBinary = Binary.Buffer(Web.Contents("http://vote.sos.ca.gov/returns/president/party/democratic/county/all")),
   CleanedUp = Text.Replace(Text.Replace(Text.FromBinary(BufferedBinary), "#(cr,lf)", "#(lf)"), "#(lf)", "#(cr,lf)"),
   Table = Web.Page(CleanedUp)
in
   Table
like image 852
James White Avatar asked Jun 15 '16 14:06

James White


People also ask

How do I fix power BI query errors?

To remove rows with errors in Power Query, first select the column that contains errors. On the Home tab, in the Reduce rows group, select Remove rows. From the drop-down menu, select Remove errors. The result of that operation will give you the table that you're looking for.

How do I keep the top 10 rows in Power Query?

If so, you could first do a Group By on dataset keep with an Aggregation of Keep all rows. You can then sort on dataset key descending and keep the first 10 rows. You can then expand the column with the tables in it and load.


1 Answers

It might look like the other code sample works, but that probably means it just hasn't been run yet.


The two library functions Web.Contents('url') and Web.Page(Web.Contents('url')) use different HTTP clients.

Web.Contents uses a pretty basic HTTP client, but Web.Page over a non-transformed Web.Contents will instead use the IE browser to load the URL directly. IE is very forgiving for any HTTP protocol errors, but the basic HTTP client is much more strict, which causes the error you see.

By calling Binary.Buffer or Text.Replace in "between," that will skip the optimization where Web.Page uses the browser directly, and it instead uses Web.Contents to make the web request and then loads those bytes into the browser.

So in this case if you invoke:

Web.Page(Web.Contents("http://vote.sos.ca.gov/returns/president/party/democratic/county/all"))

You'll get a good result table, but "From Web" will be broken because the Web.Contents('url') by itself will error.


Since you shared the URL, I was able to try http://vote.sos.ca.gov/returns/president/party/democratic/county/all and get the same problem.

The underlying problem is the fourth cookie in the HTTP response header has an invalid hex character 0x01 right in the middle:

Bad Hex

Maybe this is something that http://vote.sos.ca.gov/ can fix on their server? I tried submitting a "bug report" on their Contact Us page but I'm not sure that is the right channel...


BTW, our HTTP client library doesn't give very good error messages, there's nothing wrong with CR or LF characters. Even if there was, there's nothing that Text.Replace over the response body can fix in this case because the problem is in the HTTP response headers.

like image 113
Carl Walsh Avatar answered Sep 28 '22 02:09

Carl Walsh