I have a NodeJS/NextJS app running at http://www.schandillia.com. The project has a robots.txt file accessible at http://www.schandillia.com/robots.txt. As of now, the file is bare-bones for testing purposes:
User-agent: *
Allow: /
However, when I run a Lighthouse audit on my site, it throws a Crawling and Indexing error saying it couldn't download a robots.txt file. I repeat, the file is available at http://www.schandillia.com/robots.txt.
The project's codebase, should you need to take a look, is up at https://github.com/amitschandillia/proost. The robots.txt file is located at proost/web/static/
but accessible at root thanks to the following in my Nginx config:
# ... the rest of your configuration
location = /robots.txt {
proxy_pass http://127.0.0.1:3000/static/robots.txt;
}
The complete config file is available for review on github at https://github.com/amitschandillia/proost/blob/master/.help_docs/configs/nginx.conf.
Please advice if there's something I'm overlooking.
TL;DR: Your robots.txt
is served fine, but Lighthouse can not fetch it properly because its audit can currently not work with the connect-src
directive of of your site’s Content Security Policy, due to a known limitation which is being tracked as issue #4386 was fixed in Chrome 92.
Explanation: Lighthouse attempts to fetch the robots.txt
file by way of a script ran from the document served by the root of your site. Here is the code it uses to perform this request (found in lighthouse-core):
const response = await fetch(new URL('/robots.txt', location.href).href);
If you try to run this code from your site, you will notice that a “Refused to connect” error is thrown:
This error happens because the browser enforces the Content Security Policy restrictions from the headers served by your site (split on several lines for readability):
content-security-policy:
default-src 'self';
script-src 'self' *.google-analytics.com;
img-src 'self' *.google-analytics.com;
connect-src 'none';
style-src 'self' 'unsafe-inline' fonts.googleapis.com;
font-src 'self' fonts.gstatic.com;
object-src 'self';
media-src 'self';
frame-src 'self'
Notice the connect-src 'none';
part. Per the CSP spec, it means that no URL can be loaded using script interfaces from within the served document. In practice, any fetch
is refused.
This header is explicitly sent by the server layer of your by Next.js application, because of the way you configured your Content Security Policy middleware (from commit a6aef0e):
import csp from 'helmet-csp';
server.use(csp({
directives: {
defaultSrc: ["'self'"],
scriptSrc: ["'self'", '*.google-analytics.com'],
imgSrc: ["'self'", '*.google-analytics.com'],
connectSrc: ["'none'"],
styleSrc: ["'self'", "'unsafe-inline'", 'maxcdn.bootstrapcdn.com'], // Remove unsafe-inline for better security
fontSrc: ["'self'"],
objectSrc: ["'self'"],
mediaSrc: ["'self'"],
frameSrc: ["'self'"]
}
}));
Solution/Workaround: To solve the problem in the audit report, you can either:
connect-src 'self'
directive, which will have the side effect of allowing HTTP requests from the browser side of your Next.js appIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With