Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper .htaccess config for Next.js SSG

NextJS exports a static site with the following structure:


|-- index.html
|-- article.html
|-- tag.html
|-- article
|   |-- somearticle.html
|   \-- anotherarticle.html
\-- tag
    |-- tag1.html
    \-- tag2.html

I'm using an .htaccess file to hide the .html extensions:

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

Everything works flawlessly, EXCEPT:

  • If I follow a link to domain/article it displays the article.html page, but my address bar shows domain/article <--Good.
  • If I refresh, I get sent to address: domain/article/ (note trailing slash) which lists the contents of the article directory <--Bad (same thing with Tag)
  • Similarly, manually typing in domain/article takes me to domain/article/ instead of showing article.html without the .html extension.

So...

  • How do I fix this?
  • Is this an .htaccess issue?
  • A nextjs config issue?
  • (Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)

exportTrailingSlash

I tried playing around with exportTrailingSlash which seems related, but this created other problems like always having a trailing slash at the end of all my links:

Eg: if I go to domain/article/somearticle and hit refresh, something (.httaccess?) is adding a / to the end to give me domain/article/somearticle/ not horrible, just not very clean and inconsistent...

Edit: Actually, it's a little more horrible, because sometimes we get a trailing slash, sometimes we don't on the nextjs links... must be something about how I'm using <Link /> but I can't figure that out.

Regardless, NONE of the .htaccess rules I've tried successfully remove the trailing slash all the time every time...


More details:

In my next app, I have folder:

/articles/
   [slug].js
   index.js

In various pages, I use nextJS Link component:

import Link from 'next/link';

<Link href="/articles" as="/articles">
            <a>Articles</a>
</Link>
like image 752
Trees4theForest Avatar asked Jul 08 '20 17:07

Trees4theForest


2 Answers

If you request /article and /article exists as a physical directory then Apache's mod_dir, will (by default) append the trailing slash in order to "fix" the URL. This is achieved with a 301 permanent redirect - so it will be cached by the browser.

Although having a physical directory with the same basename as a file and using extensionless URLs creates an ambiguity. eg. Is /article supposed to access the directory /article/ or the file /article.html. You don't seem to want to allow direct access to directories anyway, so that would seem to resolve that ambiguity.

To prevent Apache mod_dir appending the trailing slash to directories we need to disable the DirectorySlash. For example:

DirectorySlash Off

But as mentioned, if you have previously visited /article then the redirect to /article/ will have been cached by the browser - so you'll need to clear the browser cache before this will be effective.

Since you are removing the file extension you also need to ensure that MultiViews is disabled, otherwise, mod_negotiation will issue an internal subrequest for the underlying file, and potentially conflict with mod_rewrite. MultiViews is disabled by default, although some shared hosts do enable it for some reason. From the output you are getting it doesn't look like MultiViews is enabled, but better to be sure...

# Ensure that MutliViews is disabled
Options -MultiViews

However, if you need to be able to access the directory itself then you will need to manually append the trailing slash with an internal rewrite. Although this does not seem to be a requirement here. You should, however, ensure that directory listings are disabled:

# Disable directory listings
Options -Indexes

Attempting to access any directory (that does not ultimately map to a file - see below) and does not contain a DirectoryIndex document will return a 403 Forbidden response, instead of a directory listing.

Note that the only difference that could occur between following a link to domain/article, refreshing the page and manually typing domain/article is caching... either by the browser or any intermediary proxy caches. (Unless you have JavaScript that intercepts the click event on the anchor?!)

You do still need to rewrite requests from /foo to /foo.html OR /foo to /foo/index.html (see below), depending on how you have configured your site. Although it would be preferable that you choose one or the other, rather than both (as you seem to imply could be the case).

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

It is unclear how this is seemingly "working" for you currently - unless you are seeing a cached response? When you request /article, the first condition fails because this exists as a physical directory and the rule is not processed. Even with MultiViews enabled, mod_dir will take priority and append the trailing slash.

The second condition that checks the existence of the .html file isn't necessarily checking the same file that is being rewritten to. eg. If you request /foo/bar, where /foo.html exists, but there is no physical directory /foo then the RewriteCond directive checks for the existence of /foo.html - which is successful, but the request is internally rewritten to /foo/bar.html (from the captured RewriteRule pattern) - this results in an internal rewrite loop and a 500 error response being returned to the client. See my answer to the following ServerFault question that goes into more detail behind what is actually happening here.

We can also make a further optimisation if we assume that any URL that contains what looks like a file extension (eg. your static resources .css, .js and image files) should be ignored, otherwise we are performing filesystem checks on every request, which is relatively expensive.

So, in order to map (internally rewrite) requests of the form /article to /article.html and /article/somearticle to /article/somearticle.html you would need to modify the above rule to read something like:

# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]

There is no need to backslash escape a literal dot in the RewriteCond TestString - the dot carries no special meaning here; it's not a regex.

Then, to handle requests of the form /foo that should map to /foo/index.html you can do something like the following:

# Rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]

Ordinarily, you would allow mod_dir to serve the DirectoryIndex (eg. index.html), but having omitted the trailing slash from the directory, this can be problematic.

Summary

Bringing the above points together, we have:

# Disable directory indexes and MultiViews
Options -Indexes -MultiViews

# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off

RewriteEngine On

# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]

# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]

This could be further optimised, depending on your site structure and whether you are adding any more directives to the .htaccess file. For example:

  1. you could check for file extensions on the requested URL at the top of the file to prevent any further processing. The RewriteRule regex on each subsequent rule could then be "simplified".
  2. Requests that include a trailing slash could be blocked or redirected (to remove the trailing slash).
  3. If the request is for a .html file then redirect to the extensionless URL. This is made slightly more complicated if you are dealing with both /foo.html and /foo/index.html. But this is only really necessary if you are changing an existing URL structure.

For example, implementing #1 and #2 above, would enable the directives to be written like so:

# Disable directory indexes and MultiViews
Options -Indexes -MultiViews

# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off

RewriteEngine On

# Prevent any further processing if the URL already ends with a file extension
RewriteRule \.\w{2.4}$ - [L]

# Redirect any requests to remove a trailing slash
RewriteRule (.*)/$ /$1 [R=301,L]

# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.*) $1.html [L]

# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1/index.html -f
RewriteRule (.*) $1/index.html [L]

Always test with a 302 (temporary) redirect before changing to a 301 (permanent) redirect in order to avoid caching issues.

like image 195
MrWhite Avatar answered Oct 12 '22 15:10

MrWhite


  • (Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)

Yes! And Next can do that for you:

It is possible to configure Next.js to export pages as index.html files and require trailing slashes, /about becomes /about/index.html and is routable via /about/. This was the default behavior prior to Next.js 9.

To switch back and add a trailing slash, open next.config.js and enable the exportTrailingSlash config:

module.exports = { exportTrailingSlash: true, }

like image 26
jdaz Avatar answered Oct 12 '22 17:10

jdaz