I am currently writing a web application using angularjs, but I think this question applies to any client-side javascript framework that does routing on the client side (as angular does).
In a single-page app, what is the right way to deal with wrong URLs?
Looking at a few major sites, I see that gmail will redirect to the inbox if you type any random URL below https://mail.google.com/mail/. This happens server-side (with an http 300 code) or client-side, depending on whether the wrong path is before or after the # character. On the other hand, twitter shows a real HTTP 404 for any invalid URL. A third option would be to show a "soft" 404, a purely client-side error page.
These solutions seem appropriate for different situations. Twitter wants the links to twitter users and tweets to be real links, so people can share them, post them in news articles, etc, so it is important that invalid links be recognized as such (if I have a broken link to a tweet in my website, a simple crawl will tell me that). In gmail, on the other hand, you are not expected to share links into your inbox, and I'm not even sure if the links are really permanent/persistent: it seems the url updating mostly serves the purpose of browser history navigation within the single-page app. The third approach of giving soft errors might be appropriate for situations similar to gmail, but where there is no reasonable "default" page.
After this long introduction, here are some specific questions:
404 error codes are generated when a user attempts to access a webpage that does not exist, has been moved, or has a dead or broken link. The 404 error code is one of the most frequent errors a web user encounters. Servers are required to respond to client requests, such as when a user attempts to visit a webpage.
If you care about SEO, one of the ways that angular.io was able to solve this problem (at least with Google anyway) is by using noindex meta tag "to indicate soft-404 status which will prevent crawlers from crawling the content of the page". Apparently it can be added to the document via JavaScript.
Alternatively, using JavaScript, you can redirect to a page that will respond with an actual HTTP 404 status code. Google understands JavaScript redirects just fine. Your original /does-not-exist
page, when redirected to /404-error?from=does-not-exist
, will be associated with the 404 status code returned by the server. The URL structure does not matter, only the status code and the redirect are important here.
Your other options are SSR (Nuxt.js, Next.js, Angular Universal, etc) or pre-rendering (prerender.io, puppeteer, etc) which Google calls dynamic rendering where you respond to search bot requests with a pre-rendered version while human users get your normal client-side rendered app.
tl;dr: Drop hashbang support and opt for PJAX like behavior if you care about SEO.
Are you making an App or a Website? If website you need to return 404
so that you don't confuse google. It needs be a real 404
not just show a message of page not found (ie 200
with message "page not found" is very bad). Also what browsers do you care to support?
My opinion is that the whole hashbang server side rendering should be avoided (ie the nasty Google SEO #!
hack). Either use real pushstate or re-render the whole page if the URL changes for browsers that don't support pushstate (not a hash change).
Now the reason this matters is that a #!
should never return a 404
because it doesn't make sense and its impossible to mimic server side because the server never gets whats after the #!
with out running Javascript.
Thus if you really care about SEO I would do something like PJAX and only use true pushstate for routing and then just fail to old web 1.0. Consequently the links I recommend you share that can truly be a 404
should not have #!
(traditional #
being fine so long as the contents of the page don't change drastically).
Finally the 404
is mostly not a problem but rather 30X
ie redirect responses. Thats because the browser will automatically handle redirects so your Javascript AJAX calls will never see a 30X
(they will get the redirect response instead... ie 200). To handle 30X
responses you will have to send a header back for every request to indicate what the redirected URL is/was (ie what you were redirected to) so that you don't mess up the Pushstate History.
Of course if you need to support hashbang like Twitter used too (and they are the ones that even killed hashbang), you can leverage Google Sitemaps and the rel=nofollow
to try to mitigate bad SEO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With