I'm generating an entire site using just an index.html with JS scripts.
The JS creates the HTML content based on JSON data received via the server-side API. This works great client-side and makes the site load speed and interaction very fast but there is a snag... when a crawler comes to index the page it will see a blank page.
The obvious solution is to provide an XML site map with static versions of all the pages. The problem is... how to generate static versions of each page when they are only generated client-side and all logic and templates are client-side?
This is not a new issue... I'm sure anyone generating pages dynamically client-side has hit this issue and solved it but I thought I'd ask the dev community before diving in and trying to solve this.
Tech has moved on significantly. I would encourage anyone looking to create SSR (server-side rendered) and client-side web apps in one isomorphic code base to take a look at the excellent Next.js.
Next.js wraps React with a server-side routing and rendering system built in Node.js, defines a standard interface to getting data for pages on server and client, and comes with some out of the box features that make it one of the best choices (IMHO) for both SSR and CSR web applications.
Oh... and they have a great tutorial too!
I've managed to generate static pages from the client-side output by using PhantomJS and capturing the HTML output after the page and all JS has finished loading/executing. This method is slower than I would like and unlikely to scale well but it's the only option that I can think of so far.
The site already receives over 10,000 page views a day with over 8,000 unique visitors so pages get updated regularly as new comments / posts are created and then these changes are added to a queue which gets process in a separate server to generate static pages with Phantom.
The only other way I can think of doing this is to create a Node.js process that uses the same jsRender library and builds HTML output from the template files based on some data, but this would be time consuming to set up and would not generate the exact same output that the dynamic site creates. Google may frown on me serving it static pages that don't really represent the dynamic version that "normal" visitors can see.
This seems like an unsolvable issue. Either I generate the pages entirely server-side, or crawlers cannot index the pages. :(
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With