Gatsby, the headless CMS marvel, alongside gatsby-plugin-advanced-sitemap
makes generating a robust sitemap.xml
file a breeze.
The plugin supports excluding pages with either by slug or with clever regular expression.
However in my case, I have 2 collections of User-Generated Content pages:
The B collection has not a slug pattern making them identifiable with a regular expression.
How to exclude the collection B from indexation?
Files that you don't include (or exclude) in your sitemap.xml
can be indexed too if their robotx.txt
allows Google's robots with an index
attribute, or if the links from other pages to the page have a follow
attribute.
Regarding your question, the plugin allows you to add a exclude
array of pages that won't be added in your sitemap.xml
despite being retrieved by your query. So, in your gatsby-config.js
:
{
resolve: `gatsby-plugin-advanced-sitemap`,
options: {
query: `{}`, // your query
mapping: {}, // add if needed
exclude: [
`/dev-404-page`,
`/404`,
`/404.html`,
`/offline-plugin-app-shell-fallback`,
`/terms-and-conditions`,
`/terms-of-use`,
`/cookie-policy`,
`/privacy-policy`,
/(\/)?hash-\S*/,
],
createLinkInHead: true,
addUncaughtPages: true,
additionalSitemaps: [], // add if needed
},
},
Ideally, your excluded pages may follow a regular expression way to be excluded automatically when generated. However, if don't, as the code above shows, you can add manually. In this case, nor /terms-of-use
, /cookie-policy
, or /privacy-policy
won't be added in your sitemap.xml
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With