Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GitHub Pages and Jekyll content duplication and SEO issues

I'm looking at using GitHub Pages to host my blog and Jekyll to present it.

Presumably, whatever I commit will appear at <yourname>.github.io through Jekyll and at https://github.com/<yourname>/<yourname>.github.io in rawer form. See this page showing links to live sites and to the source repos used to construct them.

Advice on SEO suggests that duplicating content within and across domains is bad SEO practice. See this Google support page on duplication and this Moz page on issues with duplication both of which also offer possible solutions.

My question is two-fold:

  • Is content duplication actually a problem for GitHub Pages in practice?
  • If so, how does one apply solutions like canonical linking or noindex to the GitHub repo so that search engines know that your Jekyll site is the canonical one?

Update:

Might be worth noting that I uploaded a "hello world" index file to my GitHub Pages repo and then checked the source for the page on GitHub. The GitHub source already contains a canonical link

<link rel="canonical" href="https://github.com/guypursey/guypursey.github.io/blob/master/index.html" data-pjax-transient>

I assume it's this that would need changing for each file to point to the Jekyll version of the site but I can't see a setting in GitHub to handle it.

like image 432
guypursey Avatar asked Jan 24 '16 17:01

guypursey


People also ask

Does duplicate content Hurt SEO?

Is Duplicate Content Bad For SEO? Officially, Google does not impose a penalty for duplicate content. However, it does filter identical content, which has the same impact as a penalty: a loss of rankings for your web pages.

Is GitHub Pages SEO friendly?

No. From an SEO standpoint, GitHub Pages is not special compared to any other web host.

How do I fix duplicate content in SEO?

In many cases, the best way to fix duplicate content is implementing 301 redirects from the non-preferred versions of URLs to the preferred versions. When URLs need to remain accessible to visitors, you can't use redirect but you can either use a canonical URL or a robots noindex redirective.


1 Answers

Duplicate content is unavoidable when using GitHub Pages with users and organizations if the repository is public

In general this shouldn't be a problem. See a previous answer.

You do have a couple of options:

  • Google and other search engines can't obviously access private repository which requires a paid plan.
  • Switch to a project page. This will use a gh-pages branch instead of the master branch. Since GitHub's robots.txt only allows search engine crawling of the master branch and disallows other branches. So if the site is in gh-pages branch this will prevent Google from seeing the repository.
like image 55
Jonathon Avatar answered Sep 21 '22 11:09

Jonathon