Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stopping index of Github pages

Tags:

I have a github page from my repository username.github.io

However I do not want Google to crawl my website and absolutely do not want it to show up on search results.

Will just using robots.txt in github pages work? I know there are tutorials for stop indexing Github repository but what about the actual Github page?

like image 866
user2961712 Avatar asked Sep 25 '15 14:09

user2961712


People also ask

How do I restrict access to GitHub Pages?

With access control for GitHub Pages, you can restrict access to your project site by publishing the site privately. A privately published site can only be accessed by people with read access to the repository the site is published from.

Where do I put robots txt in GitHub?

However, if you're using a custom domain with your GitHub Pages site, you can place a robots. txt file at the root of your repo and it will work as expected.

Is GitHub Pages good for SEO?

If your blog or product landing page is using GitHub Pages, it can now be optimized for SEO. By adding a simple {% seo %} to your site, GitHub will automatically add SEO metadata to each page. It even accounts for the page title, in addition to the description, canonical URL, next (and previous) URL and post metadata.


2 Answers

I don't know if it is still relevant, but google says you can stop spiders with a meta tag:

<meta name="robots" content="noindex"> 

I'm not sure however if that works for all spiders or google only.

like image 147
Gumbo Avatar answered Oct 03 '22 09:10

Gumbo


Short answer:

You can use a robots.txt to stop indexing of your users GitHub Pages by adding it in your User Page. This robots.txt will be the active robots.txt for all your projects pages as the project pages are reachable as subdirectories (username.github.io/project) in your subdomain (username.github.io).


Longer answer:

You get your own subdomain for GitHub pages (username.github.io). According to this question on MOZ and googles reference each subdomain has/needs its own robots.txt.

This means that the valid/active robots.txt for project projectname by user username lives at username.github.io/robots.txt. You can put a robots.txtfile there by creating a GitHub Pages page for your user.

This is done by creating a new project/repository named username.github.io where username is your username. You can now create a robots.txt file in the master branch of this project/repository and it should be visible at username.github.io/robots.txt. More information about project, user and organization pages can be found here.

I have tested this with Google, confirming ownership of myusername.github.io by placing a html file in my project/repository https://github.com/myusername/myusername.github.io/tree/master, creating a robot.txt file there and then verifying that my robots.txt works by using Googles Search Console webmaster tools (googlebot-fetch). Google does indeed list it as blocked and Google Search Console webmaster tools (robots-testing-tool) confirms it.

To block robots for one projects GitHub Page:

User-agent: * Disallow: /projectname/ 

To block robots for all GitHub Pages for your user (User Page and all Project Pages):

User-agent: * Disallow: / 

Other options

  • Look into the HTML meta tag
  • Look into custom domain (redirects) for GitHub Pages
like image 25
olavimmanuel Avatar answered Oct 03 '22 10:10

olavimmanuel