A Simple Tweak to Solve : How to fix Indexed, though blocked by robots.txt



What is robots.txt?

The robots exclusion protocol or simply robots.txt is a standard used by websites to communicate with web crawlers and other web robots. 

Robots are often used by search engines to categorize websites. When a site owner wishes to give instructions to web robots, they place a text file called robots.txt in the root of the web site. This text file contains the instructions in a specific format. 

The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.

Robots that choose to follow the instructions try to fetch this file and read the instructions before fetching any other file from the website.

A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. If this file doesn't exist, web robots assume that the website owner is not wishing to place any limitations on crawling the entire site.

Before we start, you have to know about custom robots Header tags which are as follow


In Blogger, you are going to deal with the following custom robots header tags.

1. all – If you set this tag, crawlers are not bound by any constraints. They can freely crawl, index and expose your content.

2. noindex – Not all the blogs are for public notice. Even if you don’t share the URL of your personal blog with anybody, chances are people will come to it from search results. On such a scenario, you can use noindex tag as it prevents search engines from indexing the pages.

3. nofollow – Nofollow and dofollow tags are for outbound links. Dofollow is the default robot tag for all your outbound links. That means the search engines can sneak upon the pages you linked to. If you don’t want search bots to look through your links, addition of a nofollow tag must help you.



4. none – none combines the features of both noindex and nofollow tags. The crawlers will neither index your pages nor skim through the links.

5. noarchive – You might have noticed a cached label with most of the website links on SERPs. It shows that Google has captured a copy of your site into their server to display in case it goes down. That being said, the noarchive tag turns off cached version in search pages.

6. nosnippet – The text snippets in search results help people find what’s on the webpage. If you want to keep the content exclusive, you can turn this header tag on.

7. noodp – Open Directory Project or Dmoz is a man-made directory of websites. Google uses the information from there sometimes. You can turn it off with this tag if you want to.

8. notranslate – Do you want to disable translation on your site? Then use notranslate for the exact purpose.

9. noimageindex – If you allow Google to index your images, people may steal it and use on their own websites. To prevent that, you can keep the images deindexed using noimageindex tag. Also Read this Image SEO guide.

10. unavailable_after – In Blogger, you will get a field right to this tag. So, the webpage will be deindexed after this time.


Now i think you have learned pretty much information about custom robots header tags to fix our main issue. Let's fix this issue.



How to fix Indexed, though blocked by robots.txt?

Step 1: Just Login to your blogger.com and Choose your Blog for which you want to fix                   “Indexed, though blocked by robots.txt” issue.

Step 2: Then go to Settings >> Search preference. There you can see two settings
                1. Custom robots.txt
                2. Custom robots header tags



Step 3. To set "Custom robots.txt" Click on “Edit” and select “Yes” for                                        “Enable custom robots.txt content?” and Now generate XML Sitemap code for                 your Blog from Ctrl.org Site.
   

Step 4:
You will get something like this "XML Sitemap code" for your site.


Example : -

# Blogger Sitemap generated on 2020.03.01
User-agent: *
Disallow: /search/
Allow: /
Sitemap: https://www.techvigyaan.com/atom.xml?redirect=false&start-index=1&max-results=500

Step 5: Now Just add these two line below "Disallow: /search" in your xml sitemap code. 

Disallow: /category/
Disallow: /tag/

your code will look like this:

# Blogger Sitemap generated on 2020.03.01
User-agent: *
Disallow: /search/
Disallow: /category/
Disallow: /tag/
Allow: /
Sitemap: https://www.techvigyaan.com/atom.xml?redirect=false&start-index=1&max-results=500

Step 6: Now paste this code into the given text box and click on Save button.

Step 7: Now it's time to set "Custom robots header tags?". Click to on edit option and                 select yes to enable header tags.


Step 8: Now you will see the number of header tags. You can set them by just follow the                   same settings I chose (refer to the image given below) and hit Save changes.



Final Step 9: Now go to the "google search console" >> coverage and start the                                validation process to fix it. 

That's it guy's, that's all you have to do to fix this issue.



LET ME KNOW IF IT WORKS FOR YOU.
Even still if You have any problem, feel free to write us.

You May Also Read: -
A Simple Tweak to Solve : How to fix Indexed, though blocked by robots.txt A Simple Tweak to Solve : How to fix Indexed, though blocked by robots.txt Reviewed by Admin on March 09, 2020 Rating: 5

No comments:

Powered by Blogger.