Google Not Updating robots.txt – Don’t Panic

Valid robots.txt Prevents Site From Being Indexed

So your robots.txt file looks valid, but Google Webmaster Tools says your site can’t be indexed, because it’s blocked by your robots.txt file.  You’ve tried everything you can think of, but your site is still blocked from the indexing giant.  Don’t panic.  It’s only a matter of time!

Oops!  What Not To Do.

The other day I was building a new site and figured I’d try something in WordPress that I’ve not tried before.  You can’t really learn new things, if you don’t try new things.

Wordpress Search Engine Visibility Option

Under the “Reading Settings” in WordPress there is an option called “Search Engine Visibility.”  When you check this option WordPress will generate a simple robots.txt file for your site which includes code that will discourage robots from indexing your site.  NOTE: the robots.txt file only suggests what files or directories are not to be indexed by search engine robots (also called crawlers or bots).  It’s up to the actual bot to respect the request of the robots.txt file.  All the major search engine crawlers will respect your robots.txt file, but beware that spammers and troublemakers most likely won’t.  DO NOT use your robots.txt file to SECURE anything, ’cause it can’t.

For the first time ever, I actually checked this option and sure enough while I went about creating my site and adding content, Google did not index my site and none of the pages showed up in their index.  The actual robots.txt file that was generated when this option was checked contained the following code:

User-agent: *
Disallow: /

First of all the “User-agent: *” instructs ALL bots to obey the instructions within this robots.txt file.  If you want to block a specific search engine crawler, you’d change the asterisk to the search engine crawler name (for example: Googlebot). Second, the “Disallow: /” instructs ALL bots that you want your entire site blocked from their indexing.

Anyway, for the time being all was good and I achieved the results I wanted.  Google wasn’t indexing my site and my pages weren’t showing up in searches. Unfortunately, all good things must come to and end, site content gets created, and you want people to find your site when they search for things relevant to what you provide.

Creating a Valid robots.txt

After all the new site content was created I went back and “unchecked” the “Search Engine Visibility” under the “Reading Settings” in WordPress.  Figuring that a valid robots.txt file would be generated and immediately read by the search engines was my only mistake.

Although a valid robots.txt file (as shown below) was generated, it didn’t seem to matter to Google Webmaster Tools.

User-agent: *
Disallow:

For some reason the site wasn’t getting indexed.  Google Webmaster Tools kept seeing the original robots.txt file that prevented indexing of the site and displayed a message saying the site was being blocked by the robots.txt file, even though it should not have been blocked.

Information Overload

After scratching my head for 20 minutes and looking throughout all the WordPress panels and the themes options, just to make sure there was no other option that may have prevented the indexing of the site, I took to my search to the search engines themselves and went looking for a solution.

Like most searches I got back a million or so results, more than I needed, but none with the solution I needed.  There was a lot of chatter on forums about how Google should be reading the file and how long it should take, but it had already been way longer than what most suggested and still the site wasn’t indexed and Google Webmaster tools kept seeing the wrong file and saying that the robots.txt was blocking the site from being indexed.

Finally I found what appeared to be a reliable solution.  Unfortunately I didn’t save the URL, so I can’t reference it here.  I looked for it briefly, but no luck.  It took a couple hours the first time to find it, so I know it’s out there and will look again later for it to include as a reference.

Refresh Rate of Robots.txt File

The most reliable information about how often Google actually updates the robots.txt file turned out to be within 48 hours of it being changed.  Although experience, in this situation, shows it might actually take longer than that.

After 48 hours the new site was still not being indexed and Google search and Webmaster Tools were still saying the robots.txt file was blocking the site.  However, finally at about the 96 hour point (4 days into this saga) Google Webmaster Tools finally did report a valid robots.txt file and there was no longer a message about site URL’s being blocked.

Within 24 hours of this being noticed, Webmaster Tools actually reported that seven pages of content had been submitted, and one page had been indexed.  Finally we were getting some where.  The next day more pages were indexed and finally after another day all the pages were indexed and viewable in Google.  Oh, and when I say viewable in Google I mean viewable when using the Google SITE search command.  For example, entering “site:millionairesuccessnetwork.com” as a Google search command will show all the pages for our Millionaire Success Network site that are indexed in Google.

All is Good, Pages are Indexed, Robots.txt is Updated and Working

In the end, it looks like only Google really knows how often they actually go out and refresh their copy of your sites robots.txt file.  Maybe this was just a once off and this new sites robots.txt file got lost in the depths of cyberspace.  Who knows?  Whatever the problem was, it was sure frustrating to know that a valid robots.txt file was in place, but not being read.

Then to have to wait days longer than anyone expected to have the robots.txt file actually refreshed within Google Webmaster Tools, just added to the frustration.  Finally life got easier once the error messages were gone and content started appearing as being submitted to the indexing giant.  Of course all the stress wasn’t gone until pages actually started appearing in the index.

I guess the moral of this story is, as long as you’re not in any hurry and don’t mind how long it takes for your site to get indexed, don’t worry and go ahead and block your site using your robots.txt file.  However, if you know you have a new site with content that needs to get indexed soon, then make sure you leave at least 48 hours and maybe twice that time set aside to unblock your site when using the robots.txt to block search engine indexing.

A Millionaire Success Secret

Originally I didn’t expect the robots.txt file to not get refreshed right away and once I started digging into the problem, I wanted to use a natural refresh of the robots.txt file to see who quick the problem was resolved on it’s own.  Had I been in a real hurry I would have sent out a Twitter update for a couple of the new sites pages that were just published.

Usually when you send out a message via Twitter with a link to a new page you created, the search engines will pickup the page and index it pretty quickly.  Not sure if that would have helped refresh the robots.txt file in this case, but that might be a good test for another time.

Today’s secret – if you want some new content index quickly, just Tweet about it!

Readers Comments

comments powered by Disqus

Sign Me Up For Millionaire Success Tips & More!

We Respect Your Privacy

Recent Comments

Recent Tweets

Connect With Us

Google Not Updating robots.txt – Don’t Panic

Valid robots.txt Prevents Site From Being Indexed

So your robots.txt file looks valid, but Google Webmaster Tools says your site can’t be indexed, because it’s blocked by your robots.txt file.  You’ve tried everything you can think of, but your site is still blocked from the indexing giant.  Don’t panic.  It’s only a matter of time!

Oops!  What Not To Do.

The other day I was building a new site and figured I’d try something in WordPress that I’ve not tried before.  You can’t really learn new things, if you don’t try new things.

Wordpress Search Engine Visibility Option

Under the “Reading Settings” in WordPress there is an option called “Search Engine Visibility.”  When you check this option WordPress will generate a simple robots.txt file for your site which includes code that will discourage robots from indexing your site.  NOTE: the robots.txt file only suggests what files or directories are not to be indexed by search engine robots (also called crawlers or bots).  It’s up to the actual bot to respect the request of the robots.txt file.  All the major search engine crawlers will respect your robots.txt file, but beware that spammers and troublemakers most likely won’t.  DO NOT use your robots.txt file to SECURE anything, ’cause it can’t.

For the first time ever, I actually checked this option and sure enough while I went about creating my site and adding content, Google did not index my site and none of the pages showed up in their index.  The actual robots.txt file that was generated when this option was checked contained the following code:

User-agent: *
Disallow: /

First of all the “User-agent: *” instructs ALL bots to obey the instructions within this robots.txt file.  If you want to block a specific search engine crawler, you’d change the asterisk to the search engine crawler name (for example: Googlebot). Second, the “Disallow: /” instructs ALL bots that you want your entire site blocked from their indexing.

Anyway, for the time being all was good and I achieved the results I wanted.  Google wasn’t indexing my site and my pages weren’t showing up in searches. Unfortunately, all good things must come to and end, site content gets created, and you want people to find your site when they search for things relevant to what you provide.

Creating a Valid robots.txt

After all the new site content was created I went back and “unchecked” the “Search Engine Visibility” under the “Reading Settings” in WordPress.  Figuring that a valid robots.txt file would be generated and immediately read by the search engines was my only mistake.

Although a valid robots.txt file (as shown below) was generated, it didn’t seem to matter to Google Webmaster Tools.

User-agent: *
Disallow:

For some reason the site wasn’t getting indexed.  Google Webmaster Tools kept seeing the original robots.txt file that prevented indexing of the site and displayed a message saying the site was being blocked by the robots.txt file, even though it should not have been blocked.

Information Overload

After scratching my head for 20 minutes and looking throughout all the WordPress panels and the themes options, just to make sure there was no other option that may have prevented the indexing of the site, I took to my search to the search engines themselves and went looking for a solution.

Like most searches I got back a million or so results, more than I needed, but none with the solution I needed.  There was a lot of chatter on forums about how Google should be reading the file and how long it should take, but it had already been way longer than what most suggested and still the site wasn’t indexed and Google Webmaster tools kept seeing the wrong file and saying that the robots.txt was blocking the site from being indexed.

Finally I found what appeared to be a reliable solution.  Unfortunately I didn’t save the URL, so I can’t reference it here.  I looked for it briefly, but no luck.  It took a couple hours the first time to find it, so I know it’s out there and will look again later for it to include as a reference.

Refresh Rate of Robots.txt File

The most reliable information about how often Google actually updates the robots.txt file turned out to be within 48 hours of it being changed.  Although experience, in this situation, shows it might actually take longer than that.

After 48 hours the new site was still not being indexed and Google search and Webmaster Tools were still saying the robots.txt file was blocking the site.  However, finally at about the 96 hour point (4 days into this saga) Google Webmaster Tools finally did report a valid robots.txt file and there was no longer a message about site URL’s being blocked.

Within 24 hours of this being noticed, Webmaster Tools actually reported that seven pages of content had been submitted, and one page had been indexed.  Finally we were getting some where.  The next day more pages were indexed and finally after another day all the pages were indexed and viewable in Google.  Oh, and when I say viewable in Google I mean viewable when using the Google SITE search command.  For example, entering “site:millionairesuccessnetwork.com” as a Google search command will show all the pages for our Millionaire Success Network site that are indexed in Google.

All is Good, Pages are Indexed, Robots.txt is Updated and Working

In the end, it looks like only Google really knows how often they actually go out and refresh their copy of your sites robots.txt file.  Maybe this was just a once off and this new sites robots.txt file got lost in the depths of cyberspace.  Who knows?  Whatever the problem was, it was sure frustrating to know that a valid robots.txt file was in place, but not being read.

Then to have to wait days longer than anyone expected to have the robots.txt file actually refreshed within Google Webmaster Tools, just added to the frustration.  Finally life got easier once the error messages were gone and content started appearing as being submitted to the indexing giant.  Of course all the stress wasn’t gone until pages actually started appearing in the index.

I guess the moral of this story is, as long as you’re not in any hurry and don’t mind how long it takes for your site to get indexed, don’t worry and go ahead and block your site using your robots.txt file.  However, if you know you have a new site with content that needs to get indexed soon, then make sure you leave at least 48 hours and maybe twice that time set aside to unblock your site when using the robots.txt to block search engine indexing.

A Millionaire Success Secret

Originally I didn’t expect the robots.txt file to not get refreshed right away and once I started digging into the problem, I wanted to use a natural refresh of the robots.txt file to see who quick the problem was resolved on it’s own.  Had I been in a real hurry I would have sent out a Twitter update for a couple of the new sites pages that were just published.

Usually when you send out a message via Twitter with a link to a new page you created, the search engines will pickup the page and index it pretty quickly.  Not sure if that would have helped refresh the robots.txt file in this case, but that might be a good test for another time.

Today’s secret – if you want some new content index quickly, just Tweet about it!



Sign Me Up For Millionaire Success Tips & More!

We Respect Your Privacy

Recent Comments

Recent Tweets

Connect With Us

© Copyright 2013 - · Millionaire Success Network · All Rights Reserved
© Copyright 2013 - · Millionaire Success Network · All Rights Reserved
© Copyright 2013 - · Millionaire Success Network · All Rights Reserved
backtotop