Friday, August 17, 2018

Google Search Console: Blogger #Error Robots Exclusion Defaults

A few days ago, I received an error alert email from Google Search Console which stated that the search engine spider had crawled my ModDoll Fun! site and encountered an issue.

New issue found:


Indexed, though blocked by robots.txt
It stated that robots.txt had specified a specific URL be excluded or blocked from search indexing, but the URL had been indexed anyway (because other site content referenced it). It issued a warning to which basically said, "Our spider likes to respect robots.txt rules, but we think this page is relevant to search. Do you really want this page excluded from Google Search results? Fix this!"

I found this odd. Not that the spider was polite, but that the ModDoll Fun! robots.txt file was blocking a specific URL from the search index. ModDoll Fun! is hosted on Blogger.com and by default Blogger automatically generates a robots.txt (with which it cautions you from fiddling). ModDoll Fun! is a playtime/hobby site without any extraordinary web development needs. I hadn't touched the robots.txt file, which means this error is entirely due to Blogger (or rather its algorithm) determining the URL in question would somehow be an issue or that it contained content that would be problematic for a search spider. The spider disagreed. And so did I.

This is the URL: http://www.moddoll.fun/search/label/events.

Can you think of any reason why an auto-generated page containing all the posts tagged with a label called "events" would freak out a robot or in this case, a software program? Once, I considered it for a minute, I chuckled. Yes, boys and girls, it would seem that Blogger erroneously assumed that each post labeled "events" was an actual event, or computer programming subroutine. By default Blogger said, "No, no, no, you can't try to surreptitiously push coding snippets onto an unsuspecting spider." And as a result, tagged this URL as excluded or blocked in the robots.txt file.

The problem is that on ModDoll Fun!, which is an 18" Doll Collector's & DIY Modeler's site, posts labeled as "events" are NOT computer program subroutines, instead these are posts about Doll/Toy Events I've attended or in which I am interested. It's an obscure error and I solved it by renaming the label, making "events" into "in-store events," but really Blogger needs to get a grip.
In all seriousness... Anyone else able to replicate this error? Or come up with an alternative reason for only blocking this one specific page, the auto-generated "posts tagged with the label events" page? Renaming the label corrected the search console issue, but....
Kudos to the curious Google Search spider that indexed the page regardless of instructions to do otherwise (though you wouldn't want a spider to ignore a rules document at its discretion in every case obviously) and then sent a warning alert to me asking if its actions were correct. Yes, Google Search spider the URL should have been indexed. Tell your cousin, the Blogger bot to adjust its algorithm for determining excluded URLs.

Not all "events" are equal, or an hacking attempt. (I'm blowing virtual raspberries.)

No comments:

Post a Comment