Search Engine Land
  • SEO
    • > All SEO
    • > What Is SEO?
    • > SEO Periodic Table
    • > Google: SEO
    • > Bing SEO
    • > Google Algorithm Updates
  • PPC
    • > All PPC
    • > What is PPC?
    • > Google Ads
    • > Microsoft Ads
    • > The Periodic Tables of PPC
  • Focuses
    • > Local
    • > Commerce
    • > Shopify SEO Guide
    • > Content
    • > Email Marketing Periodic Table
    • > Social Media Marketing
    • > Analytics
    • > Search Engine Land Awards
    • > All Focuses
  • SMX
  • Webinars
  • Intelligence Reports
  • White Papers
  • About
    • > About Search Engine Land
    • > Newsletter
    • > Third Door Media
    • > Advertise

Processing...Please wait.

Search Engine Land » Google » Google: SEO » Google Offers Robots.txt Generator

Google Offers Robots.txt Generator

Google’s rolled out a new tool at Google Webmaster Central, a robots.txt generator. It’s designed to allow site owners to easily create a robots.txt file, one of the two main ways (along with the meta robots tag) to prevent search engines from indexing content. Robots.txt generators aren’t new. You can find many of them out […]

Danny Sullivan on March 27, 2008 at 5:39 pm

Google’s

rolled out
a new tool at Google
Webmaster Central
, a robots.txt generator. It’s designed to allow site
owners to easily create a robots.txt file, one of the two main ways (along with
the meta robots tag)
to prevent search engines from indexing content. Robots.txt generators aren’t
new. You can find many of them out there by searching. But this is the first
time a major search engine has provided a generator tool of its own.

It’s nice to see the addition. Robots.txt files aren’t complicated to create.
You can write them using a text editor such as notepad with just a few simple
commands. But they can still be scary or hard for some site owners to
contemplate.

To access the tool, log-in to your
Google Webmaster Tools
account, then click on the Tools menu option on the left-hand side of the screen
after you select one of your verified sites. You’ll see a "Generate robots.txt"
link among the tool options. That’s what you want.

By default, the tool is designed to let you create a robots.txt file to allow
all robots into your site. That’s kind of odd. By default, all robots will come
into your site. If you want them, then there’s no need to have a robots.txt file
at all. It’s like pinning a note to your chest reminding yourself to breathe.
Promise, you’ll keep breathing even if you forget to look at the note.

Instead, you generally want to put up a robots.txt file to block crawling of
some type. I may dig into a future article to examine when you might want to mix
allow and disallow statements, but off the top of my head, there’s not a lot of reasons
to do so.

You can change the default option to "Block all robots" easily enough. Do
that, and you get the standard and familiar two line keep out code:

User-Agent: *
Disallow: /

The first line — User-Agent — is how you tell particular spiders or robots
to pay attention to the following instructions. Using the wildcard — * — says
"hey ALL spiders, listen up."

The second line says what they can’t access. In this case, the / means to not
spider anything within the web site. You know how pages within a web site all
begin domain/something, like this:

http://website.com/page.html

See that / between website.com and page.html? Technically, that slash is the
start of the URL. So if you disallow all pages beginning with a slash, you’re
blocking all pages within the entire site.

Let’s move on from our mini-robots.txt 101 course. Maybe you only want to
block Google. Well, the tool is supposed to make this type of thing easy, but I
was perplexed. Step one is to either allow or block ALL robots. Then in Step 2,
you decide if you want to block specific robots. So which do you go with in step
1, block all or none?

I figured you’d want to allow all robots, then believe the reassuring text
next to that option that said "you can fine-tune this rule in the next step."
The problem is, I couldn’t. If I tried to block Googlebot, the instructions
didn’t change. If I tried to choose, say, Googlebot-Mobile, same thing.

Eventually, I figured it out. If you decide to block specific spiders, you
have to choose the spider, then specify also what you want to block in the
"Files or directories" box, such as a particular file or directory. So say I
kept all print-only versions of stories in a directory called /print. I’d enter
that directory to get this:

User-Agent: *
Allow: /

User-Agent: Googlebot
Disallow: /print
Allow: /

The first part tells spiders they can access the entire site. As I said, this
is entirely unnecessary, but you get it anyway. The second part says that
Googlebot cannot access the /print area.

The tool lets you craft specific rules for these particular Google crawlers:

  • Googlebot
  • Googlebot-Mobile
  • Googlebot-Image
  • Mediapartners-Google
  • Adsbot-Google

I wish the names were accompanied by parenthesis quickly explaining what each
crawler does, and what blocking them will do, say, something like this:

  • Googlebot-Mobile (allows or blocks content from Google mobile search)

Instead, you have to look through the various

help files
to understand what each does. Ironically, the
older Analyze Robots.txt
tool
within Google Webmaster Tools DOES have these helpful explanations, so I
expect they’ll migrate over.

You can also use the tool to enter a name for another crawler. The problem
is, someone using this tool probably doesn’t know the crawler names out there
that they want to block. I’d have given Google serious kudos points if they added
some of the other major crawlers. But then again, if they had, no doubt someone
would have accused them of trying to get people to block other search engines :)

Another thing that would have been nice was if people could have pasted full
URLs into the box to have them converted. A site owner using this tool might not
realize they need to drop the domain portion of a URL to block a particular
page. But if you could paste something like this:

http://website.com/page-i-want-to-block.html

And have the tool automatically turn it into this:

User-Agent: *
Disallow: /page-i-want-to-block.html

After you make your file, upload it to the root directory of your web site.
If you don’t know what that is, find someone who does! This is important. Google
allows for subdirectories of web sites to be registered within Google Webmaster
Tools. However, robots.txt files do NOT work on a subdirectory basis. They have
to go at the root level of a web site. If you don’t put them there, then you
won’t be preventing access to any part of the site. Remember, after you upload
to the root level, you can go back into Google Webmaster Tools and use that
aforementioned analysis tool to see if it is really blocking the pages you want
to keep out.

Overall, I’m glad to see the new tool, and I imagine it will improve more
over time to make it even more user friendly.

In related news, Google says that the Web Crawl diagnostics area now has a new
filter letting you see only web crawl errors related to sitemaps you’ve
submitted. Also, there have been some UI tweaks to the iGoogle gadgets from
Webmaster Central that were
rolled out last
month.

For more about Google’s webmaster tools, be sure to check out the
quick start
guide
they offer and see our
Google
Webmaster Central archives
.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


New on Search Engine Land

    More FAQ rich results being displayed in Google Search

    Webinar: Benchmark your social media performance for a competitive edge

    Google releases May 2022 broad core update

    Spotify, Meta update political ad offerings for 2022 election cycle

    Take web hosting to the (NVMe) extreme

About The Author

Danny Sullivan
Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land and MarTech, and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.

Related Topics

GoogleGoogle: SEOMobileSEO

Get the daily newsletter search marketers rely on.

Processing...Please wait.

See terms.

ATTEND OUR EVENTS

Learn actionable search marketing tactics that can help you drive more traffic, leads, and revenue.

March 8-9, 2022: Master Classes (virtual)

June 14-15, 2022: SMX Advanced (virtual)

November 15-16, 2022: SMX Next (virtual)

Learn More About Our SMX Events

Discover time-saving technologies and actionable tactics that can help you overcome crucial marketing challenges.

Start Discovering Now: Spring (virtual)

September 28-29, 2022: Fall (virtual)

Learn More About Our MarTech Events

Webinars

Take a Crawl, Walk, Run Approach to Multi-Channel ABM

Content Comes First: Transform Your Operations With DAM

Dominate Your Competition with Google Auction Insights and Search Intelligence

See More Webinars

Intelligence Reports

Enterprise SEO Platforms: A Marketer’s Guide

Enterprise Identity Resolution Platforms

Email Marketing Platforms: A Marketer’s Guide

Enterprise Sales Enablement Platforms: A Marketer’s Guide

Enterprise Digital Experience Platforms: A Marketer’s Guide

Enterprise Call Analytics Platforms: A Marketer’s Guide

See More Intelligence Reports

White Papers

Reputation Management For Healthcare Organizations

Unlock the App Marketing Potential of QR Codes

Realising the power of virtual events for demand generation

The Progressive Marketer’s Ultimate Events Strategy 2022 Worksheet

CMO Guide: How to Plan Smart and Pivot Fast

See More Whitepapers

Receive daily search news and analysis.

Processing...Please wait.

Topics

  • SEO
  • PPC

Our Events

  • Search Marketing Expo - SMX
  • MarTech

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • RSS
  • Youtube

© 2022 Third Door Media, Inc. All rights reserved.

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.Ok
news medicine seo game game business health news news news health news news https://latestlayrics.com job seo news game news seo health health news news news seo news news news seo news medicine news