Viewing Yor Site with an All-Text Browser

Viewing Your Site with an All-Text Browser
Improvement implies a feedback loop: you can't know how well you are doing without a mechanism for examining your current status. The
feedback mechanism that helps you improve your site from an SEO perspective is to view it as the bot sees it. This means viewing the site
using a text-only browser. A text-only browser, just like the search engine bot, will ignore images and graphics and only process the text on
a page.
The best-known text-only web browser is Lynx. You can find more information about Lynx at http://lynx.isc.org/. Generally, the process of
installing Lynx involves downloading source code and compiling it.
The Lynx site also provides links to a variety of precompiled Lynx builds you can download.
Don't want to get into compiled source code or figuring out which idiosyncratic Lynx build to download? There is a simple Lynx Viewer
available on the Web at http://www.delorie.com/web/lynxview.html.
First open the Lynx Viewer web page. Next, you'll need to follow the directions to make sure that a file named delorie.htm is saved in the
root directory of your web site. To do this, you'll either need FTP access to upload a file to your web server, or the ability to create an
empty page on your site.
It doesn't matter what's in this file. Its sole purpose is to make sure you own or control the site you are
testing.
Finally, simply enter your URL, and see what your site looks like in a text-only version.

Excluding the Bot
There are a number of reasons you might want to block robots, or bots, from all, or part, of your site. For example, if your site is not
complete, if you have broken links, or if you haven't prepared your site for a search engine visit, you probably don't want to be indexed yet.
You may also want to protect parts of your site from being indexed if those parts contain sensitive information or pages that you know
cannot be accurately traversed or parsed.
If you need to, you can make sure that part of your site does not get indexed by any search engine.
Following the no-robots protocol is voluntary and based on the honor system. So all you can really be
sure of is that a legitimate search engine that follows the protocol will not index the prohibited parts of
your site




The robots.txt File
To block bots from traversing your site, place a text file named robots.txt in your site's web root directory (where the HTML files for your
site are placed). The following syntax in the robots.txt file blocks all compliant bots from traversing your entire site:
User-agent: *
Disallow: /
You can exercise more granular control over both which bots you ban and which parts of your site are off-limits as follows:
The User-agent line specifies the bot that is to be banished.
The Disallow line specifies a path relative to your root directory that is banned territory.
A single robots.txt file can include multiple User-agent bot bannings, each disallowing different paths.
For example, you would tell the Google search bot not to look in your images directory (assuming the images directory is right beneath
your web root directory) by placing the following two lines in your robots.txt file:
User-agent: googlebot
Disallow: /images
The robots.txt mechanism relies on the honor system. By definition, it is a text file that can be read by
anyone with a browser. So don't absolutely rely on every bot honoring the request within a robots.txt
file, and don't use robots.txt in an attempt to protect sensitive information from being uncovered on your
site by humans (this is a different issue from using it to avoid publishing sensitive information in search
engine indexes).
For more information about working with the robots.txt file, see the Web Robots FAQ, http://www.robotstxt.org/wc/faq.html. You can also
find tools for generating custom robots.txt files and robot meta tags (explained below) at http://www.rietta.com/robogen/.