Guidelines for Search Friendly Development: from Google I/O

Maile Ohye gave a presentation at Google I/O (Google’s developer conference) on making search friendly websites, from the perspective of Googlebot as well as the visitor. Here are some highlights from what was covered:

Avoid Pagerank Dilution by choosing one domain (including www or non-www) and only having one URL for each page.  Factors that contribute to pagerank dilution include dynamic URLs with query-string parameters.  Googlebot will attempt to find only the relevant name/value pairs, so having well-named parameters can help.  Including session IDs in the URL can lead to an infinite space problem where Googlebot thinks there are too many URLs for the amount of content it actually sees.  Using the rel=canonical syntax lets you tell Googlebot what URL it should use for the content it is reading.  This is the only sure-fire way to prevent pagerank dilution.

Design your site to help users find what they want.

  • Use breadcrumbs to help users determine their location within your site.
  • Provide logical category navigation, homepage to child-page, and child-page to homepage navigation to help users get to where they want to go.
  • Give hints about what they’ve already looked at by using visited link colors or providing a list of “items you’ve looked at”
  • Use a “helpful 404″ page to suggest the closest match or provide a search box for the user to search your site.  Webmaster Tools can generate a helpful 404 page for you to use on your own site.
  • Use the language of your users – don’t use “athletic footwear” as your keyword when people are more likely to search for “running shoes”

Create shareable URLs by navigating to a specific page instead of a single generic URL.  Use descriptive filenames for pages and images (hyphens are preferred to underscores as word separators).  As much as possible, use lowercase URLs (robots.txt is case sensitive!).

Design your site to help search engines know what is important.

  • Secure your private content that you don’t want crawled.  (passwords or robots.txt)
  • Make sure important pages are well linked from the homepage and other pages.
  • Submit a sitemap to help the crawler know about your pages.
  • Disallow shopping carts and login pages – Googlebot will never buy a book or signup for an account.
  • Use keywords in links.  “Click here to read more” isn’t nearly as good as “Read more about our web design services“.

Some notes about dynamic URLs:

  • Googlebot can recognize query string parameters as name/value pairs.
  • When Googlebot sees multiple URLs with the same content, it attempts to determine which name/value pairs are important and which ones aren’t.  (See Avoid Pagerank Dilution above.)
  • Keywords in the actual file path are weighted the same as keywords in the query string.  So “web-design-services.html” would score just as well as “index.php?page=web-design-services”.

When selecting technologies for a rich user experience, HTML is your friend and can help you reach the widest audience.  When adding AJAX, consider “hijax” where AJAX/JS can fallback to static links when Javascript isn’t enabled by the user or supported by the crawler.  But note that Googlebot can:

  • Index user-visible text in a flash file
  • Index external resources loaded by a flash file
  • Process onclick events for Javascript executed navigation IF the function is defined on the page itself (not in an external .js file)
  • Discover both frames and iframes (frames are indexed with their parent while iframes are indexed separately)

Images are indexed based on the surrounding text as well as the alt tag, the image filename, and the quality of the image (resolution and pixels).

Text replacement using “text-indent: -999″ is considered a risky behavior by Googlebot and you may be penalized or thought to be spamming if you use this method.  Preferred methods of replacing text are alt tags, sifr, sprites and noscript tags.  Use “text-indent” if you really must, but be aware of the risks.

Google Webmaster Tools provides lots of very helpful resources for making your site search friendly.  The messages and status provided there can give you insight into how Googlebot sees your site and any errors or warnings it may have for you.  It’s a good practice to verify your site and check in with Webmaster Tools every so often.  The Webmaster Tools API also provides access to crawl errors, so we could build a tool to automatically check in on our sites and alert us to any errors.  If there are missing pages with links from external sites, setup a 301 redirect for these or try to contact the site owner to have the link corrected.  “Cleaning up the web” benefits you and your visitors.

Server response codes should only be 200 when the requested page is actually returned.  Don’t return 200 when doing a redirect because the page doesn’t exist.  This “soft-404″ can lead to the infinite space problem and pagerank dilution.  When content has moved permanently use a 301 redirect to the new content and Googlebot will transfer the pagerank to the new location.  If you must take down your site for maintenance, don’t use a 404 error, instead return a 503 or another 5xx code and Googlebot will come back later.

Engaging the community with a blog can also be an effective way of improving your site’s rank.  If you start a blog, choose categories that make good keywords.  Use your blog to become an authority in your field, where people will come to your blog for good information and even link to your blog.  (Also see Matt Cutts’ blog for some tips.)

Comments or reviews of items or products on your site can be helpful to your users in making informed decisions.  But be sure to place the comments or reviews on the same page as the product to consolidate the pagerank, and don’t forget to watch out for spam!

Geographic Targeting for region specific websites can be done by selecting a country specific TLD (ccTLD) or by setting the geotargeting region in Webmaster Tools.  Also, the IP address of the server hosting the website can have an impact on the geographic targeting of a website, if no other hints are provided to Google via the TLD or Webmaster Tools.

Ultimately a conversion on your site comes from the user having a good feeling about you and your product.  This good feeling is mostly a result of what they can learn about you from their research on your site and others.  Word of mouth via social media helps, but nothing replaces a well-designed website.

Leave a Reply