Web Browser “Session Merging”
December 8, 2009Windows Live SkyDrive
June 28, 2010I’ve been writing web applications for years, but I’ve never really had to put too much thought into whether search engines such as Google and Bing were finding the sites that I have worked on, or whether they were deducing appropriate information about those sites and giving them their appropriate ranking in search results. The reason for this is that most of the web development that I have been involved with has tended to be web “applications”, where the bulk of the interesting stuff is hidden away behind a login; without logging in there’s not much to look at, and in many cases the content of the site isn’t something you would want search engines to look at anyway … so who cares about SEO!
However, if you have a web site that promotes your company, products and services then you probably do care about SEO. Or if you don’t, you probably should! Improving your ranking with the search engines could have a positive impact on your overall business, and in these economic times we all need all the help we can get.
Turns out that the basic principles of SEO are pretty straight forward, really it’s mainly about making sure that your site conforms to certain standards, and doesn’t contain errors. Sounds simple right? You’d be surprised how few companies pay little, if any, attention to this potentially important subject, and suffer the consequences of doing so … probably without even realizing it.
You have little or no control over when search engine robots visit your site, so all you can do is try to ensure that everything that you publish on the web is up to scratch. Here are a few things that you can do to help improve your search engine ratings:
- Ensure that the HTML that makes up your site is properly formatted. Robots take a dim view of improperly formatted HTML, and the more errors that are found, the lower your ratings are likely to be.
- Don’t assume that HTML editors will always produce properly formatted HTML, because it’s not always the case!
- Try to limit the physical size of each page. Robots have limits regarding the physical amount of data that they will search on any given page. After reading a certain amount of data from a page a robot may simply give up, and if there is important information at the bottom of a large page, it may never get indexed. Unfortunately these limits may be different from robot to robot, and are not published.
- Ensure that every page has a title specified with the <TITLE> tag, and that the title is short and descriptive. Page titles are very important to search engine robots.
- Use HTML headings carefully. Robots typically place a lot of importance on HTML heading tags, because it is assumed that the headings will give a good overall description of what the page is about. It is recommended that a page only has a single <H1> tag, and doesn’t make frequent use of subheadings (<H2>, <H3> etc.).
- Use meta tags in each page. In particular use the meta keywords and meta description tags to describe what the page content is about, but also consider adding other meta tags like meta author and meta copyright. Search engine robots place high importance to the data in meta tags.
- Don’t get too deep! Search engines have (undocumented) rules about how many levels deep they will go when indexing a site. If you have important content that is buried several levels down in your site it may never get indexed.
- Avoid having multiple URL’s that point to the same content, especially if you have external links in to your site. How many external links point to your content is an important indicator of how relevant your site is considered to be by other sites, and having multiple URL’s pointing to the same content could dilute the search engine crawlers view of how relevant your content is to others.
- Be careful how much use is made of technologies like Flash and Silverlight. If a site’s UI is comprised entirely of pages which make heavy use of these technologies then there will be lots of <OBJECT> tags in the site that point the browser to the Flash or Silverlight content, but mot much else! Robots don’t look at <OBJECT> tags, there’s no point because they would not know what do with the binary content anyway, so if you’re not careful you can create a very rich site that looks great in a browser … but has absolutely no content that a search engine robot can index!
- If your pages do make a lot of use of technologies like Flash and Silverlight, consider using a <NOSCRIPT> tag to add content for search engine robots to index. The <NOSCRIPT> tag is used to hold content to display in browsers that don’t support JavaScript, but these days pretty much all browsers do. However, search engine robots DO NOT support JavaScript, so they WILL see the content in a <NOSCRIPT> section of a page!
- Related to the previous item, avoid having content that is only available via the execution of JavaScript – the robots won’t execute any JavaScript code, so your valuable content may be hidden.
- Try to get other web sites, particularly “popular” web sites, to have links to your content. Search engine robots consider inbound links to your site as a good indicator of the relevance and popularity of your content, and links from sites which themselves have high ratings are considered even more important.
- Tell search engine robots what NOT to look at. If you have content that should not be indexed, for any reason, you can create a special file called robots.txt in the root folder of your site, and you can specify rules for what should be ignored by robots. In particular, make sure you exclude any binary content (images, videos, documents, PDF files, etc.) because these things are relatively large and may cause a robot to give up indexing your entire site! For more information about the robots.txt file refer to http://www.robotstxt.org.
- Tell search engines what content they SHOULD look at by adding a sitemap.xml file to the root folder of your site. A sitemap.xml file contains information about the pages that you DO want search engine robots to process. For more information refer to http://www.sitemaps.org.
- Ensure that you don’t host ANY malware on your site. Search engine robots are getting pretty good at identifying malware, and if they detect malware hosted on your site they are likely to not only give up processing the site, but also blacklist the site and never return.
Getting to grips with all of these things can be a real challenge, especially on larger sites, but there are tools out there to help. In particular I recently saw a demo of a new free tool from Microsoft called the SEO Toolkit. This is a simple application that you can use to analyze a web site in a similar way to the way that search engine robots look at a site, and the tool then produces detailed reports and suggestions as to what can be done to improve the SEO ratings for the site. You can also use the tool to compare changes over time, so you can whether changes you make to the site have improved or worsened your likely SEO rating. For more information refer to http://www.microsoft.com/web/spotlight/seo.aspx.
This article only scratches the surface of what is an extensive and complex subject, but hopefully armed with the basics you can at least be aware of some of the basic rules, and start to improve the ratings for your site.