What kind of information does a search engine collect?

A search engine cannot actually ‘see’ the way a human user can.

A search engine finds out about each web page by ‘reading’ the code. (To see this code, visit a web page then choose ‘view’ and then ‘source’ from your web browser).

There are certain parts of this code that a search engine will favour. These are generally divided into two categories:

•  ‘on the page’ content which is visible to a user. Eg the web page title, headings, text and links

•  ‘off the page’ content which is information contained in the code that is NOT visible to the user. Eg ‘meta tags’ and ‘alt text’ (these are names given to specific parts of the html code that creates web pages).

The type of information that can be indexed from a web page will depend on how the page is written (on the page content) and built (off the page content). This determines what information is made available to a search engine.

Some web page content is difficult or impossible for search engines to access and index. For example, content that is purely ‘visual’ such as a picture of a word cannot be ‘seen’ by a search engine, therefore the typed version of a word is often preferable. Other types of content that a search engine cannot ‘easily’ index include: dynamic content such as that generated by an online database, frames based web pages, content created with Macromedia Flash and content accessed via JavaScript’s. 

The first step with search engine optimisation is to make sure page content is accessible to search engines so that it can be properly indexed. A web page that has been built with care for search engine accessibility is commonly called a search engine friendly web page.