10 April 2006A brand new Microsoft research paper caught my attention recently. The paper reinforced my firm belief that one of the most important SEO factors in the future is going to be: usage statistics (Google’s terminology) or popularity (Microsoft’s terminology, not to be confused with link popularity).
The paper “Beyond PageRank: Machine Learning for Static Ranking” deals with Microsoft’s RankNet technology and devising query-independent ranking factors. The RankNet relevancy technology works on training neural networks.
Microsoft has developed a large dataset of common queries for which humans were used to order the top results. In the paper, the researchers tried to find which ranking factors could be used in order to produce ranking results as similar as possible to the ones done by real humans.
The tested groups of ranking factors were: PageRank, Popularity, Anchor Text and inlinks, Page factors, Domain factors.
The researchers tested the actual popularity of a web page (the number of times the page has been visited by a user over a period of time). This is not link popularity, but usage popularity. There are 3 basic sources of popularity data – Microsoft’s toolbar, proxy logs and click tracking within the SERPs.
Anchor text and inlinks
Information related to the anchor text of the page’s incoming links (total text, unique words etc).
These were factors associated with the page and its URL. The researchers pointed out they used 8 simple features and named only 2 of them specifically – number of words in the body and the frequency of the most common word on-page.
Factors computed as averages across all pages in the domain (like average number of outlinks, average PageRank per page etc.)
The researchers found that when PageRank was used alone against the other much simpler factors (Popularity, Page.. factors), PageRank was outscored. Basically, the other tested factors provided much better relevancy (compared with human ranking).
Now comes the meaty part. The 2 most important factors that boosted the ranking relevancy mostly were: Page factors and Popularity. The researchers found that from the Page factors the URL was insignificant and the important factors were on-page text.
Even though, the researchers had very limited Popularity data, they found that this data was a very significant relevancy booster (second to Page factors). They also found that the more Popularity data they gained, the better the relevancy boost.
All of this reinforced my contention that one of the hottest new directions in search engine relevancy is usage statistics or user/usage popularity or in simple words – how may people visit a web site / page, how much time do visitors stay and how often they return to a website.
The tricky part here is: how to obtain this popularity data? Toolbars, proxy logs, click tracking etc. Microsoft could have an edge in the future by having the most popular operating system and browser. Google has the toolbar, AdSense, AdWords, Analytics etc. The war on gathering usage data would be interesting as Google has at least 2 patents dealing with it.
The old saying: make web pages for users, not for search engines should be the new webmaster mantra.
The 10 commandments of building a quality website
Posted under tags: seo