Well there’s no time like the present to try and clear the last of the IBA free site analyses. As they say - better late than never!
Please Note: While the following analysis relates to bifsniff.com many of the suggestions made are highly relevant to any website.
Bifsniff.com - formerly the cartoon guys
I had the fortune to meet Frank a few months back when he came along the the inaugural ShareIT event in Cork. Frank has also been kind enough to let me know every time Wordpress has decided to rewrite my .htaccess file (which is more times than I care to mention - damn WP).
Page Navigation
- Fist port of call - the Canonical URL
- Comment Feeds in the Supplemental Bin
- Why pages go supplemental
- Controlling search engine access to your site
- Don’t channel PageRank to useless pages
- Having unique page titles and descriptions
First port of call - the Canonical URL
The very first thing I check when I go to a site is whether or not it resolves via a canonical URL. This all sounds very technical, but in fact it’s a very simple test:
- In your browser address bar type your website address WITHOUT the www.
- Now do the same thing again, except this time with the www.
*IF* your site appeared for both addresses *AND* the address bar didn’t change for either to the other (i.e. you typed www.[mysite].[myTLD] and the address automatically change to [mysite].[myTLD] without the www.) then you are effectively publishing the same site two times. This is known as the “Canonical URL” issue.
You see if Google can access your site via both www. and non-www. addresses it sees these as two different sites. Google does a pretty good job of filtering out one or the other from its results, but where this can hurt you is your backlink profile. Say lots of people have linked to your site and those links point at both the non-www. and www. addresses more or less evenly. Well under this situation you are effectively diluting the link love by splitting it between two sites. Now if you go and set up a really simple redirect from non-www. to www. (or vice-versa) you’ll effectively double the link love in this example. his could have an effect on how your site ranks overall.
Now if Frank is reading this he’s probably saying ‘tell me something I don’t know’. That’s because Frank has mastered this a long time ago. Try typing www.bifsniff.com into your browser. Now take a look at the address bar - no www. there now is there?
Frank mentioned that he had a lot of pages in the supplemental index. In particular the comment feeds seemed to get supped. This is quite regular in fact. The comment feed is generally only linked to from within a post itself, and rarely will you have external links pointing at your comment feed URL.
Curiously this issue has been the focus of quite a bit of disucusion (overview here, some more here) in the SEO field recently.
Before we go any further let’s take a step back and look at the problem faced by bifsniff.com. First we need to take a snapshot of the pages indexed in both Google’s main and supplemental indices. The following advanced operator commands will help us:
- Total indexed pages:
site:bifsniff.com
- Pages in supplemental index:
*** -RCredCardinalIE site:bifsniff.com
Query 1 gives us the total number of pages indexed (2,060), query 2 the number of pages in the supplemental index (1,180), and the difference between the two (880) the number of pages in the primary index. The comment feeds are of low value and deservedly end up in the supplemental index. This is quite normal, and generally wont hurt your site. An argument can be made, however, for trying to reduce the number of pages indexed in order to ensure that Google gets your most important pages into the primary index.
Another Step Back - Why Pages go Supplemental
To better understand why you want to control what pages get indexed you need to know why pages go supplemental. There have been many rumours and myths about this topic. However recently Google has come out and said on a number of occasions that there is only one reason why a page will end up in the suppelmental index - Lack of PageRank
Get more quality backlinks. This is a key way that our algorithms will view your pages as more valuable to retain in our main index.
Source: Adam Lasnik comment here
…the main determinant of whether a url is in our main web index or in the supplemental index is PageRank.
If a page doesn’t have enough PageRank to be included in our main web index, the supplemental results represent an additional chance for users to find that page, as opposed to Google not indexing the page.
Source: Matt Cutts Google Hell post
PageRank is passed from one resource (page) to another via links. A collection of pages that forms a any website will therefore have a calculable amount of pagerank to share between those pages. Let’s take a very simple example to show this.
So let’s say that your site has ‘6′ PageRank units to share amongst its pages. All external links point at the homepage only (rarely the case, although the homepage regularly has the highest PageRank). Here’s how the site might look:

The homepage (’Home’) links to 3 sub-category pages (’SubCat1′, ‘SubCat2′, ‘SubCat3′). So each of these sub pages receives 2 PageRank units (i.e. 6/3) from the homepage. In turn each of these sub-category pages links all other sub-category pages, 2 inner pages, and back to the homepage (this is a classic ’silo’ architecture).
- the homepage funnels PgaeRank to 3 sub-category pages
- each sub-category page funnels PageRank to 2 other sub-category pages, two inner pages, and back to the homepage
- each inner page funnels PageRank to 1 other inner page and back to its parent sub-category page
There are many reciprocal relationships within this very small example, and calculating the actual PageRank in and out of any page can become very complex as the number of pages and links on each page increases (I’m not even going to try).
What should be obvious though is the fact that reducing the number of pages which share the initial PageRank should increase the PageRank shared by the smaller page set. That in turn may result in some additional pages coming out of the supplemental index and into the primary index.
In bifsniff’s case they have too many pages and either not enough PageRank to support all those pages or PageRank is not being filtered optimally to support each page.
Controlling search engine access to your site
One trick here is to specifically exclude pages that you don’t want indexed. In the case of Wordpress feeds you can use this useful plugin written by Joost DeValk. The plugin ads a NoIndex tag to your feeds so they will be followed but wont get indexed.
In terms of the comments feeds that end up in the supps, well it would be as well to just add a NOFOLLOW to the links. This will take a bit of digging into your code as the link text is generated within /wp-includes/fedds.php:
92. function comments_rss_link($link_text = 'Comments RSS', $commentsrssfilename = '') {
93. $url = comments_rss($commentsrssfilename);
94. echo "<a href='$url'>$link_text</a>";
95. }
Line 94. needs to be changed to:
echo "<a href='$url' rel='nofollow'>$link_text</a>";
That should place the required “NOFOLLOW” value into the rel attribute. Those feeds should then no longer be indexed in Google. By changing this code (rather than using Joost’s plugin) you get the benefit of retaining your main feeds indexation while keeping those pesky comment feeds out of the index.
When I performed the site: operator commend in Google i got the following results:

Here’s the ‘Secret Page’:

That page has a PageRank of 4 and no external links according to Yahoo!. The same goes for the Authors Login page.
I would NOFOLLOW those links (this might require some hard coding hackery) and exclude those pages within my robots.txt file:
User-agent: *
Disallow: /secret/
Disallow: /private-authors-area/
Any other pages that are not adding to the user experience could also be removed in similar fashion.
I noticed that bifsniff uses an identical META description throughout the site. I think it is always better to make the page META as unique as possible for each page. There are two benefits here:
- search engines generally use the META description as the snippet, so you should view your META as a call-to-action;
- unique META data *may* assist you when a page is at the margin of being duplicate content.
If you use Wordpress there are a number of plugins available that allow you to add unique META descriptions and keywords to each page and post.
Conclusion
Well hopefully there’s quite a bit for Frank to go with there. There was a few other other items that I came across (linking to archives) but I think this post is quite long enough.
Hopefully this post will explain to people how search engines see each site in aggregate and how Google in particular decides which pages to include in the primary and supplementary index.
If you have any questions, or any of the above technical issues need de-mystifying please do leave me a comment below and I’ll try to better explain. (Glances at Aide from www.simplythebest.ie)
View Comments (18)
Permalink |
Add to del.icio.us |
Digg this! 