I’m a big fan of the new canonical tag (er, element, to be more technically correct). It’s a powerful tool for dealing with duplicate content. But it’s not exactly reliable yet. Google wants us use it as if it were. Unquestionably, it’s a signal. But it can be ignored, even when it should clearly by obeyed.
Case in point: Northernsafety.com. Many thousands of non-canonical URLs are indexed. For example click on some of the listings on
this SERP and compare the URLs you were led to by Google to what’s listed as the canonical URL in the HTML source of these pages. You’ll see that the parameters OPC and PFM are present in the URLs in the search listings but are not present in the canonical link element. Hmmm.
I know Google uses the element as a strong hint rather than an absolute directive, however it sounded like from Matt’s video that it’s about as strong a hint as a 301 redirect. If that were the case, I wouldn’t have expected to see this behavior. This example I found doesn’t look to me to be an “edge case,” and I don’t see any reason why Google shouldn’t trust or adhere to the canonical tag in this particular situation. So what gives?
If you’re thinking that perhaps the canonical tags were just added and didn’t have time to kick in yet, take a look at the Cached links on some of those search listings. Some of these pages were cached way back in March and yet still have the canonical tag present in the Cached version. Certainly 2+ months is ample time for Google to canonicalize these pages??
I like canonical tags and I use them. But I always prefer 301 redirects over canonical tags, as 301s are pretty much *always* obeyed.
The lesson here: I wouldn’t bet my business on the canonical tag being obeyed by Google.
Yea 301 is still the best in the fight against duplicates.
Stephan have you seen an example of a URL using the canonical tag that has been removed from the index in the same way that a 301’d URL would be?
Hey Stephan,
I have noticed the same thing even on very trusted sites / places where the canonical tag should be trusted – I see the top search result for podcasting:
http://www.google.co.uk/search?q=podcasting
being the wikipedia page which is not the canonical version according to the canonical tag (the ‘podcast’ page is).
I discovered this while researching my SMX London “give it up” presentation – it’s very interesting to see them talking as though it is implemented when it very clearly appears not to be.
One of the problems with using SEO commands in Google (such as site: and inurl:) together is that you don’t often get a real representation of what is showing up in the SERPs, Stephan. In this case, a search for “electrical arc protective clothing” http://www.google.com/search?q=electrical+arc+protective+clothing&hl=en&rlz=1T4GGLL_en&start=10&sa=N shows that the top northernsafety.com page is the canonical-defined version. what Google shows with these hacks is useful information regarding what they still have in their indexes, but an actual search without SEO commands seems to deliver the right url, no?
@Chris – my wikipedia example shows it happening ‘in the wild’ as it were…
@Will,
I appreciate your example. From what I can see, the cached version of /wiki/Podcasting (http://72.14.205.104/search?q=cache%3Ahttp%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPodcasting&strip=1) from May 15 does not have the canonical element inside it; but notice that while my URL asks for the cache of /wiki/Podcasting, the Google box at the top of the page says it’s giving the cache of /wiki/Podcast.
But if you click through to the cached version of /wiki/Podcasting from the SERP you gave, the cache *does* show the canonical element.
That, and Wiki’s own weird internal redirection (e.g., when you’re on /wiki/Podcasting it says you’ve been “redirected from Podcasting” — meaning you’re seeing /wiki/Podcast content on the /wiki/Podcasting URL) makes me think there are more variables at play here than just the canonical element’s failure, although I’m not sure exactly what those variables are.
Its interesting to see others perceptions via the post here, certainly we are running some tests on a live site at present, and I will be happy to share any learnings that come about as a result.
As regards the canonical ‘tag’, where it differs is that it isnt the 301 redirect – which by its nature is a redirect and thus means users have to adhere. That isnt always realistic in commercial spheres – particularly with large organisations (no matter how hard you try), and the use of a page level element which does not impact on user experience gives us a potentially useful tool to use (obviously if it works as Matt Cutts mentioned).
Given the fact it has been agreed to by the big 3, one can’t help but think its only a matter of time before things correctly apply themself – surely !!!
@Erik,
There’s a simple explanation as to why you can’t see a canonical tag in the cache URL that you supplied. Your URL includes &strip=1 for the Text Only version of the cache. To make the page text-only, Google strips all <link> tags from the HTML as well as <img> tags.
So Will’s example of nonfunctioning canonical tags inside Wikipedia still stands as valid.
As always, Stephan’s post is right on.
I made a comment over on SEOmoz’s post (tinyurl. com/l67rdf) which contained the top 5 SEO requests to dev teams.
Every SEO keeps praising the canonical tag. It works, but not as well as it could.
We have 37+ market based subdomains. Each market creates their own unique content but we also produce some “national” content that is available for them to use if they choose.
We use the canonical tag on national content to specify a market that should be seen as the content originator. This is an amazing resolution to a potential duplicate content issue right? Well not exactly, since Google only uses the tag as a suggestion, it tends to be hit or miss. After much tracking and evaluation, Google still seems to rely more heavily on internal linking.
Fortunately, 99% of our URLs are canonicalized already and we don’t have to worry about tracking tags, or other DUST issues. I would imagine that the canonical tag works better for these situations.
So it’s cool, but not that cool.