Append Tracking Information Without Creating Duplicate Content

I mentioned towards the end of my Search Engine Land article about redirects how you can use the hash or pound symbol (#) in a URL to append tracking information.

Why do this? Because it would prevent duplicate content (ie. the same page at multiple URLs that look unique to the engines), and it would aggregate all link juice to the one canonical URL.

The # in an URL is usually used for sending visitors to an anchor within the page they are on (e.g. “Jump to top of page” or “Jump to Table of Contents”).

Appending tracking information to URLs with a # works from an SEO perspective because search engines ignore the # and everything after it. This effectively collapses the tracked URLs together.

Let’s take a look at a concrete example to see how this plays out. Imagine you linked to your “About Us” page from your blog and that link pointed to:

www.mythicalcompany.com/aboutus.php#blog

and from your site-wide footer on your ecommerce site you linked to:

www.mythicalcompany.com/aboutus.php#footer

Both URLs would be interpreted by Google and the other engines as:
www.mythicalcompany.com/aboutus.php

Yet the full URL (www.mythicalcompany.com/aboutus.php#footer) is available to any client-side JavaScripts. So you could write a script that would pull what’s after the # and insert it into a cookie or otherwise send it to your server and/or web analytics.

Note that the full URL will NOT show up in your log files, because web browsers only use what’s after the # to jump to the anchor within the page, and that’s done locally within the browser. In other words the browser doesn’t send the full URL, so the anchor information (i.e. any text after the #) is not stored within environment variables like REQUEST_URI. Thus you can not use a hash for passing parameters in your URL for use by your PHP (or ASP or whatever) scripts (at least not directly).

If you have a stats package that uses log file analysis, hash-containing URLs won’t pass the anchor to your server logs. A workaround is to write and then include a client-side script that sends a ping via a URL with the necessary tracking appended via a query string. That ping URL would have the info appended but any content returned from that URL would be ignored by your script. That way the stats package can pick up the tracking info from query string parameters as normal — but through the second URL requested by your script, not the first one originally requested by the web browser. Make sense?

Comments

SEO Chatter says

August 22, 2008 at 8:29 pm

You rock.

Any ideas on possibly pulling the logged “#” URL into a PHP function? (maybe export Log entries with “#” URL requests to a site-internal dbase? can that be done on-the-fly in real time?)

Don’t hate a non-programmer, please. I just think you have the right viewpoint to be able to see and/or develop this idea, if it’s possible.

No matter, GREAT TIP!

Chat Man