I previously mentioned that Matt Cutts from Google gave some advice to webmasters of dynamic (database driven) web sites.
For one thing, Matt advised that if you have a dynamic web site, you should minimize the number of parameters in the URL. You’re very safe if you have fewer than 2 parameters. Keep the values of those parameters to fewer than 5 digits. And don’t name a parameter id. That’s because Google is suspicious of that parameter being a session ID or something other than a key field. Even if it’s the only parameter in your URLs, try not to use it. Particularly if that variable’s value is long (like 5 digits or more). sid would be a bad choice too because it could stand for session ID as much as it could stand for a key field like story ID. It doesn’t mean that your pages won’t be indexed if you use this parameter name; it just means those pages would be at a greater risk of not being included. You should be fine though if your pages are all already in Google.
Matt also mentioned something that should be a bit alarming to anyone with a dynamic site. Googlebot sometimes tries variations of URLs by dropping parameters. Meaning that Googlebot may experiment with removing name-value pairs from the query string portion of your URLs (i.e. the part of the URL that follows after the question mark) and seeing if the pages still load. I understand the reason for this to be that if these variant pages still show the same content as the page at the original URL, it gives Googlebot an indication that the omitted parameters are superfluous in the query string. So for example, a URL such as this:
www.bigyellow.com/cgi-bin/php/cities/unitedstates/mtg_detail.php?
xsrc=&PID=36575&S=NY&T=&MTG=PR
might be shortened by Googlebot to:
www.bigyellow.com/cgi-bin/php/cities/unitedstates/mtg_detail.php?
xsrc=&PID=36575&S=NY&T=
and
www.bigyellow.com/cgi-bin/php/cities/unitedstates/mtg_detail.php?
xsrc=&PID=36575
and
www.bigyellow.com/cgi-bin/php/cities/unitedstates/mtg_detail.php?
S=NY&T=&MTG=PR
etc.
Then these URL variations would get spidered and compared with each other. I’ve heard of big websites getting hit by this and it causing big problems for the website in question. Don’t get all worried about Googlebot doing this to your site if you’re a not a big and important site. Matt stated that Google only does this deep level analysis on big, quality sites. Anyone been subjected to this? And if so, what damage or inconvenience did it inflict on you?
Leave a Reply