AuthorityLabs Blog

Solving Canonical Problems with WWW

by Dawn Wentzell on September 29, 2011

One of the most common problems I see in websites is the same content being available at both the WWW and non-WWW versions of a domain. I’ve encountered this in nearly every website I’ve done an SEO audit for, and I see it every day when browsing the web. Despite it being so prevalent, it is indeed a problem.

Having the same content available on both the WWW and non-WWW versions of a domain (such as authoritylabs.com and www.authoritylabs.com) is called canonicalization. While you and I might realize they are in fact the same page, search engines mistake them to be unique pages.

Most of the time, search engines can figure out that they are the same page and only include the canonical URL in their index. SEObook explains the canonical URL as:

The canonical version of any URL is the single most authoritative version indexed by major search engines. Search engines typically use PageRank or a similar measure to determine which version of a URL is the canonical URL.

Regardless, canonicalization can result in indexing problems and duplicate content issues. Most importantly, canonicalization will split the link juice between each version as people link to and share both.

What you want to see is a redirection from the WWW to the non-WWW, or vice versa, so that if the wrong version is entered or linked to, the user is automatically taken to the canonical URL. Fortunately, this is relatively easy to set up.

Google Webmaster Tools

If you’ve verified your site with Google Webmaster Tools, you can set your preferred domain by going to Site Configuration > Settings, and selecting either ‘Display URLs as www.yourdomain.com’ or ‘Display URLs as yourdomain.com’.

This will make sure that Google only indexes your preferred canonical URL. However, it doesn’t fix the problem of splitting your link juice so you should still set up a redirect using one of the following methods.

 

Redirect Using .htaccess

If your site is hosted on Apache, you can redirect from the WWW to the non-WWW, or vice versa, with a few lines in your .htaccess file.

Redirect WWW to non-WWW:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^(yourdomain\.com)?$
RewriteRule ^(.*)$ http://yourdomain.com/$1 [R=301,L]

Redirect non-WWW to WWW:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www\.yourdomain\.com)?$
RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [R=301,L]

 

Redirect Using cPanel

aka The Lazy Way to Redirect Using .htaccess

If your website is hosted with a provider that uses cPanel, you can even set up your redirects without touching a line of code. This actually adds the redirect rule directly to the .htaccess file, but sometimes I’d rather not get my hands dirty. To do this, log in to your cPanel, and go to Redirects.

Redirect WWW to non-WWW:

Redirect non-WWW to WWW:

 

Redirect Using IIS7

With IIS7, there are actually two ways to do this. The URL Rewrite extension is required for this.

The first method involves adding the following as the first rule in the system.webServer section of the web.config file of the site in question.

Redirect WWW to non-WWW:

<rewrite>
   <rules>
      <rule name="www to non www"" enabled="true">
         <match url="(.*)" />
         <conditions>
            <add input="{HTTP_HOST}" negate="true" pattern="^www\.yourdomain\.com$"  />
         </conditions>
         <action type="Redirect" url=http://www\.yourdomain\.com/{R:1}” redirectType="Permanent" />
      </rule>
   </rules>
</rewrite>

Redirect non-WWW to WWW:

<rewrite>
   <rules>
      <rule name="non www to www" enabled="true">
         <match url="(.*)" />
         <conditions>
            <add input="{HTTP_HOST}" negate="true" pattern="^www\.youdomain\.com$" />
         </conditions>
         <action type="Redirect" url="http://www\.yourdomain.\com/{R:0}" redirectType="Permanent" />
      </rule>
   </rules>
</rewrite>

The second way is using the user interface of the URL Rewrite module. You can follow the steps outlined on Scott Forsyth’s blog. I suppose you could call that the lazy way to redirect in IIS7.

 

Redirect Using nginx

Nginx is starting to gain popularity due to lower overhead and higher performance than other servers. For the redirect, you will add one of the following to the top of your site’s config file.

Redirect WWW to non-WWW

server {
    listen 80;
    server_name www.yourdomain.com;
    rewrite ^/(.*) http://yourdomain.com/$1 permanent;
}

Redirect non-WWW to WWW

server {
    listen 80;
    server_name yourdomain.com;
    rewrite ^/(.*) http://www.yourdomain.com/$1 permanent;
}

 

Whether you’re on Apache, IIS, or nginx these methods really only take a few minutes to set up, so you really don’t have much of an excuse not to.

Photo: Fabrizio Sciami/Flickr

About Dawn Wentzell

Dawn Wentzell is currently working in custom mobile app development as Project Manager, Mobile Technology at SpeakFeel Corporation. She has experience with SEO for both local businesses and national markets, loves to do site audits and hates IIS hosting. You can find her at dawnwentzell.com or on twitter at @saffyre9.

{ 17 comments… read them below or add one }

Dawn Wentzell September 29, 2011 at 10:47 am

Thanks to Brian for the addition of nginx to this.

Reply

Brian LaFrance September 29, 2011 at 11:19 am

Starting to love me some nginx :D

Reply

g1smd September 29, 2011 at 11:20 am

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^www.example.com [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]

RewriteBase / is the default. No need to specify it.
You must escape literal periods in the RewriteCond pattern.
The code fails to redirect non-canonical non-www URLs with a trailing period and/or port number.

RewriteEngine On
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

Reply

Dawn Wentzell September 29, 2011 at 12:06 pm

Awesome, thanks for correcting me! I’ve updated the post, now I’m going to go update my website…

Reply

Dawn Wentzell September 29, 2011 at 12:07 pm

Also, regex is hard. My brain hurts a little bit now.

Reply

netmeg September 29, 2011 at 12:12 pm

Your Brian?

Reply

Dawn Wentzell September 29, 2011 at 12:14 pm

See? That’s how much my brain hurts! :P

g1smd September 29, 2011 at 11:25 am

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

RewriteBase / is the default. No need to specify it.
You must escape literal periods in the RewriteCond pattern.
The code fails to redirect non-canonical www URLs with a trailing period and/or port number.

RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Use example.com in blog posts. RFC 2606 reserves example.com example.net and example.org for this very purpose.

Reply

Rhea Drysdale September 29, 2011 at 12:54 pm

Love it. Simple wrap-up of solutions. Bookmarked for future client implementations. Really nice work here.

Reply

Tom Howlett September 30, 2011 at 6:08 am

Hi,
Great article.
Have you got any information about how to redirect /index.html to a version without the /index.html?
Preferred method of making these changes are through the CPanel.

Reply

Dawn Wentzell September 30, 2011 at 9:46 am

Tom, that should be pretty easy through cPanel – on the Redirects page, select the 301 redirect and your domain, add index.html to the field after the slash. In the redirects field, enter http:// and your domain – with or without the www, however you want it to end up – and check off “Redirect with or without www.”

Hope that helps!

Reply

g1smd September 30, 2011 at 1:01 pm

For URLs with the index filename mentioned in the path part of the HTTP request, use this rule to strip the filename in a redirect:

RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(php[45]?|html?)(\?[^\ ]+)?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(php[45]?|html?) http://www.example.com/$1? [R=301,L]

Normally a request for http://www.example.com/folder/ is internally rewritten to /folder/index.html via the DirectoryIndex mechanism in order to serve the content. However, this rewritten path matches the RewriteRule pattern and if there was just a RewriteRule the rule would redirect again. You do not want that to happen.

To avoid this, you MUST also test THE_REQUEST in a preceding RewriteCond in order to be sure that the internal pointer is set to /folder/index.html because those things were in the original incoming external HTTP request and not because they have been recently set by a preceding internal rewrite. You MUST test THE_REQUEST otherwise you will end up with an infinite redirect-rewrite loop.

The index redirect MUST be placed before the non-www/www redirect. Failure to do so invokes an unwanted multiple step redirection chain for index requests for the non-canonical hostname.

The above code strips parameters too. It can be modified to not do so if you want.

Finally, the RewriteEngine On directive must appear just ONCE in the .htaccess file, and it must be placed before the very first ruleset.

Inform if there’s a problem. The above code was typed from memory.

Reply

Dawn Wentzell September 30, 2011 at 1:26 pm

Annnnd g1smd brings more awesomeness to the table.

Reply

Anthony Baker March 12, 2012 at 5:16 am

Thanks you so much for making this so easy. I was looking at other ways of doing this and it was intensely complicated. This was a simple solution that worked beautifully! :-)

Reply

Usman Latif March 30, 2012 at 3:52 pm

Had been searching for a simple to the point article to refer to my client to make him understand the issues related to URL canonicalization and how to remove them. Found this article useful and have sent the client a link.
Thanks for making it easy.

Reply

Dawn Wentzell March 31, 2012 at 9:56 am

Thanks Usman, glad you were able to use it!

Reply

VBart April 9, 2012 at 9:05 am

server {
listen 80;
server_name yourdomain.com;
return 301 http://www.yourdomain.com$request_uri;
}

Reply

Leave a Comment

{ 2 trackbacks }

Previous post:

Next post: