In 2009, Google released support for a rel=“canonical” tag, later adopted by Yahoo and Bing, so that webmasters could specify a preferred URL. In 2011, Google also released an HTTP Header that webmasters could use to direct users to non-text/non-HTML files such as PDF files. An Bulletproof Digital expert who wants an easy way to link to these files without risking duplication issues should utilize this fix.
However, this method has been slow to catch on in the SEO industry for several reasons:
- SEOs tend to focus on traditional URL consolidation.
- Canonical headers may be more difficult to implement dynamically than an HTML tag.
- Access privileges may create issues.
- Additional server modules may be required.
- Server errors are a possibility if the implementation is not done correctly.
However, there are many advantages to using canonical tags, especially for PDF files. These tags can raise a site’s value, allowing spiders to crawl and index these “pages” easily and form a natural part of link building. They also do well in PageRank. However, unless webmasters know the techniques to making this implementation easy and accurate, the number of users will remain small. Here are some tips for canonical tag implementation that will help you use these tags easily in your projects. If you need assistance implementing these tags please contact our SEO friendly web design team for help.
- HTTP Headers Utilizing PHP. If your document has a PHP header() function, you can easily add the rel=“canonical” tag. Syntax in Google’s documentation shows the proper way to do this:
Add the header before HTML output to copy the tag instead of a traditional <link> tag. The <link> tag format can be used for other purposes.
- HTTP Headers Utilizing .htaccess. If you have only a few PDF files, try the .htaccess method, as shown below:
This code points the PDF to a URL with a /page.html ending. You can also use the ~ character to add regular expressions, or use a wild-card string in your filename argument.
- Advanced Dynamic HTTP Headers. First, created a php file to control PDF output. This is normally done by rewriting the URL. This allows you to control the PDF through a php file.
This technique allows you to add the canonical HTTP header using conditional logic.
The code will check to be sure the PDF exists. If so, it will add in the header; if not, it will return a 404 error meassge. This code allows you to point to a .csv or .txt file that contains more than one PDF. Be sure to add the headers for content type indicating PDF; otherwise, the file will be handled as a text file.
- Check headers carefully. You can verify your HTTP headers using Web Developer Toolbar for Firefox or Live HTTP Headers, or you could choose a third-party web-based tool.
Be sure to test dynamic HTTP headers before pushing them to the web to insure quality control.
It is that simple! All you need is PHP, Apache, mod_rewrite enabled, and mod_headers enabled. These scripts are simple and represent only one way to add canonical tags; you can probably find more as you experiment. Note that Windows running IIS does not utilize .htaccess files without third-party extensions.