SEO technical issues can be challenging but very rewarding. The same problems tend to occur across many websites, so learning the methods to solve common issues makes sense. Here are several solutions to common SEO challenges that will have your site running clean in no time.
1) Uppercase vs. lowercase in URL names. Websites that use .NET extensions often suffer from the problem of case in their URL names. While search engines are becoming better at choosing canonical versions, there is still room for improvement. Use the URL rewrite module available on IIS 7 servers, which offers interfacing that enforces lowercase URLs. Just add the rule to the web config file. You can also take a look at these articles:
What every SEO should know about IIS by Dave Sottimano
IIS SEO Toolkit secrets you may not know by Dan Butler
2) More than one homepage version. Look out for this page: www.example.com/default.aspx. It represents a duplicate of the homepage that search engines find. It may also appear as www.example.com/index.html or www.example.com/home. To solve, export a crawl of your site to a .csv file, filter by META title, and search for your homepage title to find duplicates. Point the duplicates to your “real” homepage with a 301 redirect. To find internal links that point to the duplicate page, use a tool like Screaming Frog. You can also check Google PageRank for different cache dates or levels to identify duplicates.
Here are two more articles on this topic:
How to implement redirects using htaccess
Google guidelines on 301 redirects
3) URLs with query parameters at the end. While you generally see these on eCommerce sites, they can occur anywhere. For example, you might find these at the end of a URL that filters by categories, such as www.example.com/product-category?colour=12. This can use up a lot of your crawl resources, especially when there are two or more parameters, such as size and color, that can be combined in more than one way.
This is more complex issue and requires a bit of thinking on the webmaster’s part. First, decide which pages you actually want crawled and indexed based on your user search volume. If your pages are already indexed, fix with a rel=canonical tag. If they are not already indexed, you can add the URL structure to your robots.txt file. You can also use the Fetch as Google tool.
Here are two more resources discussing this problem:
4) Soft 404. A soft 404 looks like a “real” 404 but returns a status code of 200, which tells crawlers the page is working correctly. Any 404 page that is being crawled is a waste of your budget. Although you may want to take the time to find the broken links that cause many of these errors, it is easier to simply set the page to return a true 404 code. Use Google Webmaster Tools to locate soft 404s, or try Web Sniffer or the Ayima tool for Chrome.
An additional resource for this problem is Google Webmaster blog on soft 404s.
5) 302 instead of 301 redirects. Users do not see the difference, but search engines treat these two redirects differently. 301 is permanent, 302 is temporary, so 302s are recognized as valid links. Use Screaming Frog or the IIS SEO Toolkit to filter your redirects and change your rules to fix.
You can read more here:
SEOmoz guide on learning redirects
Ultimate guide to 301 redirects by Internet Marketing Ninjas
6) Sitemaps with dated or broken information. Update your XML sitemaps on a regular basis to avoid broken links. Some search engines will flag your site if too many broken URLs are returned from your map. Audit your sitemap to find broken links with this tool, then ask your developers to make your sitemap dynamic. You can actually break your sitemap into separate entities with one for often-updated and one for standard information.
Read this article for more on this topic:
How to check for dirt in your sitemap by Everett Sizemore
7) Wrong ordering for robots.txt files. Your robots.txt files have to be coded correctly or search engines will still crawl them. This usually happens when the commands are correct individually but do not work together well. Google’s guidelines spell this out. Be sure to check your commands carefully and particularly tell Googlebot what other commands it should follow.
8) Invisible characters can show up in robots.txt. Although rare, an “invisible character” can show up in your robots.txt file. If all else fails, look for the character or simply rewrite your file and run it through your command line to check for errors. You can get help from Craig Bradford over at Distilled.
9) base64 URL problems with Google crawler. If you experience a massive number of 404 errors, check the format of your URLs. If you see one that looks like this:
/aWYgeW91IGhhdmUgZGVjb2RlZA0KdGhpcyB5b3Ugc2hvdWxkIGRlZmluaXRlbHkNCmdldCBhIGxpZmU=/
you might have an authentication problem. Add some Regex to your robots.txt file to stop Google from crawling these links. You may have to trial-and-error this fix.
10) Server misconfigurations. An “accept” header is usually sent by the browser to signify the file types it understands, but if you mismatch the file type with the position on the “accept” header, you can have problems. Googlebot sends “Accept: */*” when crawling, a generic designation to accept any type of heading. See: http://webcache.googleusercontent.com/search?sourceid=chrome&ie=UTF-8&q=cache:http://www.ericgiguere.com/tools/http-header-viewer.html for more information.
To receive help with your technical SEO problems contact a knowledgeable SEO company in Orange County like Bulletproof Digital.