Canonicalization is the process of ensuring that all versions of a URL return the same content. This is important for SEO because it helps prevent duplicate content issues. This blog post will show you how to use screaming frog to audit canonicalized URLs. Screaming frog is an excellent tool for diagnosing canonicalization issues on your website. Let’s get started!
Table of Contents
What are Canonical Tags?
A canonical tag is an HTML element that helps webmasters prevent duplicate content issues on their sites. The tag is used to specify the “canonical” or “preferred” version of a page and is placed within the <head> section of the code.
When search engines crawl a site, they often find multiple versions of the same page (for example, if the same content is accessible via multiple URLs). This can create problems for webmasters and users, making it challenging to identify the original source of the content.
e.g., These are all seen by the search engine as different URLs
http://seonorth .ca https://seonorth .ca https://www.seonorth .ca
The canonical tag tells search engines which version of the page they should index, making it easier to find and assess the quality of the content.
e.g., Using this code, you can specify which URL is the preferred version.
<link rel="canonical" href="https://seonorth.ca/" />
In addition, using canonical tags can help to improve a site’s search engine ranking, as it helps to avoid duplicate content penalties. As a result, canonical tags are an essential tool for any webmaster looking to ensure that their site is being properly indexed by search engines.
To begin, download the SEO Spider, free in light mode, for crawling up to 500 URLs. https://www.screamingfrog.co.uk/seo-spider/
Store & Crawl Canonical Settings
This option is enabled by default, so it will be set up unless you’ve changed the configuration.
To manage these settings, go to Configuration (in the top Nav), Spider, Page Links section, and Canonical settings (shown in the image).
Crawl the Website
Now you need to crawl the website to gather the canonical data from your website.
To begin the crawl, enter your URL in the ‘Enter URL to spider‘ box and hit ‘Start.’
When the crawl has been completed, you can see all of the pages crawled in the results box.
Select the Canonicals Tab
Each row in the main window pane displays URLs discovered during a crawl and their corresponding rel=”canonical” link elements and HTTP Canonicals in separate columns.
The canonical tab has six filters that may help you understand and detect typical canonical issues.
You can narrow down your search by selecting the following in the right-hand overview pane:
- Contains Canonical – The page has a canonical URL tag on the page. This may be a self-referencing canonical URL where the page URL is identical to the canonical URL, or it may be ‘canonicalized,’ with the canonical URL being distinct from the page URL.
- Self-Referencing – The URL of the page has a canonical, which is the same URL as the crawler’s current location (hence, it’s self-referencing). Only canonical versions of URLs should be linked, and every URL should have a self-referencing canonical to avoid any potential duplicate content problems.
- Canonicalised – The page has a different canonical URL than it was initially linked to. The address is “canonicalized” to another location. This implies that the search engines are instructed not to index the page and that the indexing and linking properties should be consolidated to the target canonical URL. These addresses should be carefully looked at.
- Missing – There’s no link element or HTTP header that specifies a canonical URL. If a page does not include a canonical URL, Google will choose what they believe to be the best version or URL. This can lead to ranking unpredictability; therefore, all URLs should always include a canonical version.
- Multiple – There may be numerous canonicals for a page. This might cause problems because only one canonical URL should be assigned by a single implementation (link element or HTTP header) for a page.
- Non-Indexable Canonical – The canonical URL is a non-indexable page. This will include canonicals that have been disabled by robots.txt, no response, redirect (3XX), client error (4XX), server error (5XX) or are ‘noindex.’ Canonical versions of URLs should always be indexable, with ‘200’ response pages.
Auditing Canonical Tags
Few things are as crucial as canonical tags when optimizing pages for search engine visibility. Search engines may index multiple versions of the same page without canonical tags, which can hurt your ranking and lead to duplicate content penalties. That’s why auditing your canonical tags on a regular basis is so important; it ensures that search engines are indexing the correct version of each page on your site. Fortunately, auditing canonical tags are relatively simple. Regularly auditing your canonical tags can help ensure that your pages are optimized for maximum visibility in search engine results.
Viewing Non-Indexable Canonical URLs
The ‘URL Info’ tab at the bottom shows why a canonical is non-indexable. This canonical URL is non-indexable because of its redirection, as seen in the example below.
To create bulk canonical reports, these reports include export data about:
- Canonical Chains – Canonical chain is a series of redirects that lead from one URL to another. When a search engine crawls a website, it follows the canonical chain in order to reach the final URL.
- Non-Indexable Canonicals – Non-indexable canonicals are pages on your website that you don’t want Google to index. There are a few reasons why you might want to do this: you don’t want it to show up in search results, or the page is a duplicate of another page on your site, and you only want the original page to be indexed. Non-indexable canonicals ensure that Google only indexes the pages you want them to index, which can help improve your search engine ranking. Canonicals, like redirects, can be chained and include loops.
These exports are often much easier to digest and work through to fix when exported (or send to a developer to fix).
Canonicalization is essential to SEO and should be implemented on your website to avoid duplicate content issues. Fortunately, Screaming Frog has helped you to diagnose canonicalization issues and fix them quickly. This tool will improve your website’s SEO and ranking in search engines. Please let me know if I forgot anything in the feedback form below.
What is a Canonicalized URL?
A canonical URL is the URL of the best representative page from a group of duplicate pages, according to Google. For example, if you have two URLs for the same page (such as example.com? dress=1234 and example.com/dresses/1234 ), Google chooses one as canonical.
Published on: 2022-07-14
Updated on: 2023-07-27