A website audit is a great way to make sure your site is up-to-date with the latest SEO standards and that you are in compliance with Google’s guidelines. But what if I told you there was an easier, more efficient way? Screaming Frog offers many features to help you do a complete audit of your site – both onsite and offsite. In this post, we’ll dive into some of the best settings for audits so you can get started asap!
This is not a list of all the possible settings in this tool, only ones I change from the default. So for any setting that isn’t mentioned here, it’s best to keep them at their defaults.
Configuration >> System >> Storage Mode
In Screaming Frog, you have two options for processing and saving the data: RAM or HDD/SSD Storage.
RAM vs. SSD
The default setting for the RAM model is ideal if your site has less than 500 URLs and you don’t have an SSD.
However, this option has some drawbacks because sites with more content may cause slower crawls due to a lack of storage space on our machine’s hard drive. The more RAM you use up, the slower your computer gets. It’s a vicious cycle where Screaming Frog also suffers from being slowed down and may not be able to finish crawling before running out of memory.
The Database storage mode is recommended for machines that have an SSD and for crawling sites at scale.
As Screaming Frog has to write information continually to the database on your hard drive, it’s recommended to use only with an SSD. If you use it with a mechanical HDD, it would be much slower.
The Database storage mode is recommended for machines with an SSD and sites with large amounts of content.
Configuration >> System >> Memory Allocation
How much Memory to allocate
The more RAM, the better. By default, it will have allocated 1GB in 32-bit machines and 2 GB’s in 64-bit computers.
As per the official recommendation, set your computer 2GB below the machine’s max RAM. So if I have 8GB of RAM, then allocated 6GB for Screaming Frog.
Configuration >> Spider >> Crawl
You should select these four boxes.
Although Google has stated they no longer support rel=next/pre links, we still want the Screaming Frog to crawl and store them. The reason being, there could be paginated pages that are only linked via these elements and not in the HTML body of link tags on a page.
If there are alternate URLs for different languages/locales, we want to ensure all links get crawled so that our auditors get the complete picture of the site’s health.
If you forget to tick the Amp option, your site will not know all those AMP pages and issues. You can also have a website that’s using amp, but it doesn’t show up in any list, missing out on some valuable information about how well a page performs.
You should select these four boxes.
Crawl Outside of Start Folder
You want to get a complete picture of all of the folders within the website.
Crawl All Subdomains
Similar to the above, I want to know about any subdomains there may be.
Follow Internal “nofollow”
You want to discover as many URLs as possible to ensure your doing a thorough site audit.
It also allows you to investigate and understand why a site is using it on internal links. Using rel=”nofollow” internally isn’t optimal.
Follow External “nofollow”
Similar to the above, I want the Frog to crawl all possible URLs. Not crawling those could miss 404s or even link spam.
You will want to click on the first two checkboxes.
Crawl Linked XML Sitemaps.
Linked sitemaps are one of the best ways to discover orphaned content. Orphaned content is pages linked within the XML sitemap but nowhere internally linked on the site.
Auto Discover XML Sitemaps via robots.txt
Automatically finding sitemaps linked in the robots.txt file helps find sitemaps added by the webmaster.
Configuration >> Spider >> Extraction
You will want to click on HTTP Headers.
Checking the Headers allows you to see if the content is different between mobile and desktop views.
You will want to select all options under Structured Data.
This ensures you validate the schema no matter which option they used.
A great feature here they have implemented is to ensure all the schema validate.
Google Rich Results Feature Validation.
The markup for a site can be easily validated against Google’s documentation to ensure that the markup is compliant with their guidelines. As schema guides are constantly changing and not all of them have compiled into one place, it may also be worth checking over your implementation manually just in case you’ve missed something or got confused by different requirements between various sites.
The correct case is essential for using Schema markup.
Screaming Frog will save the HTML for every page. This is extremely useful for double-checking any elements is reported on.
Store Rendered HTML.
Configuration >> Spider >> Limits
Max Redirects to Follow
Updating the maximum crawl redirect limit will help you to identify the magnitude of redirect loops.
Configuration >> Spider >> Advanced
You will want to click the following to ensure every URL is discovered:
Always Follow Redirects
Always Follow Canonicals
There could be URLs linked to in a canonical tag or redirected in the HTML code.
Respect self-referencing meta refresh.
Self-referencing meta refresh won’t stop the page from being indexed, but it will be easy to see and flag, so I can investigate why this is happening.
Extract images from IMG and SRCSET attributes.
Google can crawl images implemented in the SRCSET attribute, so I tick this to ensure the audit extracts the same images Google would be.
Keep these Un-checked:
If they were selected, any URLs set to noindex, or canonicalized to a different URL, would not be reported in the audit.
Respect HSTS Policy.
If a site has not to redirect HTTP to HTTPS, the audit will still be redirected to HTTPS; As we want to check that HTTP has been permanently rejected to HTTPS.
Configuration >> Content >> Duplicates
Un-check: Only Check Indexable Pages for Duplicates.
Even if pages are currently set to noindex, I still want to know if they are duplicating content in case they should be set to index.
Enable Near Duplicates.
Set it around 80-90%. This can vary depending on if the site uses large sections of boilerplate content.
Configuration >> Robots.txt >> Settings
Set your robots.txt to:
Ignore robots.txt but report status
Even if a page is blocked in robots.txt, it may still be indexed if Google found a link elsewhere or indexed before being blocked. It is essential to know all of the pages within your website.
Configuration >> User-Agent
Set your user-agent to Googlebot (Smart Phone) to understand how Google is viewing the site. You want to see exactly what Google is seeing.
Since Google has switched to mobile-first indexing, you will want to see the mobile version of the code, as that is what Google will base their crawling, indexing, and ranking on.
Saving Your Configuration
Once you have set all these settings, or your preferred ones, don’t forget to save it! Otherwise, you will lose them on restart.
Go to file > Configuration > Save current configuration as default.
I hope you enjoy this article, and I look forward to hearing your feedback.
Published on: 2021-08-12
Updated on: 2021-09-03