Advanced: Screaming Frog Audit Settings

A website audit is a great way to make sure your site is up-to-date with the latest SEO standards and that you are in compliance with Google’s guidelines. But what if I told you there was an easier, more efficient way? Screaming Frog offers many features to help you do a complete audit of your site – both onsite and offsite. In this post, we’ll dive into some of the best settings for audits so you can get started asap!

Screaming Frog Audit Settings

This is not a list of all the possible settings in this tool, only ones I change from the default. So for any setting that isn’t mentioned here, it’s best to keep them at their defaults.

  1. Configuration >> System >> Storage Mode

    Storage Mode

    Storage Mode

    In Screaming Frog, you have two options for processing and saving the data: RAM or HDD/SSD Storage.

    RAM vs. SSD

    The default setting for the RAM model is ideal if your site has less than 500 URLs and you don’t have an SSD.

    However, this option has some drawbacks because sites with more content may cause slower crawls due to a lack of storage space on our machine’s hard drive. The more RAM you use up, the slower your computer gets. It’s a vicious cycle where Screaming Frog also suffers from being slowed down and may not be able to finish crawling before running out of memory.

    Database Storage

    The Database storage mode is recommended for machines that have an SSD and for crawling sites at scale.

    As Screaming Frog has to write information continually to the database on your hard drive, it’s recommended to use only with an SSD. If you use it with a mechanical HDD, it would be much slower.

    The Database storage mode is recommended for machines with an SSD and sites with large amounts of content.

  2. Configuration >> System >> Memory Allocation

    Memory Allocation

    How much Memory to allocate

    The more RAM, the better. By default, it will have allocated 1GB in 32-bit machines and 2 GB’s in 64-bit computers.

    As per the official recommendation, set your computer 2GB below the machine’s max RAM. So if I have 8GB of RAM, then allocated 6GB for Screaming Frog.

  3. Configuration >> Spider >> Crawl

    page links

    Page Links

    You should select these four boxes.

    Pagination (Rel/Prev) 

    Although Google has stated they no longer support rel=next/pre links, we still want the Screaming Frog to crawl and store them. The reason being, there could be paginated pages that are only linked via these elements and not in the HTML body of link tags on a page.

    Href lang

    If there are alternate URLs for different languages/locales, we want to ensure all links get crawled so that our auditors get the complete picture of the site’s health.

    AMP

    If you forget to tick the Amp option, your site will not know all those AMP pages and issues. You can also have a website that’s using amp, but it doesn’t show up in any list, missing out on some valuable information about how well a page performs.

    Crawl Behaviour

    You should select these four boxes.

    crawl behaviour

    Crawl Outside of Start Folder

    You want to get a complete picture of all of the folders within the website.

    Crawl All Subdomains

    Similar to the above, I want to know about any subdomains there may be.

    Follow Internal “nofollow”

    You want to discover as many URLs as possible to ensure your doing a thorough site audit.

    It also allows you to investigate and understand why a site is using it on internal links. Using rel=”nofollow” internally isn’t optimal.

    Follow External “nofollow”

    Similar to the above, I want the Frog to crawl all possible URLs. Not crawling those could miss 404s or even link spam.

    XML Sitemaps

    You will want to click on the first two checkboxes.

    xml sitemaps

    Crawl Linked XML Sitemaps.

    Linked sitemaps are one of the best ways to discover orphaned content. Orphaned content is pages linked within the XML sitemap but nowhere internally linked on the site.

    Auto Discover XML Sitemaps via robots.txt

    Automatically finding sitemaps linked in the robots.txt file helps find sitemaps added by the webmaster.

  4. Configuration >> Spider >> Extraction

    URL Details

    You will want to click on HTTP Headers.

    url details

    HTTP Headers

    Checking the Headers allows you to see if the content is different between mobile and desktop views.

    Structured Data

    You will want to select all options under Structured Data.

    structured data

    JSON-LD
    Microdata
    RDFa

    This ensures you validate the schema no matter which option they used.

    Schema.org Validation.

    A great feature here they have implemented is to ensure all the schema validate.

    Google Rich Results Feature Validation.

    The markup for a site can be easily validated against Google’s documentation to ensure that the markup is compliant with their guidelines. As schema guides are constantly changing and not all of them have compiled into one place, it may also be worth checking over your implementation manually just in case you’ve missed something or got confused by different requirements between various sites.

    Case-Sensitive

    The correct case is essential for using Schema markup.

    HTML

    store render html

    Store HTML.

    Screaming Frog will save the HTML for every page. This is extremely useful for double-checking any elements is reported on.

    Store Rendered HTML.

    This will save the HTML that has been rendered. This is useful when auditing JavaScript sites to see the difference between what was sent from a server and how it’s displayed on your browser in real-time.

  5. Configuration >> Spider >> Limits

    Max Redirects to Follow

    Max Redirects to Follow

    Updating the maximum crawl redirect limit will help you to identify the magnitude of redirect loops.

  6. Configuration >> Spider >> Advanced

    Advanced settings

    You will want to click the following to ensure every URL is discovered:

    Always Follow Redirects
    Always Follow Canonicals

    There could be URLs linked to in a canonical tag or redirected in the HTML code.

    Respect self-referencing meta refresh.

    Self-referencing meta refresh won’t stop the page from being indexed, but it will be easy to see and flag, so I can investigate why this is happening.

    Extract images from IMG and SRCSET attributes.

    Google can crawl images implemented in the SRCSET attribute, so I tick this to ensure the audit extracts the same images Google would be.

    Keep these Un-checked:

    Respect Noindex
    Respect Canonicals
    Respect next/prev

    If they were selected, any URLs set to noindex, or canonicalized to a different URL, would not be reported in the audit.

    Respect HSTS Policy.

    If a site has not to redirect HTTP to HTTPS, the audit will still be redirected to HTTPS; As we want to check that HTTP has been permanently rejected to HTTPS.

  7. Configuration >> Content >> Duplicates

    Duplicates

    Un-check: Only Check Indexable Pages for Duplicates.

    Even if pages are currently set to noindex, I still want to know if they are duplicating content in case they should be set to index.

    Enable Near Duplicates.

    Set it around 80-90%. This can vary depending on if the site uses large sections of boilerplate content.

  8. Configuration >> Robots.txt >> Settings

    robots.txt settings

    Set your robots.txt to:

    Ignore robots.txt but report status

    Even if a page is blocked in robots.txt, it may still be indexed if Google found a link elsewhere or indexed before being blocked. It is essential to know all of the pages within your website.

  9. Configuration >> User-Agent

    User-Agent

    Set your user-agent to Googlebot (Smart Phone) to understand how Google is viewing the site. You want to see exactly what Google is seeing.

    Since Google has switched to mobile-first indexing, you will want to see the mobile version of the code, as that is what Google will base their crawling, indexing, and ranking on.

  10. Saving Your Configuration

    save your settings

    Once you have set all these settings, or your preferred ones, don’t forget to save it! Otherwise, you will lose them on restart.

    Go to file > Configuration > Save current configuration as default.

I hope you enjoy this article, and I look forward to hearing your feedback.


Published on: 2021-08-12
Updated on: 2021-09-03

Avatar for Isaac Adams-Hands

Isaac Adams-Hands

Isaac Adams-Hands is SEO Director with SEO North, where he helps the team plan marketing goals that are keyword-optimized and measurable for over 30 clients simultaneously. He has worked at Microsoft, The institute of chartered accountants in Australia, Auto Trader, Le Cordon Bleu, and Algonquin College in various Digital Marketing Roles. Isaac is qualified as a Full-stack developer, Server Administrator, and Cyber Security expert, adding additional experience to his Search Engine Optimization knowledge. His Inuit heritage brought him to the Arctic to hunt and fish for most summers, which grew his passion for 4-wheelers and dirtbikes.