为什么Screaming Frog没有抓取所有的URL?

Screaming Frog (https://www.screamingfrog.co.uk) is an excellent tool for crawling websites and extracting data, but if it’s not crawling all URLs, you won’t be performing a quality technical SEO audit (auditing on-page meta descriptions, response codes, internal linking, checking duplicate contents, page titles, backlinks, alt texts, etc) on your e-commerce sites. In this blog post, we’ll examine why Screaming Frog isn’t crawling all URLs and how you can fix the issue. So, if you’re having trouble getting Screaming Frog to crawl all of your URLs, stay tuned! You’re in for a treat.

尖叫的青蛙不爬行的尿液
艾萨克-亚当斯-汉斯 - 为什么Screaming Frog没有抓取所有的URL?

如何解决Screaming Frog没有抓取所有URL的问题

There are several reasons Screaming Frog may not crawl all subdomains on a 网站最常见的是,该网站被配置为阻止像尖叫蛙这样的爬虫。

  1. 该网站被robots.txt屏蔽了。

    尊敬不索引

    Robots.txt可以阻止 尖叫的青蛙 crawl pages. You can configure the SEO Spider to ignore robots.txt by going to 配置 >> 蜘蛛 >> 高级 >> 取消勾选 Respect Noindex 设置。

    You can also change your User Agent谷歌机器人 to see if the website allows that crawl.

    Robots.txt is used to instruct web crawlers, or “bots,” on what they are allowed to access on a given website. When a bot tries to access a page that is specifically disallowed in the robots.txt file, it will receive a message that the webmaster does not want this page crawled. In some cases, this may be intentional. For example, a site owner may want to prevent bots from indexing sensitive information. In other cases, it may simply be due to an oversight. Regardless of the reason, a site that is blocked by robots.txt will be inaccessible to anyone who tries to crawl it.

  2. 在不被抓取的链接上存在'nofollow'属性。

    无ollow链接

    无标签链接的作用是,它们告诉人们 爬虫 not to follow the links. If all links are set to nofollow on a page, then Screaming Frog has nowhere to go. To bypass this, you can set Screaming Frog to follow internal nofollow internal links.

    你可以在以下文件中更新这个选项 配置 >> 蜘蛛 根据 抓取标签 by clicking on 关注内部'nofollow'。 链接。

  3. 该页面有一个页面级别的 "nofollow "属性。

    尊敬不索引

    ǞǞǞ 页级nofollow属性 is set by either a meta robots tag or an X-Robots-Tag in the HTTP header. These can be seen in the “Directives” tab in the “Nofollow” filter. The page-level nofollow attribute is used to prevent search engines from following the links on a page.

    This is useful for pages that contain links to unreliable or unimportant sources. By setting the nofollow attribute, you are telling search engines that they should not follow the links on the page. This will help to improve your site’s search engine rankings but stop you from crawling the website.

    要忽略Noindex标签,你必须到 配置 >> 蜘蛛 >> 高级 >> 取消勾选尊重无索引 设置。

  4. 用户代理被阻止了。

    用户代理配置

    ǞǞǞ 用户代理 是一个文本字符串,由你的浏览器发送至你正在访问的网站。用户代理可以提供有关你的浏览器、操作系统、甚至你的设备的信息。基于这些信息,网站可以改变其行为方式。例如,如果你使用移动设备访问一个网站,该网站可能会将你重定向到该网站的移动友好版本。或者,如果你改变User-Agent以假装是一个不同的浏览器,你可能能够访问你的实际浏览器中没有的功能。同样地。 有些网站可能会完全屏蔽某些浏览器.通过改变用户代理,你可以改变一个网站的行为方式,让你对你的浏览体验有更多的控制。

    你可以在下面改变User-Agent 配置 >> 用户代理.

  5. 该网站需要JavaScript。

    尖叫蛙javascript渲染

    脚本 is a programming language that is commonly used to create interactive web pages. When JavaScript is enabled, it can run automatically when a page is loaded, making it possible for items on the page to change without the need to refresh the entire page. For example, JavaScript can be used to create drop-down menus, display images based on user input, and much more. While JavaScript can be beneficial, some users prefer to disable it in their browser for various reasons. One reason is that JavaScript can be used to track a user’s browsing activity. However, disabling JavaScript can also lead to issues with how a website is displayed or how certain features work.

    尝试 启用JavaScript渲染 在 "尖叫青蛙 "内的 配置 >> 蜘蛛 >> Rendering.

  6. 本网站需要Cookies。

    饼干存储

    您能否在浏览器中禁用cookie来浏览本网站?有许可证的用户可以通过以下方式启用cookies 配置 >> 蜘蛛 并选择 仅限会议 根据 饼干存储高级标签.

  7. 该网站使用框架集。

    混合框架

    SEO蜘蛛不抓取框架-src属性。

  8. 内容类型标头没有表明该页面是HTML。

    无效的内容类型

    这显示在内容栏中,应该是文本/HTML或应用/xhtml+xml。

总结

The Screaming Frog SEO spider can be an excellent tool for auditing your website, but it’s vital to ensure that all URLs are crawled. If you’re not getting the complete data that you need from your audits, there may be an issue with how Screaming Frog is configured. This blog post looked at why 尖叫的青蛙 might not be crawling all your URLs and how to fix the problem. By fixing these issues, you’ll be able to get more comprehensive data from your Screaming Frog audits and improve your SEO strategy. Have you tried using Screaming Frog for your website audits? What tips do you have for improving its functionality?

常见问题

  • 为什么Screaming Frog没有抓取所有的URL?

发表于:2022-06-07
Updated on: 2024-04-05

艾萨克-亚当斯-手的头像

艾萨克-亚当斯-汉斯

Isaac Adams-Hands是SEO North公司的SEO总监,该公司提供搜索引擎优化服务。作为一名搜索引擎优化专家,Isaac在网页搜索引擎优化、非网页搜索引擎优化和技术性搜索引擎优化方面拥有相当丰富的专业知识,这使他在竞争中占据了优势。
zh_CNChinese