Crawl budget is the speed and number of pages search engines want to crawl on your website. It is affected by the number of resources that the crawler will use on your website and the number of crawls supported by the server.
Doing more crawls does not mean you will rank higher, but if pages are not crawled and indexed, they will not rank at all.
Most websites don’t need to worry about the crawl budget, but in a few cases, you may want to check it out. Let us look at some of these situations.
Generally, you don’t have to worry about the crawl budget on popular pages. Usually newer pages, pages that are not well linked, or pages that have not changed much, these pages are not crawled often.
For newer websites, especially those with many pages, the crawl budget may be an issue.Your server may be able to support more crawling, but since your website is new and may not be popular yet, so search engine You may not want to crawl your website very much. This is mainly a disconnect in expectations. You want to crawl and index pages, but Google doesn’t know if it’s worth indexing the pages, and may not want to crawl as many pages as possible.
For large sites with millions of pages or frequently updated sites, the crawl budget can also be an issue. Generally, if you have many pages that are not crawled or updated as often as you want, then you may want to consider speeding up the crawl. We will discuss how to do this later in the article.
If you want to see an overview of Google crawl activity and any issues it finds, the best way to browse is Crawl statistics report In Google Search Console.
There are various reports here to help you identify changes in crawling behavior, crawling issues, and provide you with more information about how Google crawls your website.
You absolutely want to research any Flagged crawl status As shown here:
There is also a timestamp of the time of the last crawl.
If you want to see the traffic of all robots and users, you need to access your log files. Depending on the hosting and settings, you may be able to access tools like Awstats and Webalizer, as shown on the hosting shared with cPanel. These tools show some summary data in the log file.
For more complex settings, you will have to access and store the data in the original log file (which may come from multiple sources).You may also need special tools for large projects, such as elk (Elasticsearch, logstash, kibana) stack for storing, processing and visualizing log files.There are also some log analysis tools, such as Sprank.
These URLs can be found by crawling and parsing pages, or from various other sources (including sitemaps, RSS Feed, submit URLs to be indexed in Google Search Console, or use Indexing API.
Also have Multiple Googlebots Share the crawl budget.You can find a list of various Googlebot crawling your website in the “Crawl Statistics” report of the following websites GSC.
Each website will have a different crawl budget, which consists of several different inputs.
Crawl demand is the amount Google wants to crawl on your website. More popular pages and pages that have undergone major changes will be crawled more.
Popular pages or pages with more links to them will generally have priority over other pages. Remember, Google must prioritize your pages in some way, and links are an easy way to determine which pages on your website are more popular. However, this is not only your website, but also all pages on all websites on the Internet that Google must figure out how to prioritize.
you can use it Best link report Web browser Indicates which pages may be crawled more frequently. It also shows when Ahrefs last crawled the page.
There is also an outdated concept. If Google sees no changes to the page, they will crawl the page less frequently. For example, if they crawl a page and don’t see any changes after a day, they may have to wait three days to crawl it again, next time it will take ten days, 30 days, 100 days, etc. There is no actual waiting time between the two crawls, but as time goes by, it will become less and less. However, if Google sees major changes in the site as a whole or as the site moves, they will usually increase crawl speed at least temporarily.
Crawl rate limit
The crawl rate limit is the level of crawl that your website can support. The website will experience a certain degree of crawling before server stability problems (such as slowdowns or errors) occur. If most crawlers start to see these problems, they will stop crawling so as not to cause damage to the site.
Google will make adjustments based on the search health of the website. If the site can be crawled more, the limit will increase. If there is a problem with the site, Google will reduce its crawl rate.
You can take some steps to ensure that your site can support other crawls and increase the crawling needs of your site. Let us look at some of these options.
Speed up the server / increase resources
The way Google crawls a page is basically to download resources and then process them.your Page speed As users know, the situation is not exactly the same. What affects the crawl budget is how fast Google can connect and download resources, which has more to do with servers and resources.
More links, external with Internal
Keep in mind that crawling needs are usually based on popularity or links.You can increase your budget by increasing the following amounts external link and / or Internal link. Internal links are easier because you can control the site.You can find suggested internal links at Link opportunity report On-site audit, Which also includes a tutorial explaining how it works.
Fix broken and redirected links
Keeping links to broken or redirected pages on the site active will have a small impact on the crawl budget. Generally, the pages linked here have a lower priority because they may not have been changed for a while, but cleaning up all issues is usually very helpful for website maintenance and helps your crawl budget.
You can easily find broken (4xx) and redirected (3xx) links on the website. Internal page On-site audit report.
For broken or redirected links in the sitemap, please check All questions report”3XX Redirect in the sitemap” and “4XX “Page in the site map” problem.
use get instead postal Where you can
This is a bit technical because it involves HTTP Request method.Do not use postal Where is the request get Ask for work.Basically get (Pull) vs postal (Push). postal The request will not be cached, so it will affect the crawl budget, but get The request can be cached.
Use index API
If you need to crawl pages faster, please check if you are eligible to use Google’s Indexing API. Currently, this only applies to some use cases, such as job postings or real-time videos.
Ping also has one Indexing API Everyone can use it.
What doesn’t work
People sometimes try things that don’t actually help with your crawl budget.
- Small changes to the website. Make some minor changes on the page, such as update dates, spaces, or punctuation, in order to make the page crawl more frequently. Google is very good at determining whether changes make sense, so these small changes are unlikely to have any impact on the crawl.
- Crawl delay instructions in robots.txt. This instruction will slow down many robots. However, Googlebot does not use it, so it will not have an impact.We do respect this at Ahrefs, so if you need to slow down our crawling speed, you can robots.txt file.
- Delete third-party scripts. Third-party scripts do not count towards your crawl budget, so deleting them will not help.
- Nofollow. Okay, this is very unstable.In the past nofollow link The crawl budget will not be used. However, nofollow is now treated as a prompt, so Google can choose to crawl these links.
There are several good ways to slow down Google’s crawling speed. Technically, you can also make some other adjustments, such as reducing website speed, but this is not the method I recommend.
Slow adjustment, but guaranteed
The main control method for slow crawling provided to us by Google is Speed limiter In Google Search Console. You can use this tool to reduce the crawl rate, but it may take up to two days to take effect.
Quick adjustment, but risky
If you need a more direct solution, you can take advantage of Google’s crawl speed adjustments related to website health. If you provide Googlebot with “503 Service Unavailable” or “429 Too Many Requests” status codes on your webpage, their crawling speed will start to slow down, or they may stop crawling temporarily. However, you don’t want to do this for more than a few days, otherwise they might start removing pages from the index.
Once again, I want to reiterate that for most people, grabbing a budget is not an easy task. If you do have doubts, I hope this guide will help you.
Usually, I only investigate when the page is not crawled and indexed. I need to explain why I shouldn’t worry about it, or I happen to see something related to me in the crawl statistics report of Google Search Console. problem.
Is there a problem?let me know Twitter.