Inside Screaming Frog: Top Takeaways From The Workshop

Ensuring your URLs can be crawled and indexed is a cornerstone of SEO. If Google can’t find that great piece of content you want to get in front of your customers, then what’s the point in writing it?

Contents [-]

1) User Agents

2) Robots.txt

3) Increasing memory

4) Include & exclude

5) XML Sitemaps

6) Save your configuration

Screaming Frog SEO Spider is a tool that all of us in the tech team at 3WhiteHats use, pretty much every single day. From exporting page titles, to ensuring 301 redirects have been properly implemented, this tool has been completely invaluable to us. I noticed that Screaming Frog were running their very first training session and instantly registered my interest.

Before the training, I felt there were probably a huge amount of features in this behemoth of a tool that we either didn’t know about or were a little too complicated to tackle. I was hoping this course would help us better harness this croaky box of magic.

For me, these are the top takeaways from the session:

User Agents

User agents can easily be changed from the configuration dropdown within the tool – nothing new there. Some good applications raised for this are:

Changing the user agent to Googlebot if your client has blocked Screaming Frog from crawling the website (this happens to us a fair bit)
If you suspect your target website to be cloaking (illegitimately showing different versions of the website to users and search engines)
Now that most websites have moved to mobile-first indexing in Google, crawling using a smartphone user agent is a sensible idea

Robots.txt

You can now change/create your own custom Robots.txt file in Screaming Frog and crawl your website with the same rules, again through the configuration menu. You’re also able to test a given URL to see if it would be crawled using the configuration you have created.

Increasing memory

Most SEOs have, at some point, come across this problem when crawling larger websites:

Nowadays there are several solutions to help resolve this issue. Some of us are old enough to remember the days when, to increase the RAM your computer has allocated to Screaming Frog you had to amend a .txt file deep within Program Files – well, that’s no longer the case! RAM can now be changed within the configuration menu:

What? You want more?! Screaming Frog has a new feature to save data straight to your hard drive, vastly cutting the amount of RAM required to crawl a large website:

The trainers told us that in using this method they’d managed to crawl the Argos website with some 14 million+ URLs (with a PC with a decent setup) – impressive!

If that’s not enough, it’s also possible to install the program on the cloud – more details of which can be found here.

Lastly, and perhaps most pertinently, it’s important to consider what you’re actually crawling – do you really need to crawl all of the following, using up precious RAM?

CSS Files
External links
Images
SWF
A crawl depth of more than 10
Etc.

The moral of the story – don’t crawl 10 million URLs when 10,000 will do.

Include & exclude

Tying in nicely with only crawling what you need, the case for the ‘include’ and ‘exclude’ functions in Screaming Frog were raised a fair amount throughout the day. Again, found in the configuration menu, these features are nothing new, but interestingly now have the addition of a regular expression (regex) tester. The tool now lets you know if an example URL you want to crawl will be crawled with your regex query. Handy, right?!

XML Sitemaps

You can add your XML sitemap with a simple checkbox – which, until now, has passed me by somewhat:

The real value in my eyes, though, is comparing which URLs can be crawled by Screaming Frog but don’t appear in the sitemap, or vice versa. This is possible using the Crawl Analysis dropdown from the top navigation.

Save your configuration

You can save as many configurations as you want and you can set one as a default. These features can be found in the file menu and I will be using them a lot!

Scheduling

You can now schedule SF to run crawls (complete with aforementioned saved configurations) whenever you want, as long as the program is open. You can also export any specific reports you want and even create XML Sitemaps.

API Integrations

Adding in Google Analytics (GA) data into Screaming Frog adds some cool features like:

Finding orphan URLs – URLs with sessions in GA but not appearing in a crawl
Pages found via a crawl that have no GA data etc.

Orphaned pages can also be found with the Search Console integration. Showing each page’s backlinks is also useful with a Majestic, AHREFS or Moz integration.

Adding link data into Screaming Frog can be useful for a content review to see what’s working and what’s not – saving the need for heaps of vlookups etc. This can also be scheduled, and reports produced periodically.

Custom Extraction

Custom extractions are brilliant – the ability to scrape key information like category descriptions from category pages is a popular feature amongst our team.

A clever application of this, is copying the complicated CSSPath or Xpath straight from inspect element feature of Chrome.

*Tip: ‘Copy selector’ equates to CSSPath in Screaming Frog.

Another notable feature here is finding the pagespeed score of each page on the website. More on that here.

*Tip: Check competitor websites for mentions of a certain keyphrase and write content similar to theirs on or around this subject.

Handy Graphs

There are so many panels and menus in Screaming Frog, it’s difficult to know where to look. A couple of really helpful graphs that can be tagged onto emails and reports are:

Crawl depth This graph in the lower right-hand side can be used to see how shallow a website is:

Page title information

This graph can be used to see how optimised the website is as a whole for key elements:

In summary, a super useful course for any SEOs worth their salt – would definitely recommend you enrol on the next one!

Author Jon Baglow

Posted March 3, 2018

Knowledge |