Using Server Logs for SEO – An Introduction

SEO Server LogsI think you’ll agree:

No matter what analytics package you use, you can never be sure if the data is 100% accurate.

  • Campaigns might not get tracked,
  • Traffic sources get wrongly attributed,
  • Or acquizition data shows unknown traffic.

These inconsistencies are often insignificant when you review data only to assess traffic and website’s performance.

When you’re auditing your site for technical SEO issues however, they could have a profound effect on your findings.

So is there a way to retrieve accurate data then? One that allows for a thorough investigation of any potential issues with the site?

Short answer – yes – from the server’s log file.

What is a server log file?

A server log is a file containing every hit – a request for a file (webpage, image, CSS, script etc.) a server received along with such information as:

  • Time of the request,
  • Date,
  • IP address,
  • Requested URL and more.

How does a server hit log look like?

For an untrained eye, a server log file could seem intimidating. After all, it contains nothing else but rows and rows of data.

Luckily deciphering this information isn’t that difficult.

A typical log of a server hit looks like this:

123.123.123.123 – – [26/Apr/2014:00:23:48 -0400] “GET /img/infographic.png HTTP/1.0″ 200 6248 “http://www.jafsoft.com/asctortf/” “Mozilla/4.05 (Macintosh; I; PPC)”

Here’s a short explanation what this data means:

  • 123.123.123.123 – that’s the IP address that requested the information.
  • [26/Apr/2014:00:23:48 -0400] – this section reveals the time and date of the request.
  • “GET /img/infographic.png HTTP/1.0″ – that’s the actual request; it reveals what file was requested (in this case a graphic file).
  • 200 – a server’s response code. This is hugely important part information.
  • 6248 – this number indicates number of bytes transferred. In other words, that’s the size of the file requested.
  • “http://www.jafsoft.com/asctortf/” – this is a referring URL. It helps to identify where the traffic / request came from.
  • “Mozilla/4.05 (Macintosh; I; PPC)” – Lastly, the log reveals browser and operating system that made the request.

An important thing to remember is that every time someone accesses a page on a site, server logs a number of hits that are equal to the amount of files requested.

A single page therefore rarely equals one hit because often it comprises of additional files (images, CSS and other scripts etc.).

Why is server log’s data crucial to SEO?

Since it contains information about every single request, a server log can reveal crucial insights about how your site’s working:

  • It can tell when and how search engines crawled your site,
  • Disclose if they had any problems with crawling your pages,
  • Reveal if there are any URLs they couldn’t access,
  • Show how much data was transferred per request.
  • And tell you if there were any errors when processing the request.

What information to look for in a server’s log file?

1. Server Response Codes

The first thing you want to find out is if there aren’t any pages humans and bots had problems accessing.

And the best way to discover that is by checking server’s response codes.

Each response code correlates with a particular server’s response to a file request. Here are the most common ones.

Response Code 200

This number indicates that there were no problems with accessing the page.

Response Code 301

This code signals that traffic to a page is permanently redirected to another page.

It means that the original page no longer exists and when accessing its URL, user or bot are transferred to see a different page instead. This is a permanent redirect, meaning that the original page has been replaced with a new one.

Response Code 302

Similarly to the code above, 302 indicates a redirect. In this case however, it is a temporary transfer from one page to another. It suggests that in time, the original page will be accessible again.

Response Code 404

This code indicates that a particular file / page no longer exists and thus, visitors and bots can no longer access it.

You should always investigate all logs containing a 404 (or other 4xx related codes) as they indicate a potential problem with accessing a page.

Response Code 500

Another code not to ignore. 500 indicate that there were problems accessing a page. This time however they were related to a server not file / page and are potentially easily corrected.

2. Traffic patterns

A server log can also help you identify traffic patterns you could use to optimize your site.

For instance:

  • Finding out top and worst performing pages could help you highlight what content to focus on in the future
  • Most often requested pages could similarly point what content is the most popular among your audience.
  • Referring sites may reveal URLs that help spread the info about your brand (and could be good to build strategic relationships with).
  • Peak traffic times during the day could help you ensure that your server can actually handle those queries.

3. Unwanted traffic

It may sound like a line from a science fiction movie but:

It’s not only humans that try to access your site.

  • Search engine bots visit your site to crawl and index the content.
  • Spiders and scrapers go through your pages looking for information they could scrape and reuse wherever they’re from.

Not all this traffic is good though.

Just like you shouldn’t restrict access to search engines, you should prevent many bots, spiders and scrapers from indexing your pages. They might try to harvest email addresses, steal content or simply take up precious site bandwidth.

Server logs help identify those unwanted visitors and block them from accessing the site again.

And there you have it – using server logs for SEO in a nutshell.

Naturally this post serves only as an introduction.

To find out more about using server logs to perform SEO audit of your site, check out these in-depth tutorials:

Creative commons image by LinuxScreenshots / Flickr

Leave a Reply

Your email address will not be published. Required fields are marked *