Data Collection With Google Spreadsheets

Here at AuthorityLabs, we really like to play with data. Whether it’s checking website rankings or tracking indexed pages, we’re crunching a ton of data all day, every day. Recently, we’ve started playing with some features in Google Spreadsheets that seem to have an almost endless possibility for collecting and manipulating data. Using the functions for external data, we’ve put together a list with a few ways to help speed up common tasks related to SEO. While it’s not a new feature, I think it’s probably underused.

The ideas below make use of XPath. If you’re not familiar with XPath, here’s a good resource to pick up a basic understanding and here’s a Firefox plugin to help finding the XPath for elements in a document.

Check SERPs

While we think we have the best tool around for checking search results, Google Spreadsheets can provide a quick and free way to see how a site is performing. For the results below, we’ve entered the following into the first cell of the spreadsheet:

=importXML("http://www.google.com/search?q=digital+cameras&hl=en&gl=US", "//a[@class='l']/@href")

Let’s break that down a little bit.

=importXML() – This is the function that Google uses to import the document at the specified URL.

http://www.google.com/search?q=digital+cameras&hl=en&gl=US – This is the URL of the search results page for [digital cameras] at US based (gl=US) English (hl=en) results at Google.com. You can add &num=100 to the end of this URL to expand the results to 100. Note that these results may include more than just the standard organic listings, so keep that in mind when figuring out how a site is performing.

//a[@class='l']/@href – This is the XPath query that is used to parse out the data we are looking for. In this case, we are looking for the href attribute of links with a css class of l.

Compare International SERPs

Looking for an easy way to compare site performance in different countries? If you’re looking for English results, just change gl=US in the above example to whichever two character country code Google is using. For the example below, we have used the following to get UK and Australia results -

=importXML("http://www.google.com/search?q=digital+cameras&hl=en&gl=UK", "//a[@class='l']/@href")

=importXML("http://www.google.com/search?q=digital+cameras&hl=en&gl=AU", "//a[@class='l']/@href")

Analyze a Page’s Links

Sometimes a situation comes up where you really need to look at a list of all links on a page and what anchor text those links are using. Google spreadsheets makes it easy to get these links into a spreadsheet for your searching and sorting pleasure. Drop the following ImportXML() functions into cells that are side by side in your spreadsheet and let Google do the work for you.

=importXML("http://www.yourpageurl.com", "//a")

=importXML("http://www.yourpageurl.com", "//a/@href")

The following is an example of the end result. Can anyone clue me in on why SEOmoz recommends using absolute links but uses relative links all over the place?

Other Ideas

Reputation Monitoring and Reporting – Get creative with Twitter’s advanced search, del.icio.us’ popular lists and tags, and Google’s blog search to document and report on a client’s reputation or recent mentions.

Importing Data From a Feed – The ImportFeed() function lets you quickly populate a spreadsheet with data from an rss or similar feed. This method doesn’t even need any XPath knowledge to make it work. The downside is that it appears to be limited to importing 20 feed items at a time and sometimes doesn’t work as expected.

Keep in mind that if you want to take the most advantage of the data you’re grabbing, there is a good chance you will need to export the spreadsheets for use in Excel or as a CSV. It appears that there isn’t an easy way to sort and manipulate the data that results from these functions due to the way that everything is referenced in the cells of the spreadsheets. I’d be interested in hearing how others are using these features. Maybe there are easier ways of doing some of the above items, but paired with a little creativity, someone who has no idea about writing code can create their own system for scraping and analyzing data.

About Brian LaFrance

Brian is the Marketing Director here at AuthorityLabs. He has been involved in various aspects of internet marketing since 2009 and enjoys helping people solve problems.

Filed under: Featured, Insights

5 Comments

Gareth James

Really great post, going to have a real good play with this combination. I’m trying to get your example to scrape 100 results, but only getting 12. Cannot get the &num=100 to work??

Chase Granberry

Glad you like it Gareth!

Yeah, just adding &num=100 doesn’t work anymore with Google Instant, but if you add &num=100&as_qdr=all it will.

Chuck Reynolds

This is fun – does this info refresh on every load or can you archive it? Archive tabs?

Chase Granberry

It doesn’t refresh when you refresh the spreadsheet, I know that. I think if the spreadsheet hasn’t been opened in a while, it refreshes on open. So to archive data you’ll have to copy and paste just the values into another sheet.

Comments are closed.