How Accurate is Alexa? The Stats Don’t Lie

After gathering monthly stats on 40+ blogs and comparing their online ranking by various tools (Alexa, google reader, Quantcast, etc) to actual page views, I wanted to take a few posts to put the data out there on how accurately some of these online tools actually rank sites. Alexa is a site that is designed to rank websites based on traffic. An alexa rank of 1 would be the most visited site in the world (google, facebook is #2). A rank of 1 million would predict that website to be the 1 millionth most visited site on the web. So the higher your rank, the less traffic you would get. That is the theory.

One of the problems with alexa is we don’t often have the actual numbers to compare ourselves to in order to see how accurate alexa actually is. Well, now we do have some data. After requesting the data from 40+ blogs we have enough  data to run some Pearson correlations to see how well alexa ranks blogs. Alexa does claim to be less accurate on blogs with higher rankings so we have to take these numbers with a grain of salt. However, the average user of alexa does not have a top 100,000 blog, much less a top 50 blog. So are the numbers from alexa practical at all?

First, there is the correlation on the entire data set. There were 32 blogs that had alexa ranks (between 139,913-19,740,920). Alexa rank was correlated with those 32 blogs actual page views as submitted by each blogger. Remember, the lower your rank, the higher you would expect your page views to be. That means you are expecting this correlation to be negative and statistically significant (2-tailed significance less than .05). A correlation was run, a Pearson R (2-tailed) with SPSS. It showed there was not a significant correlation between actual page views on alexa rank. R = -.286 (2-tailed significance of .113). In other words, having a low alexa rank does not actually correlate with having a higher number of total page views in a given month. At least, that seems to be the case on blogs ranked in the Alexa range of this data set as defined above.

Is alexa actually more accurate the lower your rank? In order to test alexa’s own claim that it is more accurate on lower (more trafficked) ranked blogs the data set was split nearly in half with basically half the blogs falling below an alexa rank of 3m (18 blogs) and half above (14 blogs). As alexa predicted, the lower ranked blogs did have a higher correlation R = -.445 (2-tailed significance of .064). That is extremely close to being statistically significant and is in the expected negative direction. The higher scoring blogs did not do as well and actually had a positive correlation. That is not what you want, as a lower alexa rank should translated into more page views. R = .238 (2-tailed significant = .412). So alexa’s claim that it gets more accurate the lower the rank gets appears to be true.

Bottom line, alexa does improve as your blog gets more traffic but it is still far from perfect. None of these correlations were statistically significant. The closest was on blogs with more traffic. It would be interesting to gather data on blogs with alexa ranks under 100k and have 50-100 blogs in the data set to see what the results would show.

About mattdabbs
I am a minister, husband, and father. My wife and I live and minister in Saint Petersburg, Florida. My primary ministry responsibilities include: small groups, 20s and 30s, involvement, and adult education.

23 Responses to How Accurate is Alexa? The Stats Don’t Lie

  1. Tim Archer says:

    The work you and Jay are doing has gotten me intrigued with all of these rankings. I installed SiteMeter on Tuesday. According to site meter, my site had 37 visitors yesterday, with 107 pageviews. WordPress shows 101 pageviews… don’t think they record visitors. Google records 65 visitors, with 264 pageviews.

    Being a preacher, I would have to go with Google’s numbers, of course.🙂

    Grace and peace,
    Tim Archer

  2. jamesbrett says:

    matt, i was just wondering about some of this numbers stuff. glad you’re here to answer all of our computer questions…

    i also installed sitemeter as per your (or jay’s?) instructions. it seems that sitemeter counts my own pageviews, and wordpress does not? is that right, and how it’s supposed to be?

    and how much of the discrepancy in alexa rankings is due to some people having alexa toolbars and others not?

  3. jamesbrett says:

    also, this is odd, alexa tells me that 55% of my readership is in ireland — and that my blog is the 20,000th most popular blog in ireland. that’s obviously messed up. i don’t even speak irish…

    • mattdabbs says:

      Sitemeter does count your own page views and wordpress doesn’t. But what is interesting is as I look at my monthly and grand totals in wordpress and sitemeter, wordpress is always more. That is counterintuitive but there has to be a reasonable explanation.

      Sitemeter only records a hit when the sitemeter graphic loads on a page. When you go to a website with sitemeter and its graphic loads, it records 1 visit and 1 pageview. If you click on any other page and the sitemeter bar is installed on that page you are at 1 visit, 2 page views, etc. Also, if you go back to that site within about 5 minutes, sitemeter records it as the same visit (not sure if it still does that but it used to). On blogs we don’t often think about the logistics of this because our sidebars are static and so the meter loads up on all the pages. If you wanted sitemeter on a site you were hosting, say a church website, you would have to paste the code on every page you wanted counted in the stats.

      So why does wordpress count more pageviews than sitemeter when you would think it would be the opposite? I am racking my brain to think of a time when wordpress would pull a page up or the contents of a post for someone without the sidebars being loaded. I can’t think of any instance of that right now but it has to be the case, otherwise sitemeter would always be higher as it counts our own visits.

      You might think an RSS reader would do this but it doesn’t. RSS doesn’t count because an RSS reader like google reader or bloglines uses a bot to pull your information just once. Then it has your blog post in its database to feed to anyone when they open their reader. It could be that wordpress is counting the visit of those bots that pull your content but wouldn’t load the sitemeter graphic in doing so. But I kind of doubt this because there are probably 100+ bots that do this for all the various types of feedreaders. That would mean even a blog with 0 real visits should have say, 100 views, just from bots. That is obviously not the case. So I don’t think RSS readers explain the discrepancy.

      On this blog, wordpress and sitemeter are only 1% off on the totals. Over time some of this seems to even out, while some days are more off than others. I first had sitemeter at blogger. When I switched over here I think I had a couple thousand views at the end of 2006. If I take a couple thousand views out of my current sitemeter total it is currently only 1% off my wordpress total. So over time they seem to run extremely close.

      Having the alexa toolbar installed in your browser should not be enough to raise your rank substantially. According to the alexa site, alexa ranks are based on those who have the toolbar installed (millions of people) and analyzing the sites they frequent. See this link, especially the first paragraph – http://www.alexa.com/help/traffic_learn_more

      So you adding the alexa toolbar, 1 person out of millions, should not substantially increase your alexa rank. Also, you are viewing many other people’s blogs also on this list, while having the toolbar installed, so all these things should even out.

      I have no idea why alexa thinks blogs are so popular in other countries. That data seems ridiculous. I would guess it must have something to do with people in those countries with the alexa toolbars, but the ranks are so high they seem very questionable.

      Hopefully something in there was helpful.

      • jamesbrett says:

        thanks, matt. all very helpful and informative. i don’t have an alexa toolbar, and don’t intend to get one (i like the one that comes on safari just fine), but i was reading about them the other day…

  4. Hey, nice post. I love statistical analyses like this. I thought I would clarify one thing, though. The Alexa Traffic Rank is not based on pageviews, so I’m not surprised that you found little to no correlation between pageviews and rank.

    There are two measurements we use used when calculating ranks. The first is the unique number of daily visitors to the site, and the second is the number of daily unique pages each visitor sees. So, if you have a toolbar and you visit the homepage of a blog 1000 times a day, you are weighted the same as someone with a toolbar who visits that same page once.

    We also have some fairly complex algorithms that remove remove spam, duplication, pay-for-rank services, etc. This is all part of the “secret sauce” for how ranks are determined.

    I hope this answers some of your questions. If not, I’m happy to discuss this further.

    Best,
    Wayne from Alexa

    • mattdabbs says:

      Wayne,

      Thanks for your input and insights. It is imperative that if you are going to correlate two variables that they be attempting to measure similar constructs. Meaning, two things can be accurate in what they measure but still not correlate well because they are measuring two differing constructs.

      From what you are saying it sounds like this correlation needs to be run on alexa rank and unique visitors and/or unique pageviews. I can put that together. What is really needed is a bigger sample size (around 100+ at least). I’ll see what I can do.

      • Glad to help! Since most people stop at a blog’s homepage, my guess is you’ll find a reasonable correlation between unique visits and rank. I also should have pointed out in my last comment that we use 3 months worth of data to calculate the official Alexa Traffic Rank. If you have difficulty getting that much data, you might instead correlate the 1-month ranks with unique visitors.

    • mattdabbs says:

      That is what I was thinking…using the 1 month alexa rank with the same month’s unique visitors. Would it be most accurate to use the alexa rank at the beginning, middle or end of the month in question as the stats wouldn’t be totaled until the end of the month?

      • All of our ranks are “rolling,” meaning they are calculated daily using the previous 30 or 90 days of data. So if your stats are totaled at the end of the month, it would be best to grab ranks from the 1st of the month.

    • mattdabbs says:

      So if I am going to do a correlation on May’s stats, on June 1 I would get May’s unique visitors and Alexa 1 month rank? That sounds right. Just wanted to make sure you weren’t saying get alexa May 1 in that example.

  5. jamesbrett says:

    dude, matt, that is so awesome. a real alexa guy commented on your blog….

  6. Jay Guin says:

    Matt,

    I have to say I’m skeptical that Alexa scores will correlate. I mean they are so very far removed from reality. http://oneinjesus.info/2010/04/13/top-25-progressive-church-of-christ-blogs-technical-discussion/

    Therefore, I’ve not seen anything that explains why Alexa has such different results.

    I mean, the differences to consider are —

    * Alexa one-month vs. 3-month rank
    * Spider traffic
    * Unique visitors
    * Visits vs. page views

    But my own Alexa rank hasn’t changed that much in 3 months. It’s changed, but not nearly enough to explain even 10% of the discrepancy.

    WordPress filters out spider hits. http://en.support.wordpress.com/stats/ SiteMeter evidently does as well, as its numbers closely match WordPress.

    The correlation of Quantcast Unique Visitors and Alexa is only 0.301, whereas QUV correlates with our Page View data at 0.902.

    Visits will be a lower number but I would think the effect would be fairly uniformly distributed. I just can’t see the difference between visits and page views justifying the wide variation we see in Alexa scores.

    On the other hand, it looks like there’s a baseline of decent correlation Alexa scores, with several anomalies — maybe 20% — in the mix that throw the correlation way off.

    Edward Fudge, Dell Kimberly, Patrick Mead, and Terry Rush would be the ones with the largest discrepancies. Take them out, and Alexa starts to correlate fairly well within the top 25. If only there were a way to know which ones are wrongly scored …

  7. Jay Guin says:

    PS — I wonder if we can persuade the CoC blogging world to install the Alexa toolbar? This is a factor, too.

  8. Jim says:

    that’s a lot of work for something that isn’t at all important. i always figured people blogged just for the joy of doing it. i had no idea they ‘chased the dragon’.

    • mattdabbs says:

      When you are a stats junkie and have the knowledge and equipment to get it done it only makes sense to spend hours doing this…right?

      • Joel says:

        I reckon either spend hours examining stats and making sure that ever last visitor is counted, correctly, or you could blog?

    • jamesbrett says:

      jim, it seems like i saw your name in the number one spot of a list recently (by alexa stats). congratulations on catching the dragon.

  9. mattdabbs says:

    Joel,

    Pretty sure I do at least a little blogging here too…just saying.

  10. Joel says:

    I reckon, that when you said hours, I figured you spent all your time on stats. Of course, this could just be projection🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: