Predicting the future: South Carolina primary

by John Tomlinson

Before

I'm writing this part of this post on Friday 20th. The South Carolina primary for the Republican Party is tomorrow, Saturday 21st, and we will know the result by Sunday.

Google have been tracking the number of times each candidate has appeared in a Google search term in the South Carolina region. The Google graph only covers up to Wednesday, so the data isn't quite complete, but it's indicative of interest shown in the candidates by South Carolinans, or at least those with internet access.

Google normalize the numbers, and then average them out across the previous seven days, but it gives us a pretty quick and simple picture of the four remaining candidates (I'm not including Stephen Colbert) which could translate into voting intentions. The raw numbers at the end of Wednesday were:

  • Ron Paul 35%
  • Mitt Romney 28%
  • Rick Santorum 22%
  • Newt Gingrich 16%

Because I quite enjoy this sort of thing, I added weight to more recent days and came up with a fancy calculation to tune the prediction, hoping to capture the more recent movement as Gingrich rises (seen in the jump in searches on Tuesday and Wednesday) and Santorum slowly fades. It made little difference. The weighted average was a very gentle swing (barely noticable due to percentage rounding) away from Romney and Santorum to Gingrich and Ron Paul :

  • Ron Paul 35%
  • Mitt Romney 27%
  • Rick Santorum 21%
  • Newt Gingrich 16%

Adding these two together and looking at the graph in more detail, it suggests that Romney and Santorum have slightly softer votes, with Santorum in particular waning. Ron Paul remains strong, and Gingrich has seen a late surge which is likely to grow if Rick Perry's supporters follow the Texan's endorsement of the ex-Speaker.

So the prediction based solely on Google search data:

  • Ron Paul 27-39% - the largest range reflecting the largest spread of searches. This feels high to me, we know that Ron Paul supporters are very loyal, vocal and internet savvy, it is likely that this exaggerates his vote.
  • Mitt Romney 24-31% - solid vote, probably just under 30%
  • Rick Santorum 20-24% - this feels a bit high to me, I suspect Santorum's star was Iowa and he'll fade from now on in.
  • Newt Gingrich 15-18% - this is probably low based on his late surge in support and Rick Perry's endorsement.

Politico and Facebook have done similar research based on Facebook mentions in status updates.

This shows a similar Romney / Ron Paul race, with Gingrich showing movement later in the week and Santorum lagging.

Facebook also split out positive and negative comments by candidate. This is a difficult and error-prone process, it's notoriously tricky to decipher social media language and be sure to which candidate a keyword refers to (e.g."Paul good Romney not good" or "Santorum is wicked"). Interestingly, it doesn't make much difference anyway, all the candidates bunch up with similarly flatish lines.

How does this compare to the real polls. The three polls available today here (Thursday's polls) show the following range:

  • Mitt Romney - 28-34%
  • Newt Gingrich - 24-34%
  • Ron Paul 11-16%
  • Rick Santorum 10-14%

This is quite different. Romney is the only candidate who polls roughly equivalent to the Google/Facebook notoriety.

However, there is form for social media analysis besting the opinion polls. The Twitter analysis of New Hampshire (see here), was more accurate than the polls and was able to spot key trends in support. Even in the Iowa caucus, not an election exactly, Twitter forecast the Santorum surge that saw the Congressman finish only 8 votes short of Romney (this has since been overturned, and Santorum may have actually won Iowa by 34 votes).

This infographic from Dan Zarrella (scroll down to the end) shows Twitter mentions in South Carolina - it does not analyse the tweets, and thus is a measure of pure notoriety. It doesn't include percentages but shows the following result:

  1. Mitt Romney
  2. Newt Gingrich
  3. Ron Paul
  4. Rick Santorum

Mashable in this article discuss the importance of followers:

"National polls happen all the time but it's possible to predict when certain candidates will climb in the rankings based the rate they are followed."

they go on to say:

"It is more important to be followed than to be discussed"

If this is correct, we should expect the result to be:

  1. Newt Gingrich
  2. Mitt Romney
  3. Ron Paul
  4. Rick Santorum

In fact, Gingrich is so far ahead on followers, we should expect to see a resounding victory for the big man, not just in South Carolina, but in the whole primary.

Summary:

  • All agree that Romney will be first or second with around 30% of the vote.
  • Twitter followers suggests a big win for Gingrich.
  • Facebook, and especially Google, expect Ron Paul to the other candidate vying with Romney for the top spot, the polls and Twitter say it will be Gingrich.
  • Santorum is the only one not predicted to win by either. Google searches is the only one not placing him last.

 

After

On Sunday, there was more Google search information available. The graph (below) clearly shows a massive leap in interest in Gingrich and general interest in the other three remaining candidates - Santorum showing the least enthusiastic surge (the flattest line).

The actual result was (source):

  • Newt Gingrich - 40%
  • Mitt Romney - 28%
  • Rick Santorum - 17%
  • Ron Paul - 13%

This shows us that:

  • Google searching is a strong measure of interest in a candidate.
  • Other social media, in particular Twitter, are also useful - in particular looking at numbers of followers.
  • Internet savvy teams such as Ron Paul can skew predictions and need to be accounted for.
  • We still have a lot to learn in terms of turning this data into accurate predictions.

Now for Florida ...

@Colbenson Logo Twitter

Newsletter