#DataHack1: 2014 South African Election Social Media Hack #HackRU

Snapshot of the Mention Network on 12 April 2014 About SA Elections
Snapshot of the Mention Network on 11 April 2014 About SA Elections

UPDATE: This post made in onto iAfrikan (in a nicer edited format): Link, Evernote Mirror

UPDATE: DataHack2 and DataHack3 are now available.

So attending a bit of HackRU Spring 2014 and thought I would take the opportunity to sharpen up my python and data science skills. The dataset I chose to do some deep shallow dives on was a Twitter archive of Tweets about the upcoming 2014 South African elections. I setup a small python script using Twitter Python. The script grabs all tweets (hopefully) that have to do with the following search keywords

Search keywords:

  • malema
  • zuma
  • zille
  • ramphele
  • mamphele
  • mampheler
  • presidencyza
  • julius_S_Malema
  • EconFreedomZA
  • MyANC_
  • da_news
  • agangsa
  • helenzille

The list is not exhaustive but should still result in some interesting insights. This specific blog posts is about the ~ 7800 tweets collected on 11 April 2014.

#HashTag Fun

The first bit of analysis was to get all the tweets and extract the most used Hashtags. I created a histogram of the number of occurrences of a hashtag in the figure below.

12 April 2014 Hashtag Frequency
11 April 2014 Hashtag Frequency

What is interesting about the most frequent hashtag is that it also gives us a glimpse of what was happening on April 11th. The top hashtag #ayisafani also coincided with the rigorous campaign by the Democratic Alliance to spread their new "Banned" TV ad [Youtube Ad]. Obviously the president will also feature heavily as the second most used hashtag #zuma. It would be interesting to do some sentiment analysis on these tweets but getting a good sentiment model would be hard given the mixture of languages used in South Africa. Anyway, if anyone is interested in implementing this, the data is available and can be accessed from the link provided later.

Klout and Retweets

Now I know this might be a bit circular, but I wanted to measure the correlation between the user Klout scores and the number of retweets they received in a given day. For this I used the Klout API via their Python package to retrieve the Klout scores of each retweeted user. I present the scatter plot of the # of Retweets vs. the User Klout score below.

Number of Retweets vs Klout Score on 12th April 2014
Number of Retweets vs Klout Score on 11th April 2014

I annotated the plot with the labels of the top 5 retweeted accounts. The correlation for the top 30 most retweeted users was  0.51. So the correlation is positive but not strong. This was still interesting as we can see that @EconFreedomZA are punching above their weight on this day with more retweets than the @DA_News account, which has the highest Klout. score. Hmm. Well something interesting happens when we look at the number mentions vs. Klout score instead. Below I plot only the top 20 mentioned users and their Klout scores. I did not annotate it but to help in reading it I will say that the users with the second highest Klout score is @hellenzille (leader of @DA_News). The correlation here is 0.71. Way more strong.

# Mentions vs Klout Score (12th April 2014)
# Mentions vs Klout Score (11th April 2014)

Now it might be interesting if we can reverse engineer the complex Klout calculation via linear regression.

The Mention Network

The first image in this post is a snapshot of the social network constructed by tracking the mentions of specific accounts in the Twitter dataset. The relative size of each node(account) is the number of mentions each of those accounts has gotten in the network. As you would expect with this metric, @DA-News has the largest size. You can view the full network with accounts that have at least 6 mentions in the network here: 11th April SA Election Mention Network


As always, I made available the Twitter JSON dumps at my GitHub. Grab the continuously updated data here ->  github:za-2014-election-tweets

iPython Notebooks

Yes, I will upload them when they get interesting.

Tagged with: , , ,

0 Comments on “#DataHack1: 2014 South African Election Social Media Hack #HackRU

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.