#DataHack3: Where are the Tweeting classes from?
Location(s), Location(s), Location(s)
A common criticism when analyzing social media data in South Africa is how it is not representative of the general public. I agree, to a point. I think as time goes on and the (mobile) internet becomes more ubiquitous in South Africa, social media, especially Twitter will have stronger and stronger representation. So the natural question to ask, then is: Where are the Election Tweets coming from?
To answer this question is not made easy by Twitter. A big part of all tweets sent do not have location information on them. For me, personally, this is a good thing as it means people are not publicly revealing their location and as such retain a little more privacy. As a data scientist, yeah it's not that great as I have to throw away a lot of information to find the Twitter users in the election datasets who happen to be tweeting with their location broadcasting. This turned out to not be that bad.
The visualization below is the locations of tweets sent on April 22nd. As you can see the map is very representative of the country. The metros have the highest densities but the locations reported cover most of the populated areas in the country.
User Self Reported Locations
The second visualization is of the self reported locations of users.This meansImined the user profiles and checkedwhatlocation they said they were from. Then I usedGoogleGeoLocation APIs to find their locations and then map them. Again the user locations are also all over the country.
As always, I made available the Twitter JSON/CSV dumps at my GitHub. Also included are GeoJSON files with the tweet locations and user locations on generated on a daily basis. Grab the continuously updated data here -> github:za-2014-election-tweets