One recent Friday afternoon I decided to write a python script that would scrape News24 RSS feeds and fetch all the news stories. I wanted to use this to find the themes that certain columnists covered in their posts. I used beautifulsoup to collect the text and then Tagxedo to do the visualization.
So for example: I used Simon Williamson's News24 RSS Feed (http://feeds.news24.com/articles/News24/Columnists/Simon-Williamson/rss), grabbed all the linked articles from there and then used the resulting text to create the following Tag cloud.
As you can see, I ran the scraper the weeks before the US election and Simon had commented on that on his recent columns. Here are two other columnists. The usual suspect, Khaya Dlanga:
Finally, Sibongile Mafu
This was not exhaustive or clean but does give a glimpse into each columnists recent themes. There are lots of improvements that can be made to the scraper, I did not exhaustively build features into it. It would be awesome if you could just give it a news website and a name of a journalist/columnist and it would figure out the structure automatically and return the text you want. Anywa.
You can get the Scraper here: https://github.com/bionicv/XMLScraper