Training future Data Scientists - Part 1: What's in a DSIDE season?
Our work is never over - Kanye West, Stronger
We just finished another season of the Data Science for Insight and Decision Enablement (DSIDE). DSIDE is a Data Science training program that recruits 50 undergraduates (3rd & 4th years) and MSc/PhD students to come to the CSIR and tackle some of South Africa's challenges using a Data Science approach. The students spend 3 months at the CSIR, broken up into 1 month in the winter and 2 months in the summer. The program has been running since 2014 and the Department of Science and Technology is the main sponsor. You can find out more about the program on the program website. Now that the formalities are done, I wanted to look back at the just finished season and highlight some of the changes, successes and failures. Running such programs is very interesting and stretches the limits of the program team every year. Just a caveat, the DSIDE program is run between the Modelling and Digital Science and Meraka Units at - CSIR. Some experiences will be shared between the groups of students based at both, but some are unique to each unit. I will highlight this in the post.
Whats in a Season?
First let's start describing what actually happens during a season. The 50 students recruited will work in groups or 2/3 on a number of projects. We have had about 16 projects a year in the last few years. I believe a team of about 3 per project is a good number. It makes it easy to break ties and make decision :D. Our Data Science team at MDS takes 18 students, so 6 projects a year. The students are split into these teams and then assigned a project topic and a mentor. The project topic is not simply a description of the project, but access to a partner (who contributed the project topic) and data. The teams work to tackle the project challenge during the 3 month period they are given. The 1st month is focused on exploratory data analysis (EDA) and for the teams to refine their project challenge after spending some time with the data and essentially understanding the feasibility of tackling the challenge with the data given, the partner interactions and tools available.
The last 2 months are spent doing advanced analysis and modelling to create some product they can show as an output from their project. The modelling might be Machine Learning(ML)/Artificial Intelligence(AI), Statistics, Mathematical models, and other tools in the Data Scientist's toolbox. The "product" can be insights, a dashboard, a model etc. that can be used by a decision maker. We have been adjusting what these outputs are over time and still retain flexibility given the project challenge.
During the project execution, the students also get enrichment. This takes the form of workshops on specific topics such as Exploratory Data Analysis, Machine Learning, Code Management and other advanced topics. They are also encouraged to learn on their own. We make use of online resources a lot. Encouraging students to use online lectures and tutorials. You can see some of the resources we recommend on our website. The students also author reports on their project and create artefacts such as posters or presentations. The capstone of the program is a public presentation and exhibition where the students talk to the public about their projects and exhibit their artefacts and insights. The program prepares the students to be Data Scientists, allows researchers to look at interesting problems in society and gets South Africa ready for the 4th Industrial Revolution. Whew, that was a 3 paragraph synopsis of DSIDE.
This is the first post in a series of blog posts covering the 2017/2018 DSIDE program. In the next few blog posts I'll cover what happens in preplanning, what has changed over the years and where we might go. Thank you to Rebone Meraba for assisting with this post.