Training future Data Scientists - Part 3: Breaking down the process
Nothing's ever promised tomorrow today, But we'll a find a way - Kanye West, Heard 'Em Say
In this post I want to break down how we tackle the challenges that are chosen for the DSIDE program. As such lets break down the process.
How we look at problems
When we decide on the final challenge with a partner, we have some question that needs to be answered, some data to be delved into and background that the project team will have to get. We now work with partners to define their project. First the potential partners fill in the web form with information needed to start scoping a project. We then follow up with promising project partners with a scoping sheet. We currently send a scoping sheet inspired by the DSSG program. Once all of this is done, we can then get to tackling the problem.
To get to a final solution or product with the project team, we take a 3 step process.
- Exploration of the problem and data to refine the questions.
- The problem might be ill posed even with work we have done prior with the partner to refine.
- The data might not be useful or harder to use than first imagined.
- The depth might not be adequate after the exploratory phase
- We need more computing power/storage.
- Modelling as an approach to tackle the heavy lifting.
- Depending on the problem, what the flavour the modeling takes might be different. A lot of the senior researchers at MDS are Machine Learning researchers, but it does not mean that every problem is an ML problem.
- Statistical, Economics modelling is also explore
- Product development
- The final outputs of a DSIDE project can be few or many.
- One output might be a set of insights that can be used by the partner.
- Another output can be a web application that showcases dynamic insights.
- A model that can be shared with the partner.
What then happens?
DSIDE embraces a project based learning approach. We work to prepare ourselves and our partners to follow more or less our approach to break down problems and work towards a solution. In the June/July period the students work mostly to understand the project challenge they have been given. They work with their mentor and leads to unpack the problem, audit the data and form an initial approach to how that challenge will be tackled. During this time they interact with the partner, there might need to be refinements to the problem or changes in the expectations of what is possible in the time allotted. All of these pre modelling steps are important as just jumping in and taking on a path without really understanding what is being solved and what is at hand can be disastrous. In the December/Jan period Modelling takes center stage and towards the end the products are then finalised.
Student Development and Team Composition
The development of the students during the program is paramount. We would like to use their strengths to tackle the problems we have, but we also have to stretch them and have them learn new things. At the beginning of the program, after recruitment is done, we sit down as project leaders to start assigning students to teams. We have a number of things we optimise for:
- Skill Sets to attempt the problem.
- Balancing different educational institutions.
- Mixing educational level.
- Allowing for student exposure.
Through investing in the growth of the students, we then hope to create better solutions given their insights.
What’s so hard about this?
It takes practice to look for warning signs that there might not be enough (information, data, skill) to tackle a problem. You also have to manage expectations, not just of the partner, but also the student team. Data Science is not a silver bullet, but with a careful sustained approach, we can find wonderful and interesting insights from the challenge data. Things might not go the way one initially thought they would, but still result in a good outcome for all. The journey we put students through is experiencing this journey, without being handled with kid gloves. Yes, the students sometimes get challenges that are a little out of reach. Yes, the students still have to interact with partners, discussing the challenge and refining their problem. They need to be responsible for their own outputs and learn that you have to keep communicating with the partner and each other.
This is third in a series of blog posts covering the 2017/2018 DSIDE program. In the next few blog posts I'll cover what happens in preplanning, what has changed over the years and where we might go. Thank's to Donald Ntjana for inputs on this post.