Education, whether an online set of courses, an article even, or an undergraduate and graduate program, often do not discuss the professional portion of data science. Of course, highly complex, machine learning algorithms and deployment of models is crucial to learn for a data scientist, but there are other aspects that are especially important as a professional data scientist or data scientist that is more in front of customers. A customer also does not necessarily mean the customer of a product, but the customer can also refer to the stakeholder in the company. With this being said, let’s discuss three critical soft skills that every data scientist should know as they transition from a student of data science in education to a professional data scientist.
This point is both a skill and a reminder that you do not work by yourself. Soft skills, inherently, are skills that relate to communicating with others, different from other skills such as Python or skills that don’t involve interacting with people.
You will have to become accustomed to communicating with a many other people in different roles within your company. Most of the time, a project is led by a stakeholder, a person in your company who has come up with the ask and has therefore organized a team, usually of a data scientist, a product manager (the stakeholder themselves most of the time), a data engineer, a software engineer, and a specialist based on the project you are working on.
With that being said, here are some actionable ways to work this skill at your company:
- Break down data science terms into something that anyone can understand
For example, here is a poor and great example of explaining data science to a stakeholder
Poor — “let’s use a supervised, CatBoost regression algorithm for our model to predict the target variable”
Great — “we can utilize past data to feed into our algorithm that will help to predict a future value”
- Work on the relationship itself, by participating in meetings around product and research, rather than only data science meetings
- Become familiar with company KPIs (Key Performance Indicators), because a lot of the goal of data science, is to improve upon those KPIs from the effects of the model that you will build. For example, become familiar with at least five to 10 KPIs, like
clicks per user, or
average time to drive, etc. — these are usually company dependant, while some are more generic and can be applied to most companies. It is also important to know the language of the company, and KPIs are one the main terms of that language.
Problem Statement Definition
Now that you have developed a relationship with your stakeholder, you will become better at defining the problems and solutions with them. The stakeholder will most likely not know how the algorithm works, because that is not their job and not important to them, but they will need to know
what the predictions are,
how often, etc. With that being said, I will give another poor and great example of a problem statement that is used for both you and the stakeholder.
Poor — “the utilization of the number of people in a movie would be easier to predict with a machine learning algorithm from the data science team, so we should try to calculate previous statistics to know how many people will see a movie”
So, what’s wrong with it?
- Do not put a possible solution when wording a problem — this limits opportunities
- It’s too long
- It’s difficult to discern what the actual problem is
- Sometimes, stakeholders will provide a long, worded, and involved solution, when the actual data science model is just onepart of the solution
Great — “we do not know how many people will see a particular movie”
What’s right with it?
- Yes, most of the time, simple wording is best
- Now, we look into the solution, like an algorithm for example, and discuss possible data or features to feed into it
- It easily highlights or isolates what we do not know — the problem
Overall, as you can, the best way to approach looking at a problem, is by defining it in its simplest terms. Of course, then you can build off of it further and get more specific, but starting off more general makes it easier for all parties involved to truly understand the reason why there is helped wanted for a particular problem, and perhaps, more than just a data scientist can be helpful in solving the problem.
Similar to the trend of the skills mentioned above, we want to focus on the soft skills of the thing that you can improve upon that relate to communication. For example, keeping things simple and easy to understand is key to being a successful data scientist. Do not try to throw complex mathematics and confusing statistics, but instead, focus on how the results of the model impact the business.
Here are some things to think about when communicating data science model results:
- What is the general impact of the model?
- How much does the model cost in production?
- What was the goal, and has the model reached that goal?
- What is the percent increase of a KPI(s)? For example, “because of this data science model, for 80% of people, we were able to able to predict the number of people in a movie theater within +- 10 people”.
- Use a visualization graphic or more to describe your results, simplest is best
The main thing is to keep in mind the impact of your model on the business, including concepts like time, money, product, and scalability.
These three skills are often not taught, but learned as you develop in your data science career. With that being said, it can be advantageous to not only yourself but also to the company that you work at eventually, to study these concepts beforehand, as well as improve upon them at your current company if you are already working as a data scientist.
I’m summary, here are 3 main, soft, professional skills every data scientist must know:
* Stakeholder Relationship
* Problem Statement Definition
* Results Communication