There I was, lost in the vast amount of online courses available for data science, visualization, analysis, exploration. It is usually quite easy to get excited for a new topic, to delve into the basics and to learn just enough to recognize buzzwords around that new topic. However, in every learning journey, there comes the time where you stand at that metaphorical junction and have to decide where to go next. Only that the junction is in your head and has more than the usual three or four confluences. Oh and there are near to no signs, and if so, you cannot really judge whether that actually is the direction into which you want to head.
The passion for data science
So, on a sunny day of spring in 2020, this was exactly where I stood. I had already finished several courses on basic maths, an introduction to data science, and also refreshed my Python knowledge. Until this point, it was quite clear what I needed to do next – I had to understand the mathematics before going into details with how to work on data sets, and I had to learn how to work with data in Python before visualizing it. Everything so far was straightforward. Then there was this junction. The only sign was one very prominent one, floating directly over my head. It said: What is your project?
My project. My very own, first, baby project. Not the real baby (which kept kicking my intestines at this point in time at a very regular basis) – a metaphorical one which I would birth from my passions and desires. As usual when I come across situations in my life that make me focus on my passions and desires, I just felt one thing: emptiness.
It came across my mind that I stopped pursuing to learn Python in the first place was exactly this. I couldn’t find any project that I was passionate about (and felt capable of doing – not that I do not have passions… that is usually not my problem). So, for the second time, I sat at this junction, with this big and heavy sign floating over my head. What is my project. Why do I want to learn how to code? And why am I so interested in data science? As many stories do, this one starts with a trauma.
A severe mistake and its consequences
As a manager of several teams, and part of the company’s strategic decision council, I came across many situations where good data was key to good decisions. Being the one in charge, I dug through the data I could find, and tried to base my decisions on the findings. In more than one case, I wished for more data, or for better ways to analyze them. I used my Excel skills and developed habits of analyzing the same data at different points in time, in order to provide data to my colleagues to allow them to take informed decisions as well. When you work with data on a regular basis, you know how much influence just a small change might have. One example of a severe mistake that I made happened during my first year of analyzing revenue data for certain projects in the company.
I drew my data from two different sources. First, I cleaned them of not needed rows. Afterwards, I combined them in one table. The next step was a manual one: I added some more data about the revenue of each of the relevant projects in new columns in that same table. As this was a manual step, I was aware that it was horribly error-prone, and checked each line twice. Afterwards, I switched to automatic summarization of my data with the help of basic data analysis tools in Excel. The numbers were presented in the annual presentation of my company – that is how important they were. And they were wrong.
Now, where was the error introduced? It was not during the manual step. I was far to cautious for this. However, afterwards, I relied on the automatic steps, and as all spit out numbers that looked quite okay, I did not double-check them. Do not ever, ever trust numbers without checking where they come from! It turned out that during the merging step of my different data sources, one column from table A was not present in table B. I had around 30 columns in total, with around 600-1000 lines in each file per month. Of course, I didn’t check the whole table. I just checked whether the numbers in the end looked viable, which they did, and missed the fact that half of the revenue data actually was revenue data with a specific filter applied as it took the data from the wrong column after merging. I didn’t need that specific filter, but it was in my export – because real-life data is never, ever completely clean and adjusted to your purpose. It reduced the numbers by some percentage, resulting in a loss of around 2% of my total revenue. Additionally, it was quite embarrassing to have made that mistake. I could easily have seen this, if I would have been more thorough with my data. I was lucky that my former manager double-checked the numbers and found the mistake. We could correct the numbers in time, and no consequences had to be taken. Well, not none. In that moment, I swore to myself to be better with data. And that was it. That was the point when I knew I wanted to learn about data analysis.
Can you really trust numbers?
There is another reason of why I decided to learn about data science. Around the same time when I made that mistake with the data, I regularly attended strategic meetings as an expert. My job was to provide knowledge to the management team, so that they could take informed decisions. I failed regularly. I had the data they needed (at least I had some of the data they needed), and I provided it to them. However, they did not listen to what I was saying. I created bar charts and pie charts and line plots to help them understand what the data meant, and something interesting happened. They looked at the charts, but focused on details that were not important. In some cases, they interpreted the data in an unforeseen way, resulting in decisions that from my perspective did not reflect what the data really meant. I tried to argue against this, but it was pointless: In the moment I introduced the charts, the management team trusted them more than me. Arguing against the decisions they took based on the charts was like arguing against my own data – it made me look inconsistent in my own analyses and less trustworthy. This was a time when I struggled heavily with my role, and with the benefits I could bring to these meetings.
The same situation repeated later, when I was part of the strategic council myself. Whenever we talked about data, I noticed a certain behavior of my colleagues. They blindly trusted the charts they saw and the numbers that were presented. In the moment where a number is part of a nice chart or power point presentation, it becomes immutable. If numbers were suspicious, the question would not be whether the data source was reliable. The question would be how to react to this number. Together with my previous experience in the strategic decision council, I realized the true power of numbers – and thus the crucial need to make sure that numbers are not only correct, but also presented in a way that can be understood by everybody. I also realized that I didn’t want to take strategic decisions without understanding data. And in this situation, my desire to learn about data science, a term that I googled that evening and that I had never heard before, was born.