Chemical Engineering to Data Science — My Journey So Far
A reflection before my first days as a data analyst
I recently graduated from Nanyang Technological University! Without a doubt, it has been an amazing 4 years of learning.
Having graduated from Chemical Engineering and Marketing, I’m excited to start my role as a Data Analyst in a mobile tech start-up soon.
At this point, you’ve probably noticed the dissonance between my academic background and what I ended up doing. Very often, I get questioned about it —
‘Aren’t data science, marketing and chemical engineering very different from one another?’
In this post, I will reflect on my journey as a Chemical Engineering and Marketing undergraduate, and connect the dots that lead to my path as a Data Analyst. I will try to briefly outline the problems that I faced in my internship, and how I addressed them and learnt from them — and hopefully help those who are contemplating a similar move.
1. Marketing Internship in Xfers: Creating a Mini Data Pipeline
My very first internship experience was with Xfers in Singapore.
The problem
At that point in time, Xfers had a manual client on-boarding process. That is, a potential client first sends in an inquiry form to the Xfers’ sales team, who then contacts the clients to decide if they are suited for Xfers’ service. However, this process was tedious and somewhat inconsistent.
The solution
What if we could automate this process? The idea is to let clients fill up a form based on their needs. The responses automatically populate a spreadsheet, where a ‘suitability score’ is calculated. If the score hits a threshold, the client is automatically added to its customer relationship management software. The sales team can thus save time by disregarding the irrelevant leads.
There I was, a chemical engineer with no coding background writing Javascript (read: plagiarizing from StackOverflow). Little did I know, I was setting up a mini data pipeline and automating a data-driven decision-making process. Sure, it was a far cry from production-level code, but it was an eye-opener for me and I started pursuing coding and data analytics more seriously.
2. Marketing Analyst Internship in Fave: Making Data-Driven Decision
Next stop:
The problem
Fave’s daily newsletter was one of the main drivers of revenue in Fave — it recommends Fave’s most popular deals to our users. To select the most relevant deals, the team would manually rank the deals based on their past performances. This was a practical heuristics that served the team well.
However, the recommendations made were not personalized. Our first prototype was to segment the customers and recommend different deals to each segment. I was tasked with manually selecting the deals for the different segments daily. Unsurprisingly, this process was rather slow and inconsistent.
The solution
Thus, I implemented an automated scoring system, again. Except for this time I had to design scoring rubric — should the rubric choose the recommended deal based on metric A, metric B, or metric C? Eventually, it made the decision based on a combination of different factors / metrics and was tested to see if it was able to recommend the best deals.
Looking back, I realized I was performing some simple feature engineering to predict the best deal and testing the performance of the model. The thinking process is not that much different when a data scientist looks at a data set, uses domain knowledge to make new features, implements a supervised learning algorithm, then performing a performance check using a train/test split.
3. Research Internship in Silicon Valley: Communicating Technical Data
Having interned in Xfers and Fave, I realized that I like start-ups a lot. So, it’s a dream for me to be in the land of dreams… San Francisco!
I was incredibly excited to have landed an opportunity with Kuprion, which is an electronics start-up developing nanocopper pastes and inks, which are a flowable copper “metal-adhesive” with superior thermal and electrical performance.
The Problem
As Kuprion was developing a novel material, we were attempting to find the ‘perfect recipe’ that would allow us to make the perfect material that works under different conditions. The team was testing multiple hypotheses and thus was working on different recipes.
My role as a research intern was to run experiments that confirmed (or debunked) these hypotheses and present the findings with a team of scientists. The challenge was to identify patterns in noisy data, to analyze the pattern and to present the findings clearly to my team and to clients.
The Solution
In this internship, my domain knowledge in chemical engineering provided a good starting point for understanding the material’s chemistry. This helped me understand the data collected and identify patterns in data with more ease. Here’s when I realize — data is only useful if one can understand it properly and assign meaning to it. To be a truly proficient data scientist, one has to be the subject expert as well.
Moreover, this experience taught me the skills of making visualizations with the end audience in mind, as different audience has different technical knowledge at hand. Appropriate visualization makes presentations much more intuitive and understandable.
4. Data Science MOOCs
I connected the dots of my internship experiences and realized that I have been pretty interested with data — so I started seeking out data science courses. My first course was Andrew Ng’s Machine Learning course on Coursera.
I was hooked and continued with more courses on data structures, linear algebra, probability, statistics, deep learning etc. In the following post, I highlight how I learnt Data Science by myself:
- Part 1: SQL, Python and R
- Part 2: Machine learning (coming soon)
- Part 3: Statistics and probability (coming soon)
My background in engineering helped me tremendously in understanding the machine learning mathematically using concepts from linear algebra and multivariate calculus. For instance, chemical engineers learn linear algebra and multivariate calculus to perform numerical optimization — whether using gradient descent or Newton’s method — as chemical engineers are constantly optimizing for the best performance.
All these experiences helped me land my first actual data science role at the Land Transport Authority (LTA).
4. Data Science Internship with Land Transport Authority (LTA): Bringing it all together
This was my first opportunity to experience the data science pipeline from start to finish.
The problem
Every day, we travel from one place to another on different modes of transportation. Being able to predict the patterns of our citizens’ movements using different modes of transport will help urban planning in Singapore significantly. Thus, can we predict what a user is travelling on based on sensors from our phones?
The solution
I collected acceleration and rotation sensor data on my phone when I was travelling. The data is then pre-processed using signal processing techniques (Fourier transform). Using this data, a multi-class model was built to predict 4 travel modes: Walking, Train, Bus and Idle (not moving).
The results are not too shabby — this model was able to perform significantly better than a baseline model that predicts the same mode of transportation all the time. Eventually, I was able to present to one of the C-Level of LTA — this was a great confidence booster personally and really motivated me to continue on this path.
6. Projects
Along the way, I have also spent some time building several other projects which have really helped build my understanding on these topics.
- A computer vision project: A Real-Time Emotion Detector using Keras and OpenCV
- A Tableau visualization project: ‘How Different is Data Science Across the World?’
- A statistics project: ‘Does Higher Spending Leads to Poorer Education?’
In my free time, I also write about machine learning and statistics. Here are some of my blog posts:
- What Makes Great Wine… Great? (Using ML and Partial Dependence Plots in the Quest of a Good Wine)
- Interpreting Black Box Machine Learning Models Using LIME (Understanding Breast Cancer Predictor)
7. Closing Words
All these experiences bring me to where I am today — incredibly excited for what is to come for me as a data analyst.
I am planning to continue writing about my learning along the way as a data analyst. Meanwhile, I am also pursuing a Micromasters in Statistics and Data Science by MITx (which I dare say is one of the most challenging course works I have thus far), and hopefully a Masters in Data Science/Artificial Intelligence/Computer Science in the near future.
If you have any advice or questions for me, please hit me up on my LinkedIn or leave a comment down below. :)
Many thanks for Sophie for her feedback on this post.