Your First Data Science Project Idea as a Beginner
Build a tailored project for a strong resume
This is a 5-part blog post to learning data science. Feel free to follow this series:
- Part 1 — Data Processing with SQL, Python and R
- Part 2 — Mathematics, Probability and Statistics
- Part 3 — Computer Science Fundamentals (coming soon)
- Part 4 — Machine Learning
- Part 5 — Building your first Machine Learning Project (you’re here!)
I recently graduated from Chemical Engineering and landed my first role in the data realm in a tech company. During my interview process for data science positions, one of the most frequent questions that was brought up was
‘What are some data science projects that you have done?’
This question is perhaps unsurprising. Almost certainly, an interview of a data science position will be incomplete without a question on the interviewee’s data science project.
Especially for someone without formal training in data science, a data science project is extremely crucial in allowing you to stand out among other data science candidates because it
- demonstrates your technical capabilities
- showcases your commitment to learning the ropes of data science
- attests to your interest in a particular topic (based on the topics of your project)
Therefore, if I were to provide advice to new data science learners, it would be to —
Create a strong project that showcases your technical abilities and your unique insight into your area of interest.
This post provides a simple guideline on how to do that in three sections:
- Pre-requisites: what knowledge do I need?
- Warming up: your first data science project.
- Brainstorming: find the perfect project idea with 3 questions
1. Pre-requisites: What Knowledge Do I Need?
Before diving into doing a project, it is generally a good idea to brush up on your technical knowledge and coding expertise. This reduces the steepness of the learning curve during project execution, making it more frictionless. Without such knowledge, you might find yourself getting stuck or not knowing how to troubleshoot the process.
My recommendation is thus to have some basic knowledge in a coding language (R or python) and machine learning algorithms. To understand where to learn these, feel free to refer to my guide —
- Part 1 — Data Processing with SQL, Python and R
- Part 2 — Mathematics, Probability and Statistics
- Part 3 — Computer Science Fundamentals (coming soon)
- Part 4 — Machine Learning
- Part 5 — Your Personal Data Science Capstone Project (you’re here!)
2. Warming Up: Your First Mock Data Science Project
When you learn a programming language, the first step is to print a hello world statement. Similarly, the ‘hello world’ of data science is to start a data science project on a simple data set from start to finish.
This gives you to get a glimpse of the full data science project cycle — which starts from getting an understanding of the business, to preparing the data, modelling the data, evaluating the model and then deploying the model.
Here are some example of warm-up data science projects:
Titanic Survivorship Project
My favorite beginner-friendly project is the Titanic Survivorship Project. This data set uses machine learning model that predicts which passenger survived the Titanic shipwreck.
There are at least two good reasons why this is a great first project. Firstly, the data set is extremely clean and does not require extensive cleaning. Secondly, there are many excellent exemplary projects on this data set that are instructive to the beginner.
For instance, here are a few notebooks that I have referred to extensively when I was implementing the project.
Predicting Housing Prices
Another great data set that can act as a warm-up data science project is the prediction of housing prices. For this, the King’s Housing data set is a relatively clean data set that provides a reasonable number of features. There are also some excellent notebooks that you can emulate:
It is an oft-repeated adage that one should not undertake a project that has been done many times. When a data science practitioner presents a project that was heavily reused, the reader might wonder how much of the project is original. To avoid the possibility of plagiarism — inadvertent or otherwise — an original project is strongly recommended.
Now, you might ask — how do I find a project that is new and interesting? I will go through that extensively in this post.
3. Brainstorming: Find the Perfect Project Idea with 3 Questions
The project brainstorming phase can be broken down into three essential questions, as illustrated in this diagram.
Once you have the answer to the three, you can identify what lies in the middle of three
- What topic is the project on: What topics are you interested in, or are very knowledgeable in?
- Who am I showing it to: Who is the intended audience of this project?
- What skill sets do I want to demonstrate: How do you want to visualize your data or train your machine learning model?
Let me explain these in a little more details.
What topic is the project on?
The topic of the project should be one in which you can provide a unique insight. Something that you are extremely passionate about, something that is quirky and interesting, or something that is related to the company that you are applying for.
Did you come from a marketing background? You may do well in a project that involves customer segmentation.
Are you a sports junkie? Maybe sports analytics is your turf. Here’s a site that provides a list of sports-related data sets.
Do you have a background in finance? Perhaps you can look at look at datasets related to banking and finance. For instance, credit card fraud detection might be a topic that you are very knowledgable in.
Above are only a few non-exhaustive examples of what topics you could pick from. You should ask yourself — what are the topics that interest me?
Today, data science learners can access excellent data sets with a click. Here are some data sources that are popular among data scientists —
- Kaggle Dataset
- Google Data Set Search Engine
- Data.world, a site for user-contributed datasets.
- OpenDaL, a data aggregator that allows searching by metadata
- Data.gov for data provided by the U.S. government.
Who am I showing the project to?
Are you interested in a digital marketing company? Or a start-up? A data consultancy firm? Or a big tech firm?
As much as possible, you should customize your project idea to the industry or the company that you are interested in.
Why’s that? If you are able to showcase a project that is highly related to the company that you are interested in, you are demonstrating not just an awareness of the problems faced by the company, but also the competency in solving those problems.
Put yourself in the shoes of the hiring manager and ask yourself — what does a relevant project look like? What projects will I be impressed with?
For instance, if you are interested in a role at an e-commerce company like Amazon, these projects (highlighted in this blog post) are likely to be very relevant.
- Recommendation system for customer
- Customer lifetime value modeling
- Customer retention / churn modeling
- Fraud detection.
What skill sets that I want to demonstrate?
The skill set of a data scientist is broad and somewhat ambiguous. Different companies might have differing requirements. As such, you might want to research for job descriptions of your ideal role to find out what skills you want to demonstrate.
Regardless, several core technical competencies are common in the job description of a data scientist.
- data cleaning (using R tidyverse or python pandas)
- data visualization (using R ggplot2, python matplotlib or seaborn)
- dashboarding (using R shiny, Tableau, Google data studio)
- probability and statistics
- machine learning algorithms (regression, classification, clustering, dimensionality reduction, reinforcement learning…)
- deep learning algorithms (natural language processing, computer vision…)
It might not be possible to demonstrate all the skill sets in one particular project. Therefore, you might want to strategically select only some skill sets to demonstrate.
In deciding on a project, you should ask yourself
‘Does this project allow me to showcase my capability in these skill sets?’
Some Project Inspirations
Here are some resources to consider for ideas and inspirations.
- Data Science Project Ideas by Analytics India Mag might provide some ideas of possible projects.
- Ken Jee’s YouTube channel provides examples of very insightful data science projects.
- fivethirtyeight has some of the sleekest and insightful visualizations I have seen on current affairs.
Closing
My data science projects helped me land my role in the data realm, and I would strongly recommend you to start brainstorming yours today if you are serious about breaking into the field.
Don’t have an idea? Have an idea and want to see if it’s feasible? Feel free to brainstorm with me by contacting me on LinkedIn. All the best to your learning journey!
Other Readings
If you enjoyed this blog post, feel free to read my other articles on Machine Learning: