Cyclistic Case study

This case study is my analysis of the Cyclistic capstone project from the Google Data Analytics Professional Course using the member data in 2021. This capstone project includes:

  • data cleaning/processing/analysis in Rstudio and Python 3.9,
  • visualizations of the results in Tableau.

The target of this case study is to find a strategy to help Cyclistic, a bike share company based in Chicago, to maximize the number of annual memberships and increase their profit, using the historical data from the riders in 2021. The data contains the following information of each ride:

  • rideable_type : the type of bike used in this ride. There are three type of bikes: electric , classical and docked .
  • started_at and ended_at : the start and end time of this ride.
  • member_casual : the type of customer. member represents those have annual memberships, while casual represents customers do not hold memberships.

Note that start_station ( start_station_id ) and end_station ( end_station_id ) are not considered in this case study. But they could contain important reasons for customers choosing to be members, with mostly geographical reasons (e.g. customers live near stations far away from other transportations may choose to join memberships). For possible furture need of these information, we now only clean this data by double check the one-to-one pair of the station names and station id.

Data processing/cleaning (Rstudio)

  • Check with missing values (No missing values and duplicates).
  • Check with unreasonable categories in member_casual and rideable_type (No extra categories found).
  • Compute time length for each rides ( hms package).
  • Compute the month and weekday of the start date in each rides ( lubridate package).

Visualizations and results

  • Only casual customers use docked bike. The number of using electrical bike are similar, while there are more annual members use classical bike.
  • From weekly distribution of rides, it is obvious that the numbers of usage in the bike in a week does not change very much among annual members. However, casual members tend to use bikes more likely during weekends.
  • Also, surprisingly, casual members spend more time averagely, especially during weekends.

Dashboard 1

Moreover, to discover the trend of using bikes in 2021 in a monthly view, I aslo plot the number of rides in each month.

  • The number of rides surges between May and October (half year).
  • During summer time, percentage in the number of rides are similar between causal and annual members. About half number of total rides come from causal members.
  • However, the length of using bikes for casual customers are more than 2 times as annual members during this period.

Dashboard 1

Strategy of increasing memberships

The behaviour of the customers can be summarized as following:

  • Only causal members use docked bike.
  • Causal members seems to use more during weekend, while annual members has daily requirement of using bikes.
  • It is the peak period of usage during summer time from May to October, when the usage of bikes greatly increases. But only half of the customers have membership and casual customers tend to use bikes longer than annual members.

We could design such a membership strategy to help causal members to choose the best one that fits their needs:

  • Offer a special need membership for docked bike.
  • Offer family membership/weekend-time membership.
  • Offer a half-year membership. (This is targeted for summer time from May to October).
  • This could be also combined with a discount in the other half of the year.

Manish Gupta

Logo

Learner at Simplilearn-Purdue University Data Scientist | ML | DL | AI | NLP Email: [email protected] View My LinkedIn Profile View My Tableau Profile View My Kaggle Profile

GitHub Profile

CUSTOMER SEGMENTATION

Introduction: Customer segmentation is the practice of segregating the customer base into groups of individuals based on some common characteristics such as age, gender, interests, and spending habits. It’s a way for organizations to understand their customers. Knowing the differences between customer groups, it’s easier to make strategic decisions regarding business growth and marketing campaigns. Implementing customer segmentation leads to plenty of new business opportunities and business can do a lot of optimization in budgeting, product design, promotion, marketing, customer satisfaction etc. The opportunities to segment are endless and depend mainly on how much customer data you have at your use. Machine learning methodologies are a great tool for analyzing customer data and finding insights and patterns. Artificially intelligent models are powerful tools for decision-makers. They can precisely identify customer segments, which is much harder to do manually or with conventional analytical methods. There are many machine learning algorithms, each suitable for a specific type of problem. One very common machine learning algorithm that’s suitable for customer segmentation problems is the k-means clustering algorithm which I have used for this project . There are other clustering algorithms as well such as DBSCAN, Agglomerative Clustering, and BIRCH, etc.

tableau capstone project github

Objective: This is my first capstone project and was part of the final assessment for PGP in Data Science course from Simplilearn-Purdue University. My job was to analyze transactional data for an online UK-based retail company and create customer segmentation so that company can create effective marketing campaign. This is a transnational data set which contains all the transactions that occurred between 01/12/2010 and 09/12/2011. The company mainly sells unique and all-occasion gifts.

I performed following tasks in this project:-

  • Data Cleaning
  • Data Transformation
  • Data Modeling - RFM (Recency Frequency Monetary) model
  • Data Modeling - K-means clustering algorithm
  • Data Reporting - Dashboarding in tableau

tableau capstone project github

K-means clustering, an unsupervised algorithms, is one of the techniques that are useful for customer segmentation. The basic concept underlying k-means is to group data into clusters that are more similar.

tableau capstone project github

Problem Statement: It is a critical requirement for business to understand the value derived from a customer. RFM (Recency, Frequency, Monetary) is a method used for analyzing customer value. Perform customer segmentation using RFM analysis.

Data Cleaning:

  • Perform a preliminary data inspection and data cleaning. a. Check for missing data and formulate an apt strategy to treat them. b. Remove duplicate data records. c. Perform descriptive analytics on the given data.

Data Transformation:

  • Perform cohort analysis (a cohort is a group of subjects that share a defining characteristic). Observe how a cohort behaves across time and compare it to other cohorts. a. Create month cohorts and analyze active customers for each cohort. b. Analyze the retention rate of customers.

Data Modeling-I:

  • Build a RFM (Recency Frequency Monetary) model.
  • Calculate RFM metrics.
  • Build RFM Segments. Give recency, frequency, and monetary scores individually by dividing them into quartiles. b1. Combine three ratings to get a RFM segment (as strings). b2. Get the RFM score by adding up the three ratings. b3. Analyze the RFM segments by summarizing them and comment on the findings.

Data Modeling-II:

  • Create clusters using k-means clustering algorithm. a. Prepare the data for the algorithm. If the data is asymmetrically distributed, manage the skewness with appropriate transformation. Standardize the data. b. Decide the optimum number of clusters to be formed. c. Analyze these clusters and comment on the results.

Data Reporting:

  • Create a dashboard in tableau by choosing appropriate chart types and metrics useful for the business. The dashboard must entail the following: a. Country-wise analysis to demonstrate average spend. Use a bar chart to show the monthly figures b. Bar graph of top 15 products which are mostly ordered by the users to show the number of products sold c. Bar graph to show the count of orders vs. hours throughout the day d. Plot the distribution of RFM values using histogram and frequency charts e. Plot error (cost) vs. number of clusters selected f. Visualize to compare the RFM values of the clusters using heatmap

Tools used: This project was done in Python language and popular libraries like Pandas, Numpy, Matplotlib, Seaborn, K-means clustering and Scikit-learn were used in this project for Data Preprocessing and Data Transformation. Finally dashboard was created in Tableau for visualizations.

tableau capstone project github

Copyright (c) Manish Gupta

Associate Data Analyst Programme Portfolio by Chris Simon

Logo

- My Code Academy experience - Helpful Resources - Get in touch with me

View the Project on GitHub chrisjsimon/data_analyst_prog_capstones

Capstone Projects 2020 - 2021

Capstone 1: co2 emissions report in excel dashboards.

tableau capstone project github

Our cohort's first capstone was an exciting and interesting project since prior to this I was unaware that Excel could do so much more than the usual spreadsheet functionalities that I was used to doing.

In the module, I learned that Excel is capable of Data Preparation, Data Transformation and Dashboards to visualise stories for presentations. I have heard a lot about Big Data, and until this programme my knowledge of the term was minimal. I think that Excel Dashboard was a good introduction to Big Data and how to handle them, which is also an important precursor for our subsequent modules.

For my Capstone, I had initially chosen several topics and datasets that I sourced out from popular websites such as Kaggle and Data.world before eventually deciding to take on the CO2 Emissions by Developed and Developing countries datasets. I used the datasets available at the World Bank's and BP's websites and combined the columns I would need for my report.

Technologies and Applications used: MS Excel 2010, Power Query Data source: World Bank , BP

Capstone 2: Relational Database and MS Excel Dashboard (MS SQL Server)

tableau capstone project github

[Capstone Preview]

For our second Capstone project, we were tasked to use MS SQL Server for the relational database side of the project, and MS Excel for the Data Preparation and Data wrangling portion of the capstone.

Technologies and Applications used: MS Excel 2010, Power Query, MS SQL Server, MS SQL Server Management Studio 18, MS Powerpoint Data source: Data.world

Capstone 3: Singapore Private Housing Transaction Report (PowerBI)

Technologies and Applications used: PowerBI Desktop Data source: Data.gov.sg

Capstone 4: Bank Term Deposit Predictive Model (Python, Jupyter)

tableau capstone project github

Technologies and Applications used: Jupyter Notebook, Anaconda, Numpy Python 3.8, MS Powerpoint Data source: Data.gov.sg

  • Oct 18, 2021
  • 10 min read

Google Data Analytics Capstone Project

Updated: Jul 5, 2023

I worked on the Google Data Analytics Capstone Project, Track 1, Case Study 1. I will be diving into the background, my full process of cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data.

Quick Links :

Tableau Dashboard | Github R Code for Analysis | Github R Code for Tableau Visualization | LinkedIn Post

Below is a table of contents in case you want to go to a specific section.

Table of Contents:

Microsoft excel.

Finished Project

Summary of Data

Business Suggestions

What I Learned

Cyclistic is a bike sharing program which features more than 5,800 bikes and 600 docking stations. It offers reclining bikes, hand tricycles, and cargo bikes, making it more inclusive to people with disabilities and riders who can't use a standard two-wheeled bike. It was founded in 2016 and has grown tremendously into a fleet of bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Previously, Cyclistic's marketing strategy tried to build the general awareness and appeal to broad consumers. It has flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members .

My Role : In this scenario I am a junior data analyst at Cyclistic and my team has been tasked with the overall goal (see below) of designing marketing strategies

Overall Goal : Design marketing strategies aimed at converting casual riders into annual members.

Business Question : "How do annual members and casual riders use Cyclistic bikes differently?"

Below I will describe step-by-step the process I used to for this project. If you want to skip ahead to the business suggestions move onto the section "Insights".

Overview : I first analyzed the data separately (each month) in Excel, then used R to analyze the data as a whole (one year). Finally I created a dashboard in Tableau and used Figma to support the design elements.

I initially wanted to gather and analyze my data in Excel because it was the tool I was most familiar with and I could get a general understanding of the data quicker. I did not combine all of the spreadsheets into one because that would've taken more processing power than my computer had.

I began downloading the data from divvy-tripdata , and turning the .csv files into excel spreadsheets. I downloaded the most recent year of data which was at the time of starting my project:

August 2020

September 2020

October 2020

November 2020

December 2020

January 2021

February 2021

Added two columns to all of the months:

ride_length calculated the total ride length for each trip using the start_at column which was: ending time minus starting time.

day_of_week calculated the day of the week for each trip using the start_at column date.

Went over the business task and the information I had at hand and how that could be used to figure out how members and casual riders use the bike service differently

Came up with metrics to look at such as :

total number of rides per hour, per day of the month, per season, per day of the week, and for different bike types

Average ride length between members and casual

For every month in Excel created pivot tables and charts to go with the analysis on (this took the longest):

Total Rides per Weekday - calculated the total rides for members and casual and separated it by day of the week; used a cluster column chart

Average Ride Length - calculated the average ride length for members and casual and separated it by day of the week; used a cluster column chart

Total Rides per Hour - calculated the total rides for members and casual separated by the time of the day (24hr); used a line comparison chart

Total Rides per Day - calculated the total rides for members and casual separated by the day of the month; used a line comparison chart

Total Rides per Bike Type - calculated the total rides for members and casual separated by Bike type; used stacked column chart

I also created a Google docs Notes list where I wrote down the exact steps for each month (had a checklist) and included my insights for each month

Time Spent:

535 minutes or just under 9 hours to complete.

I originally wanted to use SQL but the files were too big to upload and I couldn't figure out how to utilize Google Cloud Platform. Instead I used R to analyze the data because it could handle all of the information quicker than Excel, and I wanted to work on my R skills. Below is my general process in R, I didn't include my mistakes/missteps or errors for the sake of brevity.

View my full code on my Github for this capstone project here .

Load all of the libraries I used: tidyverse, lubridate, hms, data.table

Uploaded all of the original data from the data source divytrip into R using read_csv function to upload all individual csv files and save them in separate data frames. For august 2020 data I saved it into aug08_df, september 2020 to sep09_df and so on.

Merged the 12 months of data together using rbind to create a one year view

Created a new data frame called cyclistic_date that would contain all of my new columns

Created new columns for:

Ride Length - did this by subtracting end_at time from start_at time

Day of the Week

Time - convert the time to HH:MM:SS format

Season - Spring, Summer, Winter or Fall

Time of Day - Night, Morning, Afternoon or Evening

Cleaned the data by:

Removing duplicate rows

Remove rows with NA values (blank rows)

Remove where ride_length is 0 or negative (ride_length should be a positive number)

Remove unnecessary columns: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng

Calculated Total Rides for:

Total number of rides which was just the row count = 4,152,139

Member type - casual riders vs. annual members

Type of Bike - classic vs docked vs electric; separated by member type and total rides for each bike type

Hour - separated by member type and total rides for each hour in a day

Time of Day - separated by member type and total rides for each time of day (morning, afternoon, evening, night)

Day of the Week - separated by member type and total rides for each day of the week

Day of the Month - separated by member type and total rides for each day of the month

Month - separated by member type and total rides for each month

Season - separated by member type and total rides for each season (spring, summer, fall, winter)

Calculated Average Ride Length for:

Total average ride length

Type of Bike - separated by member type and average ride length for each bike type

Hour - separated by member type and average ride length for each hour in a day

Time of Day - separated by member type and average ride length for each time of day (morning, afternoon, evening, night)

Day of the Week - separated by member type and average ride length for each day of the week

Day of the Month - separated by member type and average ride length for each day of the month

Month - separated by member type and average ride length for each month

Season - separated by member type and average ride lengths for each season (spring, summer, fall, winter)

Then using all of this data I created my own summary in my case notes and took note of the: total rides for each variable, average ride lengths for each variable, and the difference between members versus casual riders. I originally wanted to create a report using R Markdown as well but for the sake of time (I had already spent over 20 hours on the project so far), I decided to skip this step, and write this article instead.

1045 minutes or about 17 and a half hours to complete.

While I learned the basics of Tableau in the Google Course I wanted more practice with visualizing data and creating dashboards.

To view my completed dashboard click here .

I created a separate R code (you can view it here on Github) that made some changes for specifically the Tableau portion.

For ride length I rounded the digits by 1, meaning my numbers were 29.8 or 12.5.

Revised how I created my "month" column. I used mutate() to create a column that had the month in ___ format and not number format. So instead of 01 it would say "January"

Cleaned the data: removed rows with NA values, removed duplicate rows, removed where ride_length was 0 or negative and removed unnecessary columns like: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng

Created a new dataframe with this information so I could test the difference between the original data frame (cyclistic_date) that I used for my analysis and the data frame I would use for Tableau (cyclistic_tableau).

In this new data frame I removed more columns to make calculations quicker in Tableau. I removed: start_station_name, end_station_name, time, started_at, ended_at

Downloaded this data frame into a .csv file which I uploaded to Tableau

Created graphs similar to those I created in Excel but added a few:

Total Rides by Bike Type

Ride Length by Weekday

Total Rides by Weekday

Total rides by hour, total rides by month.

Then I created a basic dashboard with all of that information, a prototype for me to view while I was creating the final dashboard ( Figure 1 below).

Created a prototype mockup in Figma

Created a final version of the mockup in Figma

Edited Dashboard in Tableau to reflect design in Figma

Edited graphs in Tableau

Made bar graphs round

Added annotations

Highlights to specific important notes

Got rid of labels for visual purposes

Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype ( Figure 2 below)

Made minor edits to design elements and created final dashboard ( Figure 3 - Cyclistic Dashboard V1 )

On April 24, 2023 I decided to update my dashboard (See Finished Project , image Final Dashboard - Cyclistic Dashboard V2 ). All of the analysis is the same. The only changes have been to the dashboard. Which include:

Adding horizontal grid lines to a few of the charts

Updating the tool tips.

Making all of the top metric values (e.g. Total Rides, Average Ride Length, etc.) interactive in Tableau instead of in Figma.

765 minutes or almost 13 hours to complete.

Tableau Prototype

Below was my first draft of the dashboard only using Tableau.

Prototype of my dashboard for my google capstone project

Prototype using Figma Background

Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype.

Dashboard Prototype with Figma background

Final Dashboard V1

Made minor edits to design elements and created final dashboard. This was the original final dashboard.

tableau capstone project github

I am including the other tools I used.

Figma to create my background and help develop the dashboard aesthetics.

Google Docs helped me keep track of all of my documents for this project like:

Date Log - I wrote down what I did that day related to my project

Resources - A list of resources I frequently used

Case Notes - Notes for the case study including the final insights, what I was looking for, and anything else having to do with the case

Evernote to draft this article before I uploaded it here.

FINISHED PROJECT

Here is my finished project: Google Capstone Project (V2) . You can view the links to my R code on Github used for analysis here and the code for Tableau here .

Note: This is V2 with a few minor changes to the dashboard. Including:

Final dashboard for capstone project

SUMMARY OF DATA

Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members .

Total Rides by User Type

Average Ride Length per User Type

Average Ride per Weekday

Members had more rides with 2,328,763 total rides or 56% and casual riders had 1,823,376 total rides or 43%.

Total Rides by Rider Type Pie chart

Total Rides per Bike Type

Both casual riders and members used the classic bike the most with 1,777,593 rides or 43% of total rides, followed by docked bikes with 1,545,936 rides or 37% of total rides, and lastly with electric bikes at 828,610 rides or 20% of total rides.

Total Rides per Bike Type - bar chart

Average Ride Length by User Type

The total average ride length was 24 minutes. For casual riders it was longer at 27 minutes while members was 14 minutes.

Average ride length by rider type

Average Ride Length per Weekday

For the average ride length per weekday both casual riders and members had an increase in the average ride length on the weekends. For both Sunday was the longest at 31 minutes.

average ride length per weekday - bar chart

Saturday was the most popular weekday combining casual riders and member rides with 784,239 rides or 19% of total rides. But for member rides only Wednesday was the most popular day with 356,060 rides, 5,407 rides more than Saturday.

Total rides by weekday - bar chart

5PM or 17:00 was the busiest hour for both members and casual riders with 426,685 rides or 10% of the total rides. Typically rides began increasing in the morning at 6AM and rose until 5PM then dropped afterwards. The afternoon was the busiest for both rider types with 1,905,797 rides or 45% of total rides. 4AM was the least popular hour.

Total rides by hour

July was the busiest month combining casual riders and member rides at 691,476 rides or 16% of total rides. While summer was the most popular season for both at 1,903,446 rides or 46% of total rides. Looking at just members August is actually the busiest month with 323,140 rides, 816 rides more than July. Winter is the least popular season and February is the least popular month.

Total bike rides per month - bar chart

Final Summary

The most popular bike among with riders was the classic.

Busiest time was afternoon and the peak time was at 5PM for both casual riders and members.

Busiest weekday was Saturday, casual riders used the service the most on the weekends.

Busiest season was Summer for both types of riders.

Most rides by User Type was members but casual riders weren't far behind.

The average ride length was 24 minutes but casual riders on average rode 23 minutes longer than members.

BUSINESS SUGGESTIONS

This was the hardest part for me for the whole project. I have never provided suggestions for a business nor worked in marketing. Any feedback here would be appreciated.

These are my suggestions for the marketing team to convert casual riders to annual members:

Personalize discounts and show perks in the membership program based on their preferences and riding habits.

Emphasize the benefits of memberships, including discounts during busy times of the year like during Summer, or on the weekends.

Have existing members to share their stories about how using Cyclistic's system has changed their life, to create a sense of community, offer a discount if they do so this will help encourage new riders to join the program.

WHAT I LEARNED

Below is what I learned/practiced from over 40 hours spent on this project:

Pivot Tables in Microsoft Excel

Practice using R for data analysis and cleaning specifically using the tidyverse package for data analysis

Graphs in Tableau, edited visual elements along with creating different charts and filters.

Design elements of an effective dashboard

Combining the design feature of Figma with the functionality of Tableau

R portion of my project I found Itamar's case study on Kaggle using R as well, a helpful resource.

Tableau portion I used Navneet Singh's Tableau Dashboard as inspiration.

  • Data Analytics
  • Portfolio Projects

Recent Posts

How I Saved 10 Minutes with a Python Script

How to Run SQL in Python: Connecting to PostgreSQL and BigQuery

The Power of Collaborative Projects in Data Analytics

Wow Kelly. This really is impressive. Glad to see you are really into this process. You have definitely found your niche!!

15 Tableau Projects for Beginners to Practice with Source Code

Sample Tableau Projects for Practice with Examples to help you master using the fundamental features and tools in Tableau for any data science project.

15 Tableau Projects for Beginners to Practice with Source Code

With the explosion of data in all industries, the need for user-friendly business intelligence tools to read them has also increased. Around 63,298 companies across the globe are reported to have been using Tableau, according to Enlyft.com . Its easy-to-use features and great compatibility with other software applications has helped  leverage the power of visualization in the IT world.

In the world of analytics, it is extremely important to pick up the skill of storytelling. Storytelling involves showing patterns in a given dataset and then trying to infer actionable insights from the same. With Tableau, the exercise of storytelling has become quite efficient and interesting at the same time. With so much data in hand, business intelligence tools like Tableau aid in giving a direction to the analysis so that one can present the correct inferences from the data.

big_data_project

Big Data Hadoop Project-Visualize Daily Wikipedia Trends

Downloadable solution code | Explanatory videos | Tech Support

Tableau is a data visualization tool that can be used across different data-related profiles. These profiles include Data Engineers, Data Scientists, Data Analysts , and Business Analysts , to name a few. The stage at which Tableau can be used may vary from role to role and project to project. Most importantly, the tool is used in the exploration phase, where it showcases vast datasets in different visualizations. On the other hand, Tableau is also used to showcase inferences to end stakeholders and higher management.

Table of Contents

Beginner level tableau sample projects ideas, intermediate level tableau project ideas, advance level tableau project ideas, 15 sample tableau real time projects for practice in 2021.

tableau projects ideas

With the need to learn Tableau as part of analytical skillset , it becomes essential to understand where to start and how to start simultaneously. This article is a one-stop solution for all data enthusiasts to understand Tableau and start working on some interesting datasets for tableau projects. The example Tableau projects for practice have been categorized into Beginner, Intermediate, and Advanced Level Tableau project ideas. 

ProjectPro Free Projects on Big Data and Data Science

1) Patient Risk Healthcare Dashboard

This beginner-level Tableau project idea is from the healthcare domain. This can be a part of a Data Analysis or Data Science project based on prediction-related analysis. 

The problem statement here is to analyze a massive dataset of patients in a particular hospital and, based on their information, predict and infer the risk of their health. You should then integrate all this analysis into Tableau for easy consumption of the end-users.  The dataset here has 17 variables for which a thorough analysis has to be done to get details about the patients at risk.

A sample of the dataset is given below for reference:-

sample tableau projects

Source :  powerbi

This project is highly critical, and hence accurate analysis has to be done for the same. Since the dataset is huge for this project, Tableau can be used in two ways. One is where the as-is information is taken, and the same is represented through visualizations and interactive capabilities. Another is where it is used to do exploratory data analysis , and then a model is built on the same to predict the risk for patients with similar health conditions.

One can use variables like age and gender to analyze categorical variables so that the appropriate gender and age category can take more precautions.

Explore Data Engineer Projects to Learn the Plumbing of Data Science

2) Sales Forecast Analysis Dashboard

This sample Tableau project is a part of the Data Analysis and Data Science project since it involves forecasting. The problem statement here is to analyze the data of sales of a company. The main agenda of the project is to infer the past sales numbers of a company and then forecast their sales for the coming quarters and years. The dataset here has three variables for doing the analysis.

datasets for tableau projects

Source : Sap

With just three variables in the dataset, it becomes easier to analyze each variable and determine which store and type have given what kind of sales. You can use Tableau for creating simple bar charts on sales forecasts. Once that is done, modeling can be done to forecast the trends of sales in the future.

New Projects

3) Marketing Campaign Dashboard

An interesting Tableau project idea in the marketing domain that requires a lot of number crunching. The problem statement here is to analyze a dataset of marketing campaigns and visualize the performance of various marketing campaigns. The dataset here has six variables of multiple data types. A sample of the dataset is given below for reference:-

tableau projects examples

Source : Sisense

Marketing Analytics is one of the hottest topics across industries. With Tableau, one can easily analyze large datasets of various contacts and infer which segment type has been performing well. With the count and percentage numbers, a marketing analyst can show the aggregate level data on Tableau through area charts and simple pie charts as well.

Upskill yourself for your dream job with industry-level big data projects with source code

4) Product Availability Dashboard

This project is from the Product domain of the technology industry. The problem statement here is to analyze a dataset of product-related information. The project’s primary goal is to analyze the trends and showcase the availability of any product at any given point in time. The dataset here has 11 variables of different data types. A sample of the dataset is given below for reference:-

tableau projects for practices

Source : Datapine

This project is an ideal case in an e-commerce industry where it becomes eessential to know if a particular product is in stock at any given moment or not. With all the variables given at the top, easily you can prepare a dashboard to analyze which product is searched the most and which product has the most stock. With URLs being in the dataset, drills capability can be built for a user to drill to the actual product to have a quick look at it.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

5) Flight Price Analysis Dashboard

This project is from the Airline Industry. The problem statement here is to analyze a dataset of flight-related information. The main objective is to consider different factors of a flight and infer accurate trends for flight prices. The dataset here has ten variables that involve date and time data types as well.  A sample of the dataset is given below for reference:-

tableau capstone project github

This project allows an analyst to do a lot of exploratory data analysis to understand the pattern of flights with a higher price. With variables like route, one can analyze which routes cost more and also plot trend lines to see if a higher duration flight costs more. Once the pattern is understood,you can also implement models like Random Forest to do predictive analysis.

Get FREE Access to Machine Learning Example Codes for Data Cleaning , Data Munging, and Data Visualization

6) Crime Analysis Dashboard

The problem statement here is to analyze a dataset of various crimes happening at a place. It should also showcase its intensity and provide some excellent action items to prevent them. The dataset here has eight variables . A sample of the dataset is given below for reference:-

tableau sample projects

Source : Powerbi

  The above dataset is also an excellent way to start your Tableau journey. With the above variables, Tableau can be used to see which area has the highest number of crime cases. This can be achieved by plotting a treemap on Tableau. Also, year-wise trends can be seen to infer if the cases have come down or have increased. With categorical analysis, numerical data types can also be used to see the intensity of the thefts across years and areas.

Build Professional SQL Projects for Data Analysis with ProjectPro

7)  Air Quality and Pollution Analysis Dashboard

  This project is from the environment protection industry. 

The problem statement here is to analyze a dataset related to different air quality factors and pollution in a particular area. The primary aim is to get an understanding of its causes and take preventive measures. The dataset here has 12 variables for extensive analysis.

Air Quality and Pollution Analysis Dashboard

Source Link

Pollution and Air Quality is one of the most important challenges that leaders worldwide are trying to solve. With the above dataset and some knowledge about the similar industry, one can use the numbers of different gases to draw inferences about the composition of air in different cities and across different timelines. One can present the data visually and help the end-user infer from it based on their experience in this industry.

Explore Categories

8) Sales Pipeline Dashboard

This project is from the Sales domain of the technology industry. The problem statement here is to analyze a dataset of to understand how the entire sales funnel is moving. The objective here is to help Sales stakeholders understand how they are performing across different revenue segments. The dataset here has 13 variables which also involves currency-related inputs

For the Sales team, understanding their pipeline is of utmost importance. A business intelligence tool like Tableau can make visualizations using a funnel graph to see the entire sales pipeline. With such a view into the data, Sales stakeholders can make key decisions on their next move.

Sales Pipeline Dashboard

Here's what valued users are saying about ProjectPro

user profile

Director Data Analytics at EY / EY Tech

user profile

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

Not sure what you are looking for?

9) Stock Exchange Analysis Dashboard

The problem statement here is to analyze a dataset of different stocks and derive meaningful information. This will help other individuals take correct decisions on investment. The dataset here has eight variables of integer data type. A sample of the dataset is given below for reference:-

Stock Exchange Analysis Dashboard

This is one of the projects where the data volume is huge and the data change is very frequent. The Stock Market produces large amounts of data with more room for analysis. One can create various combinations of the area and trend charts to show how the index has been moving and what highs and lows they have made. Also, volume can be visualized with pie charts or bar charts.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

10)  Global Terrorism Analysis Dashboard

The problem statement here is to analyze a dataset of terrorism across the globe and then help different world leaders take corrective actions against any future attacks. The dataset here has 22 variables of different data types. A sample of the dataset variables is given below for reference:-

i_yearmonth

country_txt

specificity

alternative

alternative_txt

Global Terrorism Analysis Dashboard

This Tableau dashboard above can be used extensively to infer which parts of the world have the most terrorist attacks. Also, with the event dates yearly, you can do analysis using line graphs to see the trend of these attacks. Prediction can be made to prevent any future attacks based on the analysis using Tableau. 

Explore More  Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

11) Covid-19 Analysis Dashboard

The problem statement here is to analyze a dataset of covid cases worldwide and give real-time numbers for various regions. The dataset here has nine variables of different data types.

Covid-19 Analysis Dashboard

The Covid-19 pandemic has taken the world for a toss. With so many infections worldwide, it became extremely difficult for the government to measure the intensity of the pandemic. But with analytics and Tableau, various analysts were able to get government verified data and use them to show the real picture of the world. The above dataset is an ideal example of how you can use Tableau to visualize such volumes of data based on categorical or numerical data.

 12) Credit Card Fraud Detection Dashboard 

The problem statement here is to analyze different credit card transactions and understand the pattern of them to detect anomalies and identify fraudulent transactions. A sample of the dataset is given below for reference:-

grid_3x3V1sort

grid_3x3V2sort

grid_3x3V3sort

grid_3x3V4sort

grid_3x3V5sort

grid_3x3V6sort

grid_3x3V7sort

grid_3x3V8sort

grid_3x3V9sort

-1.3598071336738

-0.0727811733098497

2.53634673796914

1.37815522427443

-0.338320769942518

0.462387777762292

0.239598554061257

0.0986979012610507

0.363786969611213

1.19185711131486

0.26615071205963

0.16648011335321

0.448154078460911

0.0600176492822243

-0.0823608088155687

-0.0788029833323113

0.0851016549148104

-0.255425128109186

Credit Card Fraud Detection Dashboard 

With the increase in credit cards, credit card fraud has also increased. With the above dataset, one can do some exploratory data analysis of different transactions and try and find some pattern in the data. Tableau will be extremely helpful to visualize such complex transactions using boxplots and other methods to identify outliers.

13) Twitter Sentiment Analysis Dashboard

The problem statement here is to analyse a dataset of tweets on Twitter. The main aim is to analyze the sentiment of these tweets and take action accordingly. The dataset here has 20 variables of different data types. A sample of the dataset is given below for reference:-

Twitter Sentiment Analysis Dashboard

Sentiment Analysis is one of the most common data analytics problems that has been solved for the social media industry. With huge volumes of data as above, an analyst can see the importance and impact of positive and negative comments given across time and various segments of the industry.

Access Data Science and Machine Learning Project Code Examples

14)  Account Management Dashboard

The problem statement here is to analyze a dataset of accounts and present it so that the account managers can easily identify the accounts and manage them easily. The dataset here has five variables for string and integer data types. A sample of the dataset is given below for reference:-

Account Management Dashboard

It is crucial for any product based company to have a clear picture of where each of their account and client stands. In the above dataset with just a few variables, one can see how many Platinum, Gold, Silver, and Bronze customers are there using a bar chart in the decreasing order. One can also see how each account is contributing to their overall sales by using a pie chart.

15) Video Industry Dashboard

The problem statement here is to analyze a video dataset and its pattern based on user experience. With such analysis, the objective is to identify the patterns of different human behavior and then target their audiences better. The dataset here has 17 variables of different data types

Video Industry Dashboard

  This is another example of a recommendation system where based on a user’s browsing history, they are suggested similar videos to watch. The above dataset is quite complex in a CSV format, but with tableau, one can easily create bar, line, and pie charts to see how users see their video content across regions and timelines. Once the pattern is understood through exploration, machine learning can be used for predicting future behavior.

The above Tableau sample project ideas just serve as a start to your career in enhancing and learning Tableau skills. The best way to go about it is to do hands-on, taking different problem statements from different industries and extensively exploring all variables. Once you have worked on various excellent projects using Tableau, you will be all set to use these skills to solve real-world data science problems.

Access Solved Big Data and Data Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

@capstone-group-3-project

capstone group 3 project

Popular repositories, repositories.

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Most used topics.

avatar

🏥👩🏽‍⚕️ Data Science Course Capstone Project - Healthcare domain - Diabetes Detection

This is comprehensive project completed by me as part of the Data Science Post Graduate Programme. This project includes multiple classification algorithms over a dataset collected on health/diagnostic variables to predict of a person has diabetes or not based on the data points. Apart from extensive EDA to understand the distribution and other aspects of the data, pre-processing was done to identify data which was missing or did not make sense within certain columns and imputation techniques were deployed to treat missing values. For classification the balance of classes was also reviewed and treated using SMOTE. Finally models were built using various classification algorithms and compared for accuracy on various metrics.Lastly the project contains a dashboard on the original data using Tableau.

You can view the full project code on this Github link

Note: This is an academic project completed by me as part of my Post Graduate program in Data Science from Purdue University through Simplilearn. This project was towards final course completion.

Bussiness Scenario

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

Build a model to accurately predict whether the patients in the dataset have diabetes or not.

Analysis Steps

Data cleaning and exploratory data analysis -.

histogram

There are integer as well as float data-type of variables in this dataset. Create a count (frequency) plot describing the data types and the count of variables.

count plot of data types

Check the balance of the data (to review imbalanced classes for the classification problem) by plotting the count of outcomes by their value. Review findings and plan future course of actions.

class imbalance

We notice that there is class imbalance . The diabetic class (1) is the minority class and there are 35% samples for this class. However for the non-diabetic class(0) there are 65% of the total samples present. We need to balance the data using any oversampling for minority class or undersampling for majority class. This would help to ensure the model is balanced across both classes.We can apply the SMOTE (synthetic minority oversampling technique) method for balancing the samples by oversampling the minority class (class 1 - diabetic) as we would want to ensure model more accurately predicts when an individual has diabetes in our problem.

Create scatter charts between the pair of variables to understand the relationships. Describe findings.

Pair plots

We review scatter charts for analysing inter-relations between the variables and observe the following

Perform correlation analysis. Visually explore it using a heat map.

correlation matrix plots

Observation : As mentioned in the pairplot analysis the variable Glucose has the highest correlation to outcome.

Model Building

Confusion Matrix

Note: ROC (Receiver Operating Characteristic) Curve tells us about how good the model can distinguish between two things (e.g If a patient has a disease or no). Better models can accurately distinguish between the two. Whereas, a poor model will have difficulties in distinguishing between the two. This is quantified in the AUC Score. Final Analysis Based on the classification report:

Data Reporting

Dashboard Tableau

Tools used:

This project was completed in Python using Jupyter notebooks. Common libraries used for analysis include numpy, pandas, sci-kit learn, matplotlib, seaborn, xgboost

Further Reading

🏡🏷️ california housing price prediction using linear regression in python.

Summary- The project includes analysis on the California Housing Dataset with some Exploratory data analysis . There was encoding of categorical data using the one-hot encoding present in pandas. ...

🔎📊 Principal Component Analysis with XGBoost Regression in Python

Summary- This project is based on data from the Mercedes Benz test bench for vehicles at the testing and quality assurance phase during the production cycle. The dataset consists of high number of...

💬⚙️ NLP Project - Phone Review Analysis and Topic Modeling with Latent Dirichlet Allocation in Python

Summary- This is a Natural Language Processing project which includes analysis of buyer’s reviews/comments of a popular mobile phone by Lenovo from an e-commerce website. Analysis done for the proj...

IMAGES

  1. IPL data Visualization (Tableau capstone project)

    tableau capstone project github

  2. Tableau Project Management Dashboard

    tableau capstone project github

  3. GitHub

    tableau capstone project github

  4. CapStone Project

    tableau capstone project github

  5. GitHub

    tableau capstone project github

  6. Tableau Capstone Project

    tableau capstone project github

VIDEO

  1. Tableau Capstone Project !! Board infinity

  2. [CAPSTONE] DỰ ÁN CUỐI KHÓA Front-End

  3. 2024

  4. 2024

  5. 2024 02 16T13 43 05GMT 0500

  6. Foundations Capstone

COMMENTS

  1. GitHub

    Final Project for the UC Davis Tableau Specialization. The Project is about the World CO2 Emission using Tableau Storytelling features. - GitHub - Ayomikun17/Tableau-Capstone-Project: Final Project for the UC Davis Tableau Specialization. The Project is about the World CO2 Emission using Tableau Storytelling features.

  2. GitHub

    The company data was stored in the Google cloud database. This project was to query and extract the data, perform any cleaning if necessary, manipulate it, analyse it and create a compelling presentation of results in Tableau, which was be delivered to the senior management meeting.

  3. GitHub

    University of California Davis Data Visualization with Tableau Specialization Certificate. There are three files: Proposal Story Design Check List (Week 1) Credit Card Customer Data and Definition (Week 2) Final Capstone Project Proposal and Storytelling (with link to my Tableau Public Data Visualization)

  4. GitHub

    Steps to do the Project. Choose any Topic. Find the Dataset for it. Seek some knowledge about how to make attractive Dashboard in Tableau from the Internet. Analyze the Dataset and then start building your project. After the Project completes make a video explaining your Project and conclusion (max 2 min).

  5. Diabetes_Prediction_Capstone Project

    Diabetes_Prediction_Capstone Project NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases) research creates knowledge about and treatments for the most chronic, costly, and consequential diseases.

  6. Ketul2219/Airbnb_listing_data_Capstone_project

    This project involves exploratory data analysis (EDA) into Tableau public of Airbnb Listing Data to gain insights into booking patterns, customer demographics, and other key metrics. The analysis is performed using Tableau Public. - Ketul2219/Airbnb_listing_data_Capstone_project

  7. Cyclistic Case study

    This case study is my analysis of the Cyclistic capstone project from the Google Data Analytics Professional Course using the member data in 2021. This capstone project includes: visualizations of the results in Tableau. The target of this case study is to find a strategy to help Cyclistic, a bike share company based in Chicago, to maximize the ...

  8. GitHub

    World Covid-19 data dashboarding using Tableau. Contribute to shub8962/Tableau_capstone_project development by creating an account on GitHub.

  9. Capstone Project: "Empowering HR Decision Making with a Data ...

    For this project, I'm creating a data viz using Tableau and the dataset can be found in my Git Hub. GitHub - Popsy96/Data_Viz_Insights_for_HR: As part of my capstone project, I would like to ...

  10. Tableau Capstone Project

    The Tableau Capstone Project allows you to apply what you've learned about making data-driven decisions to real business challenges companies face every day. Effective business intelligence (BI) is crucial in decision-making and strategizing in today's data-driven business environment. This capstone project explores the utilization of Tableau ...

  11. CUSTOMER SEGMENTATION

    View My Tableau Profile View My Kaggle Profile. GitHub Profile. CUSTOMER SEGMENTATION. Introduction: ... Objective: This is my first capstone project and was part of the final assessment for PGP in Data Science course from Simplilearn-Purdue University. My job was to analyze transactional data for an online UK-based retail company and create ...

  12. Free Course: Tableau Capstone Project from SkillUp EdTech

    This capstone project explores the utilization of Tableau, a leading data visualization tool, to enhance BI capabilities within an organization. The project will showcase how Tableau can convert raw data into actionable insights, enabling informed decision-making. This project begins by identifying key business objectives and data sources ...

  13. Google Capstone Project. Updated: Mar 7, 2024

    View my full code on my Github for this capstone project here. 1. Load all of the libraries I used: tidyverse, lubridate, hms, data.table. 2. Uploaded all of the original data from the data source ...

  14. Projects · Tableau-Capstone-Project · GitHub

    GitHub is where people build software. More than 94 million people use GitHub to discover, fork, and contribute to over 330 million projects.

  15. Workbook: Tableau Capstone project

    Tableau Capstone project by Rishab Kapur. Details . 0 625. This story outlines sales for a supermarket across Europe. The objective is to showcase sales and profits across the categories and segments for the supermart. #Dashboard #Insights. Published: Apr 17, 2021 Updated: Dec 5, 2022. English (US) Deutsch; English (UK)

  16. 6 Tableau Projects to Help Develop Your Skills

    Create calculated fields to dive deeper into sales analysis. Create dual-axis graphs, highlight tables and maps. Apply filters and parameters to make visualizations more dynamic. Combine all the visualizations into a story ready to be shared. Here is an example of a page you might create whilst taking this case study.

  17. Capstone Projects 2020

    Capstone 4: Bank Term Deposit Predictive Model (Python, Jupyter) [Capstone Preview] This is a Github project page dedicated to my Assoc Data Analyst Programme journey. In this page, I will be featuring the Capstone projects that we have embarked on using Excel Dashboard, MS SQL Server, PowerBi and Python. <p> - My Code Academy experience ...

  18. Data Visualization with Tableau Project

    There are 6 modules in this course. In this project-based course, you will follow your own interests to create a portfolio worthy single-frame viz or multi-frame data story that will be shared on Tableau Public. You will use all the skills taught in this Specialization to complete this project step-by-step, with guidance from your instructors ...

  19. Google Data Analytics Capstone Project

    I worked on the Google Data Analytics Capstone Project, Track 1, Case Study 1. I will be diving into the background, my full process of cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data. Below is a table of contents in case you want to go to a specific section.

  20. Data Science Capstone

    Data Science Capstone - Healthcare by Jay Shembekar. Details . 24. 2,825. Data Science Capstone - Healthcare ... Try again, or contact your Tableau Server Administrator. ...

  21. gugunmfauzi/Capstone-Project-Modul-1

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.

  22. Workflow runs · jcrobinson004/devops-capstone-project

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.

  23. 15 Tableau Projects for Beginners to Practice with Source Code

    The example Tableau projects for practice have been categorized into Beginner, Intermediate, and Advanced Level Tableau project ideas. Beginner Level Tableau Sample Projects Ideas 1) Patient Risk Healthcare Dashboard. This beginner-level Tableau project idea is from the healthcare domain. This can be a part of a Data Analysis or Data Science ...

  24. How I created my first Data Analytics Capstone Project

    I completed this Data Analytics Capstone Project as a part of Google Data Analytics Professional Course on Coursera. Check even this blog for more about Business Intelligence v/s Business Analytics…

  25. capstone group 3 project · GitHub

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.

  26. ‍⚕️ Data Science Course Capstone Project

    Summary- This is comprehensive project completed by me as part of the Data Science Post Graduate Programme. This project includes multiple classification algorithms over a dataset collected on health/diagnostic variables to predict of a person has diabetes or not based on the data points. Apart from extensive EDA to understand the distribution and other aspects of the data, pre-processing was ...