Udacity-ML-Capstone-Kaggle-Allstate

Machine learning engineer nanodegree, capstone project.

Bryan Luke Lathrop March 6, 2017

This project encapsulates my final project for the Udacity Machine Learning Nano-degree, and is based on the Kaggle competition, Allstate Claims Severity

The project is primarily an exercise in various machine learning techniques, with a goal of demonstrating the improvement that can be acheived by their use

It is suggested to start with a brief look at either JustStacking.ipynb or JustLinear.ipynb to get a feel for the code, and then progressing to the project writeup for a full overview of the project.

I should note that input data is not included, but may be downloaded at the competition link. Intermediate cache files are available to prevent the need to re-run several calculations that may take as much as several days. The data zip includes the original data files as well as my generated data. Download each of: data / cache / output , and unzip to the top level directory of the project.

  • Project Proposal
  • Final Report

Project completion date: 3/14/2017

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

Udacity Machine Learning Engineer Capstone Project. The project is my participation to the Kaggle competition - Jigsaw unintended bias in toxicity classification. Field: Natural Language Processing

gromag/MachineLearning-Engineer-Specialisation-Capstone-Project-Udacity

Folders and files, repository files navigation, unintended bias in toxicity classification, udacity machinelearning engineer capstone project.

Project from the Kaggle competition: Jigsaw unintended bias in toxicity classification

Project Overview

Natural Language Processing is a complex field which is hypothesised to be part of AI-complete set of problems, implying that the difficulty of these computational problems is equivalent to that of solving the central artificial intelligence problem making computers as intelligent as people. With over 90% of data ever generated being produced in the last 2 years and with a great proportion being human generated unstructured text there is an ever increasing need to advance the field of Natural Language Processing.

Recent UK Government proposal to have measures to regulate social media companies over harmful content, including "substantial" fines and the ability to block services that do not stick to the rules is an example of the regulamentary need to better manage the content that is being generated by users.

Other initiatives like ​Riot Games​' work aimed to predict and reform toxic player behaviour during games is another example of this effort to understand the content being generated by users and moderate toxic content.

However, as highlighted by the Kaggle competition ​Jigsaw unintended bias in toxicity classification​, existing models suffer from unintended bias where models might predict high likelihood of toxicity for content containing certain words (e.g. "gay") even when those comments were not actually toxic (such as "I am a gay woman"), leaving machine only classification models still sub-standard.

Having tools that are able to flag up toxic content without suffering from unintended bias is of paramount importance to preserve Internet's fairness and freedom of speech.

Project Report

Download the Project-Report.pdf

Acquiring the data

Download the data from https://www.kaggle.com/c/12500/download-all , unzip and place it in /input folder.

Python package requirements

Python entry file.

  • Jupyter Notebook 86.0%
  • Python 13.8%

IMAGES

  1. Capstone Project

    udacity machine learning engineer capstone project

  2. GitHub

    udacity machine learning engineer capstone project

  3. GitHub

    udacity machine learning engineer capstone project

  4. Udacity Machine Learning Capstone Project

    udacity machine learning engineer capstone project

  5. GitHub

    udacity machine learning engineer capstone project

  6. GitHub

    udacity machine learning engineer capstone project

VIDEO

  1. Udacity Machine Learning Engineer Nanodegree program

  2. Udacity capstone Project for ML with AzureML

  3. Training a Convolutional Neural Network to Implement Real-Time Lane Keeping

  4. Udacity

  5. System Integration

  6. Udacity Machine Learning Nanodegree Capstone Project

COMMENTS

  1. Tips for Udacity Machine Learning Engineering Nanodegree Capstone Project

    Use the Udacity capstone project format (Required) On Udacity github there's a helpful sample format. Use this format to create a report north of 10 pages. Benchmark Model is a Must (Required) In real projects, machine learnists aim to improve the performance of the current state-of-art model — the benchmark.

  2. Udacity 2018 Machine Learning Nanodegree Capstone project

    Machine Learning Nanodegree 2018. This directory contain all code that was used for the Udacity Machine Learning Engineer Nanodegree Program. The folder Notebooks contains all of the Jupyter Notebooks used in the project. The links to the project proposal and the write-up of the final project are below. The project proposal: Proposal/Proposal.pdf.

  3. GitHub

    This GitHub Repository contains my final project for Udacity's Machine Learning Engineer Nanodegree. This is a Stock Price Predictor. It uses Amazon's DeepAR algorithm to create a model and forecast future stock prices.

  4. Projects completed for Udacity Machine Learning Engineer ...

    Projects completed for Udacity Machine Learning Engineer Nanodegree. Projects capstone_project. The capstone project for Udacity Machine Learning Nanodegree. A transfer learning Deep Learning model trained on augmented image dataset. A class was upsampled 380% using image augmentation and 100% accuracy was achieved. boston_housing

  5. Azure Machine Learning

    Take Udacity's Azure Machine Learning course and become a machine learning engineer. Build practical experience by training, validating, and evaluating ml models on the Azure platform. ... This capstone project gives you the opportunity to use the Azure Machine learning knowledge you have obtained from this Nanodegree to solve the problem of ...

  6. Starbucks Capstone Project. Capstone Project for Udacity's Machine

    This is my capstone project for the Udacity Machine Learning Engineer Nanodegree. The full source code can be found on my GitHub. Udacity partnered with Starbucks Coffee to provide a real-world…

  7. Udacity-ML-Capstone-Kaggle-Allstate

    Udacity-ML-Capstone-Kaggle-Allstate Machine Learning Engineer Nanodegree Capstone Project. Bryan Luke Lathrop March 6, 2017. This project encapsulates my final project for the Udacity Machine Learning Nano-degree, and is based on the Kaggle competition, Allstate Claims Severity The project is primarily an exercise in various machine learning techniques, with a goal of demonstrating the ...

  8. All about the Udacity Machine Learning Nanodegree Program ...

    The Machine Learning Nanodegree program is made up of 6 technical projects including one capstone. Each project has video lectures and in-lecture quizzes for practice.

  9. AWS Machine Learning Engineering Training Course

    Our AWS Machine Learning Engineer Nanodegree program, built in collaboration with AWS, is an intermediate-level machine learning engineering course. It's designed to equip you with the skills needed to build and deploy machine learning models using Amazon SageMaker. The program covers neural network basics, deep learning fluency, and essential ...

  10. How I completed Udacity's Machine Learning ND in just over one month

    Machine Learning by mohamed_hassan, Pixabay license. How can we learn effectively in a short amount of time? In this post, I describe how I went about finishing Udacity's Machine Learning Nanodegree in about a month when it usually takes 6-12 months. I hope this will give you some insight and ideas as to how you might work more effectively to accomplish your own learning goals.

  11. Capstone Project

    It is a screen-cast video required for the completion of final capstone project of 'Machine Learning Engineer with Microsoft Azure' course on Udacity. The ob...

  12. Udacity Capstone Project: Machine Learning Engineer Nanodegree

    Udacity Capstone Project. Naruhiko Nakanishi ... Machine learning engineering (MLE) is a rapidly growing field, and the demand for skilled professionals is high. If you're interested in a…

  13. Udacity Machine Learning Engineer Nanodegree

    Udacity Machine Learning Engineer Nanodegree - Capstone project This project aims to automate and simplify the prediction of application bottlenecks and system overloads by applying machine learning. It is a simplified approach because the solution does not need performance metrics from all application technology layers.

  14. The New and Improved AWS Machine Learning Engineer Nanodegree ...

    Machine learning (ML) is a subset of artificial intelligence (AI). An ML engineer works to create algorithms that use large training datasets to learn how to make decisions, classify objects, and much more. Machine learning is present in Spotify radio song suggestions, Google Maps driving directions, and your virtual phone assistant, like Siri.

  15. PDF THE SCHOOL OF ARTIFICIAL INTELLIGENCE Machine Learning Engineer

    Course 4: Machine Learning Capstone Project LESSON TITLE LEARNING OUTCOME ELECTIVE 1: STARBUCKS • Use purchasing habits to arrive at discount measures to obtain and retain customers. • Identify groups of individuals that are most likely to be responsive to rebates. ELECTIVE 2: ARVATO FINANCIAL SERVICES

  16. Machine Learning Engineer

    Machine Learning Engineer - Capstone Project - Development of a recommendation system to send offers via direct marketing - silviomori/udacity-machine-learning-capstone-starbucks

  17. Become a Machine Learning Engineer

    The goal of the AWS Machine Learning Engineer (MLE) Nanodegree program is to equip software developers/data scientists with the data science and machine learning skills required to build and deploy machine learning models in production using Amazon SageMaker.

  18. PDF Machine Learning Engineer with Microsoft Azure

    learning pipelines in Azure. Identify use cases for Automated Machine Learning. Use the Azure ML SDK to design, create, and manage machine learning pipelines in Azure. Project 1 Throughout the course, we cover many different ways to work with data and machine learning. It can be quite challenging to decide what method to use - building your own ...

  19. A Beginner's Guide to Machine Learning Fundamentals

    Machine learning (ML) is a subfield of artificial intelligence that empowers computers to learn and make predictions or decisions without being explicitly programmed. In simpler terms, it's a set of techniques that allows computers to analyze data, recognize patterns, and continuously improve their performance.

  20. Udacity Machine Learning Engineer Capstone Project. The project is my

    Udacity Machine Learning Engineer Capstone Project. The project is my participation to the Kaggle competition - Jigsaw unintended bias in toxicity classification. Field: Natural Language Processing - GitHub - gromag/MachineLearning-Engineer-Specialisation-Capstone-Project-Udacity: Udacity Machine Learning Engineer Capstone Project.

  21. Udacity or Coursera for Azure Machine Learning Certificates?

    Machine Learning Operations (learning part including project) Capstone Project Coursera Course 1 and 2 can be seen as introductions, Udacity assumes that you already have a little experience or ...

  22. Machine Learning for Data Science: Machine Learning Devops

    The Machine Learning DevOps Engineer Nanodegree program focuses on the software engineering fundamentals required to successfully streamline the deployment of data and machine-learning models in a production-level environment. Students will build the DevOps skills required to automate the various aspects and stages of machine learning model ...