Login  |  Join Us  |  Subscribe to Newsletter
Login to View News Feed and Manage Profile
☰
Login
Join Us
Login to View News Feed and Manage Profile
Agency
Agency
  • Home
  • Information
    • Discussion
    • Articles
    • Whitepapers
    • Use Cases
    • News
    • Contributors
    • Subscribe to Newsletter
  • Courses
    • Data Science & Analytics
    • Statistics and Related Courses
    • Online Data Science Courses
  • Prodigy
    • Prodigy Login
    • Prodigy Find Out More
    • Prodigy Free Services
    • Prodigy Feedback
    • Prodigy T&Cs
  • Awards
    • Contributors Competition
    • Data Science Writer Of The Year
  • Membership
    • Individual
    • Organisational
    • University
    • Associate
    • Affiliate
    • Benefits
    • Membership Fees
    • Join Us
  • Consultancy
    • Professional Services
    • Project Methodology
    • Unlock Your Data
    • Advanced Analytics
  • Resources
    • Big Data Resources
    • Technology Resources
    • Speakers
    • Data Science Jobs Board
    • Member CVs
  • About
    • Contact
    • Data Science Foundation
    • Steering Group
    • Professional Standards
    • Government And Industry
    • Sponsors
    • Supporter
    • Application Form
    • Education
    • Legal Notice
    • Privacy
    • Sitemap
  • Home
  • Information
    • Discussion
    • Articles
    • Whitepapers
    • Use Cases
    • News
    • Contributors
  • Courses
    • Data Science & Analytics
    • Statistics and Related Courses
    • Online Data Science Courses
  • Prodigy
    • Prodigy Login
    • Prodigy Find Out More
    • Prodigy Free Services
    • Prodigy Feedback
    • Prodigy T&Cs
  • Awards
    • Contributors Competition
    • Data Science Writer
  • Membership
    • Individual
    • Organisational
    • University
    • Associate
    • Affiliate
    • Benefits
    • Membership Fees
    • Join Us
  • Consultancy
    • Professional Services
    • Project Methodology
    • Unlock Your Data
    • Advanced Analytics
  • Resources
    • Big Data Resources
    • Technology Resources
    • Speakers
    • Data Science Jobs Board
    • Member CVs
  • About
    • Contact
    • Data Science Foundation
    • Steering Group
    • Professional Standards
    • Government And Industry
    • Sponsors
    • Supporter
    • Application Form
    • Education
    • Legal Notice
    • Privacy
    • Sitemap
  • Subscribe to Newsletter

Hands-On with your First ML Model

A DSF Whitepaper
25 March 2020
Mayank Tripathi
Author Profile
Other Articles
Follow (31)

Share with your network:

Hope you were following along with the various posts on Data Science; Machine Learning. And by this time we now know that the objective of any data science project is to derive valuable knowledge for the business from data in order to make better decisions. It is the responsibility of data scientists to define the goals to be achieved for a project. When we mention data science, we usually think about machine learning, and at-time it gets mixed up with each other, and gets confused in both.

So in-short Machine learning is the field of building algorithms that can learn patterns by themselves without being programmed explicitly. Which I have tried to explain in my previous post.
Refer to https://datascience.foundation/datatalk/understanding-why-machine-learning.

So machine learning is a family of techniques that can be used at the modeling stage of a data science project.

Going deeper, lets understand what is a model, and then having a basic understanding of how Machine Learning can be done, i mean having the hands dirty.

What Is a Model?

A machine learning model learns patterns from data and creates a mathematical function to generate predictions.

A supervised learning algorithm will try to find the relationship between a response variable and the given features.

Refer to https://datascience.foundation/datatalk/machine-learning-algorithm to understand different types of ML Algoriths.

If you are from a mathematical background then you might be aware of the mathematical function, which can be represented as a function ƒ(), that is applied to some input variables, X (which is composed of multiple features), and will calculate an output (or prediction) as ŷ.Typically the formula will be as

ŷ = ƒ(x) = ƒ(x1, x2, ….., xn)

Probably i will not make this article more boring, probably if you need more details, please add a comment and will share the details. Not all audiences are interested in knowing the back-process.

Will directly jumping to make our hands dirty.

Here I will be using the scikit-learn (or sklearn) package, also once you have learned how to train one algorithm, it is extremely easy to train another one with very minimal code changes.With sklearn or any other ML package, there are four main steps to train a machine learning model:

  1. Instantiate a model with specified hyperparameters (if any) → this will configure the machine learning model you want to train.
  2. Train the model with training data → during this step, the model will learn the best parameters to get predictions as close as possible to the actual values of the target.
  3. Predict the outcome from input data → using the learned parameter, the model will predict the outcome for new data.
  4. Assess the performance of the model predictions → for checking whether the model learned the right patterns to get accurate predictions.

Please remember that in a real project or testing any model, there might be more steps depending on the situation, but for simplicity, we will stick with these four steps for now. I will try to share more posts / articles to cover the other steps. Above 4 are generic one.

First we need to import the Data-Set. Here I am taking the example of Breast Cancer data-set, which is freely available with the sklearn package.

Also I am using google.colaboratory (it's free to use, one just needs to have a google drive account refer to https://colab.research.google.com/notebooks/welcome.ipynb),
you can also use Jupyter Notebook (https://cocalc.com/doc/jupyter-notebook.html).

I will attach the complete code for reference.

Assumption : Having a basic understanding of Python.

We will build a machine learning classifier using RandomForest from sklearn to predict whether the breast cancer of a patient is malignant (harmful) or benign (not harmful).Ignore the number in brackets. It is just the execution count of that cell.

 

In this example I am using a very basic method, thus will not go into more details, in-actual we may need to import various other packages, which will be used for Data-Cleaning; Data Visualization etc.

Sklearn has many other datasets which we can reference from scikit learn website as
https://scikit-learn.org/stable/datasets/index.html.

Next we will load the data-set into two variables, say features, and target. Also sklearn will provide a parameter return_X_y which we need to set as True, so that we can have X which are features from the data-set, and y which is target from the data-set will be retrieved and captured in respective variables.

 

Now will see what values we do have in our feature variable.

 

You should get the output similar to as shown in the above screenshot.
Similar to above, will see what values we do have in our target variable.

 

The above screenshot shows output of the target variable. There are two classes shown for each instance in the dataset. These classes are 0 and 1, representing whether the cancer is malignant or benign respectively.

Next will import the machine learning classifier. Generally it’s recommended to have all imports at the start of the program or coding, but you are free to import whenever it is required.

 

Can take any random value. Will see later what impact it has. There are n number of parameters in each model, and each has its own significance, for details you can refer to the documentation https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier. for now I am using random_state and will set the seed value to it.

 

Instantiate Random forest Classifier with the above defined see value. I personally prefer to have a variable name which has some meaning, thus instantiating the model and assigning it to variable rf_model.

 

Now it's time to train the model using .fit() method.

We are all set.
We have trained our model based on the data we had.
Now it's time to predict the data, for this will use the same features which we already have.

 

Checking the predicted values.

 

Once we have trained and predicted the values, we have to check how accurate our model is, for this will be using accuracy_score() method which is also provided by sklearn. For this we need to import it. There are various other metrics available to check the score and various other methods. The main goal is to check how accurate our model is.

 

Here you go.. Our model says it's 100% accurate as we used the same data (features) on which we have trained our model.

Score lies between 0 to 1. Where 0 represents 0% very bad, and 1 represents 100% very very accurate, which is ideal.

 

Excellent! Congrats, you just have trained a Random Forest model using sklearn and achieved an accuracy score of 1 in classifying breast cancer observations.

So in this simple way one can train a model, and with more and more effort into it, will make it a more robust and perfect model.

Hope you did get the basic idea of how to and what to do in Machine Learning / Data Science.

See you in the next article.

Code can be referenced from https://colab.research.google.com/drive/10Lq5YSmRglGQ-yNMBKMW3kGD9CcX3PB6.

Rate this Whitepaper
Rate 1 - 10 by clicking on a star
(14 Ratings) (3 Comments) (909 Views)
Download

If you found this Whitepaper interesting, why not review the other Whitepapers in our archive.

Login to Comment and Rate

Email a PDF Whitepaper

Comments:

Abhishek Mishra

19 Apr 2020 06:27:40 PM

A machine learning model learns patterns from data and creates a mathematical function to generate predictions. - I agree

Balakrishnan Subramanian

20 Apr 2020 06:48:43 AM

To get a hands-on experience you'll need a problem statement, data to process and enough time to try different approaches without deadlines pressing on you. This is “work” you can do while volunteering or participating in competitions.

Abhishek Mishra

25 Apr 2020 10:13:30 AM

you'll need a problem statement, data to process and enough time to try different approaches without deadlines pressing on you. - Great point

Go to discussion page

Categories

  • Data Science
  • Data Security
  • Analytics
  • Machine Learning
  • Artificial Intelligence
  • Robotics
  • Visualisation
  • Internet of Things
  • People & Leadership Skills
  • Other Topics
  • Top Active Contributors
  • Balakrishnan Subramanian
  • Abhishek Mishra
  • Mayank Tripathi
  • Michael Baron
  • Santosh Kumar
  • Recent Posts
  • AN ADAPTIVE MODEL FOR RUNWAY DETECTION AND LOCALIZATION IN UNMANNED AERIAL VEHICLE
    12 November 2021
  • Deep Learning
    05 November 2021
  • Machine Learning
    05 November 2021
  • Data is a New oil : A step into WSN enabled IoT and security
    26 October 2021
  • Highest Rated Posts
  • The transformational shift in educational outcomes in London 2003 to 2013: the contribution of local authorities
  • Data Driven Business Models in FMCG & Retail
  • Data Analysis with Pandas
  • Graph Analytics and Big Data
  • Understanding Buzzwords in Data Science
To attach files from your computer

    Comment

    You cannot reply to your own comment or question. You can respond to another member's comment in this thread.

    Get in touch

     

    Subscribe to latest Data science Foundation news

    I have read and agree to the Data science Foundation Privacy Policy

    • Home
    • Information
    • Resources
    • Membership
    • Services
    • Legal
    • Privacy
    • Site Map
    • Contact

    © 2022 Data science Foundation. All rights reserved. Data S.F. Limited 09624670

    Site By-Peppersack

    We use cookies

    Cookie Information

    We are using cookies to provide statistics that help us to improve your experience of our site. You can choose to use the site without cookies. However, by continuing to use the site without changing your settings, you are agreeing to our use of cookies.

    Contact Form

    This member is participating in the Prodigy programme. This message will be directed to Prodigy Admin the Prodigy Programme manager. Find out more about Prodigy

    Complete your membership listing and tell others about your interests, experience and qualifications with a Personal Profile page.

    Add a Personal Profile

    Your Personal Profile page is missing information about your experience and qualifications that other members would find interesting. Click here to update.

    Login / Join Us

    Login to your membership account to view your personalised news feed, update your profile, manage your preferences. publish articles and to create a following.

    If you are not a member but work with or have an interest in Data Science, Machine Learning and Artificial Intelligence, join us today.

    Login | Join Us

    Support the work of the Data Science Foundation

    Help to fund our work and enable us to provide free communications and knowledge sharing services to members across the globe.

    Click here to set-up a donation of £30 per year

    Follow

    Login

    Login to follow this member

    Login