Fundamentals of machine learning (2024)

Machine learning is in many ways the intersection of two disciplines - data science and software engineering. The goal of machine learning is to use data to create a predictive model that can be incorporated into a software application or service. To achieve this goal requires collaboration between data scientists who explore and prepare the data before using it totraina machine learning model, and software developers who integrate the models into applications where they're used to predict new data values (a process known asinferencing).

Machine learning has its origins in statistics and mathematical modeling of data. The fundamental idea of machine learning is to use data from past observations to predict unknown outcomes or values. For example:

  • The proprietor of an ice cream store might use an app that combines historical sales and weather records to predict how many ice creams they're likely to sell on a given day, based on the weather forecast.
  • A doctor might use clinical data from past patients to run automated tests that predict whether a new patient is at risk from diabetes based on factors like weight, blood glucose level, and other measurements.
  • A researcher in the Antarctic might use past observations automate the identification of different penguin species (such asAdelie,Gentoo, orChinstrap) based on measurements of a bird's flippers, bill, and other physical attributes.

Machine learning as afunction

Because machine learning is based on mathematics and statistics, it's common to think about machine learning models in mathematical terms. Fundamentally, a machine learning model is a software application that encapsulates afunctionto calculate an output value based on one or more input values. The process of defining that function is known astraining. After the function has been defined, you can use it to predict new values in a process calledinferencing.

Let's explore the steps involved in training and inferencing.

Fundamentals of machine learning (1)

  1. The training data consists of past observations. In most cases, the observations include the observed attributes orfeaturesof the thing being observed, and the known value of the thing you want to train a model to predict (known as thelabel).

    In mathematical terms, you'll often see the features referred to using the shorthand variable namex, and the label referred to asy. Usually, an observation consists of multiple feature values, soxis actually avector(an array with multiple values), like this:[x1,x2,x3,...].

    To make this clearer, let's consider the examples described previously:

    • In the ice cream sales scenario, our goal is to train a model that can predict the number of ice cream sales based on the weather. The weather measurements for the day (temperature, rainfall, windspeed, and so on) would be thefeatures(x), and the number of ice creams sold on each day would be thelabel(y).
    • In the medical scenario, the goal is to predict whether or not a patient is at risk of diabetes based on their clinical measurements. The patient's measurements (weight, blood glucose level, and so on) are thefeatures(x), and the likelihood of diabetes (for example,1for at risk,0for not at risk) is thelabel(y).
    • In the Antarctic research scenario, we want to predict the species of a penguin based on its physical attributes. The key measurements of the penguin (length of its flippers, width of its bill, and so on) are thefeatures(x), and the species (for example,0for Adelie,1for Gentoo, or2for Chinstrap) is thelabel(y).
  2. Analgorithmis applied to the data to try to determine a relationship between the features and the label, and generalize that relationship as a calculation that can be performed onxto calculatey. The specific algorithm used depends on the kind of predictive problem you're trying to solve (more about this later), but the basic principle is to try tofita function to the data, in which the values of the features can be used to calculate the label.

  3. The result of the algorithm is amodelthat encapsulates the calculation derived by the algorithm as afunction- let's call itf. In mathematical notation:

    y = f(x)

  4. Now that thetrainingphase is complete, the trained model can be used forinferencing. The model is essentially a software program that encapsulates the function produced by the training process. You can input a set of feature values, and receive as an output a prediction of the corresponding label. Because the output from the model is a prediction that was calculated by the function, and not an observed value, you'll often see the output from the function shown asŷ(which is rather delightfully verbalized as "y-hat").

There are multiple types of machine learning, and you must apply the appropriate type depending on what you're trying to predict. A breakdown of common types of machine learning is shown in the following diagram.

Fundamentals of machine learning (2)

Supervised machine learning

Supervisedmachine learning is a general term for machine learning algorithms in which the training data includes bothfeaturevalues and knownlabelvalues. Supervised machine learning is used to train models by determining a relationship between the features and labels in past observations, so that unknown labels can be predicted for features in future cases.


Regressionis a form of supervised machine learning in which the label predicted by the model is a numeric value. For example:

  • The number of ice creams sold on a given day, based on the temperature, rainfall, and windspeed.
  • The selling price of a property based on its size in square feet, the number of bedrooms it contains, and socio-economic metrics for its location.
  • The fuel efficiency (in miles-per-gallon) of a car based on its engine size, weight, width, height, and length.


Classificationis a form of supervised machine learning in which the label represents a categorization, orclass. There are two common classification scenarios.

Binary classification

Inbinary classification, the label determines whether the observed itemis(orisn't) an instance of a specific class. Or put another way, binary classification models predict one of two mutually exclusive outcomes. For example:

  • Whether a patient is at risk for diabetes based on clinical metrics like weight, age, blood glucose level, and so on.
  • Whether a bank customer will default on a loan based on income, credit history, age, and other factors.
  • Whether a mailing list customer will respond positively to a marketing offer based on demographic attributes and past purchases.

In all of these examples, the model predicts a binarytrue/falseorpositive/negativeprediction for a single possible class.

Multiclass classification

Multiclass classificationextends binary classification to predict a label that represents one of multiple possible classes. For example,

  • The species of a penguin (Adelie,Gentoo, orChinstrap) based on its physical measurements.
  • The genre of a movie (comedy,horror,romance,adventure, orscience fiction) based on its cast, director, and budget.

In most scenarios that involve a known set of multiple classes, multiclass classification is used to predict mutually exclusive labels. For example, a penguin can't be both aGentooand anAdelie. However, there are also some algorithms that you can use to trainmultilabelclassification models, in which there may be more than one valid label for a single observation. For example, a movie could potentially be categorized as bothscience fictionandcomedy.

Unsupervised machine learning

Unsupervisedmachine learning involves training models using data that consists only offeaturevalues without any known labels. Unsupervised machine learning algorithms determine relationships between the features of the observations in the training data.


The most common form of unsupervised machine learning isclustering. A clustering algorithm identifies similarities between observations based on their features, and groups them into discrete clusters. For example:

  • Group similar flowers based on their size, number of leaves, and number of petals.
  • Identify groups of similar customers based on demographic attributes and purchasing behavior.

In some ways, clustering is similar to multiclass classification; in that it categorizes observations into discrete groups. The difference is that when using classification, you already know the classes to which the observations in the training data belong; so the algorithm works by determining the relationship between the features and the known classification label. In clustering, there's no previously known cluster label and the algorithm groups the data observations based purely on similarity of features.

In some cases, clustering is used to determine the set of classes that exist before training a classification model. For example, you might use clustering to segment your customers into groups, and then analyze those groups to identify and categorize different classes of customer (high value - low volume,frequent small purchaser, and so on). You could then use your categorizations to label the observations in your clustering results and use the labeled data to train a classification model that predicts to which customer category a new customer might belong.

Fundamentals of machine learning (2024)


Top Articles
Latest Posts
Article information

Author: Nathanial Hackett

Last Updated:

Views: 5943

Rating: 4.1 / 5 (52 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Nathanial Hackett

Birthday: 1997-10-09

Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

Phone: +9752624861224

Job: Forward Technology Assistant

Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.