2023-05-18

Regression

hands on machine learning
machine learning

20 min read

data science deep logo

Regression

In the last blog, we discussed different types of machine learning algorithms mainly categorizing them into 3 types: supervised, Unsupervised and Reinforcement learning.

Supervised learning algorithms:

In the supervised learning algorithms, the machine is trained on labelled data to predict outcomes. Supervised learning algorithm uses training data that contains both input features(x1,x2,x3..) and their corresponding output(y) labels to learn how to map new input to output.

There are two main subcategories of problems within the field of supervised learning:

  1. Classification
  2. Regression.

In this blog, we will look at one of the simplest forms of supervised learning algorithms, ‘ Regression’.

Introduction To Regression

Regression is a statistical tool for finding the relationship between the continuous target variable and one or more input features. Here we are trying to find a function that will be able to map the input variables to the output variable. The input variables are also called as independent variables or features(x1,x2,x3,x4), while the output variable(y) is called the dependent variable.

For ex: In the case of housing price which is a continuous output variable we are using different input features/variables like the number of bedrooms, bathrooms, and square footage of the house to predict the price of the house.

Terminologies associated with Regression

Let us understand the important terminologies associated with regression from the above example.

  • Dependent variable(Y): The continuous output variable(House price) that we are trying to predict or understand is known as the dependent variable.
  • Independent variable(X): The input variables(the number of bedrooms, bathrooms, and square footage of the house) that we are using to predict the dependent variable(House price) are known as the independent variable.
  • Multicollinearity: Multicollinearity is a condition where two or more independent variables are highly correlated with each other. This can cause a problem in the regression analysis as we will not be able to identify the individual effect of the independent variable on the dependent variable.
    • Training data: Training data is a subset of the actual dataset that is provided to the model for discovering and learning of patterns in data. In this way, we train or fit our model.
    • Testing data: The unseen data for testing the model or to check how correctly the model predicts the outcome is known as testing data. Training data is always greater than testing data(80:20,75:25 etc..) for the model so that a large portion of the data is used for recognising and learning the patterns.
    • Underfitting: A condition where the model doesn’t work well with the training and testing data is called underfitting.
    • Overfitting: In case of overfitting, the model does well on the training data but doesn’t perform well in the prediction of the testing or unseen data.
    • Outliers: Outlier is an observation in data that differ significantly(too high or too low) from other observations present in the data. Outlier present gives us incorrect prediction and hence must be treated.

These terminologies are important to understand as we will use them frequently in further modules.

Importance of Regression

  • Regression is used in many fields like finance, economics, and engineering, to predict future trends or outcomes based on historical data.
  • Regression is used for the prediction of continuous output variables like weather forecasting, prediction of house pricing, sales of a product etc.
  • By Regression technique, we are able to understand the relationship between the dependent target variable and the independent feature variables.

Regression is an important tool for making predictions and understanding the relationship between different variables in various areas/fields.

Types of Regression

Regression is a fundamental technique used in data science and machine learning. There are different types of regression which are used in different cases/scenarios available. Each regression method has its own unique strength and applications but overall it tries to understand the effect of independent variables on the dependent variable. Some of the most common types of regression are:-

  • Linear Regression.
  • Logistic Regression.
  • Polynomial Regression.
  • Decision Tree Regression.
  • Random Forest Regression.