What is Linear Regression?
Linear Regression is a Supervised Machine Learning algorithm used to predict a continuous numeric value based on one or more input features.
It tries to find the best-fit straight line between input variables and output values.
Simple Definition
Linear Regression predicts a value by finding the relationship between input and output variables using a straight line.
Examples:
-
Predicting house prices
-
Predicting salary based on experience
-
Predicting sales revenue
-
Predicting temperature
Real-World Example
Suppose we want to predict a person's salary based on their years of experience.
| Experience (Years) | Salary ($) |
|---|---|
| 1 | 30,000 |
| 2 | 35,000 |
| 3 | 40,000 |
| 4 | 45,000 |
| 5 | 50,000 |
We can observe:
-
More experience → higher salary
-
Relationship looks almost like a straight line
Linear Regression finds the best line representing this pattern.
Linear Regression Equation
The equation is:
y = mx + b
Where:
-
y = predicted output
-
x = input feature
-
m = slope of the line
-
b = intercept
Applying It to the Example
Suppose the model learns:
Salary = 5000 \times Experience + 25000
If experience = 6 years:
Prediction:
Salary = 5000(6) + 25000 = 55000
Predicted salary = $55,000
How Linear Regression Works
Steps:
-
Collect data
-
Plot data points
-
Find the best-fit line
-
Minimize prediction error
-
Use the line to predict future values
The algorithm tries to minimize the difference between:
-
Actual value
-
Predicted value
This error is often measured using:
-
Mean Squared Error (MSE)
Types of Linear Regression
1. Simple Linear Regression
Uses:
-
One input variable
-
One output variable
Example:
-
Experience → Salary
2. Multiple Linear Regression
Uses:
-
Multiple input variables
Example:
-
House Size
-
Number of Rooms
-
Location
to predict:
-
House Price
Equation:
y = b_0 + b_1x_1 + b_2x_2 + ... + b_nx_n
Important Interview Terms
1. Dependent Variable
The output being predicted.
Example:
-
Salary
2. Independent Variable
Input feature used for prediction.
Example:
-
Experience
3. Best-Fit Line
The line that minimizes total prediction error.
4. Residual/Error
Difference between actual and predicted value.
Formula:
Residual = Actual - Predicted
Assumptions of Linear Regression
Interviewers often ask this.
Linear Regression assumes:
-
Linear relationship between variables
-
No high multicollinearity
-
Errors are normally distributed
-
Constant variance of errors
-
Independent observations
Advantages
-
Simple and easy to understand
-
Fast to train
-
Works well for linear data
-
Highly interpretable
Limitations
-
Works poorly with non-linear data
-
Sensitive to outliers
-
Assumes linear relationship
-
Can underperform on complex datasets
Common Interview Questions
Q1: Is Linear Regression used for classification?
No.
Linear Regression is used for:
-
Continuous numeric prediction
For classification, we usually use:
-
Logistic Regression
Q2: How do you evaluate Linear Regression?
Common metrics:
-
R² Score
-
MAE
-
MSE
-
RMSE
Q3: What is overfitting?
When the model learns noise instead of actual patterns and performs poorly on new data.
Short Interview Answer
Linear Regression is a supervised learning algorithm used to predict continuous numeric values by finding the best-fit linear relationship between input and output variables. It works using the equation y = mx + b and is commonly used for predictions like salary, sales, and house prices.
Easy Way to Remember
Think:
“Linear Regression draws the best straight line to predict numbers.”
