Sharan Initiatives — AI, Finance, Photography & More

Understanding Linear Regression

Linear regression is the simplest machine learning algorithm. It models the relationship between features (X) and target (y) as a linear function: **Formula:** y = wx + b Where: - **w** = weight (slope) - **b** = bias (intercept) - **x** = input feature - **y** = predicted output

Key Concepts

1. **Hypothesis Function**: h(x) = wx + b 2. **Cost Function**: Mean Squared Error (MSE) 3. **Optimization**: Gradient Descent 4. **Goal**: Minimize the cost function

Implementing Linear Regression

Let's build linear regression from scratch:

python

import numpy as np
import matplotlib.pyplot as plt

class LinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.lr = learning_rate
        self.n_iterations = n_iterations
        self.w = None
        self.b = None
        self.costs = []
    
    def fit(self, X, y):
        """Train the model"""
        n_samples, n_features = X.shape
        
        # Initialize parameters
        self.w = np.zeros(n_features)
        self.b = 0
        
        # Gradient descent
        for i in range(self.n_iterations):
            # Forward pass (prediction)
            y_pred = np.dot(X, self.w) + self.b
            
            # Calculate cost (MSE)
            cost = np.mean((y_pred - y) ** 2)
            self.costs.append(cost)
            
            # Backward pass (gradients)
            dw = (2 / n_samples) * np.dot(X.T, (y_pred - y))
            db = (2 / n_samples) * np.sum(y_pred - y)
            
            # Update parameters
            self.w -= self.lr * dw
            self.b -= self.lr * db
            
            if i % 100 == 0:
                print(f"Iteration {i}: Cost = {cost:.4f}")
    
    def predict(self, X):
        """Make predictions"""
        return np.dot(X, self.w) + self.b

# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1) * 0.5

# Train model
model = LinearRegression(learning_rate=0.1, n_iterations=1000)
model.fit(X, y.flatten())

print(f"\nLearned parameters:")
print(f"Weight (w): {model.w[0]:.4f} (true: 3.0)")
print(f"Bias (b): {model.b:.4f} (true: 4.0)")

Output:

Iteration 0: Cost = 19.7635
Iteration 100: Cost = 0.2891
Iteration 200: Cost = 0.2639
Iteration 300: Cost = 0.2628
Iteration 400: Cost = 0.2627
Iteration 500: Cost = 0.2627
Iteration 600: Cost = 0.2627
Iteration 700: Cost = 0.2627
Iteration 800: Cost = 0.2627
Iteration 900: Cost = 0.2627

Learned parameters:
Weight (w): 2.9847 (true: 3.0)
Bias (b): 4.0532 (true: 4.0)

Visualizing Results

Visualize the training process and final model:

python

# Plot cost over iterations
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(model.costs)
plt.xlabel('Iteration')
plt.ylabel('Cost (MSE)')
plt.title('Training Cost Over Time')
plt.grid(True, alpha=0.3)

# Plot data and fitted line
plt.subplot(1, 2, 2)
plt.scatter(X, y, alpha=0.6, label='Data')
X_line = np.linspace(0, 2, 100).reshape(-1, 1)
y_pred = model.predict(X_line)
plt.plot(X_line, y_pred, 'r-', linewidth=2, label='Fitted Line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Fit')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Make predictions
X_new = np.array([[0], [1], [2]])
predictions = model.predict(X_new)
print("\nPredictions:")
for x, pred in zip(X_new.flatten(), predictions):
    print(f"  x = {x:.1f} → y = {pred:.2f}")

Output:

Predictions:
  x = 0.0 → y = 4.05
  x = 1.0 → y = 7.04
  x = 2.0 → y = 10.02

Multiple Linear Regression

Extend to multiple features:

python

# Generate data with multiple features
np.random.seed(42)
X_multi = np.random.rand(100, 3)  # 3 features
true_w = np.array([2, -3, 5])
true_b = 1
y_multi = np.dot(X_multi, true_w) + true_b + np.random.randn(100) * 0.1

# Train model
model_multi = LinearRegression(learning_rate=0.5, n_iterations=1000)
model_multi.fit(X_multi, y_multi)

print("True weights:", true_w)
print("Learned weights:", model_multi.w)
print(f"\nTrue bias: {true_b}")
print(f"Learned bias: {model_multi.b:.4f}")

# Evaluate
y_pred = model_multi.predict(X_multi)
mse = np.mean((y_pred - y_multi) ** 2)
r2 = 1 - (np.sum((y_multi - y_pred) ** 2) / 
          np.sum((y_multi - np.mean(y_multi)) ** 2))

print(f"\nMean Squared Error: {mse:.4f}")
print(f"R² Score: {r2:.4f}")

Output:

Iteration 0: Cost = 7.4298
Iteration 100: Cost = 0.0101
Iteration 200: Cost = 0.0101
...
Iteration 900: Cost = 0.0101

True weights: [ 2 -3  5]
Learned weights: [ 1.9982 -2.9968  4.9991]

True bias: 1
Learned bias: 1.0023

Mean Squared Error: 0.0101
R² Score: 0.9999

Linear Regression from Scratch

Understanding Linear Regression

Key Concepts

Implementing Linear Regression

Visualizing Results

Multiple Linear Regression