1 - gshdh - CHERREADS

Machine Learning: Exam Notes

1. Introduction

Machine Learning (ML) enables computers to

learn patterns from data without explicit programming. Coined by Arthur Samuel

in 1959: "Field of study that gives computers the capability to learn without

being explicitly programmed."

ML vs. Traditional Programming:

• Traditional: Input + Program Logic →

Output

• ML: Input + Output → Model (logic

inferred during training), then Prediction on new inputs

2. Key Terminology

• Model (Hypothesis): Representation

learned from data via an ML algorithm.

• Feature: Measurable property of data

(e.g., color, size), represented as feature vectors.

• Target (Label): The value to predict

(e.g., fruit type).

• Training: Learning model parameters from

labeled data.

• Prediction: Applying the trained model to

new inputs.

3. Types of Learning

1. Supervised Learning: Learns from labeled

data. Tasks: Classification, Regression.

2. Unsupervised Learning: Discovers

patterns in unlabeled data. Tasks: Clustering, Association.

3. Semi‑Supervised Learning: Uses small

labeled and large unlabeled datasets.

4. Supervised Learning Algorithms

4.1 Classification

• Logistic Regression: Uses sigmoid

function to map features to probabilities. Loss: cross‑entropy.

• k‑Nearest Neighbors (kNN): Classifies

based on majority vote of k nearest samples.

• Support Vector Machine (SVM): Finds

hyperplane maximizing margin; kernel trick for non-linear separation.

• Naive Bayes: Probabilistic model with

feature independence assumption (variants: Gaussian, Multinomial, Bernoulli).

4.2 Regression

• Linear Regression: Models relationship by

minimizing squared error.

• Multiple Linear Regression: Extends to

multiple features.

• Polynomial Regression: Models nonlinear

relationships via feature expansion.

• Decision Tree Regression: Recursive

partitioning; prone to overfitting.

• Random Forest Regression: Ensemble of

trees via bagging; reduces variance.

• XGBoost: Gradient-boosted trees optimized

for speed and regularization.

5. Unsupervised Learning Algorithms

5.1 Clustering

• k-Means: Partitions data into k clusters

by minimizing within-cluster sum of squares.

• Hierarchical Clustering: Agglomerative or

divisive merging/splitting; various linkage criteria.

• DBSCAN: Density-based clustering;

parameters ε (radius) and MinPts; identifies core, border, and noise points.

6. Model Evaluation & Optimization

• Cross‑Validation (k‑Fold): Splits data

into k folds; trains on k-1, validates on held-out, averages performance.

• Hyperparameter Tuning: Grid Search over

parameter grid using CV to select best combination.

7. Sample Exam Questions & Answers

7.1 5‑Mark Questions

Q1. Define Machine Learning and

differentiate it from traditional programming.

A. ML enables computers to learn patterns

from data without explicit programming logic, unlike traditional programming

where logic must be manually coded.

Q2. List and briefly describe three types

of learning in ML.

A. 1. Supervised Learning: trains on

labeled data (classification/regression).

2. Unsupervised Learning: discovers structure in unlabeled data

(clustering/association).

3. Semi‑Supervised Learning: uses both labeled and unlabeled data.

Q3. What is overfitting, and how does

Random Forest mitigate it?

A. Overfitting occurs when a model captures

noise, reducing generalization. Random Forest mitigates overfitting by

averaging multiple decision trees trained on bootstrapped samples, reducing

variance.

7.2 10‑Mark Questions

Q1. Explain the working of k‑Nearest

Neighbors algorithm. What are its advantages and disadvantages?

A. The kNN algorithm classifies a sample

based on the majority class among its k nearest neighbors (using a distance

metric). Advantages: simple, no training time. Disadvantages: high

computational cost on large datasets, sensitive to noise and irrelevant

features, requires feature scaling.

Q2. Describe the concept of kernel trick in

SVMs. Give examples of common kernels.

A. Kernel trick maps data to a

higher-dimensional space for linear separability by computing dot products

implicitly via functions. Common kernels: Linear, Polynomial, Radial Basis

Function (RBF), Sigmoid.

7.3 20‑Mark Question

Q1. Discuss the steps involved in building,

evaluating, and tuning a supervised ML model for a regression problem.

Illustrate with a pipeline from data preprocessing to hyperparameter

optimization.

In-depth Answer: Regression Model Pipeline

1. Problem Definition & Data Collection

Begin by defining the regression objective

(e.g., predict CO₂ emissions) and gather a dataset with features and target.

2. Data Preprocessing

• Handle missing values: impute or remove.

• Encode categorical features: one-hot encoding.

• Scale features: standardize or normalize.

3. Feature Engineering & Selection

• Polynomial Features: expand for

non-linear relationships.

• Feature Importance: use tree-based models to drop uninformative features.

4. Train/Test Split

• Hold-out test set (20–30%) for final

evaluation.

• Use remaining data for training and validation.

5. Model Training

Train candidate models: Linear Regression,

Decision Tree, Random Forest, XGBoost.

6. Model Evaluation

Use MSE, MAE, R² on validation data.

7. Cross-Validation

Apply k-fold CV: split data into k folds,

train on k-1 and validate, repeat, average.

8. Hyperparameter Tuning

Use GridSearchCV to search combinations

(e.g., n_estimators, max_depth) via CV to find the best model.

9. Final Testing & Deployment

Evaluate tuned model on the held-out test

set, deploy, and monitor for drift.

Chapter 1 - 1