Python and NumPy for Deep Learning โ Zero to Productive
This chapter is your workbench setup. Every neural network you build โ from a single-neuron perceptron to a 100-layer ResNet โ will be written on top of NumPy arrays, vectorised operations, and matplotlib visualisations. Master these tools now and every subsequent chapter becomes dramatically easier.
Prerequisites
- Basic Python โ variables, lists, for-loops, functions, dictionaries
- Class 11โ12 Mathematics (CBSE/ISC) โ basic algebra, simple plotting
- Chapter 2 of this textbook (mathematical notation familiarity is helpful but not required)
Learning Objectives
By the end of this chapter, you will be able to:
- Create, index, slice, and reshape NumPy arrays with confidence
- Explain and apply NumPy broadcasting rules to eliminate explicit loops
- Demonstrate that vectorised code runs 50โ200ร faster than Python for-loops
- Use
np.dot,np.exp,np.log,np.sum,np.maximum, andnp.randomโ the six function families that power every DL model - Plot loss curves, histograms, and decision boundaries with matplotlib
- Load, inspect, and preprocess CSV datasets using Pandas
- Set up Google Colab with GPU runtime, upload files, and install libraries
The Hook
๐ ๏ธ Know Your Tools
Before we build a neural network, we need our tools. Just as a carpenter knows their chisel, a deep learning practitioner must know NumPy cold.
Consider this: a single forward pass through a neural network on 10,000 MNIST images requires roughly 80 million multiply-add operations. Written as a Python for-loop, that takes ~45 seconds. Written as a single NumPy matrix multiplication โ 3 milliseconds. That's a 15,000ร speedup.
This chapter gives you the fluency to write deep learning code that's both correct and fast. Every minute invested here pays compound interest across every remaining chapter.
India Connect
Data scientists at Flipkart, Zomato, and Jio use NumPy and Pandas daily โ from recommendation engines serving 500 million users to demand forecasting for โน50,000 crore supply chains. Indian tech interviews at these companies routinely test NumPy fluency. This chapter is your preparation.
3.1 โ NumPy Arrays: Creation, Indexing, Slicing, Reshaping
A NumPy array (formally numpy.ndarray) is the fundamental data structure of scientific Python. Unlike Python lists, NumPy arrays are homogeneous (all elements same type), stored in contiguous memory, and support element-wise operations without loops.
Why Not Python Lists?
A Python list of 1 million floats stores 1 million pointers to 1 million separate objects scattered across memory. A NumPy array stores 1 million floats in a single, contiguous block of 8 MB. The result: NumPy is 50โ200ร faster for numerical computation due to cache locality, SIMD instructions, and C-level loops.
Creating Arrays
# 1. From Python lists
import numpy as np
a = np.array([1, 2, 3, 4]) # 1D โ shape (4,)
print(a.shape, a.dtype) # (4,) int64
b = np.array([[1, 2, 3],
[4, 5, 6]]) # 2D โ shape (2, 3)
print(b.shape) # (2, 3)
# 2. Built-in constructors
zeros = np.zeros((3, 4)) # 3ร4 matrix of 0.0
ones = np.ones((2, 5)) # 2ร5 matrix of 1.0
eye = np.eye(3) # 3ร3 identity matrix
rng = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
lin = np.linspace(0, 1, 5) # [0.0, 0.25, 0.5, 0.75, 1.0]
rand = np.random.randn(3, 4) # 3ร4 standard normal
# 3. Specifying dtype (critical for DL)
w = np.zeros((784, 128), dtype=np.float32) # 32-bit saves GPU memory
print(w.dtype, w.nbytes) # float32, 401408 (โ400 KB)
Python
Pro Tip: Always Check Shape
The single most useful debugging habit in deep learning: print(x.shape) after every operation. Shape mismatches cause 80% of NumPy bugs. Make it muscle memory.
Indexing and Slicing
X = np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
# Basic indexing (row, col) โ 0-indexed
print(X[0, 2]) # 30 โ row 0, col 2
print(X[2, 1]) # 80 โ row 2, col 1
# Slicing โ [start:stop:step] (stop is exclusive)
print(X[0, :]) # [10 20 30] โ entire first row
print(X[:, 1]) # [20 50 80] โ entire second column
print(X[0:2, 1:3]) # [[20 30] โ top-right 2ร2 submatrix
# [50 60]]
# Boolean indexing โ incredibly useful for filtering
mask = X > 40
print(mask)
# [[False False False]
# [False True True]
# [ True True True]]
print(X[mask]) # [50 60 70 80 90]
# Fancy indexing โ select specific rows
print(X[[0, 2]]) # [[10 20 30] โ rows 0 and 2
# [70 80 90]]
Python
Reshaping
Reshaping is the most common operation you'll perform in deep learning. Every layer expects inputs in a specific shape. Reshaping never copies data โ it creates a new view of the same memory.
a = np.arange(12) # [0, 1, 2, ..., 11] shape: (12,)
# Reshape to 2D
b = a.reshape(3, 4) # shape: (3, 4)
c = a.reshape(4, 3) # shape: (4, 3)
d = a.reshape(2, 2, 3) # shape: (2, 2, 3) โ 3D!
# Use -1 to auto-infer one dimension
e = a.reshape(3, -1) # shape: (3, 4) โ NumPy infers 4
f = a.reshape(-1, 1) # shape: (12, 1) โ column vector
# CRITICAL for DL: flatten a batch of images
images = np.random.randn(64, 28, 28) # 64 images, 28ร28 pixels
flat = images.reshape(64, -1) # shape: (64, 784)
print(flat.shape) # (64, 784) โ ready for dense layer
# Transpose
W = np.random.randn(784, 128)
print(W.T.shape) # (128, 784)
Python
Reshape vs Resize
np.resize() will silently repeat your data to fill a larger shape โ almost never what you want. Always use .reshape(). If the total number of elements doesn't match, .reshape() will throw a clear error, which is the correct behaviour.
Memory Views
a.reshape(3, 4) does not copy data. It creates a new view โ a different "lens" on the same block of memory. This is why NumPy is fast: you can reshape a 100 MB array in microseconds, because no bytes are moved.
3.2 โ Broadcasting: The Most Important NumPy Concept for DL
Broadcasting is the mechanism that allows NumPy to perform element-wise operations on arrays of different shapes. Without broadcasting, you'd need explicit for-loops for most neural network computations. It is, without exaggeration, the single most important NumPy concept for deep learning.
The Broadcasting Rules (Memorise These)
When operating on two arrays, NumPy compares their shapes element-wise, starting from the trailing (rightmost) dimension. Two dimensions are compatible when:
- They are equal, OR
- One of them is 1
If conditions are met, the smaller array is "broadcast" (virtually stretched) across the larger array. No data is copied โ it's a compile-time trick.
Example 1: Scalar + Array
a = np.array([1, 2, 3]) # shape: (3,)
b = 10 # shape: () โ scalar
print(a + b) # [11 12 13]
# The scalar 10 is "broadcast" to [10, 10, 10]
Python
Example 2: Row Vector + Column Vector โ Matrix (Outer Operation)
row = np.array([[1, 2, 3]]) # shape: (1, 3)
col = np.array([[10],
[20],
[30]]) # shape: (3, 1)
result = row + col # shape: (3, 3)
print(result)
# [[11 12 13]
# [21 22 23]
# [31 32 33]]
Python
Example 3: Neural Network Bias Addition (The DL Use Case)
In a neural network, we compute Z = X @ W + b where X is (m, n), W is (n, k), and b is (1, k) or (k,). The bias b is broadcast across all m samples:
m = 64 # batch size
n = 784 # input features (28ร28 flattened)
k = 128 # hidden units
X = np.random.randn(m, n) # (64, 784)
W = np.random.randn(n, k) # (784, 128)
b = np.zeros((1, k)) # (1, 128)
Z = X @ W + b # (64, 128) โ b broadcast across 64 rows!
print(Z.shape) # (64, 128)
# Without broadcasting, you'd need:
# Z = np.zeros((m, k))
# for i in range(m):
# Z[i] = X[i] @ W + b โ SLOW!
Python
Example 4: Normalisation (Zero-Mean, Unit-Variance)
# Normalise each feature (column) independently
X = np.random.randn(1000, 5) * 10 + 50 # 1000 samples, 5 features
mean = X.mean(axis=0) # shape: (5,)
std = X.std(axis=0) # shape: (5,)
X_norm = (X - mean) / std # Broadcasting! (1000,5) - (5,) โ (1000,5)
print(X_norm.mean(axis=0)) # โ [0, 0, 0, 0, 0]
print(X_norm.std(axis=0)) # โ [1, 1, 1, 1, 1]
Python
Example 5: Softmax (Used in Every Classification Network)
def softmax(z):
"""Numerically stable softmax โ uses broadcasting throughout."""
z_shifted = z - np.max(z, axis=1, keepdims=True) # (m, k) - (m, 1)
exp_z = np.exp(z_shifted) # element-wise
return exp_z / np.sum(exp_z, axis=1, keepdims=True) # (m, k) / (m, 1)
logits = np.random.randn(4, 3) # 4 samples, 3 classes
probs = softmax(logits)
print(probs.sum(axis=1)) # [1. 1. 1. 1.] โ each row sums to 1
Python
keepdims=True Is Your Best Friend
When using np.sum(), np.mean(), or np.max() with an axis argument, always consider keepdims=True. It preserves the reduced dimension as size 1, making subsequent broadcasting operations work correctly. Without it, the shape drops a dimension and broadcasting may silently produce wrong results.
Shape (3,) vs Shape (3, 1) vs Shape (1, 3)
These are three different shapes that broadcast differently:
(3,)โ 1D array, broadcasts like a row when added to a 2D array(3, 1)โ 2D column vector, broadcasts across columns(1, 3)โ 2D row vector, broadcasts across rows
Use .reshape(-1, 1) to convert (3,) into (3, 1). This resolves 90% of broadcasting bugs.
3.3 โ Vectorization vs For-Loops: The 100ร Speedup
Vectorization means replacing explicit Python for-loops with NumPy array operations that execute in compiled C code. This is not a micro-optimisation โ it's the difference between training taking 5 minutes vs 8 hours.
Why Is Python Slow at Loops?
For each iteration of a Python for-loop, the interpreter must: (1) check variable types, (2) look up the + operator, (3) create a new Python float object, (4) store the result. NumPy does all of this once at the C level and then processes millions of elements in a tight, compiled loop with SIMD instructions. The overhead is constant, not per-element.
Benchmark: Dot Product
import numpy as np
import time
n = 1_000_000
a = np.random.randn(n)
b = np.random.randn(n)
# ---- Method 1: Python for-loop ----
start = time.time()
dot_loop = 0.0
for i in range(n):
dot_loop += a[i] * b[i]
loop_time = time.time() - start
# ---- Method 2: NumPy vectorised ----
start = time.time()
dot_np = np.dot(a, b)
np_time = time.time() - start
print(f"For-loop: {loop_time*1000:.1f} ms")
print(f"NumPy: {np_time*1000:.2f} ms")
print(f"Speedup: {loop_time/np_time:.0f}ร")
Python
Benchmark: Element-wise Operations
X = np.random.randn(1000, 1000)
# For-loop: square each element
start = time.time()
result_loop = np.zeros_like(X)
for i in range(X.shape[0]):
for j in range(X.shape[1]):
result_loop[i, j] = X[i, j] ** 2
loop_time = time.time() - start
# Vectorised
start = time.time()
result_vec = X ** 2
vec_time = time.time() - start
print(f"For-loop: {loop_time*1000:.1f} ms")
print(f"Vectorised: {vec_time*1000:.2f} ms")
print(f"Speedup: {loop_time/vec_time:.0f}ร")
Python
The Rule of NumPy
If you see a for-loop iterating over array elements in your deep learning code, there's almost certainly a vectorised alternative. The only exceptions are: iterating over training epochs, iterating over mini-batches, and certain sequential operations like RNNs (even those can be partially vectorised).
3.4 โ The Six Function Families That Power Deep Learning
You can build any neural network from scratch using just six categories of NumPy functions. Let's master each one.
Family 1: np.dot / @ โ Matrix Multiplication
# Vector dot product
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.dot(a, b)) # 32 (1ร4 + 2ร5 + 3ร6)
# Matrix multiplication โ the core of forward pass
X = np.random.randn(64, 784) # 64 samples, 784 features
W = np.random.randn(784, 128) # weight matrix
Z = X @ W # shape: (64, 128)
Z = np.dot(X, W) # identical to X @ W
# CAUTION: * is element-wise, @ is matrix multiply
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print(A * B) # [[ 5 12] [21 32]] โ element-wise (Hadamard)
print(A @ B) # [[19 22] [43 50]] โ matrix multiplication
Python
Family 2: np.sum โ Reduction Along Axes
X = np.array([[1, 2, 3],
[4, 5, 6]]) # shape: (2, 3)
print(np.sum(X)) # 21 โ sum everything
print(np.sum(X, axis=0)) # [5 7 9] โ sum across rows โ shape (3,)
print(np.sum(X, axis=1)) # [6 15] โ sum across cols โ shape (2,)
print(np.sum(X, axis=1, keepdims=True))
# [[ 6] โ shape (2, 1) โ for broadcasting
# [15]]
# Also: np.mean(), np.max(), np.min(), np.std() โ same axis logic
Python
Family 3: np.exp, np.log โ Activation & Loss Functions
# Sigmoid activation (used in binary classification)
def sigmoid(z):
return 1 / (1 + np.exp(-z))
z = np.array([-2, -1, 0, 1, 2])
print(sigmoid(z))
# [0.119 0.269 0.5 0.731 0.881]
# Binary cross-entropy loss
def binary_cross_entropy(y_true, y_pred):
"""y_true: (m,), y_pred: (m,) โ both between 0 and 1."""
epsilon = 1e-8 # avoid log(0)
return -np.mean(
y_true * np.log(y_pred + epsilon) +
(1 - y_true) * np.log(1 - y_pred + epsilon)
)
y = np.array([1, 0, 1, 1])
p = np.array([0.9, 0.1, 0.8, 0.95])
print(f"Loss: {binary_cross_entropy(y, p):.4f}") # Loss: 0.0970
Python
Family 4: np.maximum โ ReLU Activation
# ReLU: f(x) = max(0, x)
def relu(z):
return np.maximum(0, z)
z = np.array([-3, -1, 0, 2, 5])
print(relu(z)) # [0 0 0 2 5]
# Leaky ReLU: f(x) = max(ฮฑx, x)
def leaky_relu(z, alpha=0.01):
return np.maximum(alpha * z, z)
print(leaky_relu(z)) # [-0.03 -0.01 0. 2. 5.]
# IMPORTANT: np.maximum vs np.max
# np.maximum(a, b) โ element-wise max of two arrays
# np.max(a) โ single maximum value in array
Python
Family 5: np.random โ Weight Initialisation
# Set seed for reproducibility
np.random.seed(42)
# Standard normal (mean=0, std=1)
W1 = np.random.randn(784, 128)
# Xavier/Glorot initialisation (recommended for sigmoid/tanh)
fan_in, fan_out = 784, 128
W2 = np.random.randn(fan_in, fan_out) * np.sqrt(2.0 / (fan_in + fan_out))
# He initialisation (recommended for ReLU)
W3 = np.random.randn(fan_in, fan_out) * np.sqrt(2.0 / fan_in)
# Uniform random
W4 = np.random.uniform(-0.5, 0.5, size=(784, 128))
# Random integers (useful for sampling mini-batches)
indices = np.random.choice(10000, size=64, replace=False) # 64 random indices
print(f"Xavier std: {W2.std():.4f}") # โ 0.0468
print(f"He std: {W3.std():.4f}") # โ 0.0506
Python
Family 6: np.argmax, np.where, np.clip โ Utility Functions
# argmax โ find predicted class
probs = np.array([[0.1, 0.7, 0.2],
[0.8, 0.1, 0.1]])
predictions = np.argmax(probs, axis=1) # [1, 0]
# where โ conditional selection
x = np.array([-2, 3, -1, 5])
result = np.where(x > 0, x, 0) # [0, 3, 0, 5] โ another way to write ReLU!
# clip โ cap values (useful for numerical stability)
y_pred = np.array([0.0, 0.5, 1.0])
y_safe = np.clip(y_pred, 1e-7, 1 - 1e-7) # avoid log(0)
print(y_safe) # [1e-07, 0.5, 0.9999999]
Python
The DL NumPy Cheat Sheet
| Operation | NumPy | DL Use Case |
|---|---|---|
| Matrix multiply | X @ W or np.dot(X, W) | Forward pass |
| Element-wise multiply | A * B | Attention, gating |
| Sum along axis | np.sum(X, axis=0) | Gradient averaging |
| Exponential | np.exp(z) | Softmax, sigmoid |
| Logarithm | np.log(p) | Cross-entropy loss |
| Element-wise max | np.maximum(0, z) | ReLU activation |
| Random normal | np.random.randn(m, n) | Weight initialisation |
| Argmax | np.argmax(probs, axis=1) | Prediction class |
| Clip | np.clip(p, 1e-7, 1-1e-7) | Numerical stability |
| Transpose | W.T | Backpropagation |
3.5 โ Matplotlib Basics: Plots Every DL Practitioner Needs
Visualisation is not optional in deep learning. You must plot your loss curve to know if training is working. You must visualise data distributions to catch preprocessing bugs. Here are the three plots you'll use most.
Plot 1: Loss Curve
import matplotlib.pyplot as plt
import numpy as np
# Simulate training loss (exponential decay + noise)
epochs = np.arange(1, 101)
train_loss = 2.5 * np.exp(-0.03 * epochs) + 0.1 * np.random.randn(100) * np.exp(-0.02 * epochs)
val_loss = 2.5 * np.exp(-0.025 * epochs) + 0.15 * np.random.randn(100) * np.exp(-0.015 * epochs)
plt.figure(figsize=(8, 5))
plt.plot(epochs, train_loss, label='Train Loss', color='#7c3aed', linewidth=2)
plt.plot(epochs, val_loss, label='Val Loss', color='#f59e0b', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Progress โ MNIST Classifier')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('loss_curve.png', dpi=150)
plt.show()
Python
Plot 2: Histogram (Data Distribution)
# Visualise weight initialisation distributions
W_normal = np.random.randn(10000)
W_xavier = np.random.randn(10000) * np.sqrt(2/1000)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].hist(W_normal, bins=50, color='#7c3aed', alpha=0.7, edgecolor='white')
axes[0].set_title('Standard Normal (std=1.0)')
axes[0].set_xlabel('Weight Value')
axes[1].hist(W_xavier, bins=50, color='#10b981', alpha=0.7, edgecolor='white')
axes[1].set_title(f'Xavier (std={W_xavier.std():.3f})')
axes[1].set_xlabel('Weight Value')
plt.tight_layout()
plt.show()
Python
Plot 3: Decision Boundary (2D Classification)
def plot_decision_boundary(X, y, predict_fn, title="Decision Boundary"):
"""Plot 2D decision boundary for a binary classifier."""
h = 0.02 # mesh step size
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
grid_points = np.c_[xx.ravel(), yy.ravel()] # shape: (N, 2)
Z = predict_fn(grid_points).reshape(xx.shape)
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, alpha=0.4, cmap='RdYlBu')
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='RdYlBu',
edgecolors='black', s=30)
plt.title(title)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Usage (with a dummy classifier):
# X = np.random.randn(200, 2)
# y = (X[:, 0] + X[:, 1] > 0).astype(int)
# plot_decision_boundary(X, y, lambda x: (x[:, 0] + x[:, 1] > 0).astype(int))
Python
Plotting Best Practices for DL
- Always label axes โ unlabelled plots are useless in reports
- Use
plt.tight_layout()โ prevents labels from being cut off - Save with
dpi=150โ good balance of quality vs file size - Use
plt.grid(alpha=0.3)โ subtle gridlines aid reading - Consistent colours โ use the same colour for train/val across all plots
3.6 โ Pandas: Loading and Preprocessing Data
Before data reaches a neural network, it typically lives in a CSV, database, or API. Pandas is the bridge between raw data and NumPy arrays. You don't need to master Pandas for deep learning โ you need a survival kit.
Loading and Inspecting a CSV
import pandas as pd
# Load a dataset (e.g., house prices)
df = pd.read_csv('mumbai_house_prices.csv')
# Quick inspection
print(df.shape) # (5000, 8) โ 5000 rows, 8 columns
print(df.head()) # first 5 rows
print(df.dtypes) # column data types
print(df.describe()) # mean, std, min, max per column
print(df.isnull().sum()) # count missing values per column
Python
Basic Preprocessing
# Select features and target
features = ['area_sqft', 'bedrooms', 'floor', 'age_years']
target = 'price_lakhs'
X = df[features].values # Convert to NumPy array โ shape: (5000, 4)
y = df[target].values # shape: (5000,)
# Handle missing values
from numpy import nan
df['age_years'].fillna(df['age_years'].median(), inplace=True)
# Normalise features
X_mean = X.mean(axis=0)
X_std = X.std(axis=0)
X_norm = (X - X_mean) / X_std # Broadcasting!
# Train/test split (80/20)
np.random.seed(42)
indices = np.random.permutation(len(X_norm))
split = int(0.8 * len(X_norm))
X_train, X_test = X_norm[indices[:split]], X_norm[indices[split:]]
y_train, y_test = y[indices[:split]], y[indices[split:]]
print(f"Train: {X_train.shape}, Test: {X_test.shape}")
# Train: (4000, 4), Test: (1000, 4)
Python
One-Hot Encoding Categorical Variables
# Example: encoding city names for neural network input
cities = pd.Series(['Mumbai', 'Delhi', 'Bengaluru', 'Mumbai', 'Delhi'])
# Method 1: Pandas get_dummies
one_hot = pd.get_dummies(cities, prefix='city')
print(one_hot)
# city_Bengaluru city_Delhi city_Mumbai
# 0 0 0 1
# 1 0 1 0
# 2 1 0 0
# 3 0 0 1
# 4 0 1 0
# Convert to NumPy
X_cities = one_hot.values # shape: (5, 3)
Python
Indian Datasets to Practice With
Kaggle hosts several excellent Indian datasets: IPL ball-by-ball data, Zomato restaurant reviews, Indian census data, NIFTY stock prices, Swiggy delivery times. Search "India" on Kaggle and sort by most votes. These datasets are perfect for building your Pandas and NumPy muscles.
3.7 โ Google Colab: Your Free GPU Playground
Google Colab is a free Jupyter notebook environment with GPU access. For this textbook, Colab is the recommended environment โ no installation required, works on any laptop (even a โน25,000 budget laptop), and provides a T4 GPU that's sufficient for all exercises.
Getting Started
colab.research.google.comVerify GPU Access
# Run this in a Colab cell
import torch
print(f"GPU available: {torch.cuda.is_available()}")
print(f"GPU name: {torch.cuda.get_device_name(0)}")
# Check NumPy version
import numpy as np
print(f"NumPy: {np.__version__}")
Python
Uploading Files
# Method 1: Upload from local machine
from google.colab import files
uploaded = files.upload() # Opens file picker dialog
# Method 2: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Now access files like:
df = pd.read_csv('/content/drive/MyDrive/datasets/ipl_data.csv')
# Method 3: Download from URL
!wget https://example.com/dataset.csv -O /content/dataset.csv
Python
Installing Additional Libraries
# Colab pre-installs most ML libraries, but you can add more:
!pip install -q wandb # experiment tracking
!pip install -q torchsummary # model architecture viewer
Python
Colab Sessions Expire!
Free Colab sessions disconnect after ~90 minutes of inactivity and have a maximum runtime of ~12 hours. Always save your notebook to Drive (File โ Save a copy in Drive). For large experiments, save checkpoints to Drive periodically:
# Save model checkpoint to Drive
np.save('/content/drive/MyDrive/checkpoints/weights.npy', W)
Python
Colab Pro vs Free
The free tier is sufficient for all exercises in this textbook. Colab Pro (โน850/month) gives you faster GPUs (A100), longer runtimes, and more RAM. Consider upgrading only if you're training large models for your final project or internship work.
Critical: Why Vectorization Matters
Let's make this concrete with the most important function in deep learning: sigmoid, computed over 1 million samples.
The Definitive Benchmark
import numpy as np
import time
n = 1_000_000
z = np.random.randn(n)
# โโโโโ METHOD 1: Python for-loop โโโโโ
start = time.time()
result_loop = np.zeros(n)
for i in range(n):
result_loop[i] = 1.0 / (1.0 + np.exp(-z[i]))
loop_time = time.time() - start
# โโโโโ METHOD 2: NumPy vectorised โโโโโ
start = time.time()
result_vec = 1.0 / (1.0 + np.exp(-z))
vec_time = time.time() - start
# Verify both give same result
print(f"Max difference: {np.max(np.abs(result_loop - result_vec)):.2e}")
print(f"For-loop: {loop_time*1000:8.1f} ms")
print(f"Vectorised: {vec_time*1000:8.2f} ms")
print(f"Speedup: {loop_time/vec_time:8.0f}ร")
Python
What This Means for Training
A single epoch of training on MNIST with a 2-layer network involves computing sigmoid ~1 million times. With for-loops: ~3 seconds per epoch ร 100 epochs = 5 minutes. With vectorisation: ~0.006 seconds per epoch ร 100 epochs = 0.6 seconds. Over the course of this textbook, you'll run thousands of training experiments. Vectorisation literally saves you days.
Common Vectorisation Patterns
# โ SLOW: for-loop # โ
FAST: vectorised
# ----------------------- # -----------------------
# for i in range(m): # Z = X @ W + b
# z = 0 #
# for j in range(n): # A = sigmoid(Z)
# z += X[i,j] * W[j] #
# z += b # L = -np.mean(y*np.log(A)
# A[i] = sigmoid(z) # + (1-y)*np.log(1-A))
# L += -(y[i]*log(A[i]) #
# + (1-y[i])*log(1-A[i])) # dW = X.T @ (A - y) / m
# L /= m #
# The vectorised version is also CLEARER and SHORTER!
Python
The Golden Rule
"Whenever you're tempted to write a for-loop over array indices, stop and think: can this be a matrix operation?" In 95% of cases, the answer is yes. The remaining 5% usually involves control flow (if/else per sample) โ and even those can often be replaced with np.where().
Worked Example: End-to-End NumPy Data Pipeline
Problem: Predict Zomato Delivery Time
Given a dataset of Zomato orders with columns [distance_km, restaurant_rating, num_items, time_of_day_hour, delivery_time_min], build a complete data preprocessing pipeline using only NumPy and Pandas.
Step 1: Load and inspect
import pandas as pd
import numpy as np
# Simulate Zomato delivery data
np.random.seed(42)
n = 5000
distance = np.random.exponential(3, n) + 0.5
rating = np.clip(np.random.normal(3.8, 0.5, n), 1, 5)
num_items = np.random.randint(1, 8, n)
time_of_day = np.random.randint(8, 24, n)
delivery_time = 10 + 3*distance - 2*rating + 1.5*num_items + np.random.randn(n)*5
df = pd.DataFrame({
'distance_km': np.round(distance, 1),
'rating': np.round(rating, 1),
'num_items': num_items,
'time_of_day': time_of_day,
'delivery_min': np.round(delivery_time, 1)
})
print(df.head())
print(f"\nShape: {df.shape}")
print(f"Any nulls: {df.isnull().any().any()}")
Python
Step 2: Extract features and normalise
# Extract NumPy arrays
feature_cols = ['distance_km', 'rating', 'num_items', 'time_of_day']
X = df[feature_cols].values.astype(np.float64) # (5000, 4)
y = df['delivery_min'].values # (5000,)
# Z-score normalisation
mu = X.mean(axis=0) # shape: (4,)
sigma = X.std(axis=0) # shape: (4,)
X_norm = (X - mu) / sigma # Broadcasting: (5000,4) - (4,) / (4,)
print(f"Means after norm: {X_norm.mean(axis=0).round(6)}")
print(f"Stds after norm: {X_norm.std(axis=0).round(6)}")
Python
Step 3: Train/test split
np.random.seed(42)
indices = np.random.permutation(n)
split = int(0.8 * n)
X_train, X_test = X_norm[indices[:split]], X_norm[indices[split:]]
y_train, y_test = y[indices[:split]], y[indices[split:]]
print(f"Train: {X_train.shape}, Test: {X_test.shape}")
# Train: (4000, 4), Test: (1000, 4)
Python
Step 4: Add bias column and compute closed-form solution (preview of linear regression)
# Add column of ones for bias term
X_train_b = np.c_[np.ones(X_train.shape[0]), X_train] # (4000, 5)
X_test_b = np.c_[np.ones(X_test.shape[0]), X_test] # (1000, 5)
# Normal equation: w = (X^T X)^{-1} X^T y
w = np.linalg.inv(X_train_b.T @ X_train_b) @ X_train_b.T @ y_train
print(f"Weights: {w.round(3)}")
# [bias, dist_coeff, rating_coeff, items_coeff, time_coeff]
# Predict and evaluate
y_pred = X_test_b @ w
rmse = np.sqrt(np.mean((y_test - y_pred) ** 2))
print(f"Test RMSE: {rmse:.2f} minutes")
Python
Notice: the entire pipeline โ from data loading to prediction โ used zero for-loops. Every operation was vectorised.
Case Study & Mini-Project: IPL Cricket Analytics
IPL Ball-by-Ball Data Analysis
The Indian Premier League (IPL) generates one of the richest sports datasets in the world โ every ball bowled, every run scored, every wicket taken across 15+ seasons. Let's use NumPy, Pandas, and matplotlib to extract actionable insights.
Setup: Load the Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Download from Kaggle: "IPL Complete Dataset (2008-2024)"
# Or use: kaggle datasets download -d patrickb1912/ipl-complete-dataset-20082020
deliveries = pd.read_csv('deliveries.csv')
print(deliveries.shape)
print(deliveries.columns.tolist())
print(deliveries.head())
Python
Task 1: Compute Run Rate per Over
# Group by match_id and over, sum runs
over_runs = deliveries.groupby(['match_id', 'over'])['total_runs'].sum()
# Average runs per over across all matches
avg_runs_per_over = over_runs.groupby('over').mean()
# Plot
plt.figure(figsize=(10, 5))
plt.bar(avg_runs_per_over.index, avg_runs_per_over.values,
color='#7c3aed', edgecolor='white')
plt.xlabel('Over Number')
plt.ylabel('Average Runs')
plt.title('IPL: Average Runs per Over (All Seasons)')
plt.xticks(range(1, 21))
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Python
Task 2: Strike Rate Calculation Using NumPy
# Strike rate = (total runs / total balls faced) ร 100
batsman_stats = deliveries.groupby('batsman').agg(
total_runs=('batsman_runs', 'sum'),
balls_faced=('batsman_runs', 'count')
).reset_index()
# Filter: minimum 500 balls faced
qualified = batsman_stats[batsman_stats['balls_faced'] >= 500]
# Compute strike rate using NumPy (vectorised!)
runs = qualified['total_runs'].values # shape: (N,)
balls = qualified['balls_faced'].values # shape: (N,)
strike_rate = (runs / balls) * 100 # Broadcasting: scalar ร array
# Top 10 strike rates
top_idx = np.argsort(strike_rate)[-10:][::-1]
print("Top 10 IPL Strike Rates (min 500 balls):")
for i in top_idx:
name = qualified.iloc[i]['batsman']
print(f" {name:25s} SR: {strike_rate[i]:.1f} Runs: {runs[i]}")
Python
Task 3: Score Progression Plot
# Pick a specific match and plot cumulative score
match_id = deliveries['match_id'].unique()[42] # arbitrary match
match = deliveries[deliveries['match_id'] == match_id]
teams = match['batting_team'].unique()
plt.figure(figsize=(10, 5))
for team in teams:
innings = match[match['batting_team'] == team]
cumulative = np.cumsum(innings['total_runs'].values)
balls = np.arange(1, len(cumulative) + 1)
plt.plot(balls, cumulative, linewidth=2, label=team)
plt.xlabel('Ball Number')
plt.ylabel('Cumulative Score')
plt.title(f'IPL Match #{match_id} โ Score Progression')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Python
Task 4: Powerplay vs Death Overs Analysis
# Powerplay: overs 1-6, Middle: 7-15, Death: 16-20
def phase(over):
if over <= 6: return 'Powerplay'
elif over <= 15: return 'Middle'
else: return 'Death'
deliveries['phase'] = deliveries['over'].apply(phase)
phase_stats = deliveries.groupby('phase').agg(
avg_runs_per_ball=('total_runs', 'mean'),
total_wickets=('is_wicket', 'sum')
)
print(phase_stats.round(3))
# avg_runs_per_ball total_wickets
# Death 1.356 12845
# Middle 1.121 18923
# Powerplay 1.297 11567
Python
Mini-Project Extension Ideas
- Build a heatmap: which bowler is most dangerous to which batsman?
- Compute win probability at each ball using historical data (logistic regression preview!)
- Predict total match score from powerplay score (linear regression preview!)
- Find "clutch" players: who performs best in high-pressure death overs?
Common NumPy Bugs & Mistakes
Bug 1: Shape Mismatch in Matrix Multiplication
X = np.random.randn(64, 784)
W = np.random.randn(64, 128) # WRONG shape!
# X @ W โ ValueError: shapes (64,784) and (64,128) not aligned
# FIX: W should be (784, 128), not (64, 128)
W = np.random.randn(784, 128) # CORRECT
Z = X @ W # (64, 784) @ (784, 128) = (64, 128) โ
Python
Rule: For A @ B, the inner dimensions must match: (m, n) @ (n, k) โ (m, k).
Bug 2: Axis Confusion
X = np.random.randn(100, 5) # 100 samples, 5 features
# WRONG: normalise across features (axis=1)
mean_wrong = X.mean(axis=1) # shape: (100,) โ averaged features per sample
# RIGHT: normalise each feature independently (axis=0)
mean_right = X.mean(axis=0) # shape: (5,) โ averaged samples per feature
X_norm = (X - mean_right) / X.std(axis=0)
Python
Rule: axis=0 collapses rows (operates "downward"), axis=1 collapses columns (operates "rightward"). Think: "axis=0 gives you column-wise results".
Bug 3: In-Place Operations vs Copies
a = np.array([1, 2, 3])
b = a # b is a VIEW, not a copy!
b[0] = 99
print(a) # [99 2 3] โ a was modified too!
# FIX: use .copy()
a = np.array([1, 2, 3])
b = a.copy() # b is an independent copy
b[0] = 99
print(a) # [1 2 3] โ a unchanged โ
Python
Rule: Assignment (=) and slicing create views. Use .copy() when you need an independent copy. This also applies to slices: b = a[0:3] is a view.
Bug 4: Broadcasting Silent Errors
# Subtle bug: adding a (3,) vector to a (3, 4) matrix
W = np.random.randn(3, 4)
b = np.array([1, 2, 3]) # shape: (3,)
result = W + b # WRONG: broadcasts b as a ROW (1, 3) โ error!
# Actually this raises: ValueError (shapes don't broadcast)
# What you probably meant:
b_col = b.reshape(-1, 1) # shape: (3, 1)
result = W + b_col # โ adds b to each column
# OR, if b is per-feature (per column):
b_row = np.array([1, 2, 3, 4]) # shape: (4,)
result = W + b_row # โ broadcasts as (1, 4) + (3, 4)
Python
Rule: Always explicitly check shapes before broadcasting. When in doubt, use .reshape() to make the intent clear.
Bug 5: Integer Division Gotcha
# NumPy preserves dtype โ integer arrays stay integer!
a = np.array([1, 2, 3])
print(a / 2) # [0.5 1. 1.5] โ Python 3 float division โ
print(a // 2) # [0 1 1] โ integer division
# But watch out with typed arrays:
a = np.array([1, 2, 3], dtype=np.int32)
a = a / 2 # This creates a NEW float64 array
print(a.dtype) # float64 โ OK
# DANGER: in-place operations preserve dtype
a = np.array([1, 2, 3], dtype=np.int32)
a /= 2 # TypeError! Can't cast float to int in-place
Python
Misconceptions Busted
| โ Misconception | โ Reality |
|---|---|
| "NumPy is just a fancier Python list" | NumPy arrays are a completely different data structure โ contiguous C-level memory, typed, with compiled BLAS/LAPACK backends. They're closer to C arrays than Python lists. |
| "Broadcasting copies data" | Broadcasting is a zero-copy operation. NumPy adjusts strides (internal metadata) to "virtually expand" the smaller array. No memory allocation occurs. |
| "I need to learn all of NumPy before starting DL" | You need about 20 functions for 95% of deep learning work. This chapter covers them all. The rest you'll pick up as needed. |
| "Pandas is required for deep learning" | Pandas is for data loading and exploration. Once data is in NumPy arrays (or PyTorch tensors), Pandas is not involved in training. Think of Pandas as the "loading dock" and NumPy as the "factory floor". |
"np.random.seed(42) makes results perfectly reproducible" |
It makes NumPy's randomness reproducible. For full reproducibility in DL, you also need torch.manual_seed(), random.seed(), and CUDA determinism flags. We'll cover this in Chapter 6. |
| "Google Colab is too slow for real deep learning" | Colab's free T4 GPU has 16 GB VRAM and ~65 TFLOPS for FP16. It can train ResNet-50 on ImageNet in ~40 hours. For learning, it's more than enough โ professionals at Indian startups often prototype on Colab before deploying on cloud GPUs. |
Exercises
Section A: Multiple Choice Questions
- What is the shape of the result of
np.dot(A, B)where A has shape (64, 784) and B has shape (784, 128)?
(a) (64, 784) (b) (784, 128) (c) (64, 128) (d) (784, 784)
Answer: (c) โ Matrix multiply (m,n)@(n,k) โ (m,k) - Which of the following creates a column vector from a 1D array
aof shape (5,)?
(a)a.T(b)a.reshape(1, -1)(c)a.flatten()(d)a.reshape(-1, 1)
Answer: (d) โ reshape(-1, 1) gives shape (5, 1). Note: .T on a 1D array does nothing! - What does
np.sum(X, axis=0)do for X of shape (100, 5)?
(a) Sums all elements into a scalar (b) Sums each row โ shape (100,) (c) Sums each column โ shape (5,) (d) Raises an error
Answer: (c) โ axis=0 collapses rows, giving one sum per column - What is the output of
np.maximum(0, np.array([-3, -1, 0, 2, 5]))?
(a) 5 (b) [-3, -1, 0, 2, 5] (c) [0, 0, 0, 2, 5] (d) [0, 0, 0, 0, 0]
Answer: (c) โ np.maximum is element-wise max, implementing ReLU - Broadcasting: What is the shape of
A + Bwhere A has shape (3, 1) and B has shape (1, 4)?
(a) (3, 1) (b) (1, 4) (c) Error (d) (3, 4)
Answer: (d) โ (3,1) and (1,4) broadcast to (3,4) - Why is
1e-8added insidenp.log()in cross-entropy loss?
(a) To speed up computation (b) To prevent log(0) = โโ (c) For better accuracy (d) It's a learning rate
Answer: (b) โ log(0) is undefined; adding epsilon avoids NaN - Which initialisation is recommended for layers using ReLU activation?
(a) All zeros (b) Xavier/Glorot (c) He initialisation (d) Uniform [0, 1]
Answer: (c) โ He init uses โ(2/fan_in), designed for ReLU's half-dead property b = awhereais a NumPy array. Modifyingb[0]will:
(a) Only modify b (b) Modify both a and b (c) Raise an error (d) Create a copy
Answer: (b) โ Assignment creates a view, not a copy. Use a.copy() for independence.- What is the purpose of
keepdims=Trueinnp.sum(X, axis=1, keepdims=True)?
(a) Faster computation (b) Prevents data loss (c) Preserves the reduced dimension as size 1 for broadcasting (d) No practical effect
Answer: (c) โ Without keepdims, shape drops from (m,n) to (m,). With keepdims, it becomes (m,1), enabling correct broadcasting. - A for-loop computing dot product of two 1M-element vectors takes ~300ms. The NumPy
np.dotequivalent takes ~1ms. What explains the ~300ร speedup?
(a) NumPy uses a faster algorithm (b) NumPy runs compiled C code with SIMD, avoids Python object overhead (c) NumPy uses GPU (d) Python for-loops have a bug
Answer: (b) โ NumPy's C backend uses contiguous memory, SIMD vectorisation, and avoids per-element Python type checking
Section B: Short Answer Questions
- Explain the three broadcasting rules in your own words. Give one DL example for each rule.
- Why does
.T(transpose) have no effect on a 1D NumPy array of shape(5,)? How would you create a proper column vector? - Compare
np.maximum(A, B)vsnp.max(A). When would you use each in deep learning? - Explain why normalising features (zero mean, unit variance) before training a neural network is important. Write the vectorised NumPy code for normalisation.
- Describe the difference between Xavier and He weight initialisation. Which activation function is each designed for, and why?
Section C: Long Answer Questions
- Derive the normal equation for linear regression: w = (XTX)โ1XTy. Then implement it in NumPy on a synthetic dataset of 1000 Bengaluru house prices with 3 features (area, bedrooms, floor). Report the RMSE and plot predicted vs actual prices.
- Write a complete softmax function in NumPy that is numerically stable (handles large logits without overflow). Explain each line. Then verify that your function: (a) outputs values between 0 and 1, (b) outputs sum to 1 per row, (c) handles logits of magnitude 1000 without NaN.
Section D: Programming Exercises
D1: Vectorised Sigmoid & Its Derivative
Implement sigmoid(z) and sigmoid_derivative(z) using only NumPy (no loops). Verify that sigmoid_derivative(z) equals sigmoid(z) * (1 - sigmoid(z)) for z in [-5, 5]. Plot both functions on the same graph.
D2: Mini-Batch Generator
Write a function get_mini_batches(X, y, batch_size=64, shuffle=True) that:
- Shuffles the data if
shuffle=True - Yields tuples of
(X_batch, y_batch)of the specified batch size - Handles the last batch (which may be smaller than
batch_size) - Uses NumPy indexing (no Python list slicing)
Test it on the Zomato delivery dataset from the worked example.
D3: IPL Dataset Exploratory Data Analysis
Using the IPL ball-by-ball dataset:
- Compute the economy rate (runs conceded per over) for the top 20 bowlers by total overs bowled
- Create a bar chart showing the top 10 highest individual scores in IPL history
- Compute and plot the win percentage by batting first vs chasing for each venue
- Build a NumPy-only function that takes a match_id and returns a Manhattan plot (runs per over for each innings, side by side)
Section E: Mini-Project
๐ IPL Score Predictor (Data Pipeline)
Build a complete data pipeline for predicting first-innings total score from powerplay data:
- Data extraction: From the ball-by-ball CSV, compute per-match features: powerplay score, powerplay wickets, run rate in overs 1-3, run rate in overs 4-6, number of boundaries in powerplay
- Target: Total first-innings score
- Preprocessing: Normalise features, handle missing matches, train/test split (80/20)
- Baseline model: Use the normal equation (from the worked example) to fit a linear regression. Report RMSE in runs.
- Visualisation: Plot predicted vs actual scores, residual distribution, and feature correlations
Deliverable: A single Colab notebook with all code, plots, and a 200-word analysis of which powerplay features most strongly predict final score. Save the notebook as ipl_score_predictor.ipynb.
Chapter Summary
Key Takeaways
- NumPy arrays are the foundation of all numerical computing in Python โ contiguous, typed, and 50โ200ร faster than Python lists for numerical operations.
- Broadcasting allows operations between arrays of different shapes by virtually stretching the smaller array. Three rules: compare trailing dimensions; dimensions are compatible if equal or one is 1.
- Vectorisation replaces Python for-loops with C-level NumPy operations. Computing sigmoid over 1M values: for-loop = 2.8s, vectorised = 5.6ms (500ร speedup).
- Six function families power all of deep learning:
np.dot(forward pass),np.sum(reductions),np.exp/np.log(activations/losses),np.maximum(ReLU),np.random(initialisation), and utilities (np.argmax,np.where,np.clip). - Matplotlib provides three essential plots: loss curves, histograms, and decision boundaries.
- Pandas is the bridge from raw CSV data to clean NumPy arrays. You need:
read_csv,.head(),.describe(),.values,get_dummies(),fillna(). - Google Colab provides free GPU access โ sufficient for all exercises in this textbook. Enable T4 GPU via Runtime โ Change runtime type.
- Common bugs: shape mismatches, axis confusion, view vs copy, and silent broadcasting errors. Always
print(x.shape).
Formula Quick Reference
| Operation | Formula / Code | DL Usage |
|---|---|---|
| Matrix Multiply | Z = X @ W + b | Forward pass (every layer) |
| Sigmoid | ฯ(z) = 1 / (1 + eโz) | Binary classification output |
| ReLU | np.maximum(0, z) | Hidden layer activation |
| Softmax | ezแตข / ฮฃezโฑผ | Multi-class output |
| Cross-Entropy | โ(1/m)ฮฃ[y log ลท + (1โy)log(1โลท)] | Classification loss |
| Z-score Normalisation | (X - ฮผ) / ฯ | Feature preprocessing |
| Xavier Init | W ~ N(0, 2/(n_in + n_out)) | Sigmoid/Tanh layers |
| He Init | W ~ N(0, 2/n_in) | ReLU layers |
What's Next?
In Chapter 4: The Perceptron & Single Neuron, we'll put these tools to work. You'll implement a single neuron that computes z = X @ w + b, applies sigmoid, and learns by gradient descent โ all using the vectorised NumPy operations you mastered in this chapter. Every function you learned here โ np.dot, np.exp, np.log, np.maximum โ will be used in that implementation.
References & Further Reading
Official Documentation
- NumPy Documentation (2024). NumPy User Guide. numpy.org/doc
- Matplotlib Documentation (2024). Tutorials. matplotlib.org
- Pandas Documentation (2024). Getting Started. pandas.pydata.org
- Google Colab (2024). Welcome to Colab. colab.research.google.com
Textbooks
- VanderPlas, J. (2016). Python Data Science Handbook, Chapter 2 (NumPy). O'Reilly. Available free at jakevdp.github.io
- McKinney, W. (2022). Python for Data Analysis, 3rd Edition. O'Reilly.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning, Chapter 2 (Linear Algebra). MIT Press. deeplearningbook.org
Video Resources
- Ng, A. (2017). Vectorization (Coursera Deep Learning Specialization, Course 1, Week 2). Clear explanation of why vectorisation matters for neural networks.
- Corey Schafer โ NumPy Tutorial (YouTube). Excellent 1-hour tutorial covering all basics.
- sentdex โ Matplotlib Tutorial Series (YouTube). Comprehensive plotting guide.
Indian Context
- NPTEL โ Python for Data Science by IIT Madras. Free video lectures with certification.
- NPTEL โ Deep Learning by Prof. Mitesh Khapra, IIT Madras. NumPy-based implementations in Weeks 1-4.
- IPL Complete Dataset โ Kaggle. kaggle.com (IPL)
- Analytics Vidhya โ NumPy Tutorial for Beginners. India-focused data science blog with practical examples.