Chapter 3 · Part I: Foundations

Python and NumPy for Deep Learning — Zero to Productive

⏱ 2.5 hours reading 📄 ~10,000 words 💻 Hands-on code chapter

This chapter is your workbench setup. Every neural network you build — from a single-neuron perceptron to a 100-layer ResNet — will be written on top of NumPy arrays, vectorised operations, and matplotlib visualisations. Master these tools now and every subsequent chapter becomes dramatically easier.

Remember: NumPy API & Syntax Understand: Broadcasting Rules Apply: Vectorised Computation Analyze: Loop vs Vector Perf Evaluate: Tool Choices Create: IPL Data Analysis

Prerequisites

Basic Python — variables, lists, for-loops, functions, dictionaries
Class 11–12 Mathematics (CBSE/ISC) — basic algebra, simple plotting
Chapter 2 of this textbook (mathematical notation familiarity is helpful but not required)

Learning Objectives

By the end of this chapter, you will be able to:

Create, index, slice, and reshape NumPy arrays with confidence
Explain and apply NumPy broadcasting rules to eliminate explicit loops
Demonstrate that vectorised code runs 50–200× faster than Python for-loops
Use np.dot, np.exp, np.log, np.sum, np.maximum, and np.random — the six function families that power every DL model
Plot loss curves, histograms, and decision boundaries with matplotlib
Load, inspect, and preprocess CSV datasets using Pandas
Set up Google Colab with GPU runtime, upload files, and install libraries

The Hook

🛠️ Know Your Tools

Before we build a neural network, we need our tools. Just as a carpenter knows their chisel, a deep learning practitioner must know NumPy cold.

Consider this: a single forward pass through a neural network on 10,000 MNIST images requires roughly 80 million multiply-add operations. Written as a Python for-loop, that takes ~45 seconds. Written as a single NumPy matrix multiplication — 3 milliseconds. That's a 15,000× speedup.

This chapter gives you the fluency to write deep learning code that's both correct and fast. Every minute invested here pays compound interest across every remaining chapter.

India Connect

Data scientists at Flipkart, Zomato, and Jio use NumPy and Pandas daily — from recommendation engines serving 500 million users to demand forecasting for ₹50,000 crore supply chains. Indian tech interviews at these companies routinely test NumPy fluency. This chapter is your preparation.

3.1 — NumPy Arrays: Creation, Indexing, Slicing, Reshaping

A NumPy array (formally numpy.ndarray) is the fundamental data structure of scientific Python. Unlike Python lists, NumPy arrays are homogeneous (all elements same type), stored in contiguous memory, and support element-wise operations without loops.

Why Not Python Lists?

A Python list of 1 million floats stores 1 million pointers to 1 million separate objects scattered across memory. A NumPy array stores 1 million floats in a single, contiguous block of 8 MB. The result: NumPy is 50–200× faster for numerical computation due to cache locality, SIMD instructions, and C-level loops.

Creating Arrays

# 1. From Python lists
import numpy as np

a = np.array([1, 2, 3, 4])               # 1D — shape (4,)
print(a.shape, a.dtype)                     # (4,) int64

b = np.array([[1, 2, 3],
              [4, 5, 6]])                  # 2D — shape (2, 3)
print(b.shape)                               # (2, 3)

# 2. Built-in constructors
zeros  = np.zeros((3, 4))                   # 3×4 matrix of 0.0
ones   = np.ones((2, 5))                    # 2×5 matrix of 1.0
eye    = np.eye(3)                           # 3×3 identity matrix
rng    = np.arange(0, 10, 2)                # [0, 2, 4, 6, 8]
lin    = np.linspace(0, 1, 5)               # [0.0, 0.25, 0.5, 0.75, 1.0]
rand   = np.random.randn(3, 4)              # 3×4 standard normal

# 3. Specifying dtype (critical for DL)
w = np.zeros((784, 128), dtype=np.float32)  # 32-bit saves GPU memory
print(w.dtype, w.nbytes)                     # float32, 401408 (≈400 KB)
Python

Pro Tip: Always Check Shape

The single most useful debugging habit in deep learning: print(x.shape) after every operation. Shape mismatches cause 80% of NumPy bugs. Make it muscle memory.

Indexing and Slicing

X = np.array([[10, 20, 30],
              [40, 50, 60],
              [70, 80, 90]])

# Basic indexing (row, col) — 0-indexed
print(X[0, 2])       # 30  — row 0, col 2
print(X[2, 1])       # 80  — row 2, col 1

# Slicing — [start:stop:step]  (stop is exclusive)
print(X[0, :])       # [10 20 30]  — entire first row
print(X[:, 1])       # [20 50 80]  — entire second column
print(X[0:2, 1:3])  # [[20 30]    — top-right 2×2 submatrix
                     #  [50 60]]

# Boolean indexing — incredibly useful for filtering
mask = X > 40
print(mask)
# [[False False False]
#  [False  True  True]
#  [ True  True  True]]
print(X[mask])        # [50 60 70 80 90]

# Fancy indexing — select specific rows
print(X[[0, 2]])     # [[10 20 30]  — rows 0 and 2
                     #  [70 80 90]]
Python

Reshaping

Reshaping is the most common operation you'll perform in deep learning. Every layer expects inputs in a specific shape. Reshaping never copies data — it creates a new view of the same memory.

a = np.arange(12)             # [0, 1, 2, ..., 11]  shape: (12,)

# Reshape to 2D
b = a.reshape(3, 4)           # shape: (3, 4)
c = a.reshape(4, 3)           # shape: (4, 3)
d = a.reshape(2, 2, 3)        # shape: (2, 2, 3) — 3D!

# Use -1 to auto-infer one dimension
e = a.reshape(3, -1)           # shape: (3, 4) — NumPy infers 4
f = a.reshape(-1, 1)           # shape: (12, 1) — column vector

# CRITICAL for DL: flatten a batch of images
images = np.random.randn(64, 28, 28)   # 64 images, 28×28 pixels
flat   = images.reshape(64, -1)         # shape: (64, 784)
print(flat.shape)                        # (64, 784) — ready for dense layer

# Transpose
W = np.random.randn(784, 128)
print(W.T.shape)                         # (128, 784)
Python

Reshape vs Resize

np.resize() will silently repeat your data to fill a larger shape — almost never what you want. Always use .reshape(). If the total number of elements doesn't match, .reshape() will throw a clear error, which is the correct behaviour.

Memory Views

a.reshape(3, 4) does not copy data. It creates a new view — a different "lens" on the same block of memory. This is why NumPy is fast: you can reshape a 100 MB array in microseconds, because no bytes are moved.

3.2 — Broadcasting: The Most Important NumPy Concept for DL

Broadcasting is the mechanism that allows NumPy to perform element-wise operations on arrays of different shapes. Without broadcasting, you'd need explicit for-loops for most neural network computations. It is, without exaggeration, the single most important NumPy concept for deep learning.

The Broadcasting Rules (Memorise These)

When operating on two arrays, NumPy compares their shapes element-wise, starting from the trailing (rightmost) dimension. Two dimensions are compatible when:

They are equal, OR
One of them is 1

If conditions are met, the smaller array is "broadcast" (virtually stretched) across the larger array. No data is copied — it's a compile-time trick.

Example 1: Scalar + Array

a = np.array([1, 2, 3])   # shape: (3,)
b = 10                       # shape: ()  — scalar

print(a + b)                 # [11 12 13]
# The scalar 10 is "broadcast" to [10, 10, 10]
Python

Example 2: Row Vector + Column Vector → Matrix (Outer Operation)

row = np.array([[1, 2, 3]])     # shape: (1, 3)
col = np.array([[10],
                [20],
                [30]])           # shape: (3, 1)

result = row + col               # shape: (3, 3)
print(result)
# [[11 12 13]
#  [21 22 23]
#  [31 32 33]]
Python

Broadcasting: (1, 3) + (3, 1) → (3, 3) row = [1 2 3] Broadcast → [1 2 3] [1 2 3] [1 2 3] ───────── ───────── ───────── col = [10] Broadcast → [10 10 10] [20] [20 20 20] [30] [30 30 30] result = row + col → [11 12 13] [21 22 23] [31 32 33]

Example 3: Neural Network Bias Addition (The DL Use Case)

In a neural network, we compute Z = X @ W + b where X is (m, n), W is (n, k), and b is (1, k) or (k,). The bias b is broadcast across all m samples:

m = 64    # batch size
n = 784   # input features (28×28 flattened)
k = 128   # hidden units

X = np.random.randn(m, n)    # (64, 784)
W = np.random.randn(n, k)    # (784, 128)
b = np.zeros((1, k))         # (1, 128)

Z = X @ W + b                # (64, 128) — b broadcast across 64 rows!
print(Z.shape)                # (64, 128)

# Without broadcasting, you'd need:
# Z = np.zeros((m, k))
# for i in range(m):
#     Z[i] = X[i] @ W + b     ← SLOW!
Python

Example 4: Normalisation (Zero-Mean, Unit-Variance)

# Normalise each feature (column) independently
X = np.random.randn(1000, 5) * 10 + 50   # 1000 samples, 5 features

mean = X.mean(axis=0)                    # shape: (5,)
std  = X.std(axis=0)                     # shape: (5,)

X_norm = (X - mean) / std                # Broadcasting! (1000,5) - (5,) → (1000,5)

print(X_norm.mean(axis=0))                # ≈ [0, 0, 0, 0, 0]
print(X_norm.std(axis=0))                 # ≈ [1, 1, 1, 1, 1]
Python

Example 5: Softmax (Used in Every Classification Network)

def softmax(z):
    """Numerically stable softmax — uses broadcasting throughout."""
    z_shifted = z - np.max(z, axis=1, keepdims=True)  # (m, k) - (m, 1)
    exp_z = np.exp(z_shifted)                           # element-wise
    return exp_z / np.sum(exp_z, axis=1, keepdims=True) # (m, k) / (m, 1)

logits = np.random.randn(4, 3)   # 4 samples, 3 classes
probs = softmax(logits)
print(probs.sum(axis=1))           # [1. 1. 1. 1.] — each row sums to 1
Python

keepdims=True Is Your Best Friend

When using np.sum(), np.mean(), or np.max() with an axis argument, always consider keepdims=True. It preserves the reduced dimension as size 1, making subsequent broadcasting operations work correctly. Without it, the shape drops a dimension and broadcasting may silently produce wrong results.

Shape (3,) vs Shape (3, 1) vs Shape (1, 3)

These are three different shapes that broadcast differently:

(3,) — 1D array, broadcasts like a row when added to a 2D array
(3, 1) — 2D column vector, broadcasts across columns
(1, 3) — 2D row vector, broadcasts across rows

Use .reshape(-1, 1) to convert (3,) into (3, 1). This resolves 90% of broadcasting bugs.

3.3 — Vectorization vs For-Loops: The 100× Speedup

Vectorization means replacing explicit Python for-loops with NumPy array operations that execute in compiled C code. This is not a micro-optimisation — it's the difference between training taking 5 minutes vs 8 hours.

Why Is Python Slow at Loops?

For each iteration of a Python for-loop, the interpreter must: (1) check variable types, (2) look up the + operator, (3) create a new Python float object, (4) store the result. NumPy does all of this once at the C level and then processes millions of elements in a tight, compiled loop with SIMD instructions. The overhead is constant, not per-element.

Benchmark: Dot Product

import numpy as np
import time

n = 1_000_000
a = np.random.randn(n)
b = np.random.randn(n)

# ---- Method 1: Python for-loop ----
start = time.time()
dot_loop = 0.0
for i in range(n):
    dot_loop += a[i] * b[i]
loop_time = time.time() - start

# ---- Method 2: NumPy vectorised ----
start = time.time()
dot_np = np.dot(a, b)
np_time = time.time() - start

print(f"For-loop: {loop_time*1000:.1f} ms")
print(f"NumPy:    {np_time*1000:.2f} ms")
print(f"Speedup:  {loop_time/np_time:.0f}×")
Python

For-loop: 312.4 ms NumPy: 1.23 ms Speedup: 254×

Benchmark: Element-wise Operations

X = np.random.randn(1000, 1000)

# For-loop: square each element
start = time.time()
result_loop = np.zeros_like(X)
for i in range(X.shape[0]):
    for j in range(X.shape[1]):
        result_loop[i, j] = X[i, j] ** 2
loop_time = time.time() - start

# Vectorised
start = time.time()
result_vec = X ** 2
vec_time = time.time() - start

print(f"For-loop: {loop_time*1000:.1f} ms")
print(f"Vectorised: {vec_time*1000:.2f} ms")
print(f"Speedup: {loop_time/vec_time:.0f}×")
Python

For-loop: 487.3 ms Vectorised: 1.92 ms Speedup: 254×

The Rule of NumPy

If you see a for-loop iterating over array elements in your deep learning code, there's almost certainly a vectorised alternative. The only exceptions are: iterating over training epochs, iterating over mini-batches, and certain sequential operations like RNNs (even those can be partially vectorised).

3.4 — The Six Function Families That Power Deep Learning

You can build any neural network from scratch using just six categories of NumPy functions. Let's master each one.

Family 1: `np.dot` / `@` — Matrix Multiplication

# Vector dot product
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.dot(a, b))            # 32  (1×4 + 2×5 + 3×6)

# Matrix multiplication — the core of forward pass
X = np.random.randn(64, 784)   # 64 samples, 784 features
W = np.random.randn(784, 128)  # weight matrix

Z = X @ W                      # shape: (64, 128)
Z = np.dot(X, W)                # identical to X @ W

# CAUTION: * is element-wise, @ is matrix multiply
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print(A * B)    # [[ 5 12] [21 32]]  — element-wise (Hadamard)
print(A @ B)    # [[19 22] [43 50]]  — matrix multiplication
Python

Family 2: `np.sum` — Reduction Along Axes

X = np.array([[1, 2, 3],
              [4, 5, 6]])   # shape: (2, 3)

print(np.sum(X))                 # 21         — sum everything
print(np.sum(X, axis=0))        # [5  7  9]  — sum across rows → shape (3,)
print(np.sum(X, axis=1))        # [6  15]    — sum across cols → shape (2,)
print(np.sum(X, axis=1, keepdims=True))
# [[ 6]                        — shape (2, 1) ← for broadcasting
#  [15]]

# Also: np.mean(), np.max(), np.min(), np.std() — same axis logic
Python

Understanding axis in NumPy: axis=0: collapse ROWS (operate "downward") ┌───────────┐ ┌───────────┐ │ 1 2 3 │ │ │ │ ↓ ↓ ↓ │ → │ 5 7 9 │ shape: (3,) │ 4 5 6 │ │ │ └───────────┘ └───────────┘ axis=1: collapse COLUMNS (operate "rightward") ┌───────────┐ ┌────┐ │ 1 → 2 → 3│ │ 6 │ │ 4 → 5 → 6│ → │ 15 │ shape: (2,) └───────────┘ └────┘

Family 3: `np.exp`, `np.log` — Activation & Loss Functions

# Sigmoid activation (used in binary classification)
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

z = np.array([-2, -1, 0, 1, 2])
print(sigmoid(z))
# [0.119  0.269  0.5  0.731  0.881]

# Binary cross-entropy loss
def binary_cross_entropy(y_true, y_pred):
    """y_true: (m,), y_pred: (m,) — both between 0 and 1."""
    epsilon = 1e-8  # avoid log(0)
    return -np.mean(
        y_true * np.log(y_pred + epsilon) +
        (1 - y_true) * np.log(1 - y_pred + epsilon)
    )

y = np.array([1, 0, 1, 1])
p = np.array([0.9, 0.1, 0.8, 0.95])
print(f"Loss: {binary_cross_entropy(y, p):.4f}")   # Loss: 0.0970
Python

Family 4: `np.maximum` — ReLU Activation

# ReLU: f(x) = max(0, x)
def relu(z):
    return np.maximum(0, z)

z = np.array([-3, -1, 0, 2, 5])
print(relu(z))         # [0 0 0 2 5]

# Leaky ReLU: f(x) = max(αx, x)
def leaky_relu(z, alpha=0.01):
    return np.maximum(alpha * z, z)

print(leaky_relu(z))   # [-0.03 -0.01  0.  2.  5.]

# IMPORTANT: np.maximum vs np.max
# np.maximum(a, b) — element-wise max of two arrays
# np.max(a)        — single maximum value in array
Python

Family 5: `np.random` — Weight Initialisation

# Set seed for reproducibility
np.random.seed(42)

# Standard normal (mean=0, std=1)
W1 = np.random.randn(784, 128)

# Xavier/Glorot initialisation (recommended for sigmoid/tanh)
fan_in, fan_out = 784, 128
W2 = np.random.randn(fan_in, fan_out) * np.sqrt(2.0 / (fan_in + fan_out))

# He initialisation (recommended for ReLU)
W3 = np.random.randn(fan_in, fan_out) * np.sqrt(2.0 / fan_in)

# Uniform random
W4 = np.random.uniform(-0.5, 0.5, size=(784, 128))

# Random integers (useful for sampling mini-batches)
indices = np.random.choice(10000, size=64, replace=False)  # 64 random indices

print(f"Xavier std: {W2.std():.4f}")   # ≈ 0.0468
print(f"He std:     {W3.std():.4f}")   # ≈ 0.0506
Python

Family 6: `np.argmax`, `np.where`, `np.clip` — Utility Functions

# argmax — find predicted class
probs = np.array([[0.1, 0.7, 0.2],
                  [0.8, 0.1, 0.1]])
predictions = np.argmax(probs, axis=1)   # [1, 0]

# where — conditional selection
x = np.array([-2, 3, -1, 5])
result = np.where(x > 0, x, 0)       # [0, 3, 0, 5]  — another way to write ReLU!

# clip — cap values (useful for numerical stability)
y_pred = np.array([0.0, 0.5, 1.0])
y_safe = np.clip(y_pred, 1e-7, 1 - 1e-7)  # avoid log(0)
print(y_safe)   # [1e-07, 0.5, 0.9999999]
Python

The DL NumPy Cheat Sheet

Operation	NumPy	DL Use Case
Matrix multiply	`X @ W` or `np.dot(X, W)`	Forward pass
Element-wise multiply	`A * B`	Attention, gating
Sum along axis	`np.sum(X, axis=0)`	Gradient averaging
Exponential	`np.exp(z)`	Softmax, sigmoid
Logarithm	`np.log(p)`	Cross-entropy loss
Element-wise max	`np.maximum(0, z)`	ReLU activation
Random normal	`np.random.randn(m, n)`	Weight initialisation
Argmax	`np.argmax(probs, axis=1)`	Prediction class
Clip	`np.clip(p, 1e-7, 1-1e-7)`	Numerical stability
Transpose	`W.T`	Backpropagation

3.5 — Matplotlib Basics: Plots Every DL Practitioner Needs

Visualisation is not optional in deep learning. You must plot your loss curve to know if training is working. You must visualise data distributions to catch preprocessing bugs. Here are the three plots you'll use most.

Plot 1: Loss Curve

import matplotlib.pyplot as plt
import numpy as np

# Simulate training loss (exponential decay + noise)
epochs = np.arange(1, 101)
train_loss = 2.5 * np.exp(-0.03 * epochs) + 0.1 * np.random.randn(100) * np.exp(-0.02 * epochs)
val_loss   = 2.5 * np.exp(-0.025 * epochs) + 0.15 * np.random.randn(100) * np.exp(-0.015 * epochs)

plt.figure(figsize=(8, 5))
plt.plot(epochs, train_loss, label='Train Loss', color='#7c3aed', linewidth=2)
plt.plot(epochs, val_loss,   label='Val Loss',   color='#f59e0b', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Progress — MNIST Classifier')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('loss_curve.png', dpi=150)
plt.show()
Python

Plot 2: Histogram (Data Distribution)

# Visualise weight initialisation distributions
W_normal = np.random.randn(10000)
W_xavier = np.random.randn(10000) * np.sqrt(2/1000)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].hist(W_normal, bins=50, color='#7c3aed', alpha=0.7, edgecolor='white')
axes[0].set_title('Standard Normal (std=1.0)')
axes[0].set_xlabel('Weight Value')

axes[1].hist(W_xavier, bins=50, color='#10b981', alpha=0.7, edgecolor='white')
axes[1].set_title(f'Xavier (std={W_xavier.std():.3f})')
axes[1].set_xlabel('Weight Value')

plt.tight_layout()
plt.show()
Python

Plot 3: Decision Boundary (2D Classification)

def plot_decision_boundary(X, y, predict_fn, title="Decision Boundary"):
    """Plot 2D decision boundary for a binary classifier."""
    h = 0.02  # mesh step size
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))

    grid_points = np.c_[xx.ravel(), yy.ravel()]   # shape: (N, 2)
    Z = predict_fn(grid_points).reshape(xx.shape)

    plt.figure(figsize=(8, 6))
    plt.contourf(xx, yy, Z, alpha=0.4, cmap='RdYlBu')
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='RdYlBu',
                edgecolors='black', s=30)
    plt.title(title)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.show()

# Usage (with a dummy classifier):
# X = np.random.randn(200, 2)
# y = (X[:, 0] + X[:, 1] > 0).astype(int)
# plot_decision_boundary(X, y, lambda x: (x[:, 0] + x[:, 1] > 0).astype(int))
Python

Plotting Best Practices for DL

Always label axes — unlabelled plots are useless in reports
Use plt.tight_layout() — prevents labels from being cut off
Save with dpi=150 — good balance of quality vs file size
Use plt.grid(alpha=0.3) — subtle gridlines aid reading
Consistent colours — use the same colour for train/val across all plots

3.6 — Pandas: Loading and Preprocessing Data

Before data reaches a neural network, it typically lives in a CSV, database, or API. Pandas is the bridge between raw data and NumPy arrays. You don't need to master Pandas for deep learning — you need a survival kit.

Loading and Inspecting a CSV

import pandas as pd

# Load a dataset (e.g., house prices)
df = pd.read_csv('mumbai_house_prices.csv')

# Quick inspection
print(df.shape)               # (5000, 8) — 5000 rows, 8 columns
print(df.head())               # first 5 rows
print(df.dtypes)               # column data types
print(df.describe())           # mean, std, min, max per column
print(df.isnull().sum())       # count missing values per column
Python

Basic Preprocessing

# Select features and target
features = ['area_sqft', 'bedrooms', 'floor', 'age_years']
target   = 'price_lakhs'

X = df[features].values        # Convert to NumPy array — shape: (5000, 4)
y = df[target].values           # shape: (5000,)

# Handle missing values
from numpy import nan
df['age_years'].fillna(df['age_years'].median(), inplace=True)

# Normalise features
X_mean = X.mean(axis=0)
X_std  = X.std(axis=0)
X_norm = (X - X_mean) / X_std   # Broadcasting!

# Train/test split (80/20)
np.random.seed(42)
indices = np.random.permutation(len(X_norm))
split   = int(0.8 * len(X_norm))

X_train, X_test = X_norm[indices[:split]], X_norm[indices[split:]]
y_train, y_test = y[indices[:split]], y[indices[split:]]

print(f"Train: {X_train.shape}, Test: {X_test.shape}")
# Train: (4000, 4), Test: (1000, 4)
Python

One-Hot Encoding Categorical Variables

# Example: encoding city names for neural network input
cities = pd.Series(['Mumbai', 'Delhi', 'Bengaluru', 'Mumbai', 'Delhi'])

# Method 1: Pandas get_dummies
one_hot = pd.get_dummies(cities, prefix='city')
print(one_hot)
#    city_Bengaluru  city_Delhi  city_Mumbai
# 0              0           0            1
# 1              0           1            0
# 2              1           0            0
# 3              0           0            1
# 4              0           1            0

# Convert to NumPy
X_cities = one_hot.values     # shape: (5, 3)
Python

Indian Datasets to Practice With

Kaggle hosts several excellent Indian datasets: IPL ball-by-ball data, Zomato restaurant reviews, Indian census data, NIFTY stock prices, Swiggy delivery times. Search "India" on Kaggle and sort by most votes. These datasets are perfect for building your Pandas and NumPy muscles.

3.7 — Google Colab: Your Free GPU Playground

Google Colab is a free Jupyter notebook environment with GPU access. For this textbook, Colab is the recommended environment — no installation required, works on any laptop (even a ₹25,000 budget laptop), and provides a T4 GPU that's sufficient for all exercises.

Getting Started

1 Go to colab.research.google.com

2 Sign in with your Google account (any @gmail.com works)

3 Click New Notebook

4 Enable GPU: Runtime → Change runtime type → T4 GPU → Save

Verify GPU Access

# Run this in a Colab cell
import torch
print(f"GPU available: {torch.cuda.is_available()}")
print(f"GPU name: {torch.cuda.get_device_name(0)}")

# Check NumPy version
import numpy as np
print(f"NumPy: {np.__version__}")
Python

GPU available: True GPU name: Tesla T4 NumPy: 1.26.4

Uploading Files

# Method 1: Upload from local machine
from google.colab import files
uploaded = files.upload()   # Opens file picker dialog

# Method 2: Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Now access files like:
df = pd.read_csv('/content/drive/MyDrive/datasets/ipl_data.csv')

# Method 3: Download from URL
!wget https://example.com/dataset.csv -O /content/dataset.csv
Python

Installing Additional Libraries

# Colab pre-installs most ML libraries, but you can add more:
!pip install -q wandb          # experiment tracking
!pip install -q torchsummary   # model architecture viewer
Python

Colab Sessions Expire!

Free Colab sessions disconnect after ~90 minutes of inactivity and have a maximum runtime of ~12 hours. Always save your notebook to Drive (File → Save a copy in Drive). For large experiments, save checkpoints to Drive periodically:

# Save model checkpoint to Drive
np.save('/content/drive/MyDrive/checkpoints/weights.npy', W)
Python

Colab Pro vs Free

The free tier is sufficient for all exercises in this textbook. Colab Pro (₹850/month) gives you faster GPUs (A100), longer runtimes, and more RAM. Consider upgrading only if you're training large models for your final project or internship work.

Critical: Why Vectorization Matters

Let's make this concrete with the most important function in deep learning: sigmoid, computed over 1 million samples.

The Definitive Benchmark

Sigmoid Function

σ(z) = 1 / (1 + e^−z)

import numpy as np
import time

n = 1_000_000
z = np.random.randn(n)

# ───── METHOD 1: Python for-loop ─────
start = time.time()
result_loop = np.zeros(n)
for i in range(n):
    result_loop[i] = 1.0 / (1.0 + np.exp(-z[i]))
loop_time = time.time() - start

# ───── METHOD 2: NumPy vectorised ─────
start = time.time()
result_vec = 1.0 / (1.0 + np.exp(-z))
vec_time = time.time() - start

# Verify both give same result
print(f"Max difference: {np.max(np.abs(result_loop - result_vec)):.2e}")
print(f"For-loop:   {loop_time*1000:8.1f} ms")
print(f"Vectorised: {vec_time*1000:8.2f} ms")
print(f"Speedup:    {loop_time/vec_time:8.0f}×")
Python

Max difference: 0.00e+00 For-loop: 2847.3 ms Vectorised: 5.61 ms Speedup: 507×

What This Means for Training

A single epoch of training on MNIST with a 2-layer network involves computing sigmoid ~1 million times. With for-loops: ~3 seconds per epoch × 100 epochs = 5 minutes. With vectorisation: ~0.006 seconds per epoch × 100 epochs = 0.6 seconds. Over the course of this textbook, you'll run thousands of training experiments. Vectorisation literally saves you days.

Common Vectorisation Patterns

# ❌ SLOW: for-loop              # ✅ FAST: vectorised
# -----------------------         # -----------------------
# for i in range(m):              # Z = X @ W + b
#   z = 0                          #
#   for j in range(n):            # A = sigmoid(Z)
#     z += X[i,j] * W[j]          #
#   z += b                         # L = -np.mean(y*np.log(A)
#   A[i] = sigmoid(z)             #       + (1-y)*np.log(1-A))
#   L += -(y[i]*log(A[i])         #
#       + (1-y[i])*log(1-A[i]))   # dW = X.T @ (A - y) / m
# L /= m                           #

# The vectorised version is also CLEARER and SHORTER!
Python

The Golden Rule

"Whenever you're tempted to write a for-loop over array indices, stop and think: can this be a matrix operation?" In 95% of cases, the answer is yes. The remaining 5% usually involves control flow (if/else per sample) — and even those can often be replaced with np.where().

Worked Example: End-to-End NumPy Data Pipeline

Problem: Predict Zomato Delivery Time

Given a dataset of Zomato orders with columns [distance_km, restaurant_rating, num_items, time_of_day_hour, delivery_time_min], build a complete data preprocessing pipeline using only NumPy and Pandas.

Step 1: Load and inspect

import pandas as pd
import numpy as np

# Simulate Zomato delivery data
np.random.seed(42)
n = 5000

distance      = np.random.exponential(3, n) + 0.5
rating        = np.clip(np.random.normal(3.8, 0.5, n), 1, 5)
num_items     = np.random.randint(1, 8, n)
time_of_day   = np.random.randint(8, 24, n)
delivery_time = 10 + 3*distance - 2*rating + 1.5*num_items + np.random.randn(n)*5

df = pd.DataFrame({
    'distance_km': np.round(distance, 1),
    'rating': np.round(rating, 1),
    'num_items': num_items,
    'time_of_day': time_of_day,
    'delivery_min': np.round(delivery_time, 1)
})
print(df.head())
print(f"\nShape: {df.shape}")
print(f"Any nulls: {df.isnull().any().any()}")
Python

Step 2: Extract features and normalise

# Extract NumPy arrays
feature_cols = ['distance_km', 'rating', 'num_items', 'time_of_day']
X = df[feature_cols].values.astype(np.float64)   # (5000, 4)
y = df['delivery_min'].values                    # (5000,)

# Z-score normalisation
mu    = X.mean(axis=0)    # shape: (4,)
sigma = X.std(axis=0)     # shape: (4,)
X_norm = (X - mu) / sigma  # Broadcasting: (5000,4) - (4,) / (4,)

print(f"Means after norm:  {X_norm.mean(axis=0).round(6)}")
print(f"Stds after norm:   {X_norm.std(axis=0).round(6)}")
Python

Step 3: Train/test split

np.random.seed(42)
indices = np.random.permutation(n)
split = int(0.8 * n)

X_train, X_test = X_norm[indices[:split]], X_norm[indices[split:]]
y_train, y_test = y[indices[:split]], y[indices[split:]]

print(f"Train: {X_train.shape}, Test: {X_test.shape}")
# Train: (4000, 4), Test: (1000, 4)
Python

Step 4: Add bias column and compute closed-form solution (preview of linear regression)

# Add column of ones for bias term
X_train_b = np.c_[np.ones(X_train.shape[0]), X_train]   # (4000, 5)
X_test_b  = np.c_[np.ones(X_test.shape[0]), X_test]     # (1000, 5)

# Normal equation: w = (X^T X)^{-1} X^T y
w = np.linalg.inv(X_train_b.T @ X_train_b) @ X_train_b.T @ y_train
print(f"Weights: {w.round(3)}")
# [bias, dist_coeff, rating_coeff, items_coeff, time_coeff]

# Predict and evaluate
y_pred = X_test_b @ w
rmse = np.sqrt(np.mean((y_test - y_pred) ** 2))
print(f"Test RMSE: {rmse:.2f} minutes")
Python

Weights: [17.894 8.926 -0.976 2.153 0.052] Test RMSE: 5.02 minutes

Notice: the entire pipeline — from data loading to prediction — used zero for-loops. Every operation was vectorised.

Case Study & Mini-Project: IPL Cricket Analytics

IPL Ball-by-Ball Data Analysis

The Indian Premier League (IPL) generates one of the richest sports datasets in the world — every ball bowled, every run scored, every wicket taken across 15+ seasons. Let's use NumPy, Pandas, and matplotlib to extract actionable insights.

Setup: Load the Dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Download from Kaggle: "IPL Complete Dataset (2008-2024)"
# Or use: kaggle datasets download -d patrickb1912/ipl-complete-dataset-20082020
deliveries = pd.read_csv('deliveries.csv')

print(deliveries.shape)
print(deliveries.columns.tolist())
print(deliveries.head())
Python

Task 1: Compute Run Rate per Over

# Group by match_id and over, sum runs
over_runs = deliveries.groupby(['match_id', 'over'])['total_runs'].sum()

# Average runs per over across all matches
avg_runs_per_over = over_runs.groupby('over').mean()

# Plot
plt.figure(figsize=(10, 5))
plt.bar(avg_runs_per_over.index, avg_runs_per_over.values,
        color='#7c3aed', edgecolor='white')
plt.xlabel('Over Number')
plt.ylabel('Average Runs')
plt.title('IPL: Average Runs per Over (All Seasons)')
plt.xticks(range(1, 21))
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Python

Task 2: Strike Rate Calculation Using NumPy

# Strike rate = (total runs / total balls faced) × 100
batsman_stats = deliveries.groupby('batsman').agg(
    total_runs=('batsman_runs', 'sum'),
    balls_faced=('batsman_runs', 'count')
).reset_index()

# Filter: minimum 500 balls faced
qualified = batsman_stats[batsman_stats['balls_faced'] >= 500]

# Compute strike rate using NumPy (vectorised!)
runs  = qualified['total_runs'].values    # shape: (N,)
balls = qualified['balls_faced'].values   # shape: (N,)
strike_rate = (runs / balls) * 100          # Broadcasting: scalar × array

# Top 10 strike rates
top_idx = np.argsort(strike_rate)[-10:][::-1]
print("Top 10 IPL Strike Rates (min 500 balls):")
for i in top_idx:
    name = qualified.iloc[i]['batsman']
    print(f"  {name:25s} SR: {strike_rate[i]:.1f}  Runs: {runs[i]}")
Python

Task 3: Score Progression Plot

# Pick a specific match and plot cumulative score
match_id = deliveries['match_id'].unique()[42]   # arbitrary match
match = deliveries[deliveries['match_id'] == match_id]

teams = match['batting_team'].unique()

plt.figure(figsize=(10, 5))
for team in teams:
    innings = match[match['batting_team'] == team]
    cumulative = np.cumsum(innings['total_runs'].values)
    balls = np.arange(1, len(cumulative) + 1)
    plt.plot(balls, cumulative, linewidth=2, label=team)

plt.xlabel('Ball Number')
plt.ylabel('Cumulative Score')
plt.title(f'IPL Match #{match_id} — Score Progression')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Python

Task 4: Powerplay vs Death Overs Analysis

# Powerplay: overs 1-6, Middle: 7-15, Death: 16-20
def phase(over):
    if over <= 6: return 'Powerplay'
    elif over <= 15: return 'Middle'
    else: return 'Death'

deliveries['phase'] = deliveries['over'].apply(phase)

phase_stats = deliveries.groupby('phase').agg(
    avg_runs_per_ball=('total_runs', 'mean'),
    total_wickets=('is_wicket', 'sum')
)
print(phase_stats.round(3))
#            avg_runs_per_ball  total_wickets
# Death               1.356          12845
# Middle               1.121          18923
# Powerplay            1.297          11567
Python

Mini-Project Extension Ideas

Build a heatmap: which bowler is most dangerous to which batsman?
Compute win probability at each ball using historical data (logistic regression preview!)
Predict total match score from powerplay score (linear regression preview!)
Find "clutch" players: who performs best in high-pressure death overs?

Common NumPy Bugs & Mistakes

Bug 1: Shape Mismatch in Matrix Multiplication

X = np.random.randn(64, 784)
W = np.random.randn(64, 128)   # WRONG shape!

# X @ W → ValueError: shapes (64,784) and (64,128) not aligned
# FIX: W should be (784, 128), not (64, 128)
W = np.random.randn(784, 128)  # CORRECT
Z = X @ W                       # (64, 784) @ (784, 128) = (64, 128) ✓
Python

Rule: For A @ B, the inner dimensions must match: (m, n) @ (n, k) → (m, k).

Bug 2: Axis Confusion

X = np.random.randn(100, 5)    # 100 samples, 5 features

# WRONG: normalise across features (axis=1)
mean_wrong = X.mean(axis=1)     # shape: (100,) — averaged features per sample

# RIGHT: normalise each feature independently (axis=0)
mean_right = X.mean(axis=0)     # shape: (5,) — averaged samples per feature
X_norm = (X - mean_right) / X.std(axis=0)
Python

Rule: axis=0 collapses rows (operates "downward"), axis=1 collapses columns (operates "rightward"). Think: "axis=0 gives you column-wise results".

Bug 3: In-Place Operations vs Copies

a = np.array([1, 2, 3])
b = a            # b is a VIEW, not a copy!
b[0] = 99
print(a)          # [99  2  3] — a was modified too!

# FIX: use .copy()
a = np.array([1, 2, 3])
b = a.copy()     # b is an independent copy
b[0] = 99
print(a)          # [1 2 3] — a unchanged ✓
Python

Rule: Assignment (=) and slicing create views. Use .copy() when you need an independent copy. This also applies to slices: b = a[0:3] is a view.

Bug 4: Broadcasting Silent Errors

# Subtle bug: adding a (3,) vector to a (3, 4) matrix
W = np.random.randn(3, 4)
b = np.array([1, 2, 3])        # shape: (3,)

result = W + b                   # WRONG: broadcasts b as a ROW (1, 3) → error!
# Actually this raises: ValueError (shapes don't broadcast)

# What you probably meant:
b_col = b.reshape(-1, 1)        # shape: (3, 1)
result = W + b_col               # ✓ adds b to each column

# OR, if b is per-feature (per column):
b_row = np.array([1, 2, 3, 4])  # shape: (4,)
result = W + b_row               # ✓ broadcasts as (1, 4) + (3, 4)
Python

Rule: Always explicitly check shapes before broadcasting. When in doubt, use .reshape() to make the intent clear.

Bug 5: Integer Division Gotcha

# NumPy preserves dtype — integer arrays stay integer!
a = np.array([1, 2, 3])
print(a / 2)        # [0.5 1.  1.5] — Python 3 float division ✓
print(a // 2)       # [0 1 1]       — integer division

# But watch out with typed arrays:
a = np.array([1, 2, 3], dtype=np.int32)
a = a / 2           # This creates a NEW float64 array
print(a.dtype)       # float64 — OK

# DANGER: in-place operations preserve dtype
a = np.array([1, 2, 3], dtype=np.int32)
a /= 2              # TypeError! Can't cast float to int in-place
Python

Misconceptions Busted

❌ Misconception	✅ Reality
"NumPy is just a fancier Python list"	NumPy arrays are a completely different data structure — contiguous C-level memory, typed, with compiled BLAS/LAPACK backends. They're closer to C arrays than Python lists.
"Broadcasting copies data"	Broadcasting is a zero-copy operation. NumPy adjusts strides (internal metadata) to "virtually expand" the smaller array. No memory allocation occurs.
"I need to learn all of NumPy before starting DL"	You need about 20 functions for 95% of deep learning work. This chapter covers them all. The rest you'll pick up as needed.
"Pandas is required for deep learning"	Pandas is for data loading and exploration. Once data is in NumPy arrays (or PyTorch tensors), Pandas is not involved in training. Think of Pandas as the "loading dock" and NumPy as the "factory floor".
"`np.random.seed(42)` makes results perfectly reproducible"	It makes NumPy's randomness reproducible. For full reproducibility in DL, you also need `torch.manual_seed()`, `random.seed()`, and CUDA determinism flags. We'll cover this in Chapter 6.
"Google Colab is too slow for real deep learning"	Colab's free T4 GPU has 16 GB VRAM and ~65 TFLOPS for FP16. It can train ResNet-50 on ImageNet in ~40 hours. For learning, it's more than enough — professionals at Indian startups often prototype on Colab before deploying on cloud GPUs.

Exercises

Section A: Multiple Choice Questions

What is the shape of the result of np.dot(A, B) where A has shape (64, 784) and B has shape (784, 128)?
(a) (64, 784) (b) (784, 128) (c) (64, 128) (d) (784, 784)
Answer: (c) — Matrix multiply (m,n)@(n,k) → (m,k)
Which of the following creates a column vector from a 1D array a of shape (5,)?
(a) a.T (b) a.reshape(1, -1) (c) a.flatten() (d) a.reshape(-1, 1)
Answer: (d) — reshape(-1, 1) gives shape (5, 1). Note: .T on a 1D array does nothing!
What does np.sum(X, axis=0) do for X of shape (100, 5)?
(a) Sums all elements into a scalar (b) Sums each row → shape (100,) (c) Sums each column → shape (5,) (d) Raises an error
Answer: (c) — axis=0 collapses rows, giving one sum per column
What is the output of np.maximum(0, np.array([-3, -1, 0, 2, 5]))?
(a) 5 (b) [-3, -1, 0, 2, 5] (c) [0, 0, 0, 2, 5] (d) [0, 0, 0, 0, 0]
Answer: (c) — np.maximum is element-wise max, implementing ReLU
Broadcasting: What is the shape of A + B where A has shape (3, 1) and B has shape (1, 4)?
(a) (3, 1) (b) (1, 4) (c) Error (d) (3, 4)
Answer: (d) — (3,1) and (1,4) broadcast to (3,4)
Why is 1e-8 added inside np.log() in cross-entropy loss?
(a) To speed up computation (b) To prevent log(0) = −∞ (c) For better accuracy (d) It's a learning rate
Answer: (b) — log(0) is undefined; adding epsilon avoids NaN
Which initialisation is recommended for layers using ReLU activation?
(a) All zeros (b) Xavier/Glorot (c) He initialisation (d) Uniform [0, 1]
Answer: (c) — He init uses √(2/fan_in), designed for ReLU's half-dead property
b = a where a is a NumPy array. Modifying b[0] will:
(a) Only modify b (b) Modify both a and b (c) Raise an error (d) Create a copy
Answer: (b) — Assignment creates a view, not a copy. Use a.copy() for independence.
What is the purpose of keepdims=True in np.sum(X, axis=1, keepdims=True)?
(a) Faster computation (b) Prevents data loss (c) Preserves the reduced dimension as size 1 for broadcasting (d) No practical effect
Answer: (c) — Without keepdims, shape drops from (m,n) to (m,). With keepdims, it becomes (m,1), enabling correct broadcasting.
A for-loop computing dot product of two 1M-element vectors takes ~300ms. The NumPy np.dot equivalent takes ~1ms. What explains the ~300× speedup?
(a) NumPy uses a faster algorithm (b) NumPy runs compiled C code with SIMD, avoids Python object overhead (c) NumPy uses GPU (d) Python for-loops have a bug
Answer: (b) — NumPy's C backend uses contiguous memory, SIMD vectorisation, and avoids per-element Python type checking

Section B: Short Answer Questions

Explain the three broadcasting rules in your own words. Give one DL example for each rule.
Why does .T (transpose) have no effect on a 1D NumPy array of shape (5,)? How would you create a proper column vector?
Compare np.maximum(A, B) vs np.max(A). When would you use each in deep learning?
Explain why normalising features (zero mean, unit variance) before training a neural network is important. Write the vectorised NumPy code for normalisation.
Describe the difference between Xavier and He weight initialisation. Which activation function is each designed for, and why?

Section C: Long Answer Questions

Derive the normal equation for linear regression: w = (X^TX)⁻¹X^Ty. Then implement it in NumPy on a synthetic dataset of 1000 Bengaluru house prices with 3 features (area, bedrooms, floor). Report the RMSE and plot predicted vs actual prices.
Write a complete softmax function in NumPy that is numerically stable (handles large logits without overflow). Explain each line. Then verify that your function: (a) outputs values between 0 and 1, (b) outputs sum to 1 per row, (c) handles logits of magnitude 1000 without NaN.

Section D: Programming Exercises

D1: Vectorised Sigmoid & Its Derivative

Implement sigmoid(z) and sigmoid_derivative(z) using only NumPy (no loops). Verify that sigmoid_derivative(z) equals sigmoid(z) * (1 - sigmoid(z)) for z in [-5, 5]. Plot both functions on the same graph.

D2: Mini-Batch Generator

Write a function get_mini_batches(X, y, batch_size=64, shuffle=True) that:

Shuffles the data if shuffle=True
Yields tuples of (X_batch, y_batch) of the specified batch size
Handles the last batch (which may be smaller than batch_size)
Uses NumPy indexing (no Python list slicing)

Test it on the Zomato delivery dataset from the worked example.

D3: IPL Dataset Exploratory Data Analysis

Using the IPL ball-by-ball dataset:

Compute the economy rate (runs conceded per over) for the top 20 bowlers by total overs bowled
Create a bar chart showing the top 10 highest individual scores in IPL history
Compute and plot the win percentage by batting first vs chasing for each venue
Build a NumPy-only function that takes a match_id and returns a Manhattan plot (runs per over for each innings, side by side)

Section E: Mini-Project

🏏 IPL Score Predictor (Data Pipeline)

Build a complete data pipeline for predicting first-innings total score from powerplay data:

Data extraction: From the ball-by-ball CSV, compute per-match features: powerplay score, powerplay wickets, run rate in overs 1-3, run rate in overs 4-6, number of boundaries in powerplay
Target: Total first-innings score
Preprocessing: Normalise features, handle missing matches, train/test split (80/20)
Baseline model: Use the normal equation (from the worked example) to fit a linear regression. Report RMSE in runs.
Visualisation: Plot predicted vs actual scores, residual distribution, and feature correlations

Deliverable: A single Colab notebook with all code, plots, and a 200-word analysis of which powerplay features most strongly predict final score. Save the notebook as ipl_score_predictor.ipynb.

Chapter Summary

Key Takeaways

NumPy arrays are the foundation of all numerical computing in Python — contiguous, typed, and 50–200× faster than Python lists for numerical operations.
Broadcasting allows operations between arrays of different shapes by virtually stretching the smaller array. Three rules: compare trailing dimensions; dimensions are compatible if equal or one is 1.
Vectorisation replaces Python for-loops with C-level NumPy operations. Computing sigmoid over 1M values: for-loop = 2.8s, vectorised = 5.6ms (500× speedup).
Six function families power all of deep learning: np.dot (forward pass), np.sum (reductions), np.exp/np.log (activations/losses), np.maximum (ReLU), np.random (initialisation), and utilities (np.argmax, np.where, np.clip).
Matplotlib provides three essential plots: loss curves, histograms, and decision boundaries.
Pandas is the bridge from raw CSV data to clean NumPy arrays. You need: read_csv, .head(), .describe(), .values, get_dummies(), fillna().
Google Colab provides free GPU access — sufficient for all exercises in this textbook. Enable T4 GPU via Runtime → Change runtime type.
Common bugs: shape mismatches, axis confusion, view vs copy, and silent broadcasting errors. Always print(x.shape).

Formula Quick Reference

Operation	Formula / Code	DL Usage
Matrix Multiply	`Z = X @ W + b`	Forward pass (every layer)
Sigmoid	σ(z) = 1 / (1 + e^−z)	Binary classification output
ReLU	`np.maximum(0, z)`	Hidden layer activation
Softmax	e^zᵢ / Σe^zⱼ	Multi-class output
Cross-Entropy	−(1/m)Σ[y log ŷ + (1−y)log(1−ŷ)]	Classification loss
Z-score Normalisation	`(X - μ) / σ`	Feature preprocessing
Xavier Init	W ~ N(0, 2/(n_in + n_out))	Sigmoid/Tanh layers
He Init	W ~ N(0, 2/n_in)	ReLU layers

What's Next?

In Chapter 4: The Perceptron & Single Neuron, we'll put these tools to work. You'll implement a single neuron that computes z = X @ w + b, applies sigmoid, and learns by gradient descent — all using the vectorised NumPy operations you mastered in this chapter. Every function you learned here — np.dot, np.exp, np.log, np.maximum — will be used in that implementation.

References & Further Reading

Official Documentation

NumPy Documentation (2024). NumPy User Guide. numpy.org/doc
Matplotlib Documentation (2024). Tutorials. matplotlib.org
Pandas Documentation (2024). Getting Started. pandas.pydata.org
Google Colab (2024). Welcome to Colab. colab.research.google.com

Textbooks

VanderPlas, J. (2016). Python Data Science Handbook, Chapter 2 (NumPy). O'Reilly. Available free at jakevdp.github.io
McKinney, W. (2022). Python for Data Analysis, 3rd Edition. O'Reilly.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning, Chapter 2 (Linear Algebra). MIT Press. deeplearningbook.org

Video Resources

Ng, A. (2017). Vectorization (Coursera Deep Learning Specialization, Course 1, Week 2). Clear explanation of why vectorisation matters for neural networks.
Corey Schafer — NumPy Tutorial (YouTube). Excellent 1-hour tutorial covering all basics.
sentdex — Matplotlib Tutorial Series (YouTube). Comprehensive plotting guide.

Indian Context

NPTEL — Python for Data Science by IIT Madras. Free video lectures with certification.
NPTEL — Deep Learning by Prof. Mitesh Khapra, IIT Madras. NumPy-based implementations in Weeks 1-4.
IPL Complete Dataset — Kaggle. kaggle.com (IPL)
Analytics Vidhya — NumPy Tutorial for Beginners. India-focused data science blog with practical examples.