Part X — Specialized Domains

Time Series Analysis
& Forecasting

Master the art and science of predicting the future from historical temporal data — from classical ARIMA to deep learning LSTMs.

📖 Chapter 27 ⏱️ 4 Hours 🔗 Prereq: Ch 19 📊 24 Sections

Learning Objectives

By the end of this chapter, you will be able to:

Decompose a time series into its core components: trend, seasonality, cyclical, and noise.
Test for stationarity using the ADF (Augmented Dickey-Fuller) and KPSS tests, and apply differencing/transformations to achieve stationarity.
Interpret ACF and PACF plots to identify appropriate model orders for AR, MA, and ARIMA models.
Build and tune ARIMA(p,d,q) models including seasonal variants (SARIMA) using the Box-Jenkins methodology.
Apply Exponential Smoothing methods: Simple, Holt's Linear Trend, and Holt-Winters Seasonal models.
Use Facebook Prophet for time series forecasting with trend changepoints, holiday effects, and custom seasonality.
Design LSTM networks using TensorFlow/Keras for sequence-to-one and sequence-to-sequence forecasting.
Understand multivariate time series analysis using Vector Autoregression (VAR).
Detect anomalies in time series data using statistical and ML-based approaches.
Evaluate forecasts with MAPE, RMSE, SMAPE, and other metrics, knowing their tradeoffs.
Apply time series methods to real-world Indian and global case studies, including Nifty50 forecasting, monsoon prediction, and energy demand.
Build end-to-end mini projects — Stock Price Forecaster and Weather Predictor.

🎯 Exam Tip

University exams heavily test stationarity concepts, ARIMA order selection from ACF/PACF plots, and deriving forecast equations. Be ready to interpret plots and perform hand calculations for small series.

Introduction

Time series data is everywhere — from the minute-by-minute stock prices on the Bombay Stock Exchange to daily temperatures recorded by the India Meteorological Department, monthly GST revenues, and the second-by-second sensor readings in a smart factory. Unlike cross-sectional data where observations are independent, time series data has temporal ordering, and that ordering contains crucial information.

Time series analysis is the discipline of extracting meaningful statistics and characteristics from time-ordered data, while time series forecasting uses historical observations to predict future values. The applications span virtually every industry:

Finance: Stock prices, exchange rates, portfolio risk (VaR)
Weather: Temperature, rainfall, cyclone paths
Retail: Sales demand, inventory optimization
Healthcare: Patient vitals monitoring, epidemic forecasting
Energy: Electricity demand, solar/wind output prediction
Telecom: Network traffic, call volume
Manufacturing: Sensor data for predictive maintenance

🇮🇳 India Spotlight

India's UPI (Unified Payments Interface) processes over 10 billion transactions per month. Time series forecasting helps NPCI and banks predict transaction volumes, detect anomalies during festivals (Diwali spike!), and capacity plan server infrastructure. The Reserve Bank of India (RBI) uses time series models to forecast inflation (CPI) and GDP growth for monetary policy decisions.

This chapter walks through the entire landscape — from classical decomposition and ARIMA to modern deep learning (LSTM) and Meta's Prophet. We adopt a practitioner's approach: understand the theory, implement from scratch, then use production-ready libraries.

🎓 Professor's Insight

The most common mistake students make is applying machine learning models directly to time series without checking for stationarity. A non-stationary series violates the fundamental assumptions of most models. Always start with visualization, stationarity tests, and decomposition before modeling.

Historical Background

The study of time series has deep roots in astronomy, economics, and meteorology.

Timeline of Key Developments

Year	Milestone	Contributor
1927	Yule introduces autoregressive (AR) models for sunspot data	G. Udny Yule
1937	Slutzky formalizes the moving average (MA) process	Eugen Slutzky
1938	Wold's decomposition theorem	Herman Wold
1957	Exponential smoothing methods proposed	C.C. Holt, R.G. Brown
1960	Seasonal Holt-Winters method	Peter Winters
1970	Box-Jenkins methodology for ARIMA	George Box & Gwilym Jenkins
1979	Dickey-Fuller test for unit roots	David Dickey & Wayne Fuller
1980	Vector Autoregression (VAR) models	Christopher Sims
1982	ARCH models for volatility	Robert Engle
1986	GARCH models generalize ARCH	Tim Bollerslev
1992	KPSS stationarity test	Kwiatkowski et al.
1997	Long Short-Term Memory (LSTM) architecture	Hochreiter & Schmidhuber
2017	Facebook Prophet for scalable forecasting	Sean Taylor & Ben Letham
2019	Temporal Fusion Transformers	Lim et al. (Google)
2020	N-BEATS: Neural basis expansion for time series	Oreshkin et al.
2023	TimeGPT & foundation models for time series	Nixtla & others

The Box-Jenkins methodology (1970) was the watershed moment. It provided a systematic approach — identify, estimate, diagnose — that remains the gold standard for classical time series modeling. George Box famously said, "All models are wrong, but some are useful," which perfectly captures the forecasting philosophy.

🇮🇳 India Spotlight

The India Meteorological Department (IMD), established in 1875, is one of the oldest meteorological organizations. Indian statistician P.C. Mahalanobis (of Mahalanobis distance fame) at the Indian Statistical Institute pioneered statistical approaches to monsoon forecasting in the 1930s. Today, IMD uses ensemble models combining statistical and dynamical methods to issue Long Range Forecasts (LRF) for the Indian monsoon season.

Conceptual Explanation

4.1 Time Series Components

Every time series Y(t) can be decomposed into four fundamental components:

Trend (T): The long-term upward or downward movement. Example: India's GDP growth over decades.
Seasonality (S): Regular, periodic patterns repeating at fixed intervals. Example: Ice cream sales peak every summer, Diwali shopping spikes every October/November.
Cyclical (C): Longer-term fluctuations without fixed period, often linked to business cycles. Example: Economic booms and recessions lasting 5-10 years.
Noise/Residual (ε): Random, unpredictable variation after removing other components.

Two decomposition models are common:

Additive Decomposition

Y(t) = T(t) + S(t) + C(t) + ε(t)

Multiplicative Decomposition

Y(t) = T(t) × S(t) × C(t) × ε(t)

Use additive when seasonal fluctuations are roughly constant regardless of the level. Use multiplicative when seasonal variation grows proportionally with the level (e.g., airline passengers — the seasonal swing is larger when overall traffic is higher).

4.2 Stationarity

A time series is stationary when its statistical properties (mean, variance, autocorrelation) do not change over time. Most forecasting models require stationarity.

Strict vs. Weak Stationarity

Strict stationarity: The joint distribution of (Y(t₁), Y(t₂), ..., Y(tₖ)) is identical to (Y(t₁+τ), Y(t₂+τ), ..., Y(tₖ+τ)) for all time shifts τ.
Weak (covariance) stationarity: Mean is constant, variance is finite and constant, covariance depends only on the lag (not time). This is the practical requirement.

Testing for Stationarity

Augmented Dickey-Fuller (ADF) Test:

H₀: Unit root exists (series is non-stationary)
H₁: No unit root (series is stationary)
If p-value < 0.05 → reject H₀ → series IS stationary

KPSS Test:

H₀: Series is stationary
H₁: Series has a unit root (non-stationary)
If p-value < 0.05 → reject H₀ → series is NOT stationary

🎓 Professor's Insight

ADF and KPSS have opposite null hypotheses! Use both as a confirmatory pair. If ADF says stationary AND KPSS says stationary, you're confident. If they disagree, the series might be "near unit root" — consider differencing or fractional integration.

4.3 Differencing and Transformations

To make a non-stationary series stationary:

First Differencing: Y'(t) = Y(t) - Y(t-1) removes linear trends.
Second Differencing: Y''(t) = Y'(t) - Y'(t-1) removes quadratic trends.
Seasonal Differencing: Y'(t) = Y(t) - Y(t-m) where m is the seasonal period.
Log Transform: Stabilizes variance when it grows with the level.
Box-Cox Transform: Generalized power transform Y(λ) = (Y^λ - 1)/λ.

4.4 ACF and PACF

Autocorrelation Function (ACF) measures the correlation between Y(t) and Y(t-k) for lag k. It includes indirect correlations through intermediate lags.

Partial Autocorrelation Function (PACF) measures the direct correlation between Y(t) and Y(t-k), removing the effect of all intermediate lags.

Model	ACF Behavior	PACF Behavior
AR(p)	Decays gradually (exponential/oscillating)	Cuts off after lag p
MA(q)	Cuts off after lag q	Decays gradually
ARMA(p,q)	Decays gradually	Decays gradually

4.5 ARIMA Models

AR(p) — Autoregressive: Current value depends on p past values.

MA(q) — Moving Average: Current value depends on q past error terms.

ARIMA(p,d,q): Combines AR, differencing (d times), and MA. The I stands for "Integrated" — it reverses differencing.

4.6 SARIMA

SARIMA extends ARIMA for seasonal data: SARIMA(p,d,q)(P,D,Q,m) where uppercase letters are seasonal orders and m is the seasonal period (e.g., 12 for monthly data with annual seasonality).

4.7 Exponential Smoothing

SES (Simple Exponential Smoothing): For level-only data — no trend, no seasonality.
Holt's Linear: Adds a trend component.
Holt-Winters: Adds both trend and seasonality. Comes in additive and multiplicative variants.

4.8 Prophet

Developed by Meta (Facebook), Prophet uses a decomposable additive model: y(t) = g(t) + s(t) + h(t) + ε(t) where g(t) is a piecewise-linear or logistic growth trend, s(t) is seasonality modeled with Fourier terms, and h(t) captures holiday effects. Prophet is designed to handle missing data, changepoints, and multiple seasonalities automatically.

4.9 LSTM for Time Series

Long Short-Term Memory networks are a type of recurrent neural network (RNN) that can learn long-range dependencies. They use a gating mechanism (forget, input, output gates) to selectively remember or forget information. For time series, the input is a sliding window of past observations, and the output is the forecast.

4.10 VAR — Vector Autoregression

When you have multiple interrelated time series (e.g., Nifty50 + USD/INR + gold prices), VAR models capture the linear interdependencies. Each variable is a linear function of past lags of itself AND all other variables.

4.11 Anomaly Detection in Time Series

Anomalies are observations that deviate significantly from expected patterns. Methods include: statistical thresholds (z-score, IQR), forecast-based (flag points with large residuals), isolation forests, autoencoders, and Prophet's built-in uncertainty intervals.

🏭 Industry Alert

In production systems, the choice between classical (ARIMA) and ML (LSTM, Prophet) depends on: (1) data volume — LSTM needs thousands of data points, ARIMA works with fewer, (2) interpretability — ARIMA provides coefficient-level insight, (3) automation — Prophet is designed for "analyst in the loop" at scale, and (4) latency — ARIMA is orders of magnitude faster to train than LSTM.

Mathematical Foundation

5.1 AR(p) Process

Autoregressive Model of Order p

Y(t) = c + φ₁Y(t-1) + φ₂Y(t-2) + ... + φₚY(t-p) + ε(t)

where c is a constant, φ₁...φₚ are AR coefficients, ε(t) ~ WN(0, σ²)

The process is stationary if all roots of the characteristic polynomial 1 - φ₁z - φ₂z² - ... - φₚzᵖ = 0 lie outside the unit circle.

5.2 MA(q) Process

Moving Average Model of Order q

Y(t) = μ + ε(t) + θ₁ε(t-1) + θ₂ε(t-2) + ... + θₑε(t-q)

where μ is the mean, θ₁...θₑ are MA coefficients, ε(t) ~ WN(0, σ²)

An MA(q) process is always stationary (finite linear combination of white noise). It is invertible if all roots of 1 + θ₁z + θ₂z² + ... + θₑzᵍ = 0 lie outside the unit circle.

5.3 ARIMA(p,d,q)

ARIMA Model

φ(B)(1-B)ᵈ Y(t) = c + θ(B) ε(t)

B is the backshift operator: B·Y(t) = Y(t-1)
φ(B) = 1 - φ₁B - ... - φₚBᵖ (AR polynomial)
θ(B) = 1 + θ₁B + ... + θₑBᵍ (MA polynomial)
(1-B)ᵈ is the differencing operator applied d times

5.4 SARIMA(p,d,q)(P,D,Q)ₘ

Seasonal ARIMA

φ(B) Φ(Bᵐ) (1-B)ᵈ (1-Bᵐ)ᴰ Y(t) = c + θ(B) Θ(Bᵐ) ε(t)

Φ(Bᵐ) = 1 - Φ₁Bᵐ - ... - ΦₚBᴾᵐ (seasonal AR)
Θ(Bᵐ) = 1 + Θ₁Bᵐ + ... + ΘQBQᵐ (seasonal MA)
m = seasonal period

5.5 Exponential Smoothing — State Space Form

Simple Exponential Smoothing (SES)

ŷ(t+1) = αy(t) + (1-α)ŷ(t), where 0 < α ≤ 1

Holt's Linear Trend

Level: ℓ(t) = αy(t) + (1-α)(ℓ(t-1) + b(t-1))
Trend: b(t) = β(ℓ(t) - ℓ(t-1)) + (1-β)b(t-1)
Forecast: ŷ(t+h) = ℓ(t) + h·b(t)

Holt-Winters (Additive Seasonality)

Level: ℓ(t) = α(y(t) - s(t-m)) + (1-α)(ℓ(t-1) + b(t-1))
Trend: b(t) = β(ℓ(t) - ℓ(t-1)) + (1-β)b(t-1)
Season: s(t) = γ(y(t) - ℓ(t-1) - b(t-1)) + (1-γ)s(t-m)
Forecast: ŷ(t+h) = ℓ(t) + h·b(t) + s(t+h-m)

5.6 Evaluation Metrics

Mean Absolute Percentage Error (MAPE)

MAPE = (100/n) Σ |y(t) - ŷ(t)| / |y(t)|

Root Mean Squared Error (RMSE)

RMSE = √[(1/n) Σ (y(t) - ŷ(t))²]

Symmetric MAPE (SMAPE)

SMAPE = (100/n) Σ |y(t) - ŷ(t)| / ((|y(t)| + |ŷ(t)|)/2)

🎯 Exam Tip

MAPE has issues when actual values are near zero (division by zero). SMAPE is bounded [0%, 200%] and handles this better. RMSE penalizes large errors more heavily due to squaring. In exams, know which metric is appropriate: RMSE for point accuracy, MAPE for relative comparison, SMAPE for symmetry.

5.7 ADF Test Statistic

Augmented Dickey-Fuller Regression

ΔY(t) = α + βt + γY(t-1) + Σᵢ δᵢΔY(t-i) + ε(t)

Test statistic: τ = γ̂ / SE(γ̂)
H₀: γ = 0 (unit root → non-stationary)

Formula Derivations

6.1 Deriving the AR(1) Autocorrelation Function

For AR(1): Y(t) = φY(t-1) + ε(t), with |φ| < 1 for stationarity.

Step 1: Variance — γ(0) = Var(Y(t)) = φ²Var(Y(t-1)) + σ² = φ²γ(0) + σ²

Therefore: γ(0) = σ² / (1 - φ²)

Step 2: Autocovariance at lag k — Multiply Y(t) = φY(t-1) + ε(t) by Y(t-k) and take expectations:

γ(k) = E[Y(t)·Y(t-k)] = φ·E[Y(t-1)·Y(t-k)] + E[ε(t)·Y(t-k)] = φ·γ(k-1) + 0 = φ·γ(k-1)

By recursion: γ(k) = φᵏ · γ(0)

Step 3: Autocorrelation: ρ(k) = γ(k)/γ(0) = φᵏ

AR(1) Autocorrelation

ρ(k) = φᵏ — exponential decay for 0 < φ < 1, alternating decay for -1 < φ < 0

6.2 Deriving MA(1) Autocorrelation

For MA(1): Y(t) = ε(t) + θε(t-1)

Step 1: γ(0) = Var(ε(t) + θε(t-1)) = (1 + θ²)σ²

Step 2: γ(1) = Cov(ε(t) + θε(t-1), ε(t-1) + θε(t-2)) = θσ²

Step 3: γ(k) = 0 for k ≥ 2 (non-overlapping noise terms)

MA(1) Autocorrelation

ρ(1) = θ / (1 + θ²), ρ(k) = 0 for k ≥ 2

6.3 Deriving Simple Exponential Smoothing

The SES forecast ŷ(t+1) = αy(t) + (1-α)ŷ(t) can be expanded recursively:

ŷ(t+1) = αy(t) + (1-α)[αy(t-1) + (1-α)ŷ(t-1)]

= αy(t) + α(1-α)y(t-1) + (1-α)²ŷ(t-1)

Continuing infinitely:

SES as Weighted Average

ŷ(t+1) = α Σᵢ₌₀^∞ (1-α)ⁱ y(t-i)

Weights decay geometrically — recent data gets more weight. The weights sum to 1 since α Σ(1-α)ⁱ = α · 1/(1-(1-α)) = 1

6.4 Box-Cox Transformation

Box-Cox Transform

y(λ) = { (yᵘ - 1)/λ if λ ≠ 0
{ ln(y) if λ = 0

λ = 1: No transform | λ = 0.5: Square root | λ = 0: Log | λ = -1: Inverse

🎓 Professor's Insight

The optimal λ can be found by maximizing the log-likelihood of the transformed data. In practice, scipy's boxcox function handles this automatically. Always remember to inverse transform your forecasts back to the original scale!

Worked Numerical Examples

Example 1: AR(1) Forecast by Hand

Given: AR(1) model Y(t) = 5 + 0.7·Y(t-1) + ε(t), with Y(100) = 20.

Task: Forecast Y(101), Y(102), Y(103).

Solution:

ŷ(101) = 5 + 0.7 × 20 = 5 + 14 = 19.0
ŷ(102) = 5 + 0.7 × 19.0 = 5 + 13.3 = 18.3
ŷ(103) = 5 + 0.7 × 18.3 = 5 + 12.81 = 17.81

Note: Forecasts converge toward the long-run mean μ = c/(1-φ) = 5/(1-0.7) = 16.67.

Example 2: Simple Exponential Smoothing

Given: Sales data: [100, 120, 110, 130, 125]. α = 0.3. Initial forecast ŷ(1) = 100.

Solution:

ŷ(2) = 0.3×100 + 0.7×100 = 30 + 70 = 100.0
ŷ(3) = 0.3×120 + 0.7×100 = 36 + 70 = 106.0
ŷ(4) = 0.3×110 + 0.7×106 = 33 + 74.2 = 107.2
ŷ(5) = 0.3×130 + 0.7×107.2 = 39 + 75.04 = 114.04
ŷ(6) = 0.3×125 + 0.7×114.04 = 37.5 + 79.83 = 117.33

Example 3: MAPE Calculation

Given: Actual = [100, 150, 200], Forecast = [110, 140, 180].

Solution:

|100-110|/100 = 10/100 = 0.10
|150-140|/150 = 10/150 = 0.0667
|200-180|/200 = 20/200 = 0.10

MAPE = (100/3)(0.10 + 0.0667 + 0.10) = (100/3)(0.2667) = 8.89%

Example 4: Differencing by Hand

Given: Y = [10, 13, 18, 25, 34, 45]

First Difference: ΔY = [3, 5, 7, 9, 11] — still shows a trend

Second Difference: Δ²Y = [2, 2, 2, 2] — constant! Series is now stationary.

Since d=2 differences were needed, the original series has a quadratic trend (Y ≈ t²).

Example 5: ACF for MA(1)

Given: MA(1) model Y(t) = ε(t) + 0.6ε(t-1), σ² = 4.

Solution:

γ(0) = (1 + 0.6²) × 4 = 1.36 × 4 = 5.44
γ(1) = 0.6 × 4 = 2.4
γ(k) = 0 for k ≥ 2
ρ(1) = 2.4/5.44 = 0.441
ρ(k) = 0 for k ≥ 2 — ACF cuts off after lag 1 ✓

🎯 Exam Tip

For hand calculations, always show: (1) formula substitution, (2) intermediate arithmetic, (3) final answer with units/interpretation. For model identification, draw rough ACF/PACF sketches and match them to the table in Section 4.4.

Visual Diagrams

Time Series Decomposition

 Original Series Y(t)            Trend T(t)                 Seasonality S(t)           Residual ε(t)
 ┌──────────────────┐            ┌──────────────────┐       ┌──────────────────┐       ┌──────────────────┐
 │    ╭╮  ╭╮  ╭╮   │            │             ╱╱   │       │  ╭╮╭╮╭╮╭╮╭╮╭╮  │       │ ╷  ╷╵ ╷╵╷╵ ╷   │
 │  ╭╯ ╰╮╯ ╰╮╯ ╰╮ │            │         ╱╱╱     │       │ ╯╰╯╰╯╰╯╰╯╰╯╰╯ │       │╵ ╵╷  ╷  ╵  ╵╷  │
 │╭╯    ╰   ╰    ╰╮│     =      │     ╱╱╱         │   +   │  (repeating)    │   +   │  ╷ ╵╷ ╵╷  ╷ ╵  │
 │╯                ╰│            │ ╱╱╱             │       │  ╭╮╭╮╭╮╭╮╭╮╭╮  │       │╵  ╷  ╵  ╵╷  ╷╵ │
 │                  │            │╱                 │       │ ╯╰╯╰╯╰╯╰╯╰╯╰╯ │       │   ╵  ╵╷ ╷ ╵╷   │
 └──────────────────┘            └──────────────────┘       └──────────────────┘       └──────────────────┘
        Time →                         Time →                     Time →                    Time →

ACF and PACF Patterns for Model Identification

   AR(2) Model                          MA(2) Model                         ARMA(1,1) Model
   ═══════════                          ═══════════                         ═══════════════
   ACF: Gradual Decay                   ACF: Cuts off at lag 2              ACF: Gradual Decay
   ┌─────────────┐                      ┌─────────────┐                     ┌─────────────┐
   │█████████    │ lag 1                │████████████ │ lag 1               │██████████   │ lag 1
   │███████      │ lag 2                │████████     │ lag 2               │███████      │ lag 2
   │████         │ lag 3                │─ ─ ─ ─ ─ ─ │ lag 3 (≈0)          │█████        │ lag 3
   │██           │ lag 4                │─ ─ ─ ─ ─ ─ │ lag 4 (≈0)          │███          │ lag 4
   │█            │ lag 5                │─ ─ ─ ─ ─ ─ │ lag 5 (≈0)          │██           │ lag 5
   └─────────────┘                      └─────────────┘                     └─────────────┘

   PACF: Cuts off at lag 2             PACF: Gradual Decay                 PACF: Gradual Decay
   ┌─────────────┐                      ┌─────────────┐                     ┌─────────────┐
   │████████████ │ lag 1                │████████████ │ lag 1               │██████████   │ lag 1
   │███████      │ lag 2                │███████      │ lag 2               │█████        │ lag 2
   │─ ─ ─ ─ ─ ─ │ lag 3 (≈0)          │████         │ lag 3               │███          │ lag 3
   │─ ─ ─ ─ ─ ─ │ lag 4 (≈0)          │██           │ lag 4               │██           │ lag 4
   │─ ─ ─ ─ ─ ─ │ lag 5 (≈0)          │█            │ lag 5               │█            │ lag 5
   └─────────────┘                      └─────────────┘                     └─────────────┘

LSTM Cell Architecture

                              ┌─────────────────────────────────────────────┐
                              │              LSTM Cell                      │
                              │                                             │
    c(t-1) ───────────────────┤───[×]───────────[+]─────────────────────── c(t)
                              │    │              │                          │
                              │    │              │                          │
                              │  ┌─┴─┐         ┌─┴─┐                       │
                              │  │ fₜ │         │ iₜ │×[c̃ₜ]                 │
                              │  │for-│         │inp-│                       │
                              │  │get │         │ut  │                       │
                              │  │gate│         │gate│                       │
                              │  └─┬─┘         └─┬─┘                       │
    h(t-1) ──────┬────────────┤    │              │                          │
                 │            │    │              │         ┌───┐            │
                 │            │    ├──────────────┤    tanh─┤   │──[×]───── h(t)
                 │            │    │              │         └───┘   │        │
                 │            │  ┌─┴──────────────┴─┐             ┌┴─┐      │
                 │            │  │   σ    σ   tanh   │            │oₜ│      │
                 │            │  │  [fₜ] [iₜ] [c̃ₜ]  │            │out│      │
                 │            │  └───────┬──────────┘            │put│      │
                 │            │          │                        │gte│      │
                 └────────────┼──────────┘                        └───┘      │
                              │                                             │
    x(t) ─────────────────────┘                                             │
                              └─────────────────────────────────────────────┘

    fₜ = σ(Wf·[h(t-1), x(t)] + bf)     ← Forget gate
    iₜ = σ(Wi·[h(t-1), x(t)] + bi)     ← Input gate
    c̃ₜ = tanh(Wc·[h(t-1), x(t)] + bc)  ← Candidate cell state
    c(t) = fₜ × c(t-1) + iₜ × c̃ₜ        ← New cell state
    oₜ = σ(Wo·[h(t-1), x(t)] + bo)     ← Output gate
    h(t) = oₜ × tanh(c(t))              ← Hidden state output

Sliding Window for Time Series → Supervised Learning

    Original Series: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
    Window Size = 3, Horizon = 1

    ┌──────────────────────┬─────────┐
    │   Features (X)       │ Target  │
    ├──────────────────────┼─────────┤
    │  [10,  20,  30]      │   40    │
    │  [20,  30,  40]      │   50    │
    │  [30,  40,  50]      │   60    │
    │  [40,  50,  60]      │   70    │
    │  [50,  60,  70]      │   80    │
    │  [60,  70,  80]      │   90    │
    │  [70,  80,  90]      │  100    │
    └──────────────────────┴─────────┘

    → Converts time series into supervised learning problem
    → For LSTM: X reshaped to (samples, timesteps, features) = (7, 3, 1)

Flowcharts

Box-Jenkins Methodology Flowchart

    ┌──────────────────────────────────────────────────────────────────────┐
    │                    START: Raw Time Series Data                       │
    └──────────────────────────────┬───────────────────────────────────────┘
                                   │
                                   ▼
    ┌──────────────────────────────────────────────────────────────────────┐
    │  Step 1: VISUALIZATION                                              │
    │  • Plot the series  • Look for trend, seasonality, outliers         │
    └──────────────────────────────┬───────────────────────────────────────┘
                                   │
                                   ▼
    ┌──────────────────────────────────────────────────────────────────────┐
    │  Step 2: STATIONARITY CHECK                                         │
    │  • ADF Test  • KPSS Test  • Visual inspection of mean/variance      │
    └──────────────────────────────┬───────────────────────────────────────┘
                                   │
                          ┌────────┴────────┐
                          │  Stationary?    │
                          └────────┬────────┘
                         NO ╱      │       ╲ YES
                           ╱       │        ╲
                          ▼        │         ▼
    ┌─────────────────────────┐    │    ┌────────────────────────────────┐
    │ Apply Transformations   │    │    │  Step 3: MODEL IDENTIFICATION  │
    │ • Differencing (d)      │    │    │  • Plot ACF → determine q      │
    │ • Log Transform         │────┘    │  • Plot PACF → determine p     │
    │ • Seasonal diff (D)     │         │  • d = number of diffs needed  │
    └─────────────────────────┘         └──────────────┬─────────────────┘
                                                       │
                                                       ▼
    ┌──────────────────────────────────────────────────────────────────────┐
    │  Step 4: PARAMETER ESTIMATION                                       │
    │  • Fit ARIMA(p,d,q) or SARIMA(p,d,q)(P,D,Q,m)                     │
    │  • Use MLE (Maximum Likelihood Estimation)                         │
    │  • Compare AIC/BIC for candidate models                            │
    └──────────────────────────────┬───────────────────────────────────────┘
                                   │
                                   ▼
    ┌──────────────────────────────────────────────────────────────────────┐
    │  Step 5: DIAGNOSTIC CHECKING                                        │
    │  • Ljung-Box test on residuals (should be white noise)             │
    │  • Residual ACF plot (no significant lags)                         │
    │  • Q-Q plot (normality check)                                      │
    └──────────────────────────────┬───────────────────────────────────────┘
                                   │
                          ┌────────┴────────┐
                          │ Residuals OK?   │
                          └────────┬────────┘
                         NO ╱      │       ╲ YES
                           ╱       │        ╲
                          ▼        │         ▼
    ┌─────────────────────────┐    │    ┌────────────────────────────────┐
    │ Revise model orders     │    │    │  Step 6: FORECASTING           │
    │ Try different (p,d,q)   │────┘    │  • Generate point forecasts    │
    │ Consider SARIMA         │         │  • Compute confidence intervals│
    └─────────────────────────┘         │  • Monitor performance         │
                                        └────────────────────────────────┘

Model Selection Decision Tree

                          ┌──────────────────────┐
                          │  Time Series Data    │
                          └──────────┬───────────┘
                                     │
                          ┌──────────┴───────────┐
                          │  Univariate or       │
                          │  Multivariate?       │
                          └──────────┬───────────┘
                         Uni ╱               ╲ Multi
                            ╱                 ╲
                           ▼                   ▼
               ┌───────────────┐       ┌──────────────┐
               │ Has            │       │   VAR Model  │
               │ Seasonality?  │       │   or VECM    │
               └───────┬───────┘       └──────────────┘
              YES ╱         ╲ NO
                 ╱           ╲
                ▼             ▼
     ┌────────────────┐  ┌────────────────┐
     │ Data Size?     │  │ Data Size?     │
     └───────┬────────┘  └───────┬────────┘
      <500╱     ╲>500     <500╱     ╲>500
          ╱       ╲          ╱       ╲
         ▼         ▼        ▼         ▼
    ┌─────────┐ ┌────────┐ ┌──────┐ ┌──────────┐
    │ SARIMA  │ │ LSTM / │ │ARIMA │ │ LSTM /   │
    │ Holt-   │ │Prophet │ │ SES  │ │ Prophet  │
    │ Winters │ │ SARIMA │ │ Holt │ │ ARIMA    │
    └─────────┘ └────────┘ └──────┘ └──────────┘

Time Series Anomaly Detection Pipeline

    Raw Time Series
         │
         ▼
    ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
    │  Decompose  │────▶│  Model the  │────▶│  Compute    │
    │  (STL /     │     │  Expected   │     │  Residuals  │
    │  seasonal_  │     │  (ARIMA /   │     │  r(t) =     │
    │  decompose) │     │  Prophet)   │     │  y(t) - ŷ(t)│
    └─────────────┘     └─────────────┘     └──────┬──────┘
                                                   │
                                                   ▼
                                            ┌─────────────┐
                                            │  Flag where │
                                            │  |r(t)| >   │
                                            │  k × σ      │
                                            │  (k=2 or 3) │
                                            └──────┬──────┘
                                                   │
                                          ┌────────┴────────┐
                                          │ Anomaly?        │
                                          └────────┬────────┘
                                        YES ╱           ╲ NO
                                           ╱             ╲
                                          ▼               ▼
                                  ┌──────────────┐  ┌──────────┐
                                  │ Alert /      │  │ Normal   │
                                  │ Investigate  │  │ Continue │
                                  └──────────────┘  └──────────┘

Python Implementation (From Scratch)

10.1 Simple Exponential Smoothing from Scratch

PYTHON
import numpy as np

class SimpleExponentialSmoothing:
    """Simple Exponential Smoothing from scratch."""
    
    def __init__(self, alpha=0.3):
        self.alpha = alpha
        self.fitted_values = []
    
    def fit(self, series):
        """Fit the SES model to a time series."""
        self.series = np.array(series, dtype=float)
        n = len(self.series)
        self.fitted_values = np.zeros(n)
        
        # Initialize: first forecast = first observation
        self.fitted_values[0] = self.series[0]
        
        for t in range(1, n):
            self.fitted_values[t] = (
                self.alpha * self.series[t - 1] +
                (1 - self.alpha) * self.fitted_values[t - 1]
            )
        
        self.level = self.fitted_values[-1]
        return self
    
    def forecast(self, steps=1):
        """Forecast future values (SES gives flat forecast)."""
        last_smoothed = (
            self.alpha * self.series[-1] +
            (1 - self.alpha) * self.level
        )
        return np.full(steps, last_smoothed)
    
    def mape(self):
        """Calculate Mean Absolute Percentage Error."""
        actual = self.series[1:]
        forecast = self.fitted_values[1:]
        return np.mean(np.abs((actual - forecast) / actual)) * 100


# Example usage
data = [112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118]
model = SimpleExponentialSmoothing(alpha=0.3)
model.fit(data)
print("Fitted:", np.round(model.fitted_values, 2))
print("Forecast (next 3):", np.round(model.forecast(3), 2))
print(f"MAPE: {model.mape():.2f}%")

10.2 AR(1) Model from Scratch

PYTHON
import numpy as np

class AR1Model:
    """AR(1) model: Y(t) = c + phi * Y(t-1) + epsilon."""
    
    def __init__(self):
        self.phi = None
        self.c = None
        self.sigma2 = None
    
    def fit(self, series):
        """Estimate AR(1) parameters using OLS."""
        y = np.array(series, dtype=float)
        n = len(y)
        
        # Set up regression: y(t) = c + phi * y(t-1)
        Y = y[1:]      # dependent variable
        X = y[:-1]      # lagged variable
        
        # Add intercept
        X_design = np.column_stack([np.ones(n - 1), X])
        
        # OLS: beta = (X'X)^{-1} X'Y
        beta = np.linalg.solve(X_design.T @ X_design, X_design.T @ Y)
        self.c = beta[0]
        self.phi = beta[1]
        
        # Residuals
        residuals = Y - X_design @ beta
        self.sigma2 = np.var(residuals, ddof=2)
        self.last_value = y[-1]
        
        return self
    
    def forecast(self, steps=5):
        """Generate multi-step ahead forecasts."""
        predictions = []
        current = self.last_value
        for _ in range(steps):
            next_val = self.c + self.phi * current
            predictions.append(next_val)
            current = next_val
        return np.array(predictions)
    
    def long_run_mean(self):
        """E[Y] = c / (1 - phi) for stationary process."""
        if abs(self.phi) >= 1:
            return float('inf')
        return self.c / (1 - self.phi)
    
    def is_stationary(self):
        """Check stationarity condition: |phi| < 1."""
        return abs(self.phi) < 1


# Example: Fit AR(1) to synthetic data
np.random.seed(42)
n = 200
y = np.zeros(n)
y[0] = 10
for t in range(1, n):
    y[t] = 3 + 0.7 * y[t-1] + np.random.normal(0, 1)

model = AR1Model()
model.fit(y)
print(f"Estimated: c={model.c:.4f}, phi={model.phi:.4f}")
print(f"True:      c=3.0000, phi=0.7000")
print(f"Stationary: {model.is_stationary()}")
print(f"Long-run mean: {model.long_run_mean():.4f} (true: {3/(1-0.7):.4f})")
print(f"Forecast: {np.round(model.forecast(5), 2)}")

10.3 Augmented Dickey-Fuller Test from Scratch

PYTHON
import numpy as np

def adf_test_manual(series, max_lags=1):
    """
    Manual ADF test implementation.
    Tests H0: unit root (non-stationary) vs H1: stationary.
    """
    y = np.array(series, dtype=float)
    n = len(y)
    
    # Compute first differences
    dy = np.diff(y)
    
    # Lagged level
    y_lag = y[:-1]
    
    # Trim to account for lags
    if max_lags > 0:
        # Include lagged differences
        dy_lags = np.column_stack([
            dy[max_lags - i - 1: n - 1 - i - 1]
            for i in range(max_lags)
        ])
        dy_trimmed = dy[max_lags:]
        y_lag_trimmed = y_lag[max_lags:]
        
        X = np.column_stack([
            np.ones(len(dy_trimmed)),   # intercept
            y_lag_trimmed,               # gamma * y(t-1)
            dy_lags                      # lagged differences
        ])
    else:
        dy_trimmed = dy
        y_lag_trimmed = y_lag
        X = np.column_stack([np.ones(len(dy_trimmed)), y_lag_trimmed])
    
    # OLS estimation
    beta = np.linalg.solve(X.T @ X, X.T @ dy_trimmed)
    residuals = dy_trimmed - X @ beta
    
    # Standard error of gamma
    sigma2 = np.sum(residuals**2) / (len(dy_trimmed) - len(beta))
    cov_matrix = sigma2 * np.linalg.inv(X.T @ X)
    se_gamma = np.sqrt(cov_matrix[1, 1])
    
    # Test statistic
    gamma = beta[1]
    t_stat = gamma / se_gamma
    
    # Critical values (approximate, for n > 100 with intercept)
    critical_values = {
        '1%':  -3.43,
        '5%':  -2.86,
        '10%': -2.57
    }
    
    print(f"ADF Test Statistic: {t_stat:.4f}")
    print(f"Estimated gamma:    {gamma:.6f}")
    for level, cv in critical_values.items():
        reject = "REJECT H0 (Stationary)" if t_stat < cv else "Fail to reject H0"
        print(f"  {level} critical value: {cv:.2f} → {reject}")
    
    return t_stat, gamma


# Test with a stationary series
np.random.seed(42)
stationary = np.random.normal(0, 1, 200).cumsum()  # Random walk (non-stationary)
print("=== Random Walk (Non-Stationary) ===")
adf_test_manual(stationary)

print("\n=== White Noise (Stationary) ===")
white_noise = np.random.normal(0, 1, 200)
adf_test_manual(white_noise)

10.4 ACF/PACF Computation

PYTHON
import numpy as np

def compute_acf(series, max_lag=20):
    """Compute the Autocorrelation Function."""
    y = np.array(series, dtype=float)
    n = len(y)
    mean = np.mean(y)
    var = np.sum((y - mean) ** 2) / n
    
    acf_values = []
    for k in range(max_lag + 1):
        if k == 0:
            acf_values.append(1.0)
        else:
            cov = np.sum((y[k:] - mean) * (y[:-k] - mean)) / n
            acf_values.append(cov / var)
    
    return np.array(acf_values)


def compute_pacf(series, max_lag=20):
    """Compute PACF using the Durbin-Levinson algorithm."""
    acf = compute_acf(series, max_lag)
    n_lags = max_lag
    pacf_values = np.zeros(n_lags + 1)
    pacf_values[0] = 1.0
    pacf_values[1] = acf[1]
    
    phi = np.zeros((n_lags + 1, n_lags + 1))
    phi[1, 1] = acf[1]
    
    for k in range(2, n_lags + 1):
        # Numerator
        num = acf[k] - sum(phi[k-1, j] * acf[k-j] for j in range(1, k))
        # Denominator
        den = 1.0 - sum(phi[k-1, j] * acf[j] for j in range(1, k))
        
        phi[k, k] = num / den
        
        for j in range(1, k):
            phi[k, j] = phi[k-1, j] - phi[k, k] * phi[k-1, k-j]
        
        pacf_values[k] = phi[k, k]
    
    return pacf_values


# Example
np.random.seed(42)
ar1_data = np.zeros(500)
for t in range(1, 500):
    ar1_data[t] = 0.8 * ar1_data[t-1] + np.random.normal(0, 1)

acf = compute_acf(ar1_data, 10)
pacf = compute_pacf(ar1_data, 10)

print("Lag  |  ACF    |  PACF")
print("-" * 30)
for k in range(11):
    print(f"  {k:2d}  | {acf[k]:6.3f}  | {pacf[k]:6.3f}")
print("\n→ ACF decays exponentially, PACF cuts off after lag 1 → AR(1) ✓")

💻 Code Challenge

Extend the AR(1) model to AR(p). Implement an AR class that accepts a parameter p, fits using OLS with p lagged columns, and forecasts iteratively. Test with p=3 on synthetic data generated from a known AR(3) process.

TensorFlow Implementation

11.1 LSTM Time Series Forecaster

TENSORFLOW / KERAS
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import MinMaxScaler

# =========================================================
# 1. Generate / Load Data
# =========================================================
# Synthetic: trend + seasonality + noise
np.random.seed(42)
t = np.arange(0, 500)
series = 0.05 * t + 10 * np.sin(2 * np.pi * t / 50) + np.random.normal(0, 2, 500)
series = series.reshape(-1, 1)

# =========================================================
# 2. Scale Data
# =========================================================
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(series)

# =========================================================
# 3. Create Sliding Window Dataset
# =========================================================
def create_sequences(data, window_size=30):
    X, y = [], []
    for i in range(window_size, len(data)):
        X.append(data[i - window_size:i, 0])
        y.append(data[i, 0])
    return np.array(X), np.array(y)

WINDOW = 30
X, y = create_sequences(scaled, WINDOW)

# Train-test split (80/20)
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

# Reshape for LSTM: (samples, timesteps, features)
X_train = X_train.reshape(-1, WINDOW, 1)
X_test = X_test.reshape(-1, WINDOW, 1)

# =========================================================
# 4. Build LSTM Model
# =========================================================
model = Sequential([
    LSTM(64, return_sequences=True, input_shape=(WINDOW, 1)),
    Dropout(0.2),
    LSTM(32, return_sequences=False),
    Dropout(0.2),
    Dense(16, activation='relu'),
    Dense(1)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='mse',
    metrics=['mae']
)

model.summary()

# =========================================================
# 5. Train
# =========================================================
early_stop = EarlyStopping(
    monitor='val_loss', patience=10, restore_best_weights=True
)

history = model.fit(
    X_train, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=1
)

# =========================================================
# 6. Evaluate
# =========================================================
y_pred_scaled = model.predict(X_test)
y_pred = scaler.inverse_transform(y_pred_scaled)
y_actual = scaler.inverse_transform(y_test.reshape(-1, 1))

rmse = np.sqrt(np.mean((y_actual - y_pred) ** 2))
mape = np.mean(np.abs((y_actual - y_pred) / y_actual)) * 100
print(f"\nRMSE: {rmse:.4f}")
print(f"MAPE: {mape:.2f}%")

# =========================================================
# 7. Multi-Step Forecast
# =========================================================
def forecast_multistep(model, last_window, steps, scaler):
    """Iteratively forecast multiple steps ahead."""
    predictions = []
    current = last_window.copy()
    
    for _ in range(steps):
        pred = model.predict(current.reshape(1, -1, 1), verbose=0)
        predictions.append(pred[0, 0])
        current = np.append(current[1:], pred[0, 0])
    
    predictions = np.array(predictions).reshape(-1, 1)
    return scaler.inverse_transform(predictions)

future = forecast_multistep(model, scaled[-WINDOW:, 0], 10, scaler)
print(f"\nNext 10 forecasts: {future.flatten().round(2)}")

11.2 Bidirectional LSTM with Attention

TENSORFLOW / KERAS
from tensorflow.keras.layers import (
    Bidirectional, LSTM, Dense, Dropout,
    Attention, Input, Concatenate, Flatten
)
from tensorflow.keras.models import Model

def build_attention_lstm(window_size, n_features=1):
    """LSTM with simple attention mechanism for time series."""
    
    inp = Input(shape=(window_size, n_features))
    
    # Bidirectional LSTM
    lstm_out = Bidirectional(
        LSTM(64, return_sequences=True)
    )(inp)
    lstm_out = Dropout(0.2)(lstm_out)
    
    # Second LSTM layer
    lstm_out2 = Bidirectional(
        LSTM(32, return_sequences=True)
    )(lstm_out)
    
    # Simple attention: learn which timesteps matter most
    attention = Dense(1, activation='tanh')(lstm_out2)
    attention = Flatten()(attention)
    attention = Dense(window_size, activation='softmax')(attention)
    
    # Apply attention weights
    # Reshape attention for element-wise multiplication
    from tensorflow.keras.layers import RepeatVector, Permute, Multiply
    attention = RepeatVector(64)(attention)
    attention = Permute([2, 1])(attention)
    
    context = Multiply()([lstm_out2, attention])
    context = tf.keras.layers.GlobalAveragePooling1D()(context)
    
    # Output
    out = Dense(32, activation='relu')(context)
    out = Dropout(0.2)(out)
    out = Dense(1)(out)
    
    model = Model(inputs=inp, outputs=out)
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    
    return model

# Build and display
attn_model = build_attention_lstm(window_size=30)
attn_model.summary()

🎓 Professor's Insight

LSTMs are powerful but come with caveats for time series: (1) They need thousands of data points — don't use them on monthly data with only 60 observations. (2) They're a black box — if interpretability matters (finance regulation), prefer ARIMA or Prophet. (3) Modern alternatives like Temporal Fusion Transformers and N-BEATS often outperform LSTMs. Always benchmark LSTM against a simple baseline like ARIMA first.

Scikit-Learn Pipeline

12.1 ARIMA with statsmodels + Prophet Pipeline

PYTHON
import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import warnings
warnings.filterwarnings('ignore')

# =========================================================
# 1. Load and Prepare Data
# =========================================================
# Using airline passengers dataset as example
from statsmodels.datasets import co2
data = co2.load().data
data = data.resample('M').mean().ffill()
data.columns = ['co2']
print(f"Data shape: {data.shape}")
print(data.head())

# =========================================================
# 2. Stationarity Tests
# =========================================================
def stationarity_report(series, name="Series"):
    """Run ADF and KPSS tests and print results."""
    print(f"\n=== Stationarity Report: {name} ===")
    
    # ADF Test
    adf_result = adfuller(series.dropna(), autolag='AIC')
    print(f"ADF Statistic: {adf_result[0]:.4f}")
    print(f"ADF p-value:   {adf_result[1]:.6f}")
    print(f"ADF → {'Stationary' if adf_result[1] < 0.05 else 'Non-Stationary'}")
    
    # KPSS Test
    kpss_result = kpss(series.dropna(), regression='ct')
    print(f"KPSS Statistic: {kpss_result[0]:.4f}")
    print(f"KPSS p-value:   {kpss_result[1]:.4f}")
    print(f"KPSS → {'Non-Stationary' if kpss_result[1] < 0.05 else 'Stationary'}")

stationarity_report(data['co2'], "CO2 Levels")
stationarity_report(data['co2'].diff().dropna(), "CO2 First Difference")

# =========================================================
# 3. Decomposition
# =========================================================
decomp = seasonal_decompose(data['co2'], model='additive', period=12)
# decomp.plot()  # Uncomment in Jupyter

# =========================================================
# 4. Auto ARIMA using pmdarima
# =========================================================
# pip install pmdarima
import pmdarima as pm

auto_model = pm.auto_arima(
    data['co2'],
    seasonal=True,
    m=12,
    stepwise=True,
    suppress_warnings=True,
    trace=True,
    error_action='ignore',
    information_criterion='aic'
)
print(f"\nBest model: {auto_model.summary()}")

# =========================================================
# 5. Manual ARIMA Fit
# =========================================================
train = data['co2'][:'2000']
test = data['co2']['2001':]

model = ARIMA(train, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
fitted = model.fit()
print(fitted.summary())

# Forecast
forecast = fitted.forecast(steps=len(test))
rmse = np.sqrt(np.mean((test.values - forecast.values) ** 2))
print(f"\nSARIMA RMSE: {rmse:.4f}")

# =========================================================
# 6. Exponential Smoothing
# =========================================================
from statsmodels.tsa.holtwinters import ExponentialSmoothing

hw_model = ExponentialSmoothing(
    train,
    trend='add',
    seasonal='add',
    seasonal_periods=12
).fit()

hw_forecast = hw_model.forecast(steps=len(test))
hw_rmse = np.sqrt(np.mean((test.values - hw_forecast.values) ** 2))
print(f"Holt-Winters RMSE: {hw_rmse:.4f}")

12.2 Prophet Pipeline

PYTHON
# pip install prophet
from prophet import Prophet
import pandas as pd
import numpy as np

# =========================================================
# Prophet requires columns 'ds' (date) and 'y' (value)
# =========================================================
# Prepare data
df = data.reset_index()
df.columns = ['ds', 'y']

# Train-test split
train_df = df[df['ds'] < '2001-01-01']
test_df = df[df['ds'] >= '2001-01-01']

# =========================================================
# Build Prophet Model
# =========================================================
prophet_model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=False,
    daily_seasonality=False,
    changepoint_prior_scale=0.05,    # flexibility of trend
    seasonality_prior_scale=10.0,    # flexibility of seasonality
    interval_width=0.95              # 95% prediction interval
)

# Add Indian holidays (example)
# prophet_model.add_country_holidays(country_name='IN')

prophet_model.fit(train_df)

# =========================================================
# Forecast
# =========================================================
future = prophet_model.make_future_dataframe(periods=len(test_df), freq='M')
forecast = prophet_model.predict(future)

# Evaluate
pred = forecast[forecast['ds'] >= '2001-01-01']['yhat'].values
actual = test_df['y'].values
rmse = np.sqrt(np.mean((actual - pred[:len(actual)]) ** 2))
mape = np.mean(np.abs((actual - pred[:len(actual)]) / actual)) * 100
print(f"Prophet RMSE: {rmse:.4f}")
print(f"Prophet MAPE: {mape:.2f}%")

# =========================================================
# Components
# =========================================================
# prophet_model.plot_components(forecast)  # Uncomment in Jupyter

# =========================================================
# Anomaly Detection with Prophet
# =========================================================
anomalies = forecast[
    (df['y'].values < forecast['yhat_lower'].values) |
    (df['y'].values > forecast['yhat_upper'].values)
]
print(f"\nDetected {len(anomalies)} anomalies")

12.3 VAR Model for Multivariate Forecasting

PYTHON
from statsmodels.tsa.api import VAR
import pandas as pd
import numpy as np

# Simulated multivariate data: Nifty50 + USD/INR
np.random.seed(42)
n = 300
nifty = np.cumsum(np.random.normal(0.5, 50, n)) + 15000
usd_inr = np.cumsum(np.random.normal(0.01, 0.5, n)) + 75
gold = np.cumsum(np.random.normal(0.1, 20, n)) + 50000

df = pd.DataFrame({
    'Nifty50': nifty,
    'USD_INR': usd_inr,
    'Gold': gold
})

# Differencing for stationarity
df_diff = df.diff().dropna()

# Fit VAR
var_model = VAR(df_diff)
# Select optimal lag order
lag_order = var_model.select_order(maxlags=15)
print("Optimal lag orders:")
print(lag_order.summary())

# Fit with optimal lag
fitted_var = var_model.fit(maxlags=5, ic='aic')
print(fitted_var.summary())

# Forecast
forecast_diff = fitted_var.forecast(df_diff.values[-5:], steps=10)
forecast_df = pd.DataFrame(forecast_diff, columns=df.columns)

# Invert differencing
last_values = df.iloc[-1]
forecast_levels = forecast_df.cumsum() + last_values.values
print("\nVAR Forecast (levels):")
print(forecast_levels.round(2))

# Granger Causality Test
from statsmodels.tsa.stattools import grangercausalitytests
print("\n=== Granger Causality: USD/INR → Nifty50 ===")
gc_result = grangercausalitytests(
    df_diff[['Nifty50', 'USD_INR']].dropna(),
    maxlag=5, verbose=True
)

🚀 Career Path

Quantitative Analyst (Quant): At firms like Citadel, DE Shaw, or Indian hedge funds like Edelweiss, quants build time series models for algorithmic trading. Skills needed: ARIMA/GARCH for volatility, LSTM for price prediction, and VAR for cross-asset analysis. Compensation in India: ₹25-80 LPA for experienced roles.

Indian Case Studies

Case Study 1: Nifty50 Stock Index Forecasting

🇮🇳 India Spotlight

Context: The Nifty50 index represents the top 50 companies on the National Stock Exchange (NSE). Retail investors in India grew from 40 million in 2020 to over 130 million demat accounts by 2024.

Challenge: Build a daily Nifty50 closing price forecaster for short-term (5-day) predictions.

Approach:

Data: 10 years of daily Nifty50 closing prices from NSE India (nse-india.com).
Stationarity: ADF test on log-returns (stationary). Raw prices are non-stationary (unit root).
Models tested: ARIMA(2,1,2), GARCH(1,1) for volatility, LSTM with 60-day window.
Feature engineering: Add moving averages (20, 50, 200-day), RSI, Bollinger Bands.

Results:

ARIMA MAPE: 1.8% (5-day forecast)
LSTM MAPE: 1.2% (same horizon)
Key insight: Stock prices follow a random walk — even small improvements matter. LSTM captured regime changes (budget announcements, RBI rate decisions) better than ARIMA.

Lesson: Stock price prediction is extremely difficult (Efficient Market Hypothesis). Models are more useful for volatility estimation and risk management than price direction.

Case Study 2: IMD Monsoon Rainfall Prediction

Context: The Indian monsoon delivers ~70% of annual rainfall between June-September, critical for agriculture (which employs ~42% of the workforce). IMD issues Long Range Forecasts (LRF) in April.

Approach:

Data: 150+ years of All-India monsoon rainfall (IITM Pune dataset).
Predictors: ENSO (El Niño), IOD (Indian Ocean Dipole), snow cover, SST.
Models: SARIMA(1,0,1)(1,1,1,12), Multiple regression with lagged climate indices, Neural network ensemble.
Evaluation: Categorical accuracy (normal/excess/deficit).

Results: IMD's statistical model correctly categorizes monsoon in ~70% of years. The 2023 El Niño year was correctly predicted as "below normal" with a 3-month lead time.

Case Study 3: COVID-19 India Curve Forecasting

Context: During the 2020-2021 COVID-19 waves, ICMR and IIT research teams built forecasting models to predict daily cases and guide lockdown decisions.

Approach:

Data: Daily confirmed cases, deaths, recoveries from covid19india.org API.
Models: SIR/SEIR epidemiological models, ARIMA on log-transformed daily cases, LSTM with mobility data features, Prophet with custom changepoints for lockdown dates.

Key findings: Prophet with manual changepoints at lockdown start/end dates outperformed pure ARIMA. LSTM required careful windowing — the second wave (Delta variant, April 2021) had fundamentally different dynamics from the first wave, causing models trained on Wave 1 to severely underestimate Wave 2 peaks.

Lesson: Time series models assume the future resembles the past. Black swan events (new variants, policy changes) break this assumption. Ensemble methods with scenario analysis work best for crisis forecasting.

🎓 Professor's Insight

Indian time series often have unique challenges: (1) Missing data during holidays/hartals, (2) Structural breaks from demonetization (Nov 2016) and GST rollout (Jul 2017), (3) Festival-driven seasonality that follows the lunar calendar (Diwali date varies by 15+ days). Always account for these in your models.

Global Case Studies

Case Study 1: Weather Forecasting (NOAA / ECMWF)

Context: Modern weather forecasting combines physics-based Numerical Weather Prediction (NWP) with statistical/ML post-processing. The ECMWF (European Centre for Medium-Range Weather Forecasts) produces the world's best 10-day forecasts.

ML Role:

Post-processing NWP output using gradient boosting and neural networks.
Google DeepMind's GraphCast (2023) uses GNNs to produce 10-day forecasts in under a minute (vs. hours for NWP), matching ECMWF accuracy.
Time series methods (ARIMA, ETS) still used for localized short-term temperature/wind speed forecasting.

Metrics: RMSE for temperature (typically 1-2°C for 24-hour forecasts), skill scores relative to climatological baselines.

Case Study 2: Energy Demand Forecasting (Electricity Grids)

Context: Power grid operators (like ISO New England, or POSOCO in India) must forecast electricity demand hourly to dispatch generation units efficiently. Over-forecasting wastes fuel; under-forecasting causes blackouts.

Approach:

Features: Historical load, temperature, humidity, day of week, holidays, special events.
Models: SARIMA for baseline, gradient boosting for complex patterns, LSTM for intra-day load shape.
Scale: Forecasts generated for 15-minute, hourly, daily, and weekly horizons.

Results: Modern ensembles achieve MAPE of 1-3% for day-ahead forecasting. The challenge grows with renewable energy integration — solar and wind output depends on weather, adding another layer of uncertainty.

Case Study 3: Retail Sales Forecasting (Walmart / Amazon)

Context: Walmart manages inventory for 4,700+ stores across the US. Accurate demand forecasting reduces waste, prevents stockouts, and saves billions in supply chain costs.

Approach:

Walmart's M5 competition (2020) on Kaggle: 42,840 hierarchical time series for sales at item-store-state levels.
Winning solutions used LightGBM with extensive feature engineering: lag features, rolling means, price changes, calendar events, SNAP benefits.
Prophet used for interpretable decomposition of trends and holiday effects.

Results: Top solutions achieved WRMSSE (Weighted Root Mean Scaled Squared Error) of ~0.52. Key insight: simple lag features + gradient boosting beat deep learning in this competition.

🏭 Industry Alert

Amazon's demand forecasting system (used in 2023) runs a combination of DeepAR (autoregressive RNN) and classical methods on millions of SKUs. They published the DeepAR paper showing that training a single global model across all related time series outperforms fitting individual ARIMA models to each series — a paradigm shift called "cross-learning."

Startup Applications

15.1 FinTech — Cash Flow Forecasting

Startups: Razorpay, CashFlo, Recko (India)

Fintech startups help SMEs forecast cash flows using transaction history. Prophet models decompose revenue into trend + weekly + monthly seasonality. Alert systems flag anomalies (sudden drops in UPI collections) using residual-based detection.

15.2 AgriTech — Crop Yield Prediction

Startups: CropIn, Fasal, DeHaat (India)

These startups use satellite imagery time series (NDVI index over crop growth cycles) + weather time series to forecast crop yields. Models include: SARIMA for rainfall patterns, LSTM for multi-source sensor data fusion, and Prophet for price forecasting to advise farmers on optimal selling times.

15.3 HealthTech — Patient Volume Forecasting

Startups: Practo, 1mg

Hospital bed occupancy, emergency room visits, and appointment demand follow time series patterns with weekly seasonality (higher on Mondays), annual seasonality (flu season), and pandemic-driven anomalies. Accurate forecasting enables efficient staff scheduling.

15.4 E-Commerce — Demand Sensing

Startups: Meesho, Udaan, Flipkart

Indian e-commerce faces extreme demand spikes during sales events (Big Billion Days, Great Indian Festival). Time series models augmented with event indicators and real-time search trend data enable "demand sensing" — adjusting forecasts hours/days ahead of traditional weekly cycles.

🚀 Career Path

Data Scientist — Forecasting: Specialized roles at companies like Amazon, Flipkart, Uber, and OLA focus exclusively on time series forecasting. Required skills: statsmodels, Prophet, deep learning, and strong statistical foundations. These roles are among the highest-paying DS positions due to direct revenue impact.

Government Applications

16.1 RBI — Inflation Forecasting

The Reserve Bank of India uses time series models to forecast CPI inflation for its monetary policy decisions. The RBI's Quarterly Projection Model combines ARIMA with structural economic models. Accurate inflation forecasts determine repo rate changes that affect every Indian borrower.

16.2 NITI Aayog — GDP Forecasting

India's GDP forecasting uses VAR models with high-frequency indicators (IIP, PMI, GST collections, electricity consumption) as predictors. The "nowcasting" approach uses real-time data to estimate current-quarter GDP before official statistics are released.

16.3 Ministry of Health — Epidemic Surveillance

IDSP (Integrated Disease Surveillance Programme) monitors weekly disease incidence across India. Time series anomaly detection flags unusual spikes in dengue, malaria, and respiratory illness cases at the district level, triggering rapid response.

16.4 Smart Cities Mission — Traffic Forecasting

Cities like Pune, Hyderabad, and Bengaluru use traffic sensor time series data to forecast congestion, optimize signal timings, and plan infrastructure. SARIMA models capture daily and weekly patterns, while LSTM models incorporate weather and event data.

🇮🇳 India Spotlight

India's GST (Goods and Services Tax) collections are a key economic time series. Monthly collections crossed ₹2 lakh crore in April 2024. The Ministry of Finance uses time series analysis to forecast revenue and set fiscal targets for the Union Budget.

Industry Applications

17.1 Manufacturing — Predictive Maintenance

Sensor time series (vibration, temperature, pressure) from machines are monitored using anomaly detection. LSTM autoencoders learn "normal" operating patterns and flag deviations. Companies like Tata Steel and Reliance Industries use this to prevent unplanned downtime worth crores per hour.

17.2 Telecom — Network Traffic Forecasting

Jio, Airtel, and Vi forecast network traffic to plan capacity. Time series at the cell tower level show strong daily patterns (peaks at 8-10 PM) and weekly patterns (weekends vs. weekdays). SARIMA and Prophet handle this well, while LSTM captures special events like cricket match streaming.

17.3 Banking — ATM Cash Demand

Banks forecast cash withdrawal patterns at each ATM to optimize replenishment schedules. The series shows strong patterns: salary days (1st and 15th), weekends, festivals, and even election days. Post-demonetization (Nov 2016), models had to rapidly adapt to fundamentally changed withdrawal patterns.

17.4 Pharma — Drug Demand Forecasting

Pharmaceutical companies forecast drug demand considering: seasonal illness patterns (flu season), patent expiry impacts (generic entry), and supply chain lead times. Cipla and Sun Pharma use hierarchical time series models (bottom-up from SKU level to national).

17.5 Aviation — Ticket Price Optimization

IndiGo, Air India, and SpiceJet use time series models for dynamic pricing. Historical booking curves (cumulative bookings vs. days before departure) are modeled per route-class combination. Forecasting residual demand enables revenue optimization (yield management).

Industry	Typical Series	Preferred Model	Forecast Horizon
Finance	Stock prices, volatility	ARIMA + GARCH	1-5 days
Retail	SKU-level sales	Prophet / LightGBM	1-13 weeks
Energy	Electricity load	SARIMA + LSTM	1-24 hours
Weather	Temperature, rainfall	NWP + ML post-processing	1-10 days
Telecom	Network traffic	SARIMA + Prophet	Hours to days
Healthcare	Patient admissions	Holt-Winters	1-4 weeks
Manufacturing	Sensor readings	LSTM Autoencoder	Real-time

Mini Projects

🏗️ Mini Project 1: Stock Price Forecaster

Objective: Build an end-to-end system that forecasts the next 5 trading days of Nifty50/RELIANCE stock prices.

Steps:

Data Collection: Use yfinance library to download 5 years of daily OHLCV data for RELIANCE.NS
EDA: Plot closing prices, compute rolling statistics, check stationarity (ADF, KPSS)
Feature Engineering: Log returns, 20/50/200-day moving averages, RSI, MACD, Bollinger Bands
Model 1 — ARIMA: Fit ARIMA on log prices, use auto_arima for order selection
Model 2 — Prophet: Fit Prophet with custom holiday for Indian market holidays (Republic Day, Diwali, etc.)
Model 3 — LSTM: Window=60 days, 2-layer LSTM (64, 32 units), train on 80% data
Comparison: Evaluate all models on test set using RMSE, MAPE, directional accuracy
Deployment: Create a simple Streamlit app showing forecasts with confidence intervals

PYTHON
# Starter code for Mini Project 1
import yfinance as yf
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet

# Download data
ticker = "RELIANCE.NS"
df = yf.download(ticker, start="2019-01-01", end="2024-12-31")
print(f"Downloaded {len(df)} rows for {ticker}")

# Basic analysis
df['Log_Return'] = np.log(df['Close'] / df['Close'].shift(1))
df['MA_20'] = df['Close'].rolling(20).mean()
df['MA_50'] = df['Close'].rolling(50).mean()

# Train-test split
train = df[:'2024-06-30']
test = df['2024-07-01':]
print(f"Train: {len(train)}, Test: {len(test)}")

# ARIMA on log prices
log_price = np.log(train['Close'])
model = ARIMA(log_price, order=(2, 1, 2))
fitted = model.fit()
forecast_log = fitted.forecast(steps=len(test))
forecast_price = np.exp(forecast_log)

rmse = np.sqrt(np.mean((test['Close'].values - forecast_price.values) ** 2))
print(f"ARIMA RMSE: ₹{rmse:.2f}")

# TODO: Add Prophet and LSTM models, compare, build Streamlit app

Expected Deliverables: Jupyter notebook with visualizations, model comparison table, Streamlit dashboard, and a 2-page report discussing why stock prices are fundamentally hard to predict (EMH).

🏗️ Mini Project 2: Weather Predictor (Temperature Forecasting)

Objective: Forecast daily maximum temperature for an Indian city (Delhi/Mumbai/Bangalore) for the next 7 days.

Steps:

Data: Download historical weather data from Open-Meteo API or Visual Crossing for 10+ years
EDA: Annual seasonality (summer/winter), decompose into trend + season + residual
Stationarity: After seasonal differencing (lag=365), check with ADF
Model 1 — SARIMA: SARIMA(p,d,q)(P,D,Q,365) — challenging due to long seasonal period, consider seasonal_period=7 for weekly patterns
Model 2 — Holt-Winters: Additive seasonality with period=365
Model 3 — Prophet: Captures yearly + weekly seasonality automatically
Model 4 — LSTM: Window=90 days, include humidity and wind speed as extra features
Ensemble: Simple average of top 2 models' forecasts

PYTHON
# Starter code for Mini Project 2
import pandas as pd
import numpy as np
import requests

# Fetch data from Open-Meteo API
url = "https://archive-api.open-meteo.com/v1/archive"
params = {
    "latitude": 28.6139,  # Delhi
    "longitude": 77.209,
    "start_date": "2014-01-01",
    "end_date": "2024-12-31",
    "daily": "temperature_2m_max,temperature_2m_min,precipitation_sum",
    "timezone": "Asia/Kolkata"
}
response = requests.get(url, params=params)
data = response.json()

df = pd.DataFrame({
    'date': pd.to_datetime(data['daily']['time']),
    'temp_max': data['daily']['temperature_2m_max'],
    'temp_min': data['daily']['temperature_2m_min'],
    'precip': data['daily']['precipitation_sum']
})
df.set_index('date', inplace=True)
print(f"Delhi weather data: {len(df)} days")
print(df.describe())

# Prophet model
from prophet import Prophet
prophet_df = df[['temp_max']].reset_index()
prophet_df.columns = ['ds', 'y']

model = Prophet(yearly_seasonality=True, weekly_seasonality=True)
model.fit(prophet_df)

future = model.make_future_dataframe(periods=7)
forecast = model.predict(future)
print("\nNext 7-day Temperature Forecast:")
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(7))

Expected Deliverables: Complete notebook, model comparison with RMSE/MAPE, visualization of forecast vs actual for test period, and discussion of which model handles Delhi's extreme summer (45°C+) best.

🏗️ Mini Project 3: Anomaly Detection in Server Metrics

Objective: Build a real-time anomaly detection system for server CPU/memory time series.

Steps:

Generate synthetic server metrics with injected anomalies (spikes, level shifts, trend changes)
Implement statistical method: Z-score on rolling window residuals
Implement Prophet-based method: Flag points outside prediction intervals
Implement LSTM Autoencoder: Reconstruction error-based anomaly scoring
Compare precision/recall/F1 on known anomalies
Build a dashboard showing real-time metric with anomaly highlights

Exercises

Exercise 1 Easy

Given the series [10, 12, 15, 19, 24, 30], compute the first and second differences by hand. Is the first difference stationary? Is the second?

Exercise 2 Easy

For an AR(1) model Y(t) = 2 + 0.5Y(t-1) + ε(t), compute: (a) the long-run mean, (b) ρ(1), ρ(2), ρ(3), (c) Y(t+1) if Y(t) = 8.

Exercise 3 Easy

Explain the difference between ACF and PACF. How do you use them together to identify ARIMA orders?

Exercise 4 Medium

For an MA(1) model Y(t) = 10 + ε(t) + 0.4ε(t-1) with σ²=9, compute: (a) E[Y(t)], (b) Var(Y(t)), (c) ρ(1), (d) ρ(2).

Exercise 5 Medium

Apply SES with α=0.2 to [200, 180, 190, 210, 195, 205, 220]. Start with ŷ(1)=200. Compute all forecasts and the final forecast for period 8.

Exercise 6 Medium

A series has ADF p-value = 0.34 and KPSS p-value = 0.01. What do you conclude? What action should you take?

Exercise 7 Medium

You see ACF that decays gradually and PACF that cuts off sharply at lag 3. What model would you identify? Write the model equation.

Exercise 8 Medium

Explain why a random walk Y(t) = Y(t-1) + ε(t) is non-stationary. What is its variance at time t?

Exercise 9 Medium

Compute MAPE, RMSE, and SMAPE for: Actual = [50, 60, 70, 80, 90], Forecast = [52, 58, 73, 76, 95].

Exercise 10 Medium

What is the difference between additive and multiplicative decomposition? Give a real-world Indian example where multiplicative is preferred.

Exercise 11 Medium

Write Python code to perform seasonal decomposition on monthly airline passenger data using statsmodels. Plot and interpret each component.

Exercise 12 Medium

For SARIMA(1,1,1)(1,1,1,12), how many parameters need to be estimated? Write out the full model equation using backshift notation.

Exercise 13 Hard

Prove that the variance of an AR(1) process Y(t) = φY(t-1) + ε(t) is σ²/(1-φ²). What happens as |φ| → 1?

Exercise 14 Hard

Implement Holt-Winters (additive) from scratch in Python. Test on a series with clear trend and seasonality. Compare your output with statsmodels.

Exercise 15 Hard

Build an LSTM model in TensorFlow for multi-step forecasting (predict 7 days ahead simultaneously). Compare seq2one (iterate) vs seq2seq (direct) approaches.

Exercise 16 Medium

Use Prophet to forecast Indian GST monthly collections. Add custom holidays for Diwali, Eid, and Republic Day. How do holidays affect GST?

Exercise 17 Hard

Implement a VAR(2) model for Nifty50 and USD/INR exchange rate. Perform Granger causality tests. Does one "cause" the other?

Exercise 18 Medium

Create a time series anomaly detector using z-scores on rolling window residuals. Test it on synthetic data with 5 injected anomalies.

Exercise 19 Hard

Compare AIC and BIC for model selection. Fit ARIMA(1,1,0), ARIMA(0,1,1), ARIMA(1,1,1), and ARIMA(2,1,1) on a dataset. Which does AIC prefer? Which does BIC prefer? Why might they differ?

Exercise 20 Easy

What is the Box-Cox transformation? If the optimal λ=0.5, what transformation is applied? How do you inverse-transform forecasts?

Exercise 21 Hard

Build an LSTM autoencoder for time series anomaly detection. Train on normal data only, then use reconstruction error to flag anomalies in a test set containing anomalies.

Exercise 22 Medium

Explain cross-validation for time series. Why can't you use standard k-fold? Implement expanding window and sliding window CV schemes.

Multiple Choice Questions

Q1. In the ADF test, a p-value of 0.02 indicates:

A) The series is non-stationary
B) The series is stationary (reject H₀ of unit root)
C) The test is inconclusive
D) The series has seasonal patterns

✅ B) The series is stationary. ADF H₀ = unit root exists. p-value < 0.05 → reject H₀ → series is stationary.

Q2. If the ACF of a stationary series cuts off after lag 2 and the PACF decays gradually, the appropriate model is:

A) AR(2)
B) MA(2)
C) ARMA(2,2)
D) ARIMA(0,1,2)

✅ B) MA(2). ACF cuts off → MA order q=2. PACF decays → confirms MA, not AR.

Q3. In ARIMA(1,2,1), the value d=2 means:

A) Two autoregressive lags are used
B) The series was differenced twice to achieve stationarity
C) Two moving average terms are included
D) The seasonal period is 2

✅ B) The series was differenced twice. The 'd' in ARIMA(p,d,q) = order of differencing.

Q4. Simple Exponential Smoothing with α close to 1 gives:

A) More weight to recent observations
B) More weight to older observations
C) Equal weight to all observations
D) More weight to the initial observation only

✅ A) α close to 1 means the forecast is dominated by the most recent observation. The weights (1-α)^i decay very fast, so old data is essentially forgotten.

Q5. Which metric has a problem when actual values are zero or near zero?

A) RMSE
B) MAE
C) MAPE
D) MSE

✅ C) MAPE divides by |actual|, so it becomes undefined or infinite when actual = 0. Use SMAPE or RMSE instead.

Q6. Prophet by Meta models the trend component as:

A) An ARIMA process
B) A piecewise-linear or logistic growth function
C) A polynomial regression
D) A random walk

✅ B) Prophet models trend as either piecewise-linear (default) or logistic growth (for saturating trends). Changepoints automatically detect where the trend slope changes.

Q7. In an LSTM cell, the forget gate determines:

A) What new information to add to the cell state
B) What information to discard from the cell state
C) What part of the cell state to output
D) The learning rate

✅ B) The forget gate (f_t = σ(W_f·[h_{t-1}, x_t] + b_f)) outputs values in [0,1] that multiply the previous cell state, determining what to forget (0 = forget, 1 = keep).

Q8. A VAR(1) model with 3 variables has how many parameters (excluding intercepts)?

A) 3
B) 6
C) 9
D) 12

✅ C) 9. Each of the 3 equations has 3 coefficients (one for each variable's lag 1). Total = 3 × 3 = 9 AR parameters. (Plus 3 intercepts if counted.)

Q9. What is the KPSS test's null hypothesis?

A) The series has a unit root
B) The series is stationary
C) The series has no autocorrelation
D) The residuals are normally distributed

✅ B) KPSS tests H₀: series is stationary vs H₁: unit root (non-stationary). This is the opposite of ADF! If KPSS rejects H₀, the series is non-stationary.

Q10. Which of the following is NOT a component of the Prophet model?

A) Trend g(t)
B) Seasonality s(t)
C) Autoregressive term φY(t-1)
D) Holiday effects h(t)

✅ C) Prophet does NOT include autoregressive terms. It uses y(t) = g(t) + s(t) + h(t) + ε(t). This is a key difference from ARIMA — Prophet is regression-based, not autoregressive.

Q11. When using time series cross-validation, which approach is correct?

A) Standard k-fold with random splits
B) Expanding/sliding window maintaining temporal order
C) Leave-one-out with random selection
D) Stratified k-fold based on target values

✅ B) Time series must preserve temporal order. Expanding window (train on all data up to cutoff, test on next h steps) or sliding window (fixed train window moves forward) are correct. Random splits cause data leakage!

Interview Questions

Interview Q1

What is stationarity and why is it important for time series modeling?

Model Answer: A stationary time series has constant mean, variance, and autocovariance over time. Most classical models (ARIMA, VAR) assume stationarity because their parameters are estimated from the entire series — if statistical properties change over time, the estimated parameters are meaningless averages. Non-stationary series also lead to spurious regression (two unrelated trending series appear correlated). We achieve stationarity through differencing, log transforms, or detrending.

Interview Q2

Explain the difference between AR, MA, and ARIMA models.

Model Answer: AR(p) models the current value as a linear function of p past values — it captures momentum/persistence. MA(q) models the current value as a function of q past forecast errors — it captures shocks. ARIMA(p,d,q) combines both with d rounds of differencing to handle non-stationarity. AR captures gradual effects (autocorrelation), MA captures sudden effects (error corrections), and the I (Integration) handles trends.

Interview Q3

How would you handle a time series with both trend and seasonality?

Model Answer: Options include: (1) SARIMA with seasonal differencing and seasonal AR/MA terms, (2) Holt-Winters with explicit trend and seasonal components, (3) Prophet which models trend (piecewise-linear) and seasonality (Fourier terms) separately, (4) STL decomposition to remove trend/season, then model residuals with ARIMA. For large datasets, LSTM with calendar features works well. I'd start with decomposition to understand the components, then choose the model based on data size and forecasting horizon.

Interview Q4

When would you choose Prophet over ARIMA? When would you choose ARIMA?

Model Answer: Choose Prophet when: you have multiple seasonalities (daily + weekly + yearly), many missing values, you need to add holiday effects easily, or when you're forecasting at scale (thousands of series with minimal tuning). Choose ARIMA when: you have few data points (<100), you need a purely statistical approach with confidence intervals grounded in theory, or when the series is well-modeled by a low-order ARIMA (interpretable coefficients). ARIMA is also faster for single-series inference.

Interview Q5

How do you handle missing values in time series?

Model Answer: (1) Forward fill (locf) — carry last observation forward, (2) Linear interpolation — works for gradual changes, (3) Seasonal interpolation — fill with value from same season last year, (4) Model-based imputation — fit ARIMA on available data, fill gaps with predictions, (5) Prophet handles missing values natively. The key is understanding whether missingness is random (MCAR) or systematic (sensor failure = extended gaps). Never drop rows in time series — it destroys temporal structure.

Interview Q6

Explain the concept of Granger Causality. Does it imply true causation?

Model Answer: Granger Causality tests whether past values of series X improve the forecast of series Y beyond what past values of Y alone provide. If including lagged X significantly reduces the forecast error (F-test p-value < 0.05), we say X "Granger-causes" Y. However, it does NOT imply true causation — it's about predictive information, not causal mechanism. Two series could be Granger-causal because of a confounding third variable. It's a statistical concept, not a causal one.

Interview Q7

How do you evaluate a time series model? Why can't you use standard cross-validation?

Model Answer: Standard k-fold CV randomly shuffles data, breaking temporal dependencies and causing data leakage (future data used to predict past). Instead, use: (1) Train-test split preserving order (last 20% as test), (2) Expanding window CV — train on data up to time t, test on t+1 to t+h, then expand, (3) Sliding window CV — fixed-size training window moves forward. Metrics: RMSE for magnitude, MAPE for relative error, SMAPE for symmetry, and directional accuracy for trading applications.

Interview Q8

What are the limitations of LSTM for time series forecasting?

Model Answer: (1) Data hungry — needs thousands of points, unsuitable for short series, (2) Black box — no interpretable coefficients, problematic for regulated industries, (3) Training time and compute cost — much slower than ARIMA, (4) Sensitive to hyperparameters — window size, layers, units, learning rate, (5) Multi-step forecasting error compounds — iterative forecasting accumulates errors, (6) Doesn't inherently model seasonality — must be engineered as features, (7) Modern alternatives (Transformers, N-BEATS) often outperform on benchmarks.

Interview Q9

How would you detect anomalies in a time series at scale (millions of series)?

Model Answer: At scale, I'd use a multi-tier approach: (1) Statistical baselines — rolling z-score (fast, no training), flag |z| > 3, (2) Seasonal-aware — STL decomposition to remove season, then z-score on residuals, (3) For important series — Prophet with uncertainty intervals, or Isolation Forest on feature-engineered representation, (4) Training a single LSTM autoencoder on all "normal" series and using reconstruction error, (5) Streaming approaches — online algorithms (ADWIN) for concept drift detection. The key is balancing precision (avoiding false alarms) with recall (catching real anomalies).

Interview Q10

A time series shows high ADF p-value (0.6) and low KPSS p-value (0.01). What does this mean?

Model Answer: ADF p-value 0.6 → fail to reject H₀ (unit root), so ADF says non-stationary. KPSS p-value 0.01 → reject H₀ (stationary), so KPSS also says non-stationary. Both tests agree: the series is non-stationary. Action: apply differencing (d=1), then retest. If the disagreement case arose (ADF says stationary, KPSS says not, or vice versa), the series is "near unit root" — consider fractional differencing or detrending as alternatives to full differencing.

Interview Q11

How does the Holt-Winters method differ from ARIMA? When would you prefer one over the other?

Model Answer: Holt-Winters directly models level, trend, and seasonality with smoothing equations — it's intuitive and computationally light. ARIMA models the differenced series through AR and MA terms — more flexible, with formal diagnostic tools (Ljung-Box, AIC/BIC). Prefer Holt-Winters when: the series has clear trend + seasonality, you need quick results, or for production systems needing fast updates. Prefer ARIMA when: the series is complex (multiple differencing needed), you need formal statistical inference, or for academic rigor.

Research Problems

🔬 Research Problem 1: Foundation Models for Indian Time Series

Problem: Large language models have been adapted for time series (TimeGPT, Chronos). Can a foundation model pre-trained on diverse Indian time series (weather, stocks, agriculture, telecom) transfer knowledge across domains? How does cross-domain pre-training compare to domain-specific models?

Approach: Collect a large corpus of Indian time series data across 5+ domains. Pre-train a Transformer-based model. Evaluate zero-shot and few-shot performance on held-out series from each domain. Compare with domain-specific ARIMA and LSTM models.

References: Ansari et al. (2024) "Chronos: Learning the Language of Time Series"; Garza & Mergenthaler-Canseco (2023) "TimeGPT-1".

🔬 Research Problem 2: Causal Discovery in Multivariate Indian Economic Time Series

Problem: India's economic indicators (CPI, WPI, repo rate, GDP, Nifty50, USD/INR, crude oil) form a complex web of causal relationships. Can we move beyond Granger causality to discover true causal structure using modern causal inference methods (PCMCI, Convergent Cross-Mapping)?

Approach: Apply PCMCI (Runge et al., 2019) to monthly data from RBI, MOSPI, and NSE. Compare causal graphs with Granger causality results. Validate against known economic mechanisms (e.g., RBI rate → lending rates → GDP).

🔬 Research Problem 3: Forecasting Indian Monsoon Extremes with Deep Learning

Problem: Climate change is making Indian monsoon patterns more erratic. Can attention-based deep learning models (Temporal Fusion Transformers) improve prediction of extreme rainfall events (>100mm/day) at the district level, with lead times of 1-7 days?

Approach: Use IMD gridded rainfall data (0.25° resolution), ERA5 reanalysis data, and sea surface temperature as inputs. Train a TFT with custom loss function penalizing missed extremes. Compare with IMD's current NWP-based warnings.

🔬 Research Problem 4: Explainable Time Series Anomaly Detection for Smart Cities

Problem: Current anomaly detection in urban sensor networks (traffic, pollution, water quality) produces alerts but rarely explains WHY an anomaly occurred. Can we build interpretable anomaly detectors that provide natural language explanations linking anomalies to root causes?

Approach: Combine SHAP-based feature attribution on gradient boosting models with LLM-generated explanations. Train on labeled historical anomalies from Pune Smart City data where root causes are known (water main break, festival traffic, industrial pollution event). Evaluate explanation quality with domain experts.

Key Takeaways

📊

Decomposition First: Every time series has trend, seasonality, cyclical, and noise components. Always decompose and visualize before modeling — understanding the components guides model selection.

📈

Stationarity is Non-Negotiable: Most classical models require stationarity. Use ADF + KPSS together for robust testing. Apply differencing (d for trend, D for seasonal) and transforms (log, Box-Cox) as needed.

📉

ACF/PACF are Your Friends: ACF decays + PACF cuts off → AR(p). ACF cuts off + PACF decays → MA(q). Both decay → ARMA(p,q). These plots are the foundation of model identification.

🔧

Box-Jenkins is Systematic: Identify → Estimate → Diagnose → Forecast. Check residuals with Ljung-Box test. If residuals aren't white noise, revise the model. Use AIC/BIC for model comparison.

⚡

Match Model to Context: ARIMA for small data + interpretability. SARIMA/Holt-Winters for clear seasonal patterns. Prophet for multiple seasonalities + holidays at scale. LSTM for large data + complex nonlinear patterns. VAR for multivariate interdependencies.

📐

Metrics Matter: RMSE penalizes large errors. MAPE gives relative accuracy but fails near zero. SMAPE is symmetric and bounded. Always use multiple metrics and a naive baseline (persistence model) for context.

⚠️

No Time Travel in Validation: Never use standard k-fold CV on time series. Use expanding window or sliding window cross-validation that preserves temporal order. Future data must never leak into training.

🧠

LSTMs Need Care: They require large datasets (1000+ points), careful windowing, proper scaling (MinMaxScaler), and they're black boxes. Modern alternatives (TFT, N-BEATS, Chronos) often outperform. Always benchmark against ARIMA.

🚨

Anomaly Detection is Dual-Use: The same forecast residual-based approach serves both forecasting (is my model accurate?) and anomaly detection (is this data point unusual?). Prophet's prediction intervals make this easy.

🇮🇳

Indian Data Has Unique Challenges: Lunar calendar holidays, demonetization/GST structural breaks, monsoon-dependent agriculture, high festival seasonality. Domain knowledge is crucial — no model can replace understanding of Indian economic and cultural patterns.

References

Foundational Texts

Box, G.E.P., Jenkins, G.M., Reinsel, G.C., & Ljung, G.M. (2015). Time Series Analysis: Forecasting and Control, 5th Edition. Wiley. — The classic reference.
Hyndman, R.J. & Athanasopoulos, G. (2021). Forecasting: Principles and Practice, 3rd Edition. OTexts. Free online: otexts.com/fpp3 — The best modern textbook.
Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press. — Graduate-level econometric treatment.
Shumway, R.H. & Stoffer, D.S. (2017). Time Series Analysis and Its Applications, 4th Edition. Springer.

Key Research Papers

Dickey, D.A. & Fuller, W.A. (1979). "Distribution of the Estimators for Autoregressive Time Series with a Unit Root." JASA, 74(366), 427-431.
Kwiatkowski, D. et al. (1992). "Testing the null hypothesis of stationarity against the alternative of a unit root." J. of Econometrics, 54(1-3), 159-178.
Hochreiter, S. & Schmidhuber, J. (1997). "Long Short-Term Memory." Neural Computation, 9(8), 1735-1780.
Taylor, S.J. & Letham, B. (2018). "Forecasting at Scale." The American Statistician, 72(1), 37-45. — The Prophet paper.
Salinas, D. et al. (2020). "DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks." International J. of Forecasting, 36(3), 1181-1191.
Lim, B. et al. (2021). "Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting." International J. of Forecasting, 37(4), 1748-1764.
Oreshkin, B.N. et al. (2020). "N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting." ICLR 2020.
Lam, R. et al. (2023). "Learning Skillful Medium-Range Global Weather Forecasting." Science, 382(6677), 1416-1421. — GraphCast.
Ansari, A.F. et al. (2024). "Chronos: Learning the Language of Time Series." arXiv:2403.07815.

Indian Context References

Rajeevan, M. et al. (2012). "Analysis of variability and trends of extreme rainfall events over India using 104 years of gridded daily rainfall data." Geophysical Research Letters, 39(6).
Reserve Bank of India (2023). "Report on Currency and Finance 2022-23." RBI Publications. — Contains RBI's forecasting methodology.
Indian Meteorological Department (2024). "End of Season Report — Southwest Monsoon 2023." IMD, Ministry of Earth Sciences.
Sahoo, B.B. et al. (2019). "Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting." Acta Geophysica, 67, 1471-1481.

Software & Libraries

statsmodels: statsmodels.org — Python statistical models including ARIMA, VAR.
Prophet: facebook.github.io/prophet — Meta's forecasting library.
pmdarima: alkaline-ml.com/pmdarima — Auto-ARIMA for Python.
TensorFlow/Keras: tensorflow.org — LSTM implementation.
Nixtla: github.com/Nixtla — TimeGPT and StatsForecast libraries.
Darts: unit8co.github.io/darts — Unified time series library supporting many models.

Online Resources

Kaggle — M5 Forecasting Competition: kaggle.com/c/m5-forecasting-accuracy
IITM Pune — Indian Rainfall Data: tropmet.res.in
NSE India Historical Data: nseindia.com

🎓 Professor's Insight — Final Words

Time series forecasting is where theory meets humility. George Box's famous quote — "All models are wrong, but some are useful" — is most true here. No model can predict a pandemic, a demonetization, or a war. The best forecasters combine statistical rigor with domain knowledge, scenario planning, and honest uncertainty quantification. As you build your career, remember: a simple model with good uncertainty intervals is more useful than a complex model that's overconfident.