{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "title",
   "metadata": {},
   "source": [
    "# PyTorch Cheat Sheet\n",
    "\n",
    "A comprehensive quick-reference covering tensor operations, autograd, model building,\n",
    "training loops, data loading, and debugging patterns. Designed to complement\n",
    "Chapters 29--31 of this course.\n",
    "\n",
    "```{admonition} How to Use This Page\n",
    ":class: tip\n",
    "This is a pure reference document -- no executable code, just patterns you can\n",
    "copy and adapt. Use `Ctrl+F` / `Cmd+F` to search for specific topics.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec1-tensors",
   "metadata": {},
   "source": [
    "## 1. Tensor Basics\n",
    "\n",
    "### Creation\n",
    "\n",
    "```python\n",
    "import torch\n",
    "\n",
    "# From Python data\n",
    "x = torch.tensor([1, 2, 3])                # from list\n",
    "x = torch.tensor([[1, 2], [3, 4]])          # 2D from nested list\n",
    "\n",
    "# Standard constructors\n",
    "x = torch.zeros(3, 4)                       # all zeros\n",
    "x = torch.ones(2, 3)                        # all ones\n",
    "x = torch.full((2, 3), 7.0)                 # filled with 7.0\n",
    "x = torch.empty(3, 4)                       # uninitialized (fast)\n",
    "x = torch.eye(3)                            # 3x3 identity matrix\n",
    "\n",
    "# Random tensors\n",
    "x = torch.randn(3, 4)                       # standard normal N(0,1)\n",
    "x = torch.rand(3, 4)                        # uniform [0, 1)\n",
    "x = torch.randint(0, 10, (3, 4))            # random integers in [0, 10)\n",
    "\n",
    "# Sequences\n",
    "x = torch.arange(0, 10, 2)                  # [0, 2, 4, 6, 8]\n",
    "x = torch.linspace(0, 1, 100)               # 100 points in [0, 1]\n",
    "\n",
    "# Like-constructors (match shape/dtype/device of existing tensor)\n",
    "y = torch.zeros_like(x)\n",
    "y = torch.randn_like(x)\n",
    "```\n",
    "\n",
    "### NumPy Interop\n",
    "\n",
    "```python\n",
    "# NumPy -> PyTorch (shared memory -- no copy!)\n",
    "x = torch.from_numpy(np_array)\n",
    "\n",
    "# PyTorch -> NumPy (shared memory on CPU)\n",
    "np_array = x.numpy()              # CPU tensor only\n",
    "np_array = x.cpu().numpy()        # safe for GPU tensors\n",
    "np_array = x.detach().numpy()     # safe if requires_grad=True\n",
    "np_array = x.detach().cpu().numpy()  # safest -- works in all cases\n",
    "```\n",
    "\n",
    "```{admonition} Shared Memory Warning\n",
    ":class: warning\n",
    "`torch.from_numpy()` and `.numpy()` share the underlying memory buffer.\n",
    "Modifying one will modify the other. Use `.clone()` if you need an independent copy.\n",
    "```\n",
    "\n",
    "### Properties\n",
    "\n",
    "```python\n",
    "x.shape          # torch.Size([3, 4]) -- dimensions\n",
    "x.dtype          # torch.float32 -- data type\n",
    "x.device         # device(type='cpu') or device(type='cuda', index=0)\n",
    "x.requires_grad  # True/False -- gradient tracking\n",
    "x.ndim           # number of dimensions (same as len(x.shape))\n",
    "x.numel()        # total number of elements\n",
    "x.is_contiguous()  # memory layout check\n",
    "```\n",
    "\n",
    "### Data Types\n",
    "\n",
    "| PyTorch dtype | Alias | Notes |\n",
    "|:---|:---|:---|\n",
    "| `torch.float32` | `torch.float` | Default for floats. Use this for training. |\n",
    "| `torch.float64` | `torch.double` | Double precision. Rarely needed. |\n",
    "| `torch.float16` | `torch.half` | Half precision. Used for mixed-precision training. |\n",
    "| `torch.bfloat16` | -- | Brain floating point. Better range than float16. |\n",
    "| `torch.int64` | `torch.long` | Default for integers. Required for class labels. |\n",
    "| `torch.int32` | `torch.int` | 32-bit integer. |\n",
    "| `torch.bool` | -- | Boolean tensor. |\n",
    "\n",
    "```python\n",
    "# Type casting\n",
    "x = x.float()                 # -> float32\n",
    "x = x.long()                  # -> int64\n",
    "x = x.to(torch.float16)       # explicit dtype\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec2-operations",
   "metadata": {},
   "source": [
    "## 2. Tensor Operations\n",
    "\n",
    "### NumPy vs. PyTorch Equivalents\n",
    "\n",
    "| Operation | NumPy | PyTorch | Notes |\n",
    "|:---|:---|:---|:---|\n",
    "| Reshape | `np.reshape(x, shape)` | `x.view(shape)` or `x.reshape(shape)` | `view` requires contiguous memory |\n",
    "| Flatten | `x.flatten()` | `x.view(-1)` or `x.flatten()` | |\n",
    "| Concatenate | `np.concatenate([a,b])` | `torch.cat([a,b], dim=0)` | Along existing dim |\n",
    "| Stack | `np.stack([a,b])` | `torch.stack([a,b], dim=0)` | Creates new dim |\n",
    "| Split | `np.split(x, n)` | `torch.chunk(x, n, dim=0)` | |\n",
    "| Transpose | `x.T` | `x.T` or `x.permute(...)` | |\n",
    "| Squeeze | `np.squeeze(x)` | `x.squeeze()` | Remove dims of size 1 |\n",
    "| Unsqueeze | `np.expand_dims(x, 0)` | `x.unsqueeze(0)` | Add dim of size 1 |\n",
    "| Matrix multiply | `a @ b` | `a @ b` or `torch.mm(a, b)` | |\n",
    "| Batch matmul | `np.matmul(a, b)` | `torch.bmm(a, b)` | For 3D tensors |\n",
    "| Element-wise | `a * b, a + b` | `a * b, a + b` | Same syntax |\n",
    "| Sum | `np.sum(x, axis=0)` | `x.sum(dim=0)` | `axis` vs `dim` |\n",
    "| Mean | `np.mean(x, axis=0)` | `x.mean(dim=0)` | |\n",
    "| Argmax | `np.argmax(x, axis=0)` | `x.argmax(dim=0)` | |\n",
    "| Clamp | `np.clip(x, a, b)` | `x.clamp(min=a, max=b)` | |\n",
    "| Where | `np.where(cond, a, b)` | `torch.where(cond, a, b)` | |\n",
    "\n",
    "### Indexing and Slicing\n",
    "\n",
    "```python\n",
    "# Same syntax as NumPy\n",
    "x[0]              # first row\n",
    "x[:, 1]           # second column\n",
    "x[0:3, :]         # first three rows\n",
    "x[x > 0]          # boolean indexing\n",
    "x[[0, 2, 4]]      # fancy indexing\n",
    "\n",
    "# Useful for batches\n",
    "x[..., -1]        # last element along final dim (Ellipsis)\n",
    "```\n",
    "\n",
    "### Broadcasting Rules\n",
    "\n",
    "Same rules as NumPy:\n",
    "\n",
    "1. Dimensions are compared from the **right** (trailing dimensions).\n",
    "2. Two dimensions are compatible if they are **equal** or one of them is **1**.\n",
    "3. Missing dimensions on the left are treated as size 1.\n",
    "\n",
    "```python\n",
    "# Example: (4, 3) + (3,) -> (4, 3)\n",
    "# Example: (4, 1) + (1, 3) -> (4, 3)\n",
    "# Example: (2, 1, 3) + (4, 1) -> (2, 4, 3)\n",
    "```\n",
    "\n",
    "### In-Place Operations\n",
    "\n",
    "```python\n",
    "# Trailing underscore = in-place\n",
    "x.add_(1)          # x = x + 1\n",
    "x.mul_(2)          # x = x * 2\n",
    "x.zero_()          # x = 0\n",
    "x.fill_(5)         # x = 5\n",
    "x.clamp_(0, 1)     # clamp in-place\n",
    "```\n",
    "\n",
    "```{admonition} Avoid In-Place on Grad Tensors\n",
    ":class: warning\n",
    "In-place operations on tensors that require gradients can cause errors\n",
    "during backpropagation. PyTorch needs the original values to compute\n",
    "gradients, and in-place ops destroy them.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec3-device",
   "metadata": {},
   "source": [
    "## 3. Device Management\n",
    "\n",
    "```python\n",
    "# Detect available device\n",
    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
    "\n",
    "# Move tensors\n",
    "x = x.to(device)                  # generic\n",
    "x = x.cuda()                      # explicit GPU\n",
    "x = x.cpu()                       # explicit CPU\n",
    "\n",
    "# Move model (moves ALL parameters and buffers)\n",
    "model = model.to(device)\n",
    "\n",
    "# Create tensor directly on device\n",
    "x = torch.randn(3, 4, device=device)\n",
    "\n",
    "# Check device\n",
    "x.device                           # device(type='cuda', index=0)\n",
    "x.is_cuda                          # True / False\n",
    "```\n",
    "\n",
    "```{admonition} Common Pitfall\n",
    ":class: warning\n",
    "All tensors in an operation must be on the **same device**. You cannot\n",
    "add a CPU tensor to a CUDA tensor. If you get a \"RuntimeError: expected all\n",
    "tensors to be on the same device\" error, check that both your data and model\n",
    "are on the same device.\n",
    "```\n",
    "\n",
    "```python\n",
    "# Apple Silicon (MPS backend)\n",
    "device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')\n",
    "\n",
    "# Multi-GPU: specify GPU index\n",
    "device = torch.device('cuda:0')    # first GPU\n",
    "device = torch.device('cuda:1')    # second GPU\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec4-autograd",
   "metadata": {},
   "source": [
    "## 4. Autograd\n",
    "\n",
    "PyTorch's automatic differentiation engine. Every operation on tensors with\n",
    "`requires_grad=True` is recorded on a dynamic computation graph. Calling\n",
    "`.backward()` traverses this graph in reverse to compute gradients.\n",
    "\n",
    "### Basic Gradient Computation\n",
    "\n",
    "```python\n",
    "x = torch.tensor(3.0, requires_grad=True)\n",
    "y = x**2 + 2*x + 1\n",
    "y.backward()            # compute dy/dx\n",
    "print(x.grad)           # tensor(8.)  -- dy/dx = 2x + 2 = 8\n",
    "```\n",
    "\n",
    "### Gradient Control\n",
    "\n",
    "```python\n",
    "# Disable gradient tracking (inference, evaluation)\n",
    "with torch.no_grad():\n",
    "    pred = model(x)           # no graph built, saves memory\n",
    "\n",
    "# Alternative: inference mode (even more memory efficient)\n",
    "with torch.inference_mode():\n",
    "    pred = model(x)\n",
    "\n",
    "# Detach tensor from computation graph\n",
    "x_detached = x.detach()       # shares data, no grad tracking\n",
    "\n",
    "# Prevent gradient for specific parameters\n",
    "for param in model.encoder.parameters():\n",
    "    param.requires_grad = False   # freeze encoder\n",
    "```\n",
    "\n",
    "### Zeroing Gradients\n",
    "\n",
    "```python\n",
    "# CRITICAL: PyTorch accumulates gradients by default!\n",
    "optimizer.zero_grad()          # preferred in training loop\n",
    "\n",
    "# Manual alternatives\n",
    "param.grad = None              # modern PyTorch preferred\n",
    "param.grad.zero_()             # in-place zeroing\n",
    "model.zero_grad()              # zero all model params\n",
    "```\n",
    "\n",
    "```{admonition} Why Gradients Accumulate\n",
    ":class: note\n",
    "Gradient accumulation is by design -- it enables computing effective gradients\n",
    "over multiple mini-batches when GPU memory is too small for a single large batch.\n",
    "But if you forget `optimizer.zero_grad()`, gradients from previous iterations\n",
    "contaminate the current update. This is one of the most common PyTorch bugs.\n",
    "```\n",
    "\n",
    "### Inspecting the Computation Graph\n",
    "\n",
    "```python\n",
    "x = torch.tensor(2.0, requires_grad=True)\n",
    "y = x * 3\n",
    "z = y + 1\n",
    "\n",
    "z.grad_fn                      # <AddBackward0> -- last operation\n",
    "z.grad_fn.next_functions       # links to previous ops\n",
    "y.is_leaf                      # False (created by an op)\n",
    "x.is_leaf                      # True (user-created)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec5-module",
   "metadata": {},
   "source": [
    "## 5. nn.Module Pattern\n",
    "\n",
    "The fundamental building block for all neural networks in PyTorch.\n",
    "\n",
    "### Custom Module Template\n",
    "\n",
    "```python\n",
    "import torch.nn as nn\n",
    "\n",
    "class MyModel(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()             # ALWAYS call super().__init__()\n",
    "        self.layer1 = nn.Linear(784, 128)\n",
    "        self.relu = nn.ReLU()\n",
    "        self.dropout = nn.Dropout(0.2)\n",
    "        self.layer2 = nn.Linear(128, 10)\n",
    "    \n",
    "    def forward(self, x):\n",
    "        x = self.relu(self.layer1(x))\n",
    "        x = self.dropout(x)\n",
    "        return self.layer2(x)\n",
    "```\n",
    "\n",
    "### Inspecting Parameters\n",
    "\n",
    "```python\n",
    "model = MyModel()\n",
    "\n",
    "# Iterate over all parameters\n",
    "model.parameters()                          # iterator\n",
    "list(model.parameters())                    # list of Parameter tensors\n",
    "\n",
    "# Named parameters (for debugging, freezing)\n",
    "for name, param in model.named_parameters():\n",
    "    print(f\"{name}: {param.shape}\")\n",
    "\n",
    "# Total parameter count\n",
    "total = sum(p.numel() for p in model.parameters())\n",
    "trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
    "print(f\"Total: {total}, Trainable: {trainable}\")\n",
    "\n",
    "# List sub-modules\n",
    "model.children()                            # immediate children\n",
    "model.modules()                             # all modules recursively\n",
    "model.named_modules()                       # with names\n",
    "```\n",
    "\n",
    "### Module State\n",
    "\n",
    "```python\n",
    "model.train()    # training mode: dropout active, batchnorm uses batch stats\n",
    "model.eval()     # evaluation mode: dropout off, batchnorm uses running stats\n",
    "\n",
    "model.training   # True / False -- check current mode\n",
    "```\n",
    "\n",
    "```{admonition} Always Set the Mode\n",
    ":class: important\n",
    "Forgetting `model.eval()` before inference leads to non-deterministic predictions\n",
    "(dropout still drops) and incorrect batch normalization (uses batch stats instead\n",
    "of learned running statistics). Forgetting `model.train()` before training means\n",
    "dropout and batchnorm behave incorrectly.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec6-sequential",
   "metadata": {},
   "source": [
    "## 6. nn.Sequential Shortcut\n",
    "\n",
    "For simple feed-forward architectures where data flows linearly through layers.\n",
    "\n",
    "```python\n",
    "model = nn.Sequential(\n",
    "    nn.Linear(784, 256),\n",
    "    nn.ReLU(),\n",
    "    nn.Dropout(0.2),\n",
    "    nn.Linear(256, 128),\n",
    "    nn.ReLU(),\n",
    "    nn.Dropout(0.2),\n",
    "    nn.Linear(128, 10),\n",
    ")\n",
    "```\n",
    "\n",
    "### With Named Layers\n",
    "\n",
    "```python\n",
    "from collections import OrderedDict\n",
    "\n",
    "model = nn.Sequential(OrderedDict([\n",
    "    ('fc1', nn.Linear(784, 256)),\n",
    "    ('relu1', nn.ReLU()),\n",
    "    ('fc2', nn.Linear(256, 10)),\n",
    "]))\n",
    "\n",
    "# Access by name\n",
    "model.fc1.weight.shape   # torch.Size([256, 784])\n",
    "```\n",
    "\n",
    "```{admonition} When NOT to Use Sequential\n",
    ":class: tip\n",
    "Write a custom `nn.Module` when you need skip connections (ResNet),\n",
    "multiple inputs/outputs, conditional logic, or shared weights.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec7-layers",
   "metadata": {},
   "source": [
    "## 7. Common Layers\n",
    "\n",
    "### Core Layers\n",
    "\n",
    "| Layer | Code | Parameters | Input Shape | Output Shape |\n",
    "|:---|:---|:---|:---|:---|\n",
    "| Fully connected | `nn.Linear(in_f, out_f)` | $W$: `(out, in)`, $b$: `(out,)` | `(*, in_f)` | `(*, out_f)` |\n",
    "| 1D convolution | `nn.Conv1d(in_ch, out_ch, k)` | $W$: `(out, in, k)` | `(N, C_in, L)` | `(N, C_out, L')` |\n",
    "| 2D convolution | `nn.Conv2d(in_ch, out_ch, k)` | $W$: `(out, in, k, k)` | `(N, C_in, H, W)` | `(N, C_out, H', W')` |\n",
    "| Max pool | `nn.MaxPool2d(k)` | None | `(N, C, H, W)` | `(N, C, H/k, W/k)` |\n",
    "| Avg pool | `nn.AvgPool2d(k)` | None | `(N, C, H, W)` | `(N, C, H/k, W/k)` |\n",
    "| Adaptive avg pool | `nn.AdaptiveAvgPool2d((1,1))` | None | `(N, C, H, W)` | `(N, C, 1, 1)` |\n",
    "| Batch norm (1D) | `nn.BatchNorm1d(features)` | $\\gamma$, $\\beta$ | `(N, features)` | `(N, features)` |\n",
    "| Batch norm (2D) | `nn.BatchNorm2d(channels)` | $\\gamma$, $\\beta$ | `(N, C, H, W)` | `(N, C, H, W)` |\n",
    "| Layer norm | `nn.LayerNorm(shape)` | $\\gamma$, $\\beta$ | `(*, shape)` | `(*, shape)` |\n",
    "| Dropout | `nn.Dropout(p=0.5)` | None | any | same |\n",
    "| Embedding | `nn.Embedding(vocab, dim)` | `(vocab, dim)` | `(*)` int | `(*, dim)` |\n",
    "\n",
    "### Recurrent Layers\n",
    "\n",
    "| Layer | Code | Notes |\n",
    "|:---|:---|:---|\n",
    "| Simple RNN | `nn.RNN(input_size, hidden_size, num_layers=1)` | Vanilla recurrence |\n",
    "| LSTM | `nn.LSTM(input_size, hidden_size, num_layers=1)` | Long short-term memory |\n",
    "| GRU | `nn.GRU(input_size, hidden_size, num_layers=1)` | Gated recurrent unit |\n",
    "\n",
    "```python\n",
    "# RNN usage pattern\n",
    "rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2, batch_first=True)\n",
    "x = torch.randn(32, 50, 10)    # (batch, seq_len, features)\n",
    "output, (h_n, c_n) = rnn(x)    # output: (32, 50, 20), h_n: (2, 32, 20)\n",
    "```\n",
    "\n",
    "### Activation Functions\n",
    "\n",
    "| Activation | Module | Functional | Formula |\n",
    "|:---|:---|:---|:---|\n",
    "| ReLU | `nn.ReLU()` | `F.relu(x)` | $\\max(0, x)$ |\n",
    "| LeakyReLU | `nn.LeakyReLU(0.01)` | `F.leaky_relu(x)` | $\\max(0.01x, x)$ |\n",
    "| Sigmoid | `nn.Sigmoid()` | `torch.sigmoid(x)` | $\\frac{1}{1+e^{-x}}$ |\n",
    "| Tanh | `nn.Tanh()` | `torch.tanh(x)` | $\\frac{e^x - e^{-x}}{e^x + e^{-x}}$ |\n",
    "| GELU | `nn.GELU()` | `F.gelu(x)` | $x \\cdot \\Phi(x)$ |\n",
    "| Softmax | `nn.Softmax(dim=-1)` | `F.softmax(x, dim=-1)` | $\\frac{e^{x_i}}{\\sum_j e^{x_j}}$ |\n",
    "| LogSoftmax | `nn.LogSoftmax(dim=-1)` | `F.log_softmax(x, dim=-1)` | $\\log\\text{softmax}(x)$ |\n",
    "\n",
    "```{admonition} Module vs. Functional\n",
    ":class: tip\n",
    "Use `nn.ReLU()` as a module attribute when you want it visible in `print(model)`.\n",
    "Use `F.relu(x)` (from `torch.nn.functional`) inside `forward()` for a lighter\n",
    "touch. Both produce identical results -- it is purely a style choice.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec8-loss",
   "metadata": {},
   "source": [
    "## 8. Loss Functions\n",
    "\n",
    "| Loss | Code | Use Case | Input | Target |\n",
    "|:---|:---|:---|:---|:---|\n",
    "| Mean squared error | `nn.MSELoss()` | Regression | any shape | same shape |\n",
    "| Mean absolute error | `nn.L1Loss()` | Robust regression | any shape | same shape |\n",
    "| Cross-entropy | `nn.CrossEntropyLoss()` | Multi-class classification | `(N, C)` raw logits | `(N,)` class indices |\n",
    "| Binary cross-entropy | `nn.BCEWithLogitsLoss()` | Binary / multi-label | `(N, *)` raw logits | `(N, *)` floats in [0,1] |\n",
    "| Negative log-likelihood | `nn.NLLLoss()` | After `log_softmax` | `(N, C)` log-probs | `(N,)` class indices |\n",
    "| Huber (smooth L1) | `nn.SmoothL1Loss()` | Robust regression | any shape | same shape |\n",
    "| KL divergence | `nn.KLDivLoss(reduction='batchmean')` | Distribution matching | log-probs | probs |\n",
    "| Cosine embedding | `nn.CosineEmbeddingLoss()` | Similarity learning | `(N, D)` | `(N,)` in {-1, 1} |\n",
    "\n",
    "```{admonition} CrossEntropyLoss Includes Softmax\n",
    ":class: important\n",
    "`nn.CrossEntropyLoss()` internally applies `log_softmax` before `NLLLoss`.\n",
    "Do **not** apply softmax to your model output when using this loss -- you would\n",
    "be applying softmax twice, which is a common bug that leads to poor training.\n",
    "```\n",
    "\n",
    "```python\n",
    "# Classification example\n",
    "criterion = nn.CrossEntropyLoss()\n",
    "logits = model(x)                    # shape: (batch, num_classes) -- RAW\n",
    "loss = criterion(logits, labels)     # labels: (batch,) of ints\n",
    "\n",
    "# With class weights (for imbalanced data)\n",
    "weights = torch.tensor([1.0, 2.0, 0.5])  # one per class\n",
    "criterion = nn.CrossEntropyLoss(weight=weights)\n",
    "\n",
    "# Ignoring padding tokens (NLP)\n",
    "criterion = nn.CrossEntropyLoss(ignore_index=-100)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec9-optimizers",
   "metadata": {},
   "source": [
    "## 9. Optimizers\n",
    "\n",
    "### Common Optimizers\n",
    "\n",
    "```python\n",
    "import torch.optim as optim\n",
    "\n",
    "# Stochastic gradient descent\n",
    "optimizer = optim.SGD(model.parameters(), lr=0.01)\n",
    "optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)\n",
    "optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)\n",
    "\n",
    "# Adam (adaptive learning rate)\n",
    "optimizer = optim.Adam(model.parameters(), lr=0.001)\n",
    "optimizer = optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999))\n",
    "\n",
    "# AdamW (decoupled weight decay -- generally preferred over Adam)\n",
    "optimizer = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)\n",
    "\n",
    "# RMSprop\n",
    "optimizer = optim.RMSprop(model.parameters(), lr=0.01, alpha=0.99)\n",
    "```\n",
    "\n",
    "### Per-Parameter Options\n",
    "\n",
    "```python\n",
    "# Different learning rates for different parts of the model\n",
    "optimizer = optim.Adam([\n",
    "    {'params': model.encoder.parameters(), 'lr': 1e-5},   # fine-tune slowly\n",
    "    {'params': model.decoder.parameters(), 'lr': 1e-3},   # train faster\n",
    "])\n",
    "```\n",
    "\n",
    "### Learning Rate Schedulers\n",
    "\n",
    "```python\n",
    "from torch.optim.lr_scheduler import (\n",
    "    StepLR, ExponentialLR, CosineAnnealingLR, ReduceLROnPlateau, OneCycleLR\n",
    ")\n",
    "\n",
    "# Step decay: multiply lr by gamma every step_size epochs\n",
    "scheduler = StepLR(optimizer, step_size=10, gamma=0.1)\n",
    "\n",
    "# Exponential decay\n",
    "scheduler = ExponentialLR(optimizer, gamma=0.95)\n",
    "\n",
    "# Cosine annealing\n",
    "scheduler = CosineAnnealingLR(optimizer, T_max=100)\n",
    "\n",
    "# Reduce on plateau (watches a metric)\n",
    "scheduler = ReduceLROnPlateau(optimizer, mode='min', patience=5)\n",
    "\n",
    "# One-cycle policy (best for super-convergence)\n",
    "scheduler = OneCycleLR(optimizer, max_lr=0.01, total_steps=1000)\n",
    "\n",
    "# Usage in training loop\n",
    "for epoch in range(num_epochs):\n",
    "    train(...)  \n",
    "    scheduler.step()                    # most schedulers\n",
    "    # scheduler.step(val_loss)          # for ReduceLROnPlateau\n",
    "```\n",
    "\n",
    "| Optimizer | When to Use |\n",
    "|:---|:---|\n",
    "| SGD + Momentum | Classic choice; often best final accuracy with tuning |\n",
    "| Adam | Good default; fast convergence; less sensitive to lr |\n",
    "| AdamW | Preferred over Adam when using weight decay (most modern work) |\n",
    "| RMSprop | RNNs, reinforcement learning |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec10-training",
   "metadata": {},
   "source": [
    "## 10. Training Loop Template\n",
    "\n",
    "### Standard Training Loop\n",
    "\n",
    "```python\n",
    "model = MyModel().to(device)\n",
    "criterion = nn.CrossEntropyLoss()\n",
    "optimizer = optim.Adam(model.parameters(), lr=1e-3)\n",
    "\n",
    "for epoch in range(num_epochs):\n",
    "    # --- Training phase ---\n",
    "    model.train()\n",
    "    train_loss = 0.0\n",
    "    correct = 0\n",
    "    total = 0\n",
    "    \n",
    "    for X_batch, y_batch in train_loader:\n",
    "        X_batch = X_batch.to(device)\n",
    "        y_batch = y_batch.to(device)\n",
    "        \n",
    "        pred = model(X_batch)             # 1. Forward pass\n",
    "        loss = criterion(pred, y_batch)   # 2. Compute loss\n",
    "        \n",
    "        optimizer.zero_grad()             # 3. Zero gradients\n",
    "        loss.backward()                   # 4. Backward pass\n",
    "        optimizer.step()                  # 5. Update weights\n",
    "        \n",
    "        train_loss += loss.item() * X_batch.size(0)\n",
    "        correct += (pred.argmax(dim=1) == y_batch).sum().item()\n",
    "        total += y_batch.size(0)\n",
    "    \n",
    "    train_loss /= total\n",
    "    train_acc = correct / total\n",
    "    \n",
    "    # --- Validation phase ---\n",
    "    model.eval()\n",
    "    val_loss = 0.0\n",
    "    val_correct = 0\n",
    "    val_total = 0\n",
    "    \n",
    "    with torch.no_grad():\n",
    "        for X_batch, y_batch in val_loader:\n",
    "            X_batch = X_batch.to(device)\n",
    "            y_batch = y_batch.to(device)\n",
    "            \n",
    "            pred = model(X_batch)\n",
    "            loss = criterion(pred, y_batch)\n",
    "            \n",
    "            val_loss += loss.item() * X_batch.size(0)\n",
    "            val_correct += (pred.argmax(dim=1) == y_batch).sum().item()\n",
    "            val_total += y_batch.size(0)\n",
    "    \n",
    "    val_loss /= val_total\n",
    "    val_acc = val_correct / val_total\n",
    "    \n",
    "    print(f\"Epoch {epoch+1}/{num_epochs}: \"\n",
    "          f\"train_loss={train_loss:.4f}, train_acc={train_acc:.4f}, \"\n",
    "          f\"val_loss={val_loss:.4f}, val_acc={val_acc:.4f}\")\n",
    "```\n",
    "\n",
    "```{admonition} The 5 Sacred Steps\n",
    ":class: important\n",
    "Every training iteration follows the same 5-step pattern:\n",
    "\n",
    "1. **Forward** -- compute predictions\n",
    "2. **Loss** -- measure error\n",
    "3. **Zero** -- clear old gradients\n",
    "4. **Backward** -- compute new gradients\n",
    "5. **Step** -- update parameters\n",
    "\n",
    "Steps 3-4-5 can swap order slightly (zero_grad can come before forward),\n",
    "but the logic must remain: zero before backward, backward before step.\n",
    "```\n",
    "\n",
    "### Gradient Clipping (for RNNs)\n",
    "\n",
    "```python\n",
    "loss.backward()\n",
    "torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)\n",
    "optimizer.step()\n",
    "```\n",
    "\n",
    "### Gradient Accumulation (simulate larger batch)\n",
    "\n",
    "```python\n",
    "accumulation_steps = 4\n",
    "optimizer.zero_grad()\n",
    "\n",
    "for i, (X_batch, y_batch) in enumerate(train_loader):\n",
    "    loss = criterion(model(X_batch.to(device)), y_batch.to(device))\n",
    "    loss = loss / accumulation_steps    # normalize\n",
    "    loss.backward()                     # accumulate gradients\n",
    "    \n",
    "    if (i + 1) % accumulation_steps == 0:\n",
    "        optimizer.step()\n",
    "        optimizer.zero_grad()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec11-data",
   "metadata": {},
   "source": [
    "## 11. Data Loading\n",
    "\n",
    "### Custom Dataset\n",
    "\n",
    "```python\n",
    "from torch.utils.data import Dataset, DataLoader\n",
    "\n",
    "class MyDataset(Dataset):\n",
    "    def __init__(self, X, y, transform=None):\n",
    "        self.X = torch.tensor(X, dtype=torch.float32)\n",
    "        self.y = torch.tensor(y, dtype=torch.long)\n",
    "        self.transform = transform\n",
    "    \n",
    "    def __len__(self):\n",
    "        return len(self.X)\n",
    "    \n",
    "    def __getitem__(self, idx):\n",
    "        sample = self.X[idx]\n",
    "        if self.transform:\n",
    "            sample = self.transform(sample)\n",
    "        return sample, self.y[idx]\n",
    "```\n",
    "\n",
    "### DataLoader\n",
    "\n",
    "```python\n",
    "dataset = MyDataset(X_train, y_train)\n",
    "loader = DataLoader(\n",
    "    dataset,\n",
    "    batch_size=64,           # samples per batch\n",
    "    shuffle=True,            # randomize order each epoch\n",
    "    num_workers=4,           # parallel data loading\n",
    "    pin_memory=True,         # faster GPU transfer\n",
    "    drop_last=True,          # drop incomplete final batch\n",
    ")\n",
    "\n",
    "# Iterate\n",
    "for X_batch, y_batch in loader:\n",
    "    print(X_batch.shape, y_batch.shape)\n",
    "    break\n",
    "```\n",
    "\n",
    "### Built-in Datasets (torchvision)\n",
    "\n",
    "```python\n",
    "from torchvision import datasets, transforms\n",
    "\n",
    "# Standard transform pipeline\n",
    "transform = transforms.Compose([\n",
    "    transforms.Resize(32),\n",
    "    transforms.ToTensor(),                      # PIL -> tensor, scale to [0,1]\n",
    "    transforms.Normalize((0.5,), (0.5,)),       # normalize to [-1, 1]\n",
    "])\n",
    "\n",
    "# MNIST\n",
    "train_data = datasets.MNIST(\n",
    "    root='data/', train=True, download=True, transform=transform\n",
    ")\n",
    "test_data = datasets.MNIST(\n",
    "    root='data/', train=False, download=True, transform=transform\n",
    ")\n",
    "\n",
    "train_loader = DataLoader(train_data, batch_size=64, shuffle=True)\n",
    "test_loader = DataLoader(test_data, batch_size=256, shuffle=False)\n",
    "```\n",
    "\n",
    "### Common Datasets\n",
    "\n",
    "| Dataset | Code | Shape | Classes |\n",
    "|:---|:---|:---|:---|\n",
    "| MNIST | `datasets.MNIST(...)` | 1x28x28 | 10 digits |\n",
    "| FashionMNIST | `datasets.FashionMNIST(...)` | 1x28x28 | 10 clothing |\n",
    "| CIFAR-10 | `datasets.CIFAR10(...)` | 3x32x32 | 10 objects |\n",
    "| CIFAR-100 | `datasets.CIFAR100(...)` | 3x32x32 | 100 objects |\n",
    "| ImageNet | `datasets.ImageNet(...)` | 3x224x224 | 1000 objects |\n",
    "\n",
    "### Train/Validation Split\n",
    "\n",
    "```python\n",
    "from torch.utils.data import random_split\n",
    "\n",
    "full_dataset = MyDataset(X, y)\n",
    "train_size = int(0.8 * len(full_dataset))\n",
    "val_size = len(full_dataset) - train_size\n",
    "train_set, val_set = random_split(full_dataset, [train_size, val_size])\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec12-save-load",
   "metadata": {},
   "source": [
    "## 12. Save and Load\n",
    "\n",
    "### Model Weights (Recommended)\n",
    "\n",
    "```python\n",
    "# Save weights only\n",
    "torch.save(model.state_dict(), 'model_weights.pth')\n",
    "\n",
    "# Load weights\n",
    "model = MyModel()                                     # create model first\n",
    "model.load_state_dict(torch.load('model_weights.pth', weights_only=True))\n",
    "model.eval()                                          # set to eval mode\n",
    "```\n",
    "\n",
    "### Full Checkpoint (model + optimizer + epoch)\n",
    "\n",
    "```python\n",
    "# Save checkpoint\n",
    "torch.save({\n",
    "    'epoch': epoch,\n",
    "    'model_state_dict': model.state_dict(),\n",
    "    'optimizer_state_dict': optimizer.state_dict(),\n",
    "    'loss': loss.item(),\n",
    "}, 'checkpoint.pth')\n",
    "\n",
    "# Load checkpoint\n",
    "checkpoint = torch.load('checkpoint.pth', weights_only=True)\n",
    "model.load_state_dict(checkpoint['model_state_dict'])\n",
    "optimizer.load_state_dict(checkpoint['optimizer_state_dict'])\n",
    "start_epoch = checkpoint['epoch']\n",
    "```\n",
    "\n",
    "```{admonition} Avoid Saving the Entire Model\n",
    ":class: warning\n",
    "`torch.save(model, 'model.pth')` uses Python pickle, which ties the saved file\n",
    "to the exact class definition and directory structure. Saving `state_dict()` is\n",
    "portable and robust.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec13-patterns",
   "metadata": {},
   "source": [
    "## 13. Common Patterns and Idioms\n",
    "\n",
    "### Flattening for Fully-Connected Layers\n",
    "\n",
    "```python\n",
    "# After conv layers, before FC layers\n",
    "x = x.view(x.size(0), -1)        # flatten keeping batch dim\n",
    "x = x.flatten(1)                  # equivalent, more explicit\n",
    "x = nn.Flatten()(x)               # as a module (use in Sequential)\n",
    "```\n",
    "\n",
    "### Weight Initialization\n",
    "\n",
    "```python\n",
    "def init_weights(m):\n",
    "    if isinstance(m, nn.Linear):\n",
    "        nn.init.xavier_uniform_(m.weight)\n",
    "        nn.init.zeros_(m.bias)\n",
    "    elif isinstance(m, nn.Conv2d):\n",
    "        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n",
    "\n",
    "model.apply(init_weights)         # applies recursively to all modules\n",
    "```\n",
    "\n",
    "### Freezing Layers (Transfer Learning)\n",
    "\n",
    "```python\n",
    "# Freeze all layers\n",
    "for param in model.parameters():\n",
    "    param.requires_grad = False\n",
    "\n",
    "# Unfreeze only the classifier head\n",
    "for param in model.classifier.parameters():\n",
    "    param.requires_grad = True\n",
    "\n",
    "# Only pass trainable params to optimizer\n",
    "optimizer = optim.Adam(\n",
    "    filter(lambda p: p.requires_grad, model.parameters()),\n",
    "    lr=1e-3\n",
    ")\n",
    "```\n",
    "\n",
    "### Extracting Scalar from Tensor\n",
    "\n",
    "```python\n",
    "loss_value = loss.item()          # Python float from 0-dim tensor\n",
    "count = correct.item()            # Python int\n",
    "# .item() only works on tensors with exactly one element\n",
    "```\n",
    "\n",
    "### Reproducibility\n",
    "\n",
    "```python\n",
    "torch.manual_seed(42)\n",
    "torch.cuda.manual_seed_all(42)\n",
    "np.random.seed(42)\n",
    "\n",
    "# For full determinism (may slow things down)\n",
    "torch.backends.cudnn.deterministic = True\n",
    "torch.backends.cudnn.benchmark = False\n",
    "```\n",
    "\n",
    "### Mixed Precision Training\n",
    "\n",
    "```python\n",
    "from torch.amp import autocast, GradScaler\n",
    "\n",
    "scaler = GradScaler()\n",
    "\n",
    "for X_batch, y_batch in train_loader:\n",
    "    optimizer.zero_grad()\n",
    "    \n",
    "    with autocast(device_type='cuda'):      # forward in float16\n",
    "        pred = model(X_batch)\n",
    "        loss = criterion(pred, y_batch)\n",
    "    \n",
    "    scaler.scale(loss).backward()           # scaled backward\n",
    "    scaler.step(optimizer)                  # unscale + step\n",
    "    scaler.update()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec14-cnn-patterns",
   "metadata": {},
   "source": [
    "## 14. CNN Architecture Patterns\n",
    "\n",
    "### Basic CNN for Image Classification\n",
    "\n",
    "```python\n",
    "class SimpleCNN(nn.Module):\n",
    "    def __init__(self, num_classes=10):\n",
    "        super().__init__()\n",
    "        self.features = nn.Sequential(\n",
    "            nn.Conv2d(1, 32, 3, padding=1),      # 28x28 -> 28x28\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(2),                      # 28x28 -> 14x14\n",
    "            nn.Conv2d(32, 64, 3, padding=1),      # 14x14 -> 14x14\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(2),                      # 14x14 -> 7x7\n",
    "        )\n",
    "        self.classifier = nn.Sequential(\n",
    "            nn.Flatten(),\n",
    "            nn.Linear(64 * 7 * 7, 128),\n",
    "            nn.ReLU(),\n",
    "            nn.Dropout(0.5),\n",
    "            nn.Linear(128, num_classes),\n",
    "        )\n",
    "    \n",
    "    def forward(self, x):\n",
    "        x = self.features(x)\n",
    "        return self.classifier(x)\n",
    "```\n",
    "\n",
    "### Conv2d Output Size Formula\n",
    "\n",
    "$$H_{\\text{out}} = \\left\\lfloor \\frac{H_{\\text{in}} + 2 \\times \\text{padding} - \\text{kernel\\_size}}{\\text{stride}} \\right\\rfloor + 1$$\n",
    "\n",
    "| Config | Input 28x28 | Input 32x32 |\n",
    "|:---|:---|:---|\n",
    "| `Conv2d(*, *, 3, padding=1)` | 28x28 | 32x32 |\n",
    "| `Conv2d(*, *, 3, padding=0)` | 26x26 | 30x30 |\n",
    "| `Conv2d(*, *, 5, padding=2)` | 28x28 | 32x32 |\n",
    "| `MaxPool2d(2)` | 14x14 | 16x16 |\n",
    "| `Conv2d(*, *, 3, stride=2, padding=1)` | 14x14 | 16x16 |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec15-debugging",
   "metadata": {},
   "source": [
    "## 15. Debugging Tips\n",
    "\n",
    "### Shape Debugging\n",
    "\n",
    "```python\n",
    "# Print shapes at each step in forward()\n",
    "def forward(self, x):\n",
    "    print(f\"Input:      {x.shape}\")\n",
    "    x = self.conv1(x)\n",
    "    print(f\"After conv1: {x.shape}\")\n",
    "    x = self.pool(x)\n",
    "    print(f\"After pool:  {x.shape}\")\n",
    "    x = x.flatten(1)\n",
    "    print(f\"Flattened:   {x.shape}\")\n",
    "    return self.fc(x)\n",
    "```\n",
    "\n",
    "### Common Errors and Fixes\n",
    "\n",
    "| Error | Likely Cause | Fix |\n",
    "|:---|:---|:---|\n",
    "| `RuntimeError: mat1 and mat2 shapes cannot be multiplied` | Wrong Linear input size | Print shape before the Linear layer |\n",
    "| `RuntimeError: expected all tensors to be on the same device` | Mixed CPU/CUDA tensors | `.to(device)` for both model and data |\n",
    "| `RuntimeError: element 0 of tensors does not require grad` | Forgot `requires_grad=True` | Check input tensor settings |\n",
    "| `RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation` | In-place op on grad tensor | Replace `x += 1` with `x = x + 1` |\n",
    "| `ValueError: expected target size (N, C) got (N,)` | Wrong loss function target format | Check `nn.CrossEntropyLoss` expects `(N,)` indices |\n",
    "| Loss is `nan` | Exploding gradients, bad lr | Reduce lr, add gradient clipping |\n",
    "| Loss stuck / not decreasing | Learning rate too low, or bug | Overfit on 1 batch first to verify model works |\n",
    "\n",
    "### NaN and Gradient Checks\n",
    "\n",
    "```python\n",
    "# Check for NaN in loss\n",
    "assert not torch.isnan(loss), \"Loss is NaN!\"\n",
    "\n",
    "# Check for NaN in gradients\n",
    "for name, param in model.named_parameters():\n",
    "    if param.grad is not None and torch.isnan(param.grad).any():\n",
    "        print(f\"NaN gradient in {name}\")\n",
    "\n",
    "# Numerical gradient verification\n",
    "torch.autograd.gradcheck(func, inputs, eps=1e-6, atol=1e-4)\n",
    "\n",
    "# Check gradient magnitudes\n",
    "for name, param in model.named_parameters():\n",
    "    if param.grad is not None:\n",
    "        print(f\"{name}: grad_norm={param.grad.norm():.4f}\")\n",
    "```\n",
    "\n",
    "### The \"Overfit One Batch\" Test\n",
    "\n",
    "```python\n",
    "# Sanity check: can the model memorize a single batch?\n",
    "X_batch, y_batch = next(iter(train_loader))\n",
    "X_batch, y_batch = X_batch.to(device), y_batch.to(device)\n",
    "\n",
    "model.train()\n",
    "for i in range(200):\n",
    "    pred = model(X_batch)\n",
    "    loss = criterion(pred, y_batch)\n",
    "    optimizer.zero_grad()\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "    if i % 50 == 0:\n",
    "        print(f\"Step {i}: loss={loss.item():.4f}\")\n",
    "# Loss should drop to ~0. If not, your model or data pipeline has a bug.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec16-quick-ref",
   "metadata": {},
   "source": [
    "## 16. Quick Reference Card\n",
    "\n",
    "The 25 most-used PyTorch functions and patterns at a glance.\n",
    "\n",
    "| # | Function / Pattern | What It Does |\n",
    "|:--|:---|:---|\n",
    "| 1 | `torch.tensor(data)` | Create a tensor from Python data |\n",
    "| 2 | `torch.randn(shape)` | Random tensor from standard normal |\n",
    "| 3 | `x.to(device)` | Move tensor to CPU/GPU |\n",
    "| 4 | `x.shape` | Tensor dimensions |\n",
    "| 5 | `x.view(shape)` / `x.reshape(shape)` | Reshape tensor |\n",
    "| 6 | `x.requires_grad_(True)` | Enable gradient tracking (in-place) |\n",
    "| 7 | `loss.backward()` | Compute all gradients via backprop |\n",
    "| 8 | `x.grad` | Access computed gradient |\n",
    "| 9 | `torch.no_grad()` | Context manager to disable gradients |\n",
    "| 10 | `x.detach()` | Detach tensor from computation graph |\n",
    "| 11 | `nn.Linear(in, out)` | Fully connected layer |\n",
    "| 12 | `nn.Conv2d(in, out, k)` | 2D convolution layer |\n",
    "| 13 | `nn.ReLU()` | ReLU activation |\n",
    "| 14 | `nn.Sequential(...)` | Chain layers into a model |\n",
    "| 15 | `nn.CrossEntropyLoss()` | Classification loss (includes softmax) |\n",
    "| 16 | `nn.MSELoss()` | Regression loss |\n",
    "| 17 | `optim.Adam(params, lr)` | Adam optimizer |\n",
    "| 18 | `optimizer.zero_grad()` | Zero all parameter gradients |\n",
    "| 19 | `optimizer.step()` | Update parameters using gradients |\n",
    "| 20 | `model.train()` | Set model to training mode |\n",
    "| 21 | `model.eval()` | Set model to evaluation mode |\n",
    "| 22 | `model.parameters()` | Iterator over model parameters |\n",
    "| 23 | `DataLoader(dataset, batch_size)` | Batched data iterator |\n",
    "| 24 | `torch.save(state_dict, path)` | Save model weights |\n",
    "| 25 | `loss.item()` | Extract Python scalar from loss tensor |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sec17-imports",
   "metadata": {},
   "source": [
    "## 17. Import Cheat Sheet\n",
    "\n",
    "```python\n",
    "import torch                              # core library\n",
    "import torch.nn as nn                     # neural network modules\n",
    "import torch.nn.functional as F           # functional API (relu, softmax, etc.)\n",
    "import torch.optim as optim               # optimizers\n",
    "from torch.utils.data import Dataset, DataLoader  # data utilities\n",
    "\n",
    "import torchvision                        # vision datasets and transforms\n",
    "from torchvision import datasets, transforms\n",
    "\n",
    "import numpy as np                        # interop\n",
    "import matplotlib.pyplot as plt           # plotting\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "footer",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "**References.**\n",
    "\n",
    "- Paszke, A., Gross, S., Massa, F., et al. (2019). \"PyTorch: An Imperative Style, High-Performance Deep Learning Library.\" *Advances in Neural Information Processing Systems 32*.\n",
    "- [PyTorch Documentation](https://pytorch.org/docs/stable/) -- official API reference.\n",
    "- [PyTorch Tutorials](https://pytorch.org/tutorials/) -- official learning resources."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbformat_minor": 5,
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}