{
"cells":[{
"cell_type":"markdown","metadata":{
"id":"mqfVkflEwVMJ"},"source":["# **1 Deep Learning**"]},{
"cell_type":"markdown","metadata":{
"id":"DmllWyc2wo4j"},"source":["## Starting Example"]},{
"cell_type":"markdown","metadata":{
"id":"ggvBvMbmkij1"},"source":["- The MNIST handwritten digit recognition example is a great starting point for learning deep learning because it allows you to quickly understand the basics of neural network construction.\n","\n","- By working on this example, you will gain hands-on experience with building a neural network model that can accurately classify handwritten digits."]},{
"cell_type":"markdown","metadata":{
"id":"8qZpg4Nww7zB"},"source":["### Load MNIST Dataset"]},{
"cell_type":"markdown","metadata":{
"id":"nqPTavH0kmzn"},"source":["There are two versions of Keras - `tensorflow.keras` and `keras`.\n","\n","+ The `tensorflow.keras` module is a part of TensorFlow and is the recommended option for most users. It implements the Keras API with seamless integration into TensorFlow. In contrast, `keras` is an independent library developed before TensorFlow had its own implementation.\n","\n","+ Although both share similar APIs, there are subtle differences. We use `tensorflow.keras` here for its better compatibility with TensorFlow. Generally, code written for `keras` works with `tensorflow.keras`, but there might be minor variations or additional features exclusive to `tensorflow.keras`."]},{
"cell_type":"code","execution_count":17,"metadata":{
"colab":{
"base_uri":"https://localhost:8080/"},"id":"cwFJW8ABI5qC","outputId":"173b64ad-3d93-4f5a-bd3b-8afbf0da056b"},"outputs":[{
"name":"stdout","output_type":"stream","text":["x_train shape: (48000, 28, 28), y_train shape: (48000,)\n","x_val shape: (12000, 28, 28), y_val shape: (12000,)\n","x_test shape: (10000, 28, 28), y_test shape: (10000,)\n"]}],"source":["import tensorflow as tf\n","from sklearn.model_selection import train_test_split\n","\n","# Load the MNIST dataset\n","(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()\n","\n","# Preprocess the data by normalizing pixel values to the range [0, 1]\n","x_train = x_train / 255.0\n","x_test = x_test / 255.0\n","\n","# Split training data into training and validation sets\n","x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=42)\n","\n","# Print dataset shapes to confirm the split\n","print(f\"x_train shape: {x_train.shape}, y_train shape: {y_train.shape}\")\n","print(f\"x_val shape: {x_val.shape}, y_val shape: {y_val.shape}\")\n","print(f\"x_test shape: {x_test.shape}, y_test shape: {y_test.shape}\")\n"]},{
"cell_type":"code","execution_count":18,"metadata":{
"colab":{
"base_uri":"https://localhost:8080/","height":360},"id":"dMSEsmxowYdS","outputId":"d33ebdbe-35a3-4743-fc0b-16d18b7a56d8"},"outputs":[{
"data":{
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxsAAAFXCAYAAADK21P3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYX
"text/plain":["<Figure size 1000x400 with 10 Axes>"]},"metadata":{
},"output_type":"display_data"}],"source":["import matplotlib.pyplot as plt\n","\n","# We can visualize the first 10 images from the training set to better understand the input data\n","plt.figure(figsize = (10, 4))\n","for i in range(10):\n"," plt.subplot(2, 5, i+1)\n"," plt.imshow(x_train[i], cmap = 'gray')\n"," plt.axis('off')\n"," plt.title(str(y_train[i]))\n","plt.show()\n"]},{
"cell_type":"markdown","metadata":{
"id":"UmvGibB9kqoz"},"source":["### Train a Simple Neural Network"]},{
"cell_type":"markdown","metadata":{
"id":"v4KXlBZrx3Xj"},"source":["**1. Flatten Layer**: Reshapes the 2D input images of size `(28, 28)` into a 1D array of size `784`.\n","\n","**2. Hidden Layer**\n"," - Number of hidden units: `50`. Extract features from the input data. You can increase it to capture more features or decrease to prevent overfitting.\n"," - Activation function: ReLU (Rectified Linear Unit). Introduce non-linearity, enabling the network to model complex patterns.\n"," - Weight initialization: RandomNormal (mean=0.0, stddev=0.05).\n"," \n"," It sets the starting point for training a neural network. Proper initialization ensures stable gradients, accelerates convergence, and prevents issues like vanishing or exploding gradients, leading to smoother and more efficient training.\n","\n","**3. Output Layer**\n"," - Number of units: `10` (one for each digit class). Convert the raw output into a probability distribution over the 10 classes.\n"," - Activation function: softmax.\n","\n","\n","**4. Loss Function**\n","- Sparse categorical cross-entropy. When labels are integers (e.g., `[0, 1, 2, ..., 9]`), not one-hot encoded.\n","- It measures the difference between the predicted probability distribution (from softmax) and the true class labels.\n","- If labels are one-hot encoded (e.g., `[[1, 0, 0, ..., 0], [0, 1, 0, ..., 0]]`), use **categorical cross-entropy**."]},{
"cell_type":"code","execution_count":19,"metadata":{
"colab":{
"base_uri":"https://localhost:8080/","height":287},"id":"sOyttYDOyI24","outputId":"4a854f94-639b-452d-df3e-032cc66d9b74"},"outputs":[{
"name":"stderr","output_type":"stream","text":["/usr/local/lib/python3.10/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.\n"," super().__init__(**kwargs)\n"]},{
"data":{
"text/html":["<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">Model: \"sequential\"</span>\n","</pre>\n"],"text/plain":["\u001b[1mModel: \"sequential\"\u001b[0m\n"]},"metadata":{
},"output_type":"display_data"},{
"data":{
"text/html":["<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓\n"]},"metadata":{
},"output_type":"display_data"},{
"data":{
"text/html":["<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Total params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">39,760</span> (155.31 KB)\n","</pre>\n"],"text/plain":["\u001b[1m Total params: \u001b[0m\u001b[38;5;34m39,760\u001b[0m (155.31 KB)\n"]},"metadata":{
},"output_type":"display_data"},{
"data":{
"text/html":["<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Trainable params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">39,760</span> (155.31 KB)\n","</pre>\n"],"text/plain":["\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m39,760\u001b[0m (155.31 KB)\n"]},"metadata":{
},"output_type":"display_data"},{
"data":{
"text/html":["<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Non-trainable params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">0</span> (0.00 B)\n","</pre>\n"],"text/plain":["\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n"]},"metadata":{
},"output_type":"display_data"}],"source":["from tensorflow.keras.models import Sequential\n","from tensorflow.keras.layers import Dense, Flatten\n","from tensorflow.keras.optimizers import Adam\n","from tensorflow.keras.initializers import RandomNormal\n","\n","# Build the model\n","model = Sequential()\n","\n","# 1. Flatten the 2D input images into a 1D array\n","model.add(Flatten(input_shape = (28, 28)))\n","\n","# 2. Add a fully connected (dense) hidden layer\n","model.add(Dense(50, activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05)))\n","\n","# 3. Add the output layer with 10 units (one for each digit) and softmax activation\n","model.add(Dense(10, activation='softmax'))\n","\n","# Print the model summary\n","model.summary()"]},{
"cell_type":"markdown","metadata":{
"id":"3ImvmXaPMfMS"},"source":["**How are the Parameters Calculated in the Model?**:\n","\n","1. **Flatten Layer**:\n"," - Reshapes the 2D input `(28, 28)` into a 1D vector of size 784.\n"," - **Parameters**: 0 (no trainable parameters).\n","\n","2. **Hidden Layer**:\n"," - **Operation**: Computes $z = \\alpha \\cdot x + \\beta$, where:\n"," - $\\alpha$: Weights matrix, size 784 $\\times$ 50.\n"," - $\\beta$: Bias vector, size 50.\n"," - **Parameters**: 784 $\\t