No description

Find a file

waali 672aba5a6f docs: update README		2026-05-12 15:23:57 +02:00
assets	📝 update README demo part	2025-03-11 18:47:15 +01:00
input	feat: add input dataset	2025-03-11 18:16:59 +01:00
output	fix: use crate factory	2025-03-12 20:33:19 +01:00
src	feat: improve report	2025-09-25 15:51:54 +02:00
tests	refacto: sample path as constant	2025-03-12 21:54:15 +01:00
.gitignore	🙈 update gitignore	2025-03-11 18:16:41 +01:00
Cargo.lock	feat: remove egui lib	2025-04-12 13:34:31 +02:00
Cargo.toml	feat: remove egui lib	2025-04-12 13:34:31 +02:00
README.md	docs: update README	2026-05-12 15:23:57 +02:00

README.md

Rust implementation of a neural network.

Introduction

The idea comes from the video : building a neural network from scratch, this video explains the steps to train a neural network "from scratch" i.e. without a machine learning libraries like Pytorch or Tensorflow. This implementation is a solution submitted to the competition Digit Recognizer. The goal of this competition is to correctly identify digits from a dataset of tens of thousands of handwritten images (MNIST dataset).

I used video and the notebook as a reference implementation. I also use the notebook to generate expected results for tests.

Multi Layer Perceptron

Multi Layer Perceptron (MLP) is a type of neural network composed of multiple layers of neurons. There are at least 3 layers: one input layer (input values), one output layer (output values), and one or many hidden layers. The output of a perceptron y is a combination of all the outputs of the previous layer x_{i}, each output multiplied by a weigh w_{i}.

y = \Sigma_{i}(x_{i}w_{i} + b)

Each perceptron output is passed through an activation function before being used by the perceptron of the next layer. It introduces non-linearity into the model, allowing the network to learn and represent complex patterns in the data. The activation functions used are the Rectified Linear Unit (ReLU) for input and hidden layers and Softmax for the output layer.

ReLU(x) = \begin{cases} x &\text{if } x > 0 \ 0 &\text{else } \end{cases}

def ReLU(X):
    return np.maximum(X,0)

softmax(x_{i}) = \frac{e^{x_{i}}}{\sum_{j=1}^K e^{x_{j}}} \\

def softmax(X):
    exp = np.exp(X - np.max(X))
    return exp / exp.sum(axis=0)

Training

Multi-Layer Perceptron training iteratively fine-tunes the model. An iteration is composed of :

The forward propagation: during this step, the input data is fed through the network. The output of the last layer gives us the model prediction. We can compare the difference between the predicted output and the actual value.
The backward propagation: starting from the output layer, the propagates this error through the network. The goal is to evaluate, for each weight, how much this weight contributed to the final error and how we can adjust his weight to improve the final prediction. This value is called the gradient.
Update each weight with his computed gradient (weighted by the learning rate).

Rust implementation

Architecture

    ─ src
        ├─ model            // multilayer perceptron struct
        ├─ train            // model training functions (forward, backward...)  
        ├─ dataset          // dataset struct
        ├─ arguments        // parse and manage script arguments   
        ├─ factory          // model / dataset factories (csv / npy parser )
        └─ main.rs
    ─ tests
        ├─ python_scripts   // to generate expected values 
        └─ tests

Crates

Rust libraries are called crates, this is non-exhaustive list of crates I used :

nalgebra : for data structure as matrix and vector, and function to manipulate this structures
clap : to parse arguments (i.e iteration, output destination...)
egui and eframe : for demo gui
csv : to parse csv
ndarray-npy : to read and write npy files (numpy storage format)

Evolution

parallelize operations : matrix multiplication, update weights, etc.
implement convolutional layers (example). Convolutional layer applies filters to the input image to extract features. Convolution layer values are learned during training in the same way as the perceptron weights.

Run the project

Setup

Install rust
Extract dataset and pretrained model

unzip ./input/train.zip -d ./input/
unzip ./output/example_model_pretrained.zip -d ./output/example_model_pretrained

Run

Build and run the projet (model training + demo):

cargo run

Default project process :

load the input dataset
shuffle the dataset data
split the dataset into a training set and a test set
train the model on the train set
evaluate the model on the test set

Alternative use

Alternative independent possible steps :

load a pretrained model
skip the training
skip the evaluation

Examples :

Run only the demo with a pretrained model :

cargo run -- --skip-training --test-rate 0 --input_model ./output/example_model_pretrained

Only evaluate a pretrained model :

cargo run -- --skip-training --test-rate 100 --input_model ./output/example_model_pretrained

Documentation :

cargo run -- --help

Test

Run test suite :

cargo test