No description
Find a file
2025-09-25 15:55:05 +02:00
assets 📝 update README demo part 2025-03-11 18:47:15 +01:00
input feat: add input dataset 2025-03-11 18:16:59 +01:00
output fix: use crate factory 2025-03-12 20:33:19 +01:00
src feat: improve report 2025-09-25 15:51:54 +02:00
tests refacto: sample path as constant 2025-03-12 21:54:15 +01:00
.gitignore 🙈 update gitignore 2025-03-11 18:16:41 +01:00
Cargo.lock feat: remove egui lib 2025-04-12 13:34:31 +02:00
Cargo.toml feat: remove egui lib 2025-04-12 13:34:31 +02:00
README.md docs: update README 2025-09-25 15:55:05 +02:00

Rust implementation of a neural network.

Introduction

I chose as SAP project to build a neural network in rust.

The idea comes from the video : building a neural network from scratch, this video explains the steps to train a neural network "from scratch" i.e. without a machine learning libraries like Pytorch or Tensorflow. This implementation is a solution submitted to the competition Digit Recognizer. The goal of this competition is to correctly identify digits from a dataset of tens of thousands of handwritten images (MNIST dataset).

digits

I used video and the notebook as a reference implementation. I also use the notebook to generate expected results for tests.

Multi Layer Perceptron

Multi Layer Perceptron (MLP) is a type of neural network composed of multiple layers of neurons. There are at least 3 layers: one input layer (input values), one output layer (output values), and one or many hidden layers. The output of a perceptron y is a combination of all the outputs of the previous layer x_{i}, each output multiplied by a weigh w_{i}.

y = \Sigma_{i}(x_{i}w_{i} + b)

Each perceptron output is passed through an activation function before being used by the perceptron of the next layer. It introduces non-linearity into the model, allowing the network to learn and represent complex patterns in the data. The activation functions used are the Rectified Linear Unit (ReLU) for input and hidden layers and Softmax for the output layer.

$$ReLU(x) = \begin{cases} x &\text{if } x > 0 \ 0 &\text{else } \end{cases}

def ReLU(X):
    return np.maximum(X,0)
softmax(x_{i}) = \frac{e^{x_{i}}}{\sum_{j=1}^K e^{x_{j}}} \\ 
def softmax(X):
    exp = np.exp(X - np.max(X))
    return exp / exp.sum(axis=0)

Training

Multi-Layer Perceptron training iteratively fine-tunes the model. An iteration is composed of :

  1. The forward propagation: during this step, the input data is fed through the network. The output of the last layer gives us the model prediction. We can compare the difference between the predicted output and the actual value.
  2. The backward propagation: starting from the output layer, the propagates this error through the network. The goal is to evaluate, for each weight, how much this weight contributed to the final error and how we can adjust his weight to improve the final prediction. This value is called the gradient.
  3. Update each weight with his computed gradient (weighted by the learning rate).

Rust implementation

Architecture

    ─ src
        ├─ model            // multilayer perceptron struct
        ├─ train            // model training functions (forward, backward...)  
        ├─ dataset          // dataset struct
        ├─ arguments        // parse and manage script arguments   
        ├─ factory          // model / dataset factories (csv / npy parser )
        └─ main.rs
    ─ tests
        ├─ python_scripts   // to generate expected values 
        └─ tests 

Crates

Rust libraries are called crates, this is non-exhaustive list of crates I used :

  • nalgebra : for data structure as matrix and vector, and function to manipulate this structures
  • clap : to parse arguments (i.e iteration, output destination...)
  • egui and eframe : for demo gui
  • csv : to parse csv
  • ndarray-npy : to read and write npy files (numpy storage format)

Evolution

  • parallelize operations : matrix multiplication, update weights, etc.
  • implement convolutional layers (example). Convolutional layer applies filters to the input image to extract features. Convolution layer values are learned during training in the same way as the perceptron weights.

Run the project

Setup

  1. Install rust
  2. Extract dataset and pretrained model
unzip ./input/train.zip -d ./input/
unzip ./output/example_model_pretrained.zip -d ./output/example_model_pretrained

Run

Build and run the projet (model training + demo):

cargo run 

Default project process :

  1. load the input dataset
  2. shuffle the dataset data
  3. split the dataset into a training set and a test set
  4. train the model on the train set
  5. evaluate the model on the test set

Alternative use

Alternative independent possible steps :

  • load a pretrained model
  • skip the training
  • skip the evaluation

Examples :

Run only the demo with a pretrained model :

cargo run -- --skip-training --test-rate 0 --input_model ./output/example_model_pretrained

Only evaluate a pretrained model :

cargo run -- --skip-training --test-rate 100 --input_model ./output/example_model_pretrained

Documentation :

cargo run -- --help

Test

Run test suite :

cargo test