From: Shapley-based interpretation of deep learning models for wildfire spread rate prediction
Sr # | Hyperparameters | Description |
---|---|---|
1 | Number of layers (encoder) | Number of stacked encoder layers in the transformer |
2 | Model dimension (d_model) | Dimension of the model’s input and output embeddings |
3 | Number of heads | Number of attention heads for multi-head attention |
4 | Feed-forward dimension | Dimension of the feed-forward network’s inner layer |
5 | Activation function | Activation function used in the feed-forward network (e.g., ReLU, GELU) |
6 | Dropout rate | Dropout rate applied to various parts of the model (e.g., attention, FFN) |
7 | Attention mechanism | Type of attention mechanism adapted for non-sequential data (e.g., scaled dot-product attention) |
8 | Learning rate | Learning rate used during optimization |
9 | Learning rate scheduler | Scheduler used to change learning rate during training (e.g., cosine annealing) |
10 | Weight initialization | Method to initialize weights (e.g., Xavier, He) |
11 | Batch size | Number of data points processed in a single batch |
12 | Number of epochs | Total number of times the model sees the entire dataset during training |
13 | Optimization algorithm | Optimization algorithm used (e.g., Adam, SGD) |
14 | Warm-up steps | Number of warm-up steps for learning rate in some schedulers |
15 | Regularization (L2 penalty) | Weight decay or L2 penalty, if applied |