Skip to main content

Table 4 List of hyperparameters used for transformer model implementation

From: Shapley-based interpretation of deep learning models for wildfire spread rate prediction

Sr #

Hyperparameters

Description

1

Number of layers (encoder)

Number of stacked encoder layers in the transformer

2

Model dimension (d_model)

Dimension of the model’s input and output embeddings

3

Number of heads

Number of attention heads for multi-head attention

4

Feed-forward dimension

Dimension of the feed-forward network’s inner layer

5

Activation function

Activation function used in the feed-forward network (e.g., ReLU, GELU)

6

Dropout rate

Dropout rate applied to various parts of the model (e.g., attention, FFN)

7

Attention mechanism

Type of attention mechanism adapted for non-sequential data (e.g., scaled dot-product attention)

8

Learning rate

Learning rate used during optimization

9

Learning rate scheduler

Scheduler used to change learning rate during training (e.g., cosine annealing)

10

Weight initialization

Method to initialize weights (e.g., Xavier, He)

11

Batch size

Number of data points processed in a single batch

12

Number of epochs

Total number of times the model sees the entire dataset during training

13

Optimization algorithm

Optimization algorithm used (e.g., Adam, SGD)

14

Warm-up steps

Number of warm-up steps for learning rate in some schedulers

15

Regularization (L2 penalty)

Weight decay or L2 penalty, if applied