Shapley-based interpretation of deep learning models for wildfire spread rate prediction

Fire Ecology

Table 4 List of hyperparameters used for transformer model implementation

Sr #	Hyperparameters	Description
1	Number of layers (encoder)	Number of stacked encoder layers in the transformer
2	Model dimension (d_model)	Dimension of the model’s input and output embeddings
3	Number of heads	Number of attention heads for multi-head attention
4	Feed-forward dimension	Dimension of the feed-forward network’s inner layer
5	Activation function	Activation function used in the feed-forward network (e.g., ReLU, GELU)
6	Dropout rate	Dropout rate applied to various parts of the model (e.g., attention, FFN)
7	Attention mechanism	Type of attention mechanism adapted for non-sequential data (e.g., scaled dot-product attention)
8	Learning rate	Learning rate used during optimization
9	Learning rate scheduler	Scheduler used to change learning rate during training (e.g., cosine annealing)
10	Weight initialization	Method to initialize weights (e.g., Xavier, He)
11	Batch size	Number of data points processed in a single batch
12	Number of epochs	Total number of times the model sees the entire dataset during training
13	Optimization algorithm	Optimization algorithm used (e.g., Adam, SGD)
14	Warm-up steps	Number of warm-up steps for learning rate in some schedulers
15	Regularization (L2 penalty)	Weight decay or L2 penalty, if applied