Skip to main content
  • Original research
  • Open access
  • Published:

Ultra-lightweight convolution-transformer network for early fire smoke detection

Abstract

Background

Forests are invaluable resources, and fire is a natural process that is considered an integral part of the forest ecosystem. Although fire offers several ecological benefits, its frequent occurrence in different parts of the world has raised concerns in the recent past. Covering millions of hectares of forest land, these fire incidents have resulted in the loss of human lives, wild habitats, civil infrastructure, and severe damage to the environment. Around 90% of wildland fires have been caused by humans intentionally or unintentionally. Early detection of fire close to human settlements and wildlife centuries can help mitigate fire hazards. Numerous artificial intelligence-based solutions have been proposed in the past decade that prioritize the detection of fire smoke, as it can be caught through remote sensing and provide an early sign of wildland fire. However, most of these methods are either computationally intensive or suffer from a high false alarm rate. In this paper, a lightweight deep neural network model is proposed for fire smoke detection in images captured by satellites or other remote sensing sources.

Results

With only 0.6 million parameters and 0.4 billion floating point operations per second, the hybrid network of convolutional and vision transformer blocks efficiently detects smoke in normal and foggy environmental conditions. It outperforms seven state-of-the-art methods on four datasets, including a self-collected dataset from the “Moderate Resolution Imaging Spectroradiometer” satellite imagery. The model achieves an accuracy of more than 99% on three datasets and 93.90% on the fourth dataset. The t-distributed stochastic neighbor embedding of extracted features by the proposed model demonstrates its superior feature learning capabilities. It is remarkable that even a tiny occurrence of smoke covering just 2% of the satellite image area is efficiently detected by the model.

Conclusions

With low memory and computational demands, the proposed model works exceedingly well, making it suitable for deployment in resource constrained devices for forest surveillance and early fire smoke detection.

Resumen

Antecedentes

Los bosques son recursos invaluables y el fuego es un proceso natural que es considerado como una parte integral de los ecosistemas forestales. Aunque el fuego ofrece muchos beneficios ecológicos, su ocurrencia frecuente en muchas partes del mundo ha concitado cierta preocupación en el pasado reciente. Al cubrir miles de hectáreas de bosques, estos incidentes de incendios han resultado en pérdidas de vidas humanas, de áreas silvestres, de infraestructuras, y daños severos en el ambiente. Alrededor del 90% de los incendios de vegetación han sido causados por humanos, ya sea de manera intencional o accidental. La detección precoz de fuegos cercanos a asentamientos humanos y refugios de fauna silvestre pueden mitigar los efectos del fuego. Numerosas soluciones basadas en la inteligencia artificial han sido propuestas en la década pasada que priorizan la detección del humo de fuegos de vegetación, de manera que pueda ser captado por sensores remotos y proveer de una señal temprana de un evento de fuego. Sin embargo, la mayoría de estos métodos de detección son computacionalmente intensivos y pueden sufrir de una alta tasas de falsa alarma en la detección. En este trabajo, un modelo ligero de red neuronal profunda fue propuesto para la detección de humo en imágenes capturadas por satélites u otras fuentes provenientes de sensores remotos.

Resultados

Con solo 0,6 millones de parámetros y 0,4 billones de operaciones de puntos flotantes por segundo, la red híbrida de bloques convolucionales y de transformadores de visión detectan eficientemente el humo en condiciones ambientales normales y con neblinas. El mismo mejora los siete métodos del estado del arte sobre cuatro conjuntos de datos, incluyendo el conjunto de datos auto-colectado derivado de la imagen satelital del espectro-radiómetro de resolución moderada. El modelo alcanza una exactitud de más del 99% en tres conjuntos de datos, y del 93,90% en el cuarto conjunto de datos. La distribución estocástica de t del vecino más cercano incrustado en las características extraídas por el modelo propuesto demuestra su superioridad en las capacidades de aprendizaje. Es remarcable que aún una pequeña presencia de humo que cubra solo el 2% del área de la imagen del satélite es eficientemente detectada por el modelo.

Conclusiones

Con muy baja memoria y demandas computacionales, el modelo propuesto funciona extremadamente bien, haciéndolo muy adecuado para desplegarlo en dispositivos limitados en recursos, para la vigilancia forestal y la detección temprana de humo.

Introduction

Forests are essential natural resources that cover about 31% of the total Earth’s surface (Keenan et al. 2015). Wildland fire (wildfire) often destroys the forest landscape, which has been a concern, especially for human settlements near forest regions (Gajendiran et al. 2023). But wildfires are integral to the forest ecosystem and contribute to the enrichment of the soil, resulting in the growth of flora and fauna (Garcês and Pires 2023). Interestingly, some of the less dominant plant species find the regions affected by wildfire more suitable for their growth (Parkins et al. 2018). But wildfires change the landscape and can reduce the availability of fuel for future combustion (Balch et al. 2008). Vegetation cover and soil moisture level change over time, increasing wildfire risk.

Recent wildfire incidents have caused severe damage to vast forest areas, wildlife habitats, and undesired displacement of communities (Haque et al. 2021). The 2019–2020 Australian bushfire and the continued occurrence of fires have destroyed millions of hectares of forest land and thousands of civil properties and have claimed human and animal lives (Filkov et al. 2020). In the 2023 fire incident, approximately 10.6 million hectares of Brazilian rainforest have also been destroyed, an increase of 35.4% compared to previous years (RF 2023). Indian forests have also witnessed 52,785 wildfire incidents between November 2020 and June 2021, according to a report by the Forest Survey of India (FSI 2021). The 2023 wildfire incident in Canada has resulted in a loss of 7.8 million hectares of tree cover and a ruinous effect on the environment due to 3 billion tons of carbon dioxide emission, which is four times more than the carbon emission of the aviation sector globally in 2022 (MacCarthy et al. 2024). Therefore, early detection and suppression of such fire events are necessary in forest regions close to rural and urban areas. A major cause of the increase in wildfires is the influence of climate change that affects the forest ecosystems (Giorgis et al. 2021). But 90% wildfire incidents in the last decade have been caused by humans, intentionally or unintentionally (Pacificbio 2024). One of the recent studies by Trancoso et al. (2022) suggests that the change in precipitation, temperature, relative humidity, etc., due to the conversion of tropical forests to agriculture land has increased the wildfire susceptibility in Borneo’s ecosystem in Southeast Asia. It is needless to mention that hot and dry climate also contributes to dried vegetation in forest regions, which increases the probability of wildfires (Charizanos and Demirhan 2023).

Artificial intelligence (AI)-based computer vision techniques play an important role in the detection of wildfires. Since fire smoke can be easily caught from a distance, its early detection can help control fire (Chaturvedi et al. 2022b). Keeping this in view, numerous methods have been proposed in recent years for fire smoke detection. These methods classify images captured by satellites or other remote cameras into two categories, clear and presence of smoke (Liu et al. 2019; Yin et al. 2017; Filonenko et al. 2017). Most of the current methods are based on deep learning, a methodology that employs different types of artificial neural networks to develop smoke detection models (Almeida et al. 2023). Among these networks, convolutional neural networks (CNNs) are extensively applied for smoke detection (Khan et al. 2019; Muhammad et al. 2019; Khan et al. 2021; Almeida et al. 2022; Ahmad et al. 2023; Almeida et al. 2022; Sathishkumar et al. 2023), as they are predominantly used in computer vision-related AI tasks. CNNs automatically capture image information and find smoke patterns (features) like color, density, and irregular shapes that otherwise make smoke detection a difficult task for traditional machine learning methods (Kim and Muminov 2023). Apart from attaining high accuracy in smoke detection, the CNN-based models prove to be efficient in classifying smoke objects in normal and challenging environments like the presence of fog, clouds, and similar background conditions (Khan et al. 2019; Muhammad et al. 2019; Khan et al. 2021; Almeida et al. 2022). But most of the state-of-the-art AI models achieve the expected accuracy at the cost of high computation and memory requirements (Khan et al. 2019; Ba et al. 2019; Liu et al. 2019). Furthermore, such heavyweight models cannot be deployed on resource constrained devices in the Internet of Things (IoT)-based forest surveillance systems (Jadon et al. 2019).

The IoT-based systems are much in demand in the cyber-physical world and can play a useful role in preserving forests and related natural resources (Almeida et al. 2023). The IoT-based forest monitoring systems make use of sensors, drones, closed-circuit television (CCTV) cameras, and smartphones for data collection and communication (Myagmar-Ochir and Kim 2023). The collected data is stored and processed in real time or near real time using cloud-based platforms equipped with smoke detection models (Kaur et al. 2020; Almeida et al. 2022; Wang et al. 2023; Almeida et al. 2023; Giannakidou et al. 2024). The IoT systems running on cloud platforms often integrate devices that operate on batteries and may have limited computing power as well as memory to store and process data (Alam 2021). In view of this, several lightweight models have been suggested in recent years that can run on light computing devices (Muhammad et al. 2019; Almeida et al. 2022; Chaturvedi et al. 2022a). However, the lightweight smoke detection AI models may suffer from high false alarm rates. This may cause unnecessary mobilization of people (Chaturvedi et al. 2022a), and repeated false alarms may also turn fire managers and the general public unresponsive and lead to disastrous implications (Barnes et al. 2007).

The generalization capability of deep learning-based AI models is one of the most important research domains. AI models are expected to perform equally well on unseen data, especially when the data comes from a different distribution. In the context of smoke detection, the input data can be received from another satellite or a different remote sensing device. The current AI models are good at memorizing the patterns of the data on which they are trained, and they often fail to detect patterns that are not presented to them during training. Most of the existing smoke detection models have not been tested for their performance on images that come from a different data distribution. Their acceptability in real-world systems is therefore questionable. These limitations impede the deployment of fire smoke AI models for forest surveillance and early smoke detection in real-world scenarios (El-Madafri et al. 2023; Jadon et al. 2019; Hu et al. 2024; Ahmad et al. 2023).

Recently, vision transformer (ViT) networks have emerged as powerful neural networks in vision-based tasks like classification, segmentation, and object detection (Dosovitskiy et al. 2020). The unique self-attention mechanism of ViT provides an unparalleled performance in extracting global features from its input. Since CNN is known for its local feature learning property, combining their individual strengths can help address some of the existing problems, if both the networks simultaneously learn image features and complement each other in obtaining refined input features for the identification of smoke in images. Although some studies have explored ViT-based networks for fire smoke detection (Khan et al. 2023; Cheng et al. 2023), false alarm rates and generalizability and early detection of fire smoke remain challenging. To address some of the challenges, a lightweight, dual path deep learning model is proposed in this paper for detecting fire smoke in images. Before we present the model, a detailed review of existing works on fire smoke detection is presented in the next section.

Related works

There has been significant growth in the applications of AI in fire ecology. Effects of fire on ecological systems have been studied using AI approaches to assess the fire risks and the severity of fire propagation and to recommend fire management strategies. In a study by Mishra et al. (2023), trends and growth patterns of wildfire incidents in Nepal have been investigated based on the available data of the last two decades. Deep neural networks and maximum entropy are used by the authors to analyze the fire vulnerability risks. Renard et al. (2012) have also used the maximum entropy algorithm to analyze the spatial distribution of fire in the Western Ghats of India. Furthermore, Kim et al. (2019) have predicted wildfire probability using the random forest algorithm and maximum entropy algorithm while considering both environmental and socioeconomic factors.

To understand the factors influencing wildfire and related issues, and to estimate the severity of fire, several studies have been made, targeting fire-prone regions in Nepal, South America, California, and India (Miller et al. 2009; Giorgis et al. 2021; Keeley and Syphard 2021; Mishra et al. 2023; Wasserman and Mueller 2023; Jodhani et al. 2024). Miller et al. (2009) have studied the accuracy of the prediction of fire severity using different characteristics of the geographical regions including vegetation. Study on socioeconomic vulnerabilities is also an important area of research to improve preparedness against the risks associated with wildfire hazards (Prior and Eriksen 2013; Chuvieco et al. 2014; Palaiologou et al. 2019; Paveglio et al. 2016). In this area also, the current AI approaches play a useful role (Mishra et al. 2023; Saha et al. 2023). Kim et al. (2019) have utilized the random forest classifier to categorize the risks associated with wildfire into four different categories from very high to very low risk. They consider both the environmental and socioeconomic factors in assessing the risks. Jodhani et al. (2024) have presented a study to predict wildfire susceptibility in Gujarat state of India. They studied the change in landscape caused by wildfire using satellite image data. Then, they analyzed some of the important environmental variables such as slope orientation, elevation, drainage density, normalized difference vegetation index, and temperature to predict the susceptibility of wildfire using the random forest classification method. Other than these areas, there has been tremendous growth in the studies related to wildfire detection for mitigating fire hazards, and computer vision-based AI approaches have been extensively studied for improving the accuracy of fire detection and to reduce false alarms. Since fire smoke can appear earlier than flames in deep forest areas through satellite images or remote cameras, there has been a trend to devise AI-based solutions for automatic fire smoke detection and alert generation within an IoT framework (Almeida et al. 2022, 2023; Giannakidou et al. 2024).

Initial studies on recognizing fire smoke in an outdoor environment using AI models focused on data collection and construction of models to identify smoke in images or videos using well-known CNN architectures. These models were developed for the detection of smoke under normal weather conditions (Yin et al. 2017; Namozov and Im Cho 2018; Yin and Wei 2019; Yin et al. 2020). Later, challenging scenarios such as the presence of fog, haze, and clouds were also considered (Khan et al. 2019; Muhammad et al. 2019; Almeida et al. 2022). Due to the rapid growth of IoT-based applications in recent years, major attention has been towards lightweight smoke detection models to aid the automatic monitoring and fire smoke detection.

In one of the early works, Tao et al. (2016) proposed a neural network architecture similar to AlexNet (Krizhevsky et al. 2017) for detecting smoke in normal environmental conditions. The model performed well with an accuracy of 99.56% and a false alarm rate of 0.44% but with a large number of parameters (46.7 million). Some of the important contributions in the field were based on fine tuning of deep learning models that were pretrained for general image classification tasks. These models were developed using standard networks like VGG16 (Simonyan and Zisserman 2014), EfficientNet (Tan and Le 2019), and MobileNetV2 (Sandler et al. 2018). Adopting the method of fine tuning helped in the fast convergence of these models and resulted in high accuracy in smoke detection. A fine-tuned VGG16 (Simonyan and Zisserman 2014) network was developed by Khan et al. (2019) that detected smoke efficiently in normal and foggy environment. But this was achieved at the cost of 134 million parameters and accordingly high computational requirement. Recently, Sathishkumar et al. (2023) analyzed the comparative performance of smoke detection models developed with pretrained classification networks using a transfer learning approach. They used three benchmark networks―VGG16, InceptionV3 (Szegedy et al. 2016), and Xception (Chollet 2017)―for comparison and concluded that the Xception network with learning without forgetting gave the best results with 98.72% accuracy.

Muhammad et al. (2019) developed a lightweight network based on a fine-tuned MobileNetV2 to detect fire smoke in images captured under normal and challenging environmental conditions. Their model exhibited the classification accuracy of 98.17% and a false alarm rate of 1.18%, which was comparatively higher than earlier models. Khan et al. (2021) also developed a lightweight smoke detection model under a foggy environment. They deployed a fine-tuned EfficientNet for classification and DeepLabv3+ (Chen et al. 2018) for segmentation. They tested their segmentation model on images and videos downloaded from Google Images and YouTube, where their model’s prediction results were visually appealing. However, the model struggled to detect smoke during foggy conditions (Khan et al. 2021).

Almeida et al. (2022) have proposed a lightweight model with a shallow CNN for wildfire smoke detection. Their model is shown to identify smoke in images taken from normal camera devices and unmanned aerial vehicles. In daylight scenes, their model exhibits 98.97% detection accuracy but requires artificial light at nights (Almeida et al. 2022). Another recent CNN-based model by Ahmad et al. (2023) detects smoke with 98.42% accuracy under normal environmental conditions. However, the authors have not tested their model’s performance in challenging weather conditions. Some ViT-based models have also emerged in recent years. Although the models are shown to achieve high accuracy, important challenges like generalizability and early smoke detection have not been explored in these works (Khan et al. 2023; Cheng et al. 2023; Chen et al. 2022).

Detection of wildfire smoke using remote sensing data like satellite captured images poses several challenges. Objects of similar appearance like clouds, haze, dense fog, and other background objects make it difficult for automatic feature extraction of smoke (Ba et al. 2019; Chen et al. 2021). Due to the scarcity of labeled data, it remains a challenge to develop efficient AI models that address three important issues―high false alarm, memory and computational needs, and generalizability of the prediction performance of AI models. Therefore, data collection and preprocessing are also an important component of AI research.

In this paper, a lightweight network architecture is proposed that addresses some of the aforementioned challenges. It is remarkable that with only 0.6 million parameters and 0.4 billions of floating point operations per second (known as GFLOPs), the model performed better than seven recently introduced smoke detection models. The proposed model also demonstrates its efficacy in detecting small-sized smoke covering only 2% area of an image. Additionally, the model demonstrates better results in the evaluation of its generalizability through cross-data validation, as compared to the competing models.

Material and methods

In this section, the details of the network architecture of the proposed wildfire smoke detection model are presented. For the development and evaluation of the deep learning model, four datasets are used that contain images of outdoor environments captured by satellite cameras or remotely installed cameras on other devices. We begin with a description of the datasets used in the study.

Datasets

Out of four datasets―IIITDMJ_Smoke, USTC_SmokeRS, Khan et al. (2019), and He et al. (2021)―used in the present study, Khan et al. (2019) and He et al. (2021) datasets contain images from CCTV footage. IIITDMJ_Smoke and USTC_SmokeRS contain satellite camera images. Sample images from each of the datasets are displayed in Fig. 1. For model building, each dataset was split in the ratio 80% training, 10% validation, and 10% testing. This is a common strategy used in training machine learning and deep learning models. The idea is to provide enough experience of data to the model so that it learns various patterns from different categories of samples (Bishop 2006). The validation set is used to observe the model’s learning behavior and evaluate its performance after each epoch of training to avoid overfitting. The testing set is reserved to evaluate the model’s final performance after the training is completed. This set is never shown to the model during the training.

Fig. 1
figure 1

Sample images of datasets used in the work. a IIITDMJ_Smoke, b USTC_SmokeRS, c Khan et al. (2019), and d He et al. (2021) datasets

IIITDMJ_Smoke dataset

The majority of available datasets for developing smoke detection models are collected using CCTV or drone cameras, and there are only a few publicly available labeled datasets containing images captured by a satellite camera. These datasets are of relatively small size. Images captured using satellites can be extremely beneficial in developing models for forest monitoring and detecting early wildfires, as they cover large landscapes. Considering these points, IIITDMJ_Smoke dataset of satellite images is collected from MODIS satellite to build the proposed model and to help the research community working in the field.

The IIITDMJ_Smoke dataset contains 23,644 images of wildfire smoke and other images. To prepare the dataset, a total of 11,604 images were collected from MODIS satellite (NASA 2021) imagery: 4784 images containing smoke, 6820 clear images (without smoke), and 564 hazy images without fog. The data was augmented by adding synthetic fog to the images, and in this way, the final dataset was created with images under four categories―(a) images without smoke (“clear”), (b) images with fog but without smoke (“foggy”), (c) images containing smoke (“smokey”), and (d) images with both smoke and fog (“foggy and smokey”). It may be mentioned that synthetic fog was also used in the existing literature (Khan et al. 2019; He et al. 2021) to create “foggy” and “foggy and smokey” images. Synthetic fog was not added in 992 clear images in which the fog made no difference. These images were mostly those covering snow, clouds, and water bodies.

The collected images are of resolutions ranging from \(210\times 211\) to \(1094\times 916\) pixels; each pixel covers 1 square kilometer of area. The images are obtained from diverse geographical locations. These include images from Canada, Australia, Russia, and the United States of America (USA). Images from the USA are from southern states like Washington, Idaho, San Francisco, California, Helena, Georgia, South Carolina, Florida, and Alabama. The vegetation type in California is covered with grasslands and woodlands. From Australia, the states of Victoria, New South Wales, and Queensland are covered. It may be mentioned that the coastal region of eastern Australia is prone to forest fires, especially in the regions of eucalyptus forests. The collected images contain land areas, water bodies, both land and water bodies. The land areas contain barren land and vegetation. To generate a dataset with a wide range of characteristics and challenging scenarios, images comprising clouds and smoke are also collected.

During the collection of the IIITDMJ_Smoke dataset, the focus was on gathering data of twenty-first century wildfires. The data was collected based on the occurrence of some major wildfire occurrences such as Russian wildfires in the years 2003 and 2015, Australian bushfires between the years 2002–2003, 2006–2007, 2011–2012, and 2019–2020, Canada wildfires in 2014 and 2017, California fire in 2020, and Bolivia fire in 2010. The vast areas of Amazon rainforests frequently burn due to wildfire. Thus, we have also covered wildfires that occurred in Brazil, Bolivia, Paraguay, and Peru in the year 2019. The use of the MODIS satellite imagery for data collection was based on its spatial properties, making it well-suited for various Earth observation applications.

All the images in the datasets are resized to 224 \(\times\) 224 and are preprocessed using the “Contrast Limited Adaptive Histogram Equalization” algorithm for image quality enhancement. The dataset is available on https://github.com/Image-and-Vision-Engineering-Group/IIITDMJ_Smoke-Dataset.

USTC_SmokeRS (Ba et al. 2019) dataset

The dataset described by Ba et al. (2019) consists of 6225 images of resolution \(256\times 256\) captured by MODIS satellite camera. The images fall into one of the six categories (a) smoke, (b) dust, (c) clouds, (d) haze, (e) seaside, and (f) land. For the present work, the dataset was modified for the classification into the four categories mentioned earlier. Original 1016 “smokey” images from the dataset were used, and synthetic fog was added by embedding fog into the images to create an equal number of images in the category―“foggy and smokey”. For “clear” category, a total of 1025 images were randomly selected from dust, cloud, seaside, and land categories of the original dataset. In place of “foggy” category, 1002 images of the haze category were taken. To increase the number of images, six data augmentation techniques, viz., rotation, zoom, vertical, horizontal flip, height, and width shift were used.

Both the satellite image datasets, IIITDMJ_Smoke and USTC_SmokeRS datasets, have their own distinct properties. The total number of images in IIITDMJ_Smoke dataset is more than that in USTC_SmokeRS dataset, making it more suitable in training these models with a total 23,644 images. The IIITDMJ_Smoke dataset is specifically created to detect smoke in normal and challenging environmental conditions, such as fog, heavy cloud cover, or haze. The dataset consists of other challenging scenarios as well, such as small smoke instances, water bodies, and other landscapes. The USTC_SmokeRS dataset used in the present work is a subset of the original USTC_SmokeRS dataset. It consists of images that are categorized based on the types of land covers. The USTC_SmokeRS dataset is also challenging due to the presence of clouds and other objects similar to smoke. This dataset is relatively smaller, but more balanced as compared to IIITDMJ_Smoke dataset.

Khan et al. (2019) dataset

The dataset consists of 72,012 outdoor environment images of varying resolution ranging from \(292\times 240\) to \(704\times 576\) extracted from videos captured using CCTV cameras. These images are classified into four categories, similar to the labels in IIITDMJ_Smoke dataset.

He et al. (2021) dataset

The images of this dataset are also captured using CCTV in an outdoor environment. The dataset consists of a total of 33,710 images with varying resolutions ranging from \(241\times 193\) to \(1280\times 720\) categorized into the above mentioned four categories with images captured under normal and foggy conditions. The background of the images has some challenging objects such as clouds, water vapor, and other smoke look-alike objects. Training and test images were merged together and then split into an 80:10:10 ratio of training, validation, and test set.

Khan et al. (2019) and He et al. (2021) datasets are large scale datasets, and the characteristics of these two are different from the satellite image datasets. The images in these datasets have background objects such as houses, garden, mountains, parking lot, and traffic. In essence, there are two different types of datasets: one type contains satellite images; the other type contains images captured by CCTV, smartphone, or handheld cameras. Table 1 presents the number of images in each category in the four datasets discussed above.

Table 1 Summary of datasets

To perform training and evaluation of the proposed model and its comparative analysis, an Nvidia DGX A100 machine was used. The system comprises four A100 GPU cards, with each card having a memory capacity of 40 GB. The setup includes an Ubuntu 18.04 LTS OS, 512 GB of RAM, and an AMD 7742 processor operating within the range of 2.25 to 3.4 GHz. Both the proposed and competing models were implemented in the Keras framework, along with NVIDIA CUDA version 11.5 and cuDNN library version 8.3.

Model architecture

The architecture of the proposed AI model for early smoke detection is shown in Fig. 2. Starting with a backbone network, the network is split into two parallel paths. One path is configured with the ViT blocks and the other one with CNN blocks. Both CNN and ViT have their own advantages in extracting salient features from an image. When an input image is passed through the backbone network and the two paths, its features are extracted by the two paths. The outputs are tensors of different dimensions. Both outputs are flattened and are further processed by three fully connected (FC or dense) layers for feature refinement. Then, the features from two branches are added and are further passed through two more FC layers and finally the softmax classification layer. The classification layer or the output layer predicts the class label of the input image in one of the four categories―“smokey,” “foggy and smokey,” “clear,” and “foggy.” In the following, details of each component of the model architecture are given.

Fig. 2
figure 2

Architectural design of the dual-path model for early smoke detection

The input image to the backbone network has three channels, and each channel contains 224 \(\times\) 224 pixels. In the backbone of the proposed model, a convolution layer followed by three bottleneck convolution blocks of MobileNetV2 is used. The bottleneck blocks are pretrained on the ImageNet dataset. The first two bottleneck blocks are used as in the original MobileNetV2, whereas only the expansion layer of the third bottleneck block is utilized. The backbone network produces an output tensor of size \(56\times 56\times 144\). The idea behind using the pretrained MobileNetV2 blocks is to fine tune the network and extract features from the input data that are further refined by the subsequent parallel networks.

The output features of the backbone network serve as the input to the two parallel paths (refer Fig. 2). In the ViT path, the features are split into non-overlapping patches of size \(7\times 7\), 7 pixels in horizontal and 7 pixels in vertical direction. As per the ViT processing protocol, the patches are flattened and linearly projected to generate a feature map of size \(64\times 32\) which is then processed by 4 successive transformer blocks. The self attention mechanism in each transformer block is configured with 4 Multi-Head Attention (MHA) units and a Multilayer Perceptron (MLP) containing 64 and 32 neurons, respectively, in its two hidden layers. The output tensor from the last transformer block is flattened and passed through three successive FC layers with 64, 32, and 16 neurons respectively.

As shown in Fig. 2, the CNN path of the proposed model is constituted with four residual convolution blocks, two residual blocks with conventional convolution layers, and the remaining two with depthwise convolution layers. The output of the backbone network is also given as an input to the first residual block with convolution (RBC) layers. This block consists of two modules in a sequence, each with a batch normalization layer, rectified linear unit (ReLU) activation layer, followed by a convolution layer. The convolution layers of this block consist of 64 filters each of filter size \(3\times 3\). The input of this block is projected into the corresponding output volume using a convolution layer with a filter of size \(1\times 1\) followed by a batch normalization layer. The output of the second convolution layer of this block is then added to the projected output. The output of the first RBC is given as an input to the first residual block with depthwise convolution (RBDC) layers. This block consists of two consecutive modules, each with a depthwise convolutional layer, BN, and a ReLU activation. The depthwise convolutional layers of this block consist of \(3\times 3\) filter size. A convolution layer with \(1\times 1\) filter size and a BN layer projects the input given to this block to match the output volume. The output generated from the second depthwise convolutional layer is then added to the projected output. This output is then given as an input to another set of RBC followed by RBDC. The output generated by RBDC of the second set is given to the global max pooling layer followed by three consecutive dense layers with 64, 32, and 16 neurons respectively.

The outputs generated by two parallel paths (ViT and CNN) are added to produce the feature vector that is processed by two FC layers with 16 and 8 neurons respectively. Finally, the classification layer contains 4 neurons. Training, validation, and testing of the proposed model were performed on four datasets including the self collected and prepared dataset IIITDMJ_Smoke dataset. Each dataset contained “smokey” and “clear” images under the normal and foggy environment. A batch size of 16 with categorical cross-entropy loss function was used for training the model with a learning rate of 0.0001 for IIITDMJ_Smoke and USTC_SmokeRS (Ba et al. 2019) datasets and 0.00001 for Khan et al. (2019) and He et al. (2021) datasets. The choice of the learning rate was based on the convergence of the loss function on each dataset. The model’s loss was locally optimized using these learning rates on the respective datasets. During the training, the model weights assigned to the neurons and filter values were initialized randomly, except for the weights of the filters in the backbone network, whose weights were initialized using the pretrained MobileNetV2 weights as mentioned before. In each epoch of the model training, the weights were updated using the ADAM optimizer.

Experiments for the performance evaluation and comparison with the state-of-the-art models were also performed, as discussed in the next section.

Experimental results

The proposed model has been evaluated on four datasets using well known classification performance measures. Furthermore, the performance of the proposed model is compared with seven state-of-the-art methods that are detailed in this section. The comparative performance results and the results of the ablation study performed on the necessity and suitability of different components of the model are also presented in this section.

Standard classification measures such as accuracy (Acc), precision (P), recall (R), F1 score, and false alarm rate (FAR) are used to evaluate and compare the performance of the proposed smoke detection model and the state-of-the-art models. The evaluation measures are defined using true positive, false positive, false negative, and true negative values. True positives represent instances that belong to the positive class and are accurately categorized as positive. False positives are the negative class samples identified as positive whereas false negatives are the positive class samples identified as negative. True negatives denote instances from the negative class that have been accurately recognized as negative.

Accuracy (Acc) is defined as the ratio of the total number of correct predictions to the total number of predictions. Precision (P) can be defined as the fraction of positive class samples correctly categorized as positive to the total number of samples predicted as positive. Recall can be defined as the fraction of positive class samples correctly classified as positive to the total number of samples actually belonging to the positive class. It signifies the ability of a model to accurately identify the total samples of a class. The F1 score is also used for the evaluation of classifiers, that is the harmonic mean of precision and recall. Since the metric considers both precision and recall together, it demonstrates the trade-off between the two. FAR is the proportion of incorrect positive predictions to the overall number of negative class samples.

The architecture of the proposed model was selected after conducting a number of experiments with different components. The structure of the ViT block was finalized using experiments with different patch sizes. For example, a patch size of x means that a patch of size x pixels \(\times \ x\) pixels is chosen to prepare the input to the ViT blocks. Patch sizes 5, 7, and 9 were used in the ViT path to select the best patch size. As the objective was to choose an appropriate lightweight architecture, MobileNetV2 (Sandler et al. 2018) blocks were preferred over other architectures for the backbone of the network. In the following section, the results of the experiments to evaluate the performance of the model are presented.

The model architecture was finalized based on an extensive set of experiments. The ablation study was performed by using only ViT, only RBC + RBDC paths, and then on a combination of both paths. Table 2 describes the results of the ablation study on four datasets. Initially, when only ViT path was used, accuracy scores of 99.03%, 90.98%, 99.97% and 99.91% were achieved on IIITDMJ_Smoke, USTC_SmokeRS (Ba et al. 2019), Khan et al. (2019), and He et al. (2021) datasets respectively.

Table 2 Results of ablation study

The sole contribution of the CNN path was also analyzed, where the network consisted of RBC + RBDC (Fig. 2). The architecture gave an accuracy of 97.21%, 93.17%, 99.90%, and 99.88% on IIITDMJ_Smoke, USTC_SmokeRS (Ba et al. 2019), Khan et al. (2019), and He et al. (2021) datasets respectively. To analyze the significance of the modules used in the CNN path, another set of experiments was performed where one path consisted of the ViT, and the parallel path of CNN consisted of 2 RBC blocks. The last set of the ablation study was performed by considering the parallel combination of ViT path and the CNN path with both RBC and RBDC blocks as described in Fig. 2. This combination achieved the best results on IIITDMJ_Smoke, USTC_SmokeRS (Ba et al. 2019), and He et al. (2021) datasets with the accuracy scores of 99.62%, 93.90%, and 99.91% respectively. The combination of ViT path in parallel with 2 RBC blocks achieved the second best accuracy score in the IIITDMJ_Smoke dataset with an accuracy of 99.41%, whereas the ViT path with 2 RBDC blocks achieved the second best results on USTC_SmokeRS (Ba et al. 2019) dataset with an accuracy of 93.66%. The model architecture as given in Fig. 2 was finalized based on the ablation study.

Table 3 Performance comparison of proposed model with state-of-the-art methods

The proposed model’s performance is compared with seven state-of-the-art models (Tao et al. 2016; Khan et al. 2019; Muhammad et al. 2019; Khan et al. 2021; Majid et al. 2022; Almeida et al. 2022; Ahmad et al. 2023). Table 3 presents the comparison of quantitative results of the state-of-the-art methods with the proposed method. The table shows that the proposed model performed better than the state-of-the-art methods and achieves an accuracy of 99.62%, 93.90%, 99.94%, and 99.91% on the aforementioned four datasets respectively. It is remarkable, that the proposed model outperforms in reducing the FAR on all four datasets. The combination of ViT and CNN is seen to be capable enough to identify smoke patterns in both CCTV as well as satellite camera captured images. The self-attention mechanism within the MHA block of ViT enhances the ability to capture smoke characteristics, complementing the feature extraction proficiency of CNN. This is one of the probable reasons for the performance enhancement of the proposed model.

One of the other closely performing model by Khan et al. (2019) is based on VGG16 (Simonyan and Zisserman 2014) and attains an accuracy of 99.41% on IIITDMJ_Smoke, 90.49% on USTC_SmokeRS (Ba et al. 2019), 99.35% on Khan et al. (2019), and 99.17% on He et al. (2021) dataset. However, their model is heavy in terms of the number of parameters and GFLOPs (see Table 6). Another closely performing model is by Muhammad et al. (2019) that has achieved an accuracy of 98.82%, 87.07%, 98.90%, and 96.65% on IIITDMJ_Smoke, USTC_SmokeRS (Ba et al. 2019), Khan et al. (2019), and He et al. (2021) datasets respectively. But this model suffers from a relatively higher FAR, as can be seen in Table 3. Results of this table are also shown using charts in Fig. 3.

Fig. 3
figure 3

Performance comparison graph of (a) Tao et al. (2016), (b) Khan et al. (2019), (c) Muhammad et al. (2019) (d) Khan et al. (2021), (e) Majid et al. (2022), (f) Almeida et al. (2022), (g) Ahmad et al. (2023), and (h) the proposed model when trained on (1) IIITDMJ_Smoke, (2)USTC_SmokeRS (Ba et al. 2019), (3) Khan et al. (2019), and (4) He et al. (2021) datasets

Discussion

The results presented in the previous section demonstrate the efficacy of the proposed smoke detection model using four datasets of different nature. However, the model’s performance needs to be analyzed with respect to its generalizability. Furthermore, for the applicability of the proposed model in resource constrained IoT systems, the model’s computational and memory demands also need to be analyzed. The model deployed on IoT devices such as drones can help in predicting real time environmental conditions for wildfire management. They can also be used in the assessment of the impact of fire hazards on the ecosystem of fire affected regions. The data collected and processed through these devices can be used for further analysis and rehabilitation programs. The fire-affected zones include various kinds of vegetation cover, such as grasslands and savannas. Within these regions, certain areas are prone to regular fires, while others suffer fires of varying intensities. Factors such as climatic changes, geography, and human activity determine the limits and boundaries of fire-vegetation interactions. Efficient fire management solutions aim to achieve an equilibrium between ecological well-being and the mitigation of fire hazards. The relationship between fire and vegetation cover involves recognizing fire as both a hazardous and regenerative source on ecosystems.

Another important consideration in the deployability of AI model in the real-world scenario is its reliability in detecting fire incidents in early stages, that is, when the fire smoke appears very small as compared to the image size. A further point of analysis is on the capability of the proposed model to understand input patterns and distinguish the patterns of different image categories. This section is devoted to the analysis of the model’s abilities and limitations in these aspects.

Early smoke detection

The model was tested for its capability in early smoke detection using satellite data. For the experiment, two datasets IIITDMJ_Smoke and USTC_SmokeRS (Ba et al. 2019) were used that contained satellite images. From the test sets of both datasets, images having small-sized smoke occupying only 2% of the total image area were considered to analyze the model’s performance. A total of 135 images were picked from the test set of IIITDMJ_Smoke with 62 images in “smokey” category and 73 images in “foggy and smokey” category. A total of 151 images were taken from the test set of IIITDMJ_Smoke consisting of 74 images in “clear” and 77 images in “foggy” category. Similarly, from USTC_SmokeRS (Ba et al. 2019) test set also, a total of 34 images were taken with 17 images in “smokey” category and 17 images in “foggy and smokey” category. A total of 34 images were taken with 17 images in “clear” category and 17 images in “foggy” category. The proposed model performed strikingly well in correctly identifying smoke images in both datasets, as compared to state-of-the-art models. The model achieved an accuracy of 99.30% on IIITDMJ_Smoke and 95.59% on USTC_SmokeRS (Ba et al. 2019) dataset. Table 4 lists the results of the early smoke detection performance of the competing models and the proposed model. These results are also presented in charts in Fig. 4.

Table 4 Small-sized smoke detection performance on datasets containing satellite images
Fig. 4
figure 4

Small-sized smoke detection performance comparison using graphs on datasets containing satellite images (a) Tao et al. (2016), (b) Khan et al. (2019), (c) Muhammad et al. (2019), (d) Khan et al. (2021), (d) Majid et al. (2022), (e) Almeida et al. (2022), (f) Ahmad et al. (2023), and (g) the proposed model on (1) IIITDMJ_Smoke and (2) USTC_SmokeRS (Ba et al. 2019) datasets

Generalizability

Vegetation cover varies significantly across different regions, making it challenging to generalize the prediction performance of an AI model. To analyze the generalizability, the cross-data validation of the model’s performance is most commonly used. Cross-data validation means testing the performance of a model trained on a dataset, say, D1 on another unseen dataset, say, D2. To analyze the robustness of the model on general unseen data, the model and all other competing models trained on IIITDMJ_Smoke were tested on USTC_SmokeRS (Ba et al. 2019), Khan et al. (2019) and He et al. (2021) datasets. Table 5 shows the results on these three datasets, and Fig. 5 shows the corresponding results using charts. The table shows that the proposed model and Tao et al. (2016) approach, both achieve an accuracy of 49.51% on USTC_SmokeRS (Ba et al. 2019) dataset. The model by Tao et al. (2016) has obtained the least FAR of 16.82% on USTC_SmokeRS (Ba et al. 2019) dataset, but it has a significantly high number of parameters (46.7 million) which can help the model learn complex features and generalize on unseen data. Although none of the models attain the desired level of precision or accuracy, the proposed model records the highest accuracy of 49.63% and 58.55% and lowest FAR of 16.83% and 13.82% on Khan et al. (2019) and He et al. (2021) datasets, respectively. It can be seen from the table that some state-of-the-art models perform competitively close. There is always a trade-off between the model’s generalization capability and its size. Further investigations are necessary to optimize the model’s size, computational requirements, and its generalizability for the acceptability of such AI models in real-world applications.

Table 5 Cross-data validation trained on IIITDMJ_Smoke dataset
Fig. 5
figure 5

Cross-data validation of (a) Tao et al. (2016), (b) Khan et al. (2019), (c) Muhammad et al. (2019) (d) Khan et al. (2021), (e) Majid et al. (2022), (f) Almeida et al. (2022), (g) Ahmad et al. (2023), and (h) the proposed model when trained on IIITDMJ_Smoke dataset and tested on (1) USTC_SmokeRS (Ba et al. 2019), (2) Khan et al. (2019), and (3) He et al. (2021) datasets

Memory and computational requirement

The number of parameters and FLOPs of a model is an essential criterion for determining its computational complexity. Table 6 describes the comparison of number of parameters (in millions (M)) and FLOP counts (in Giga) required by earlier methods and the proposed model. With only 0.6 M parameters and 0.4 GFLOPs, the proposed model attains superior performance with more than 99% accuracy on three datasets and 93% on the fourth dataset that has more challenging scenarios. It is indeed remarkable that the model also records the least FAR with \(\le 2\%\) on satellite image datasets, as compared to the state-of-the-art methods.

Table 6 Comparison of number of parameters (M) and GFLOPs

Effectiveness of feature learning

Similar and dissimilar extracted features between the samples of the same classes can be visualized using the t-distributed stochastic neighbor embedding method. The main objective of t-distributed stochastic neighbor embedding algorithm is the dimensionality reduction by transforming the features into a lower dimension (i.e., 2D-plane) and clustering similar features. Figure 6 shows the t-distributed stochastic neighbor embedding plots of extracted features of the existing models and the proposed model on (1) IIITDMJ_Smoke, (2) USTC_SmokeRS (Ba et al. 2019), (3) Khan et al. (2019), and (4) He et al. (2021) datasets respectively. It is clear that the proposed model’s extracted features exhibit better separation, and therefore, the proposed model is able to clearly separate the features of different classes.

Fig. 6
figure 6

t-distributed stochastic neighbor embedding (t-SNE) plots for (1) IIITDMJ_Smoke, (2) USTC_SmokeRS (Ba et al. 2019), (3) Khan et al. (2019), and (4) He et al. (2021) on (a) Tao et al. (2016), (b) Khan et al. (2019), (c) Muhammad et al. (2019), (d) Khan et al. (2021), (e) Majid et al. (2022), (f) Almeida et al. (2022), (g) Ahmad et al. (2023), and (h) the proposed model

Conclusion

This work presents an ultra-lightweight dual path model architecture configured with parallel ViT and CNN for early smoke detection. The present study aims to reduce the wildfire risks at an early stage in normal and challenging environmental conditions using both CCTV captured images as well as satellite images. The model is built on a self collected satellite image dataset and also trained and tested on three other smoke classification datasets. With only 0.6 million parameters and 0.4 GFLOPs, the model not only demonstrates high accuracy and reduced false alarms, but it also shows significant improvement in detecting smoke at early stages. To the best of our knowledge, this is the first study in the direction of early stage smoke detection where a small-sized smoke covering less than 2% of the total image area is detected with high accuracy by the model, indicating its application potential for early alarm generation. The model’s efficiency in feature extraction is also demonstrated using t-SNE plots. The proposed model shows significant performance improvement at a low computational cost and with the least memory demands.

Being a lightweight architecture, the proposed model is suitable for deployment on resource constrained devices in the IoT-based systems for forest monitoring and fire smoke detection. The early alert generation by the model can help in taking preventive measures for protection of humans and wild animals from wildfire. A high precision fire smoke detection model can also be used for forest monitoring, managing, preventing, and mitigating fire hazards. Additionally, AI models that show better generalizability can be used in different geographical locations without compromise on the accuracy of fire smoke detection. Although the proposed model shows limitations in generalizing its predictive power, it has shown better performance than the existing AI models. More efforts are needed to analyze different AI models and improve their generalization performance. Not much attention is paid on this aspect and this could be a useful future direction of research.

Availability of data and materials

The IIITDMJ_Smoke dataset is available at https://github.com/Image-and-Vision-Engineering-Group/IIITDMJ_Smoke-Dataset, and USTC_SmokeRS is available at https://pan.baidu.com/s/1GBOE6xRVzEBV92TrRMtfWg. The Khan et al. (2019) dataset used in this work is available at D.O.I 10.1109/JIOT.2019.2896120, and the He et al. (2021) dataset is available at https://drive.google.com/drive/folders/1l0l7QH5lS8z8L MD-p6GX6kgZjzvhSYF.

References

  • Ahmad, K., M. S. Khan, F. Ahmed, M. Driss, W. Boulila, A. Alazeb, M. Alsulami, M. S. Alshehri, Y. Y. Ghadi, and J. Ahmad. 2023. FireXnet: an explainable AI-based tailored deep learning model for wildfire detection on resource-constrained devices. Fire Ecology 19 (1): 54.

    Article  Google Scholar 

  • Alam, T. 2021. Cloud-based IoT applications and their roles in smart cities. Smart Cities 4 (3): 1196–1219.

    Article  Google Scholar 

  • Almeida, J. S., C. Huang, F. G. Nogueira, S. Bhatia, and V. H. C. de Albuquerque. 2022. Edgefiresmoke: a novel lightweight CNN model for real-time video fire-smoke detection. IEEE Transactions on Industrial Informatics 18 (11): 7889–7898.

    Article  Google Scholar 

  • Almeida, J. S., S. K. Jagatheesaperumal, F. G. Nogueira, and V. H. C. de Albuquerque. 2023. Edgefiresmoke++: a novel lightweight algorithm for real-time forest fire detection and visualization using internet of things-human machine interface. Expert Systems with Applications 221:119747.

    Article  Google Scholar 

  • Ba, R., C. Chen, J. Yuan, W. Song, and S. Lo. 2019. SmokeNet: satellite smoke scene detection using convolutional neural network with spatial and channel-wise attention. Remote Sensing 11 (14): 1702.

    Article  Google Scholar 

  • Balch, J. K., D. C. Nepstad, P. M. Brando, L. M. Curran, O. Portela, O. de Carvalho Jr, and P. Lefebvre. 2008. Negative fire feedback in a transitional forest of southeastern Amazonia. Global Change Biology 14 (10): 2276–2287.

    Article  Google Scholar 

  • Barnes, L. R., E. C. Gruntfest, M. H. Hayden, D. M. Schultz, and C. Benight. 2007. False alarms and close calls: a conceptual model of warning accuracy. Weather and Forecasting 22 (5): 1140–1147.

    Article  Google Scholar 

  • Bishop, C. M. 2006. Pattern recognition and machine learning. Springer Google Schola 2:645–678.

    Google Scholar 

  • Charizanos, G., and H. Demirhan. 2023. Bayesian prediction of wildfire event probability using normalized difference vegetation index data from an Australian forest. Ecological Informatics 73:101899.

    Article  Google Scholar 

  • Chaturvedi, S., P. Khanna, and A. Ojha. 2022a. An efficient residual convolutional neural network with attention mechanism for smoke detection in outdoor environment. In International Conference on Computer Vision and Image Processing, 1–14. Springer Cham: Springer Nature Switzerland.

  • Chaturvedi, S., P. Khanna, and A. Ojha. 2022b. A survey on vision-based outdoor smoke detection techniques for environmental safety. ISPRS Journal of Photogrammetry and Remote Sensing 185:158–187.

  • Chen, S., Y. Cao, X. Feng, and X. Lu. 2021. Global2Salient: self-adaptive feature aggregation for remote sensing smoke detection. Neurocomputing 466:202–220.

    Article  Google Scholar 

  • Chen, S., W. Li, Y. Cao, and X. Lu. 2022. Combining the convolution and transformer for classification of smoke-like scenes in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 60:1–19.

    CAS  Google Scholar 

  • Chen, L. C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), 801–818. Switzerland: Springer Nature.

  • Cheng, G., Y. Zhou, S. Gao, Y. Li, and H. Yu. 2023. Convolution-enhanced vision transformer network for smoke recognition. Fire Technology 59 (2): 925–948.

    Article  Google Scholar 

  • Chollet, F. 2017. Xception: deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1251–1258. USA: IEEE.

  • Chuvieco, E., S. Martínez, M. V. Román, S. Hantson, and M. L. Pettinari. 2014. Integration of ecological and socio-economic factors to assess global vulnerability to wildfire. Global Ecology and Biogeography 23 (2): 245–258.

    Article  Google Scholar 

  • Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al. 2020. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." In International Conference on Learning Representations.

  • El-Madafri, I., M. Peña, and N. Olmedo-Torre. 2023. The wildfire dataset: enhancing deep learning-based forest fire detection with a diverse evolving open-source dataset focused on data representativeness and a novel multi-task learning approach. Forests 14 (9): 1697.

    Article  Google Scholar 

  • Filkov, A. I., T. Ngo, S. Matthews, S. Telfer, and T. D. Penman. 2020. Impact of Australia’s catastrophic 2019/20 bushfire season on communities and environment. retrospective analysis and current trends. Journal of Safety Science and Resilience 1 (1): 44–56.

    Article  Google Scholar 

  • Filonenko, A., L. Kurnianggoro, and K. H. Jo. 2017. Comparative study of modern convolutional neural networks for smoke detection on image data. In 2017 10th international conference on human system interactions (HSI), 64–68. USA: IEEE.

  • FSI. 2021. Forest fire activities. https://fsi.nic.in/forest-fire-activities. Accessed 6 Apr 2024.

  • Gajendiran, K., S. Kandasamy, and M. Narayanan. 2023. Influences of wildfire on the forest ecosystem and climate change: a comprehensive study. Environmental Research 240:117537.

  • Garcês, A., and I. Pires. 2023. The hell of wildfires: the impact on wildlife and its conservation and the role of the veterinarian. Conservation 3 (1): 96–108.

    Article  Google Scholar 

  • Giannakidou, S., P. Radoglou-Grammatikis, T. Lagkas, V. Argyriou, S. Goudos, E. K. Markakis, and P. Sarigiannidis. 2024. Leveraging the power of Internet of Things and artificial intelligence in forest fire prevention, detection, and restoration: a comprehensive survey. Internet of Things 26:101171.

    Article  Google Scholar 

  • Giorgis, M. A., S. R. Zeballos, L. Carbone, H. Zimmermann, H. von Wehrden, R. Aguilar, A. E. Ferreras, P. A. Tecco, E. Kowaljow, F. Barri, et al. 2021. A review of fire effects across South American ecosystems: the role of climate and time since fire. Fire Ecology 17:1–20.

    Google Scholar 

  • Haque, M. K., M. A. K. Azad, M. Y. Hossain, T. Ahmed, M. Uddin, and M. M. Hossain. 2021. Wildfire in Australia during 2019–2020, its impact on health, biodiversity and environment with some proposals for risk management: a review. Journal of Environmental Protection 12 (6): 391–414.

    Article  Google Scholar 

  • He, L., X. Gong, S. Zhang, L. Wang, and F. Li. 2021. Efficient attention based deep fusion CNN for smoke detection in fog environment. Neurocomputing 434:224–238.

    Article  Google Scholar 

  • Hu, P., R. Tanchak, and Q. Wang. 2024. Developing risk assessment framework for wildfire in the United States–a deep learning approach to safety and sustainability. Journal of Safety and Sustainability 1 (1): 26–41.

  • Jadon, A., M. Omama, A. Varshney, M. S. Ansari, and R. Sharma. 2019. FireNet: a specialized lightweight fire & smoke detection model for real-time IoT applications. arXiv preprint arXiv:1905.11922.

  • Jodhani, K. H., H. Patel, U. Soni, R. Patel, B. Valodara, N. Gupta, A. Patel, and P. J. Omar. 2024. Assessment of forest fire severity and land surface temperature using Google Earth Engine: a case study of Gujarat State, India. Fire Ecology 20 (1): 23.

    Article  Google Scholar 

  • Kaur, H., S. K. Sood, and M. Bhatia. 2020. Cloud-assisted green IoT-enabled comprehensive framework for wildfire monitoring. Cluster Computing 23 (2): 1149–1162.

    Article  Google Scholar 

  • Keeley, J. E., and A. D. Syphard. 2021. Large California wildfires: 2020 fires in historical context. Fire Ecology 17:1–11.

    Article  Google Scholar 

  • Keenan, R. J., G. A. Reams, F. Achard, J. V. de Freitas, A. Grainger, and E. Lindquist. 2015. Dynamics of global forest area: results from the FAO Global Forest Resources Assessment 2015. Forest Ecology and Management 352:9–20.

    Article  Google Scholar 

  • Khan, R. A., A. Hussain, U. I. Bajwa, R. H. Raza, and M. W. Anwar. 2023. Fire and smoke detection using capsule network. Fire Technology 59 (2): 581–594.

    Article  Google Scholar 

  • Khan, S., K. Muhammad, T. Hussain, J. Del Ser, F. Cuzzolin, S. Bhattacharyya, Z. Akhtar, and V. H. C. de Albuquerque. 2021. Deepsmoke: deep learning model for smoke detection and segmentation in outdoor environments. Expert Systems with Applications 182:115125.

    Article  Google Scholar 

  • Khan, S., K. Muhammad, S. Mumtaz, S. W. Baik, and V. H. C. de Albuquerque. 2019. Energy-efficient deep CNN for smoke detection in foggy IoT environment. IEEE Internet of Things Journal 6 (6): 9237–9245.

    Article  Google Scholar 

  • Kim, S. J., C. H. Lim, G. S. Kim, J. Lee, T. Geiger, O. Rahmati, Y. Son, and W. K. Lee. 2019. Multi-temporal analysis of forest fire probability using socio-economic and environmental variables. Remote Sensing 11 (1): 86.

    Article  Google Scholar 

  • Kim, S. Y., and A. Muminov. 2023. Forest fire smoke detection based on deep learning approaches and unmanned aerial vehicle images. Sensors 23 (12): 5702.

    Article  PubMed  PubMed Central  Google Scholar 

  • Krizhevsky, A., I. Sutskever, and G. E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60 (6): 84–90.

    Article  Google Scholar 

  • Liu, Y., W. Qin, K. Liu, F. Zhang, and Z. Xiao. 2019. A dual convolution network using dark channel prior for image smoke classification. IEEE Access 7:60697–60706.

    Article  Google Scholar 

  • MacCarthy, J., A. Tyukavina, M. Weisse, and N. Harris. 2024. World Resources Institute. https://www.wri.org/insights/canada-wildfire-emissions. Accessed 7 Apr 2024.

  • Majid, S., F. Alenezi, S. Masood, M. Ahmad, E. S. Gündüz, and K. Polat. 2022. Attention based CNN model for fire detection and localization in real-world images. Expert Systems with Applications 189:116114.

    Article  Google Scholar 

  • Miller, J. D., E. E. Knapp, C. H. Key, C. N. Skinner, C. J. Isbell, R. M. Creasy, and J. W. Sherlock. 2009. Calibration and validation of the relative differenced Normalized Burn Ratio (RdNBR) to three measures of fire severity in the Sierra Nevada and Klamath Mountains, California, USA. Remote Sensing of Environment 113 (3): 645–656.

    Article  Google Scholar 

  • Mishra, B., S. Panthi, S. Poudel, and B. R. Ghimire. 2023. Forest fire pattern and vulnerability mapping using deep learning in Nepal. Fire Ecology 19 (1): 3.

    Article  Google Scholar 

  • Muhammad, K., S. Khan, V. Palade, I. Mehmood, and V. H. C. De Albuquerque. 2019. Edge intelligence-assisted smoke detection in foggy surveillance environments. IEEE Transactions on Industrial Informatics 16 (2): 1067–1075.

    Article  Google Scholar 

  • Myagmar-Ochir, Y., and W. Kim. 2023. A survey of video surveillance systems in smart city. Electronics 12 (17): 3567.

    Article  Google Scholar 

  • Namozov, A., and Y. Im Cho. 2018. An efficient deep learning algorithm for fire and smoke detection with limited data. Advances in Electrical and Computer Engineering 18 (4): 121–128.

    Article  Google Scholar 

  • NASA. 2021. Worldview Earth Data. https://worldview.earthdata.nasa.gov. Accessed 15 Mar 2021.

  • Pacificbio, P. B. I. 2024. Fire Ecology. https://www.pacificbio.org/initiatives/fire/fire_ecology.html. Accessed 9 Apr 2024.

  • Palaiologou, P., A. A. Ager, M. Nielsen-Pincus, C. R. Evers, and M. A. Day. 2019. Social vulnerability to large wildfires in the Western USA. Landscape and Urban Planning 189:99–116.

    Article  Google Scholar 

  • Parkins, K., A. York, and J. Di Stefano. 2018. Edge effects in fire-prone landscapes: ecological importance and implications for fauna. Ecology and Evolution 8 (11): 5937–5948.

    Article  PubMed  PubMed Central  Google Scholar 

  • Paveglio, T. B., T. Prato, C. Edgeley, and D. Nalle. 2016. Evaluating the characteristics of social vulnerability to wildfire: demographics, perceptions, and parcel characteristics. Environmental Management 58:534–548.

    Article  PubMed  Google Scholar 

  • Prior, T., and C. Eriksen. 2013. Wildfire preparedness, community cohesion and social-ecological systems. Global Environmental Change 23 (6): 1575–1586.

    Article  Google Scholar 

  • Renard, Q., R. Pélissier, B. Ramesh, and N. Kodandapani. 2012. Environmental susceptibility model for predicting forest fire occurrence in the Western Ghats of India. International Journal of Wildland Fire 21 (4): 368–379.

    Article  Google Scholar 

  • RF. 2023. Amazon rainforest fires. https://rainforestfoundation.org/engage/brazil-amazon-fires/. Accessed 7 Apr 2024.

  • Saha, S., B. Bera, P. K. Shit, S. Bhattacharjee, and N. Sengupta. 2023. Prediction of forest fire susceptibility applying machine and deep learning algorithms for conservation priorities of forest resources. Remote Sensing Applications: Society and Environment 29:100917.

    Article  Google Scholar 

  • Sandler, M., A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen 2018. MobileNetV2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4510–4520. USA: IEEE.

  • Sathishkumar, V. E., J. Cho, M. Subramanian, and O. S. Naren. 2023. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecology 19 (1): 9.

    Article  Google Scholar 

  • Simonyan, K., and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR 2015).

  • Szegedy, C., V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2818–2826. USA: IEEE.

  • Tan, M. and Q. Le 2019. EfficientNet: rethinking model scaling for convolutional neural networks. In International conference on machine learning, 6105–6114. California: PMLR.

  • Tao, C., J. Zhang, and P. Wang. 2016. Smoke detection based on deep convolutional neural networks. In 2016 International conference on industrial informatics-computing technology, intelligent technology, industrial information integration (ICIICII), 150–153. USA: IEEE.

  • Trancoso, R., J. Syktus, A. Salazar, M. Thatcher, N. Toombs, K. K. H. Wong, E. Meijaard, D. Sheil, and C. A. McAlpine. 2022. Converting tropical forests to agriculture increases fire risk by fourfold. Environmental Research Letters 17 (10): 104019.

    Article  Google Scholar 

  • Wang, K., Y. Fu, S. Zhou, R. Zhou, G. Wen, F. Zhou, L. Li, and G. Qi. 2023. Cloud-fog-based approach for smart wildfire monitoring. Simulation Modelling Practice and Theory 127:102791.

    Article  Google Scholar 

  • Wasserman, T. N., and S. E. Mueller. 2023. Climate influences on future fire severity: a synthesis of climate-fire interactions and impacts on fire regimes, high-severity fire, and forests in the western United States. Fire Ecology 19 (1): 43.

    Article  Google Scholar 

  • Yin, Z., B. Wan, F. Yuan, X. Xia, and J. Shi. 2017. A deep normalization and convolutional neural network for image smoke detection. IEEE Access 5:18429–18438.

    Article  Google Scholar 

  • Yin, H. and Y. Wei. 2019. An improved algorithm based on convolutional neural network for smoke detection. In 2019 IEEE International Conferences on Ubiquitous Computing & Communications (IUCC) and Data Science and Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS), 207–211. USA: IEEE.

  • Yin, H., Y. Wei, H. Liu, S. Liu, C. Liu, and Y. Gao. 2020. Deep convolutional generative adversarial network and convolutional neural network for smoke detection. Complexity 1:6843869.

Download references

Acknowledgements

Not applicable.

Funding

This research received no specific grant from any funding agency.

Author information

Authors and Affiliations

Authors

Contributions

Shubhangi Chaturvedi: model development, methodology and experimentation, writing―original draft. Chandravanshi Shubham Arun: data collection. Poornima Singh Thakur: editing help. Pritee Khanna: supervision, review and editing of the manuscript. Aparajita Ojha: supervision, review and editing of the manuscript.

Corresponding author

Correspondence to Aparajita Ojha.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaturvedi, S., Shubham Arun, C., Singh Thakur, P. et al. Ultra-lightweight convolution-transformer network for early fire smoke detection. fire ecol 20, 83 (2024). https://doi.org/10.1186/s42408-024-00304-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42408-024-00304-9

Keywords