Abstract
This work proposes a remote sensing image scene classification method based on stackable attention structure. This method utilizes a stackable attention structure to represent the feature information in remote sensing images as a linear function, changing the previous processing methods for global and local information in remote sensing images to avoid the need for memory intensive attention maps. Through experiments, it has been shown that the scene classification method we proposed based on stackable attention structure has higher accuracy and smaller model size than ResNet152. This provides a new approach for constructing remote sensing image scene classification models, especially for processing large-sized remote sensing images.