Abstract:Facial expression recognition has received increasing attention in computer vision tasks. In real-world scenarios, facial expressions often contain a significant amount of noise introduced by factors such as pose, age, image quality, and annotation, which have greatly increased intra-class variation and have posed significant challenges for facial expression classification tasks. The existing researches addressing this problem often focus on the data itself, improving recognition capabilities by filtering or expanding the types of data accepted by the models, without considering the limitations of the convolutional networks in attending to image features. To address this issue, this paper proposed a convolutional neural network(CNN) based on local spatial feature guidance. It emphasizes certain pixels in the feature maps, enabling deep layers of the convolutional network to attend to multiple local facial regions that are effective for classification. Additionally, a re-labeling approach was employed to suppress noise caused by label errors. The proposed method was tested on multiple publicly available facial expression recognition datasets and has achieved better recognition performance compared to several existing methods.