Abstract:Due to the limitations of computing power and memory, it is difficult to apply traditional deep learning (DL) models to the Internet of Things (IoT) edge devices for automatic modulation recognition (AMR). To effectively combine the advantages of convolutional neural networks (CNNs) and visual Transformer networks, a combined convolutional and Transformer network model called ICTNet (Internet of Things CNN Transformer network) was introduced for IoT edge devices. ICTNet not only has the advantage of Transformer to capture feature long-range dependencies, but also utilizes the strengths of CNNs to extract local feature information, reducing model size while increasing modulation recognition accuracy. The average recognition accuracies of the ICTNet model in the RadioML2016.10a, RadioML 2016.10b and RadioML2016.04c datasets are 61.51%, 64.18% and 71.96% respectively. Moreover, ICTNet processes each signal sample on a typical edge device in nearly 0.01 ms, and is much smaller than existing DL-AMR models, with only 29 455 parameters.