一种基于篡改训练数据的词级文本后门攻击方法

doi:10.12399/j.issn.2097-163x.2022.01.008

首页 > 过刊浏览>2022年第卷第1期 >81-89. DOI:10.12399/j.issn.2097-163x.2022.01.008

一种基于篡改训练数据的词级文本后门攻击方法
DOI:
                        10.12399/j.issn.2097-163x.2022.01.008
                    
作者:
                        
                        
                    
作者单位:
作者简介:邵堃(1994—),男,博士研究生,研究方向为信息对抗;
通讯作者:
中图分类号:TP311
基金项目:

A word-level textual backdoor attack method based on tampering with training data

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

后门攻击是针对深度神经网络模型的一种隐蔽安全威胁,在智能信息系统安全性测试等方面具有重要的研究价值。现有的字符级后门攻击存在两方面的问题:当被毒化的训练样本的源标签与目标标签一致时,后门攻击的效果不佳;插入的触发器与上下文相关性不强,会破坏原始输入的语义和流畅性。为了解决上述问题,提出了一种基于篡改训练数据的词级文本后门攻击方法。通过对抗扰动技术或隐藏重要词技术篡改少部分训练数据,使目标模型更容易学习到后门特征;在触发器的生成和添加部分,利用义原库向被攻击句子中添加相关性强的触发器。在标签一致的条件下,通过在2个基准模型上的大量实验,证明了所提出的攻击可以达到90%以上的成功率,并能生成更高质量的后门示例,其性能明显优于基线方法。

Abstract:

As a kind of insidious security threat against deep neural network models, research on backdoor attacks has great values in the security testing of intelligent information systems. The existing word-level backdoor attacks have two problems: Backdoor attacks do not work well when the source labels of the poisoned training samples are consistent with the target labels; The inserted triggers are context-free, so that the semantics and fluency of the original inputs may be destroyed. To solve the above problems, a word-level text backdoor attack method was proposed through tampering with training data. Firstly, a few training samples were tampered by the adversarial perturbation (AD) technique or hiding important words (HIW) technique to make the target model learn the backdoor features more easily; Secondly, the sememe library was used to add highly relevant triggers to the attacked sentences. Through extensive experiments on two benchmarks under the label-consistent condition, the proposed attack achieved more than 90% attack success rate, and generated backdoor examples with higher quality, which were obviously better than the baselines approach.

参考文献

相似文献

引证文献

引用本文

邵堃,杨俊安.一种基于篡改训练数据的词级文本后门攻击方法[J].信息对抗技术,2022,(1):81-89.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2022-07-11
出版日期:

首页

期刊介绍

作者指南

编委会

出版道德声明

学术诚信承诺

开放获取声明

联系我们

Rss

AI检索

引用本文

分享

文章指标

历史

文章二维码