一种基于篡改训练数据的词级文本后门攻击方法
作者:
作者单位:

作者简介:

邵堃(1994—),男,博士研究生,研究方向为信息对抗;

通讯作者:

中图分类号:

TP311

基金项目:


A word-level textual backdoor attack method based on tampering with training data
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    后门攻击是针对深度神经网络模型的一种隐蔽安全威胁,在智能信息系统安全性测试等方面具有重要的研究价值。现有的字符级后门攻击存在两方面的问题:当被毒化的训练样本的源标签与目标标签一致时,后门攻击的效果不佳;插入的触发器与上下文相关性不强,会破坏原始输入的语义和流畅性。为了解决上述问题,提出了一种基于篡改训练数据的词级文本后门攻击方法。通过对抗扰动技术或隐藏重要词技术篡改少部分训练数据,使目标模型更容易学习到后门特征;在触发器的生成和添加部分,利用义原库向被攻击句子中添加相关性强的触发器。在标签一致的条件下,通过在2个基准模型上的大量实验,证明了所提出的攻击可以达到90%以上的成功率,并能生成更高质量的后门示例,其性能明显优于基线方法。

    Abstract:

    As a kind of insidious security threat against deep neural network models, research on backdoor attacks has great values in the security testing of intelligent information systems. The existing word-level backdoor attacks have two problems: Backdoor attacks do not work well when the source labels of the poisoned training samples are consistent with the target labels; The inserted triggers are context-free, so that the semantics and fluency of the original inputs may be destroyed. To solve the above problems, a word-level text backdoor attack method was proposed through tampering with training data. Firstly, a few training samples were tampered by the adversarial perturbation (AD) technique or hiding important words (HIW) technique to make the target model learn the backdoor features more easily; Secondly, the sememe library was used to add highly relevant triggers to the attacked sentences. Through extensive experiments on two benchmarks under the label-consistent condition, the proposed attack achieved more than 90% attack success rate, and generated backdoor examples with higher quality, which were obviously better than the baselines approach.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2022-07-11
  • 出版日期: