文本对抗攻击工具：OpenAttack试用

aipwn

Oct 07, 2018

昨天，清华大学自然语言处理与社会人文计算实验室开源了一个针对文本对抗攻击的工具包，名字叫OpenAttack。

官方介绍：OpenAttack：文本对抗攻击工具包

项目网址：https://github.com/thunlp/OpenAttack

今天体验了一下，带来一些使用介绍。

安装

推荐使用源码安装

git clone https://github.com/thunlp/OpenAttack.git  
cd  OpenAttack
python setup.py install

运行攻击demo

官方提供了一个demo文件，让大家可以尝试攻击操作。

运行发现使用了NLTK和NLTK的扩展数据包，NLTK的数据包是出了名的大和难下，怎么办呢？

除了使用科学上网外，可以用以下的方法来手动下载数据包。

import nltk
nltk.download()

把下面Download Directory记下来，去

https://github.com/nltk/nltk_data/tree/gh-pages/packages

把packages(可以只下载你需要的包，比如：这次demo需要vader_lexicon，需要保持目录结构不变)下载到本地这个目录下，改名为nltk_data就行了。

数据下载好了，我们来看下demo的代码。

import OpenAttackimport nltkfrom nltk.sentiment.vader import SentimentIntensityAnalyzerimport numpy as npfrom tqdm import tqdm

def make_model():    class MyClassifier(OpenAttack.Classifier):        def __init__(self):            try:                self.model = SentimentIntensityAnalyzer()            except LookupError:                nltk.download('vader_lexicon')                self.model = SentimentIntensityAnalyzer()                    def get_prob(self, input_):            ret = []            for sent in input_:                res = self.model.polarity_scores(sent)                prob = (res["pos"] + 1e-6) / (res["neg"] + res["pos"] + 1e-6)                ret.append(np.array([1 - prob, prob]))            return np.array(ret)    return MyClassifier()

def main(): print("New Attacker") attacker = OpenAttack.attackers.PWWSAttacker() print("Build model") clsf = make_model() dataset = OpenAttack.DataManager.loadDataset("SST.sample")[:1000] print("Start attack") options = { "success_rate": True, "fluency": False, "mistake": False, "semantic": False, "levenstein": True, "word_distance": False, "modification_rate": True, "running_time": True, "invoke_limit": 500, "average_invoke": True } attack_eval = OpenAttack.attack_evals.InvokeLimitedAttackEval(attacker, clsf, **options ) attack_eval.eval(dataset, visualize=True) if __name__ == "__main__": main()

先把34行中的10改为1000，我们来测试一下。

dataset = OpenAttack.DataManager.loadDataset("SST.sample")[:1000]

结果：

+===========================================+| Summary |+===========================================+| Total Attacked Instances: | 1000 || Successful Instances: | 744 || Attack Success Rate: | 0.744 || Avg. Levenshtein Edit Distance: | 14.816 || Avg. Word Modif. Rate: | 0.16719 || Avg. Victim Model Queries: | 139.27 || Avg. Running Time: | 0.10569 |+===========================================+

可以看到，攻击成功率为74.4%，我们看代码，发现是用vader_lexicon训练了一个情感分类的模型，然后用PWWS这个攻击模型来攻击。

OpenAttack支持13种攻击模型：

可以看到，PWWS主要是基于贪心的词替换。

现在来看几个攻击成功的例子：

同义词替换

攻击失败的例子：

我在比赛分析篇(1)：文本分类对抗攻击中，提到基本上都是基于字词的扰动来对原样本进行攻击，OpenAttack的架构里面有TextProcessor模块提供了tokenization、lemmatization、词义消歧、命名实体识别等文本预处理的功能，而Substitute模型是各种词、字替换方法。