Privacy preservation experiment based on random masking and adversarial training for text representation

WU Zhouting; LUO Senlin

doi:10.16791/j.cnki.sjg.2023.08.011

您当前所在位置：首页> 文献列表> 基于随机掩码和对抗训练的文本隐私保护实验

2023, 08, v.40 72-76

基于随机掩码和对抗训练的文本隐私保护实验

吴舟婷罗森林

1.北京理工大学信息与电子学院

基金项目(Foundation): 国家242信息安全专项（2019A021,2020A065）

邮箱(Email):

DOI: 10.16791/j.cnki.sjg.2023.08.011

投稿时间： 2023-03-31

投稿日期（年）： 2023

终审时间： 2023-04-12

终审日期（年）： 2023

审稿周期（年）： 1

发布时间： 2023-09-15

出版时间： 2023-09-15

网络发布时间： 2023-09-15

移动端阅读

119	0	285
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

针对深度学习文本表示隐私保护面临可用性与隐私性难以平衡的问题，该文提出一种基于随机掩码和对抗训练的文本表示隐私保护算法RMAT。该算法首先对原始输入文本序列做随机掩盖，之后注入差分隐私噪声，并结合模拟攻击器与任务分类器间的对抗训练，实现深度学习文本表示的隐私脱敏。文章通过理论推导证明了算法满足差分隐私要求，并用5个公开数据集的实验结果验证了算法在提供完备隐私保障的同时提升了脱敏文本的可用性。通过本项实验，学生不仅对深度学习文本表示模型面临的安全风险有了更清晰的认识，还提升了利用深度学习方法分析和解决安全问题的能力。

关键词： 隐私安全; 文本表示; 差分隐私; 对抗训练;

Abstract：

To address the problem of striking the privacy-utility balance for the privacy protection of deep-learning based text representation, this paper proposes a privacy preservation algorithm for text representation based on random mask and adversarial training. The algorithm first masks the original input text sequence randomly, and then injects differential privacy noise, and combines the adversarial training between the simulated attacker and the task classifier to realize the privacy preservation of deep learning text representation. Through theoretical derivation, the paper proves that the algorithm meets the differential privacy requirements, and verifies that the algorithm improves the usability of desensitized text while providing complete privacy protection with experimental results of five public datasets. Through this experiment, students not only have a clearer understanding of the security risks faced by the deep-learning text representation model, but also improve their ability to analyze and solve security problems by using the deep learning method.

KeyWords： privacy security; text representation; differential privacy; adversarial training;

如需获取全文，请访问cnki.net

参考文献

[1] DEVLIN J, CHANG M, LEE K, et al. BERT:Pre-training of deep bidirectional transformers for language understanding:Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Volume 1(Long and Short Papers)[C].Minneapolis, Minnesota:Association for Computational Linguistics,2019:4171–4186.

[2]岳增营，叶霞，刘睿珩.基于语言模型的预训练技术研究综述[J].中文信息学报，2021, 35(9):15–29.

[3] COAVOUX M, NARAYAN S, COHEN S. Privacy-preserving neural representations of text:Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing[C].Brussels, Belgium:Association for Computational Linguistics,2018:1–10.

[4] PAN X, ZHANG M, JI S, et al. Privacy risks of general-purpose language models:Proceedingsof the 2020 IEEE Symposium on Security and Privacy(SP)[C]. San Francisco, CA, USA:IEEE Press. 2020:1314–1331.

[5]谭作文，张连福.机器学习隐私保护研究综述[J].软件学报，2020, 31(7):2127–2156.

[6]郑海斌，陈晋音，章燕等.面向自然语言处理的对抗攻防与鲁棒性分析综述[J].计算机研究与发展，2021, 58(8):1727–1750.

[7] LI Y, BALDWIN T, COHN T. Towards robust and privacypreserving text representations:Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers)[C]. Melbourne, Australia:Association for Computational Linguistics, 2018:25–30.

[8] SONG C, RAGHUNATHAN A. Information leakage in embedding models:Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security(CCS'20)[C]. New York,NY, USA:Association for Computing Machinery, 2020:377–390.

[9]纪守领，杜天宇，李进锋等.机器学习模型安全与隐私研究综述[J].软件学报，2021, 32(1):41–67.

[10] XIE Q, DAI Z, DU Y, et al. Controllable invariance through adversarial feature learning:Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS'17)[C].NY, USA:Curran Associates Inc., Red Hook. 2017:585–596.

[11] ELAZAR Y, GOLDBERG Y. Adversarial removal of demographic attributes from text data:Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing[C].Brussels, Belgium:Association for Computational Linguistics.2018:11–21.

[12] FEYISETAN O, BALLE B, DRAKE T, et al. Privacy-and utilitypreserving textual analysis via calibrated multivariate perturbations:Proceedings of the 13th International Conference on Web Search and Data Mining(WSDM'20)[C]. New York, NY, USA:Association for Computing Machinery. 2020:178–186.

[13] BASU S, CHOWDHURY R, GHOSH S, et al. Adversarial scrubbing of demographic information for text classification:Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing[C]. Online and Punta Cana, Dominican Republic:Association for Computational Linguistics. 2021:550–562.

[14] LYU L, HE X, LI Y. Differentially private representation for NLP:Formal guarantee and an empirical study on privacy and fairness:Findings of the Association for Computational Linguistics:EMNLP 2020[C]. Online:Association for Computational Linguistics,2020:2355–2365.

[15] PLANT R, GKATZIA D, GIUFFRIDA V. CAPE:Context-aware private embeddings for private language learning:Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing[C]. Online and Punta Cana, Dominican Republic:Association for Computational Linguistics.2021:7970–7978.

[16]李效光，李晖，李凤华，等.差分隐私综述[J].信息安全学报，2018, 3(5):92–104.

基本信息:

DOI：10.16791/j.cnki.sjg.2023.08.011

中图分类号:TP391.1

引用信息:

[1]吴舟婷,罗森林.基于随机掩码和对抗训练的文本隐私保护实验[J].实验技术与管理,2023,40(08):72-76.DOI:10.16791/j.cnki.sjg.2023.08.011.

基金信息:

国家242信息安全专项（2019A021,2020A065）

投稿时间：

2023-03-31

投稿日期（年）：

2023

终审时间：

2023-04-12

终审日期（年）：

2023

审稿周期（年）：

发布时间：

2023-09-15

出版时间：

2023-09-15

网络发布时间：

2023-09-15

请选择需要下载的pdf数据

实验技术与管理

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

实验技术与管理

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈