Arxiv Day: Article

Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness

Knowledge sharing about emerging threats is crucial in the rapidly advancing field of cybersecurity and forms the foundation of Cyber Threat Intelligence (CTI). In this context, Large Language Models are becoming increasingly significant in the field of cybersecurity, presenting a wide range of opportunities. This study surveys the performance of ChatGPT, GPT4all, Dolly, Stanford Alpaca, Alpaca-LoRA, Falcon, and Vicuna chatbots in binary classification and Named Entity Recognition (NER) tasks performed using Open Source INTelligence (OSINT). We utilize well-established data collected in previous research from Twitter to assess the competitiveness of these chatbots when compared to specialized models trained for those tasks. In binary classification experiments, Chatbot GPT-4 as a commercial model achieved an acceptable F1 score of 0.94, and the open-source GPT4all model achieved an F1 score of 0.90. However, concerning cybersecurity entity recognition, all evaluated chatbots have limitations and are less effective. This study demonstrates the capability of chatbots for OSINT binary classification and shows that they require further improvement in NER to effectively replace specially trained models. Our results shed light on the limitations of the LLM chatbots when compared to specialized models, and can help researchers improve chatbots technology with the objective to reduce the required effort to integrate machine learning in OSINT-based CTI tools.

Updated: 2024-03-13 23:51:13

标题: LLM聊天机器人在基于OSINT的网络威胁意识评估中的应用

摘要: 关于新兴威胁的知识分享在快速发展的网络安全领域至关重要，构成了网络威胁情报（CTI）的基础。在这种背景下，大型语言模型在网络安全领域变得日益重要，提供了广泛的机会。本研究调查了ChatGPT、GPT4all、Dolly、Stanford Alpaca、Alpaca-LoRA、Falcon和Vicuna聊天机器人在使用公开来源情报（OSINT）进行二进制分类和命名实体识别（NER）任务时的性能。我们利用之前从Twitter收集的成熟数据来评估这些聊天机器人与针对这些任务训练的专门模型的竞争力。在二进制分类实验中，商业模型Chatbot GPT-4达到了可接受的F1分数0.94，开源模型GPT4all达到了F1分数0.90。然而，在网络安全实体识别方面，所有评估的聊天机器人都存在局限性，效果较差。本研究展示了聊天机器人在OSINT二进制分类方面的能力，并显示它们需要在NER方面进一步改进才能有效取代专门训练的模型。我们的结果揭示了LLM聊天机器人与专门模型相比存在的局限性，并可以帮助研究人员改进聊天机器人技术，以降低将机器学习集成到基于OSINT的CTI工具中所需的努力。

更新时间: 2024-03-13 23:51:13

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2401.15127v2

Efficiently Computing Similarities to Private Datasets

Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) data structure which approximates $\sum_{x \in X} f(x,y)$ for any query $y$. We consider the cases where $f$ is a kernel function, such as $f(x,y) = e^{-\|x-y\|_2^2/\sigma^2}$ (also known as DP kernel density estimation), or a distance function such as $f(x,y) = \|x-y\|_2$, among others. Our theoretical results improve upon prior work and give better privacy-utility trade-offs as well as faster query times for a wide range of kernels and distance functions. The unifying approach behind our results is leveraging `low-dimensional structures' present in the specific functions $f$ that we study, using tools such as provable dimensionality reduction, approximation theory, and one-dimensional decomposition of the functions. Our algorithms empirically exhibit improved query times and accuracy over prior state of the art. We also present an application to DP classification. Our experiments demonstrate that the simple methodology of classifying based on average similarity is orders of magnitude faster than prior DP-SGD based approaches for comparable accuracy.

Updated: 2024-03-13 19:19:19

标题: 高效计算与私有数据集的相似性

摘要: 在不同ially私有模型训练中，许多方法依赖于计算查询点（如公共或合成数据）与私有数据之间的相似性。我们将这一常见子程序抽象出来，研究以下基本算法问题：给定相似性函数$f$和一个大的高维私有数据集$X \subset \mathbb{R}^d$，输出一个近似于$\sum_{x \in X} f(x,y)$的差分私有（DP）数据结构，用于任何查询$y$。我们考虑$f$是一个核函数的情况，例如$f(x,y) = e^{-\|x-y\|_2^2/\sigma^2}$（也称为DP核密度估计），或者是一个距离函数，如$f(x,y) = \|x-y\|_2$，等等。我们的理论结果改进了先前的工作，并为各种核函数和距离函数提供了更好的隐私-效用权衡以及更快的查询时间。我们结果背后的统一方法是利用我们研究的特定函数$f$中存在的“低维结构”，利用可证明的降维、逼近理论和函数的一维分解等工具。我们的算法在实验中展示了比先前最先进技术更好的查询时间和准确性。我们还提出了一个应用于DP分类的应用。我们的实验表明，基于平均相似性进行分类的简单方法比先前基于DP-SGD的方法快几个数量级，并具有可比较的精度。

更新时间: 2024-03-13 19:19:19

领域: cs.CR,cs.DS,cs.LG

下载: http://arxiv.org/abs/2403.08917v1

Acoustic Side Channel Attack on Keyboards Based on Typing Patterns

Acoustic side-channel attacks on keyboards can bypass security measures in many systems that use keyboards as one of the input devices. These attacks aim to reveal users' sensitive information by targeting the sounds made by their keyboards as they type. Most existing approaches in this field ignore the negative impacts of typing patterns and environmental noise in their results. This paper seeks to address these shortcomings by proposing an applicable method that takes into account the user's typing pattern in a realistic environment. Our method achieved an average success rate of 43% across all our case studies when considering real-world scenarios.

Updated: 2024-03-13 17:44:15

标题: 基于打字模式的键盘声学侧信道攻击

摘要: 键盘声音侧信道攻击可以绕过许多使用键盘作为输入设备的系统的安全措施。这些攻击旨在通过瞄准用户在键入时发出的声音来揭露用户的敏感信息。目前在这一领域中，大多数现有方法忽略了键入模式和环境噪音对结果的负面影响。本文旨在通过提出一个可行的方法来解决这些缺点，该方法考虑了用户在现实环境中的键入模式。在考虑真实世界情景时，我们的方法在所有案例研究中取得了43%的平均成功率。

更新时间: 2024-03-13 17:44:15

领域: cs.CR

下载: http://arxiv.org/abs/2403.08740v1

Randomized Kaczmarz in Adversarial Distributed Setting

Developing large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. In this paper, we propose an iterative approach that is adversary-tolerant for convex optimization problems. By leveraging simple statistics, our method ensures convergence and is capable of adapting to adversarial distributions. Additionally, the efficiency of the proposed methods for solving convex problems is shown in simulations with the presence of adversaries. Through simulations, we demonstrate the efficiency of our approach in the presence of adversaries and its ability to identify adversarial workers with high accuracy and tolerate varying levels of adversary rates.

Updated: 2024-03-13 17:11:20

标题: 在对抗性分布式环境中的随机Kaczmarz

摘要: 开发大规模分布式方法，对抗或损坏的工作者具有鲁棒性，是使这些方法在现实世界问题中变得实用的重要部分。在本文中，我们提出了一种迭代方法，用于凸优化问题的对抗容忍。通过利用简单的统计数据，我们的方法确保收敛并能够适应对抗性分布。此外，在存在对抗者的模拟中展示了所提方法解决凸问题的效率。通过模拟，我们展示了我们的方法在存在对抗者的情况下的效率，以及其识别具有高精度的对抗性工作者和容忍不同水平对抗率的能力。

更新时间: 2024-03-13 17:11:20

领域: math.OC,cs.CR,cs.LG,cs.NA,math.NA,65F20, 65F10, 65K10

下载: http://arxiv.org/abs/2302.14615v2

Physical Memory Attacks and a Memory Safe Management System for Memory Defense

Programming errors, defective hardware components (such as hard disk spindle defects), and environmental hazards can lead to invalid memory operations. In addition, less predictable forms of environmental stress, such as radiation, thermal influence, and energy fluctuations, can induce hardware faults. Sometimes, a soft error can occur instead of a complete failure, such as a bit-flip. The 'natural' factors that can cause bit-flips are replicable through targeted attacks that result in significant compromises, including full privileged system access. Existing physical defense solutions have consistently been circumvented shortly after deployment. We will explore the concept of a novel software-based low-level layer that can protect vulnerable memory targeted by physical attack vectors related to bit-flip vulnerabilities.

Updated: 2024-03-13 16:10:04

标题: 物理内存攻击与内存防御的内存安全管理系统

摘要: 编程错误、缺陷硬件组件（如硬盘主轴缺陷）和环境危害可能导致无效的内存操作。此外，不太可预测的环境压力形式，如辐射、热影响和能量波动，可能引起硬件故障。有时，软错误可能会发生，而不是完全故障，比如位翻转。可以通过定向攻击复制导致重大妥协的“自然”因素，包括完全特权系统访问。现有的物理防御解决方案在部署后不久就被不断规避。我们将探讨一种新颖的基于软件的低层层概念，该层可以保护容易受到与位翻转漏洞相关的物理攻击矢量针对的内存。

更新时间: 2024-03-13 16:10:04

领域: cs.CR,cs.OS

下载: http://arxiv.org/abs/2403.08656v1

Tight Group-Level DP Guarantees for DP-SGD with Sampling via Mixture of Gaussians Mechanisms

We give a procedure for computing group-level $(\epsilon, \delta)$-DP guarantees for DP-SGD, when using Poisson sampling or fixed batch size sampling. Up to discretization errors in the implementation, the DP guarantees computed by this procedure are tight (assuming we release every intermediate iterate).

Updated: 2024-03-13 16:08:01

标题: 混合高斯机制下采样的DP-SGD紧密的群体级DP保证

摘要: 我们提供了一个计算DP-SGD组级$(\epsilon, \delta)$-DP保证的过程，当使用泊松抽样或固定批量大小抽样时。在实现中的离散化误差之内，通过该过程计算的DP保证是紧密的（假设我们发布每个中间迭代）。

更新时间: 2024-03-13 16:08:01

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2401.10294v2

Continual Adversarial Defense

In response to the rapidly evolving nature of adversarial attacks against visual classifiers on a monthly basis, numerous defenses have been proposed to generalize against as many known attacks as possible. However, designing a defense method that generalizes to all types of attacks is not realistic because the environment in which defense systems operate is dynamic and comprises various unique attacks that emerge as time goes on. The defense system must gather online few-shot defense feedback to promptly enhance itself, leveraging efficient memory utilization. Therefore, we propose the first continual adversarial defense (CAD) framework that adapts to any attacks in a dynamic scenario, where various attacks emerge stage by stage. In practice, CAD is modeled under four principles: (1) continual adaptation to new attacks without catastrophic forgetting, (2) few-shot adaptation, (3) memory-efficient adaptation, and (4) high accuracy on both clean and adversarial images. We explore and integrate cutting-edge continual learning, few-shot learning, and ensemble learning techniques to qualify the principles. Experiments conducted on CIFAR-10 and ImageNet-100 validate the effectiveness of our approach against multiple stages of modern adversarial attacks and demonstrate significant improvements over numerous baseline methods. In particular, CAD is capable of quickly adapting with minimal feedback and a low cost of defense failure, while maintaining good performance against previous attacks. Our research sheds light on a brand-new paradigm for continual defense adaptation against dynamic and evolving attacks.

Updated: 2024-03-13 15:24:19

标题: 持续的对抗性防御

摘要: 随着对视觉分类器的对抗攻击在每月基础上迅速演变，已经提出了许多防御方法，以便尽可能地对抗所有已知攻击。然而，设计一种能够适用于所有类型攻击的防御方法并不现实，因为防御系统运行的环境是动态的，并包含随着时间推移而出现的各种独特攻击。防御系统必须收集在线的少样本防御反馈以迅速增强自身，利用高效的内存利用率。因此，我们提出了第一个持续对抗防御（CAD）框架，适应动态场景中的任何攻击，其中各种攻击逐阶段出现。在实践中，CAD基于四个原则建模：（1）持续适应新攻击而不发生灾难性遗忘，（2）少样本适应，（3）内存高效适应，（4）对干净和对抗图像都具有高精度。我们探索并整合了前沿的持续学习、少样本学习和集成学习技术来符合这些原则。在CIFAR-10和ImageNet-100上进行的实验验证了我们的方法对现代对抗攻击的多个阶段的有效性，并显示出与许多基线方法相比的显著改进。特别是，CAD能够快速适应并以最少的反馈和低成本的防御失败来维持对以前攻击的良好性能。我们的研究为持续适应动态和演进攻击的全新防御范式带来了新的启示。

更新时间: 2024-03-13 15:24:19

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.09481v2

Dr. Jekyll and Mr. Hyde: Two Faces of LLMs

Only a year ago, we witnessed a rise in the use of Large Language Models (LLMs), especially when combined with applications like chatbot assistants. Safety mechanisms and specialized training procedures are implemented to prevent improper responses from these assistants. In this work, we bypass these measures for ChatGPT and Bard (and, to some extent, Bing chat) by making them impersonate complex personas with opposite characteristics as those of the truthful assistants they are supposed to be. We start by creating elaborate biographies of these personas, which we then use in a new session with the same chatbots. Our conversation followed a role-play style to get the response the assistant was not allowed to provide. By making use of personas, we show that the response that is prohibited is actually provided, making it possible to obtain unauthorized, illegal, or harmful information. This work shows that by using adversarial personas, one can overcome safety mechanisms set out by ChatGPT and Bard. We also introduce several ways of activating such adversarial personas, altogether showing that both chatbots are vulnerable to this kind of attack. With the same principle, we introduce two defenses that push the model to interpret trustworthy personalities and make it more robust against such attacks.

Updated: 2024-03-13 14:52:47

标题: 杰克博士和海德先生：LLMs的两面性

摘要: 仅仅一年前，我们目睹了大型语言模型（LLMs）的使用增加，尤其是当它们与聊天机器人助手等应用程序结合使用时。为了防止这些助手产生不当回应，安全机制和专门的训练程序被实施。在这项工作中，我们通过让ChatGPT和Bard（以及在某种程度上，必应聊天）模拟与真实助手相反特征的复杂角色，绕过了这些措施。我们首先创建这些角色的详细传记，然后在同一聊天机器人的新会话中使用它们。我们的对话遵循角色扮演风格，以获得助手不允许提供的回应。通过使用角色，我们展示了实际提供了被禁止的回应，从而有可能获得未经授权、非法或有害信息。这项工作表明，通过使用对抗性角色，可以克服ChatGPT和Bard设定的安全机制。我们还介绍了几种激活这种对抗性角色的方法，共同展示了这种攻击方式对两个聊天机器人的脆弱性。基于相同原则，我们介绍了两种推动模型解释可信赖个性并使其更加强大抵抗此类攻击的防御措施。

更新时间: 2024-03-13 14:52:47

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.03853v2

A Sophisticated Framework for the Accurate Detection of Phishing Websites

Phishing is an increasingly sophisticated form of cyberattack that is inflicting huge financial damage to corporations throughout the globe while also jeopardizing individuals' privacy. Attackers are constantly devising new methods of launching such assaults and detecting them has become a daunting task. Many different techniques have been suggested, each with its own pros and cons. While machine learning-based techniques have been most successful in identifying such attacks, they continue to fall short in terms of performance and generalizability. This paper proposes a comprehensive methodology for detecting phishing websites. The goal is to design a system that is capable of accurately distinguishing phishing websites from legitimate ones and provides generalized performance over a broad variety of datasets. A combination of feature selection, greedy algorithm, cross-validation, and deep learning methods have been utilized to construct a sophisticated stacking ensemble classifier. Extensive experimentation on four different phishing datasets was conducted to evaluate the performance of the proposed technique. The proposed algorithm outperformed the other existing phishing detection models obtaining accuracy of 97.49%, 98.23%, 97.48%, and 98.20% on dataset-1 (UCI Phishing Websites Dataset), dataset-2 (Phishing Dataset for Machine Learning: Feature Evaluation), dataset-3 (Phishing Websites Dataset), and dataset-4 (Web page phishing detection), respectively. The high accuracy values obtained across all datasets imply the models' generalizability and effectiveness in the accurate identification of phishing websites.

Updated: 2024-03-13 14:26:25

标题: 一个精密的框架用于准确检测网络钓鱼网站

摘要: 网络钓鱼是一种日益复杂的网络攻击形式，正在给全球各地的企业造成巨大的财务损失，同时也危及个人的隐私。攻击者不断设计新的发动这种攻击的方法，而检测它们已成为一项艰巨的任务。许多不同的技术已被提出，每种技术都有其优缺点。虽然基于机器学习的技术在识别此类攻击方面取得了最大成功，但它们在性能和泛化方面仍存在不足。本文提出了一种全面的方法论来检测网络钓鱼网站。目标是设计一个能够准确区分网络钓鱼网站和合法网站，并在各种数据集上提供广泛性能的系统。结合了特征选择、贪婪算法、交叉验证和深度学习方法，构建了一个复杂的堆叠集成分类器。对四个不同的网络钓鱼数据集进行了大量实验，以评估所提技术的性能。所提出的算法在数据集-1（UCI网络钓鱼网站数据集）、数据集-2（用于机器学习的网络钓鱼数据集：特征评估）、数据集-3（网络钓鱼网站数据集）和数据集-4（网页网络钓鱼检测）上分别获得了97.49%、98.23%、97.48%和98.20%的准确率，超过了其他现有的网络钓鱼检测模型。跨数据集获得的高准确率值意味着模型的泛化能力和准确识别网络钓鱼网站的有效性。

更新时间: 2024-03-13 14:26:25

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2403.09735v1

An Extended View on Measuring Tor AS-level Adversaries

Tor provides anonymity to millions of users around the globe which has made it a valuable target for malicious actors. As a low-latency anonymity system, it is vulnerable to traffic correlation attacks from strong passive adversaries such as large autonomous systems (ASes). In preliminary work, we have developed a measurement approach utilizing the RIPE Atlas framework -- a network of more than 11,000 probes worldwide -- to infer the risk of deanonymization for IPv4 clients in Germany and the US. In this paper, we apply our methodology to additional scenarios providing a broader picture of the potential for deanonymization in the Tor network. In particular, we (a) repeat our earlier (2020) measurements in 2022 to observe changes over time, (b) adopt our approach for IPv6 to analyze the risk of deanonymization when using this next-generation Internet protocol, and (c) investigate the current situation in Russia, where censorship has been intensified after the beginning of Russia's full-scale invasion of Ukraine. According to our results, Tor provides user anonymity at consistent quality: While individual numbers vary in dependence of client and destination, we were able to identify ASes with the potential to conduct deanonymization attacks. For clients in Germany and the US, the overall picture, however, has not changed since 2020. In addition, the protocols (IPv4 vs. IPv6) do not significantly impact the risk of deanonymization. Russian users are able to securely evade censorship using Tor. Their general risk of deanonymization is, in fact, lower than in the other investigated countries. Beyond, the few ASes with the potential to successfully perform deanonymization are operated by Western companies, further reducing the risk for Russian users.

Updated: 2024-03-13 13:27:02

标题: 对AS级对手的Tor测量的扩展视角

摘要: Tor为全球数百万用户提供了匿名性，这使其成为恶意行为者的宝贵目标。作为一种低延迟的匿名系统，Tor易受来自强大被动对手（如大型自治系统）的流量相关攻击的威胁。在初步工作中，我们开发了一种利用RIPE Atlas框架的测量方法--这是一个全球超过11,000个探针的网络--来推断德国和美国的IPv4客户端在Tor网络中被去匿名化的风险。在本文中，我们将我们的方法应用于其他场景，提供了Tor网络中去匿名化潜力的更广泛画面。具体来说，我们（a）在2022年重复了我们早期（2020年）的测量结果以观察随时间的变化，（b）采用我们的方法来分析使用这种下一代互联网协议时去匿名化的风险，以及（c）调查了俄罗斯目前的情况，在乌克兰全面入侵后加剧了审查制度。根据我们的结果，Tor提供了一致质量的用户匿名性：虽然个别数字因客户端和目的地而异，但我们能够识别出有可能进行去匿名化攻击的自治系统。然而，对于德国和美国的客户端，自2020年以来整体情况并未发生变化。此外，协议（IPv4 vs. IPv6）并不显著影响去匿名化的风险。俄罗斯用户可以使用Tor安全地规避审查。事实上，他们的去匿名化风险比其他调查国家低。此外，有几个有潜力成功进行去匿名化攻击的自治系统由西方公司运营，进一步降低了俄罗斯用户的风险。

更新时间: 2024-03-13 13:27:02

领域: cs.NI,cs.CR,cs.CY

下载: http://arxiv.org/abs/2403.08517v1

MobileAtlas: Geographically Decoupled Measurements in Cellular Networks for Security and Privacy Research

Cellular networks are not merely data access networks to the Internet. Their distinct services and ability to form large complex compounds for roaming purposes make them an attractive research target in their own right. Their promise of providing a consistent service with comparable privacy and security across roaming partners falls apart at close inspection. Thus, there is a need for controlled testbeds and measurement tools for cellular access networks doing justice to the technology's unique structure and global scope. Particularly, such measurements suffer from a combinatorial explosion of operators, mobile plans, and services. To cope with these challenges, we built a framework that geographically decouples the SIM from the cellular modem by selectively connecting both remotely. This allows testing any subscriber with any operator at any modem location within minutes without moving parts. The resulting GSM/UMTS/LTE measurement and testbed platform offers a controlled experimentation environment, which is scalable and cost-effective. The platform is extensible and fully open-sourced, allowing other researchers to contribute locations, SIM cards, and measurement scripts. Using the above framework, our international experiments in commercial networks revealed exploitable inconsistencies in traffic metering, leading to multiple phreaking opportunities, i.e., fare-dodging. We also expose problematic IPv6 firewall configurations, hidden SIM card communication to the home network, and fingerprint dial progress tones to track victims across different roaming networks and countries with voice calls.

Updated: 2024-03-13 13:15:13

标题: MobileAtlas：用于安全和隐私研究的蜂窝网络中地理解耦测量

摘要: 移动网络不仅仅是连接到互联网的数据访问网络。它们独特的服务和形成大型复杂化合物以进行漫游使它们成为一个有吸引力的研究对象。然而，它们承诺提供与漫游合作伙伴相当的隐私和安全性的一致服务在仔细检查时会破灭。因此，有必要为移动接入网络建立受控的测试平台和测量工具，以充分发挥技术的独特结构和全球范围。特别是，这些测量受到运营商、移动计划和服务的组合爆炸的影响。为了应对这些挑战，我们建立了一个框架，通过远程选择性连接来地理上将SIM卡与移动调制解调器解耦。这样可以在几分钟内在任何调制解调器位置测试任何订户与任何运营商，而无需移动部件。由此产生的GSM/UMTS/LTE测量和测试平台提供了一个可控的实验环境，具有可伸缩性和成本效益。该平台是可扩展的并完全开源，允许其他研究人员贡献位置、SIM卡和测量脚本。利用上述框架，我们在商业网络中进行的国际实验揭示了可利用的流量计量不一致性，导致多个窃听机会，即逃票。我们还揭露了问题IPv6防火墙配置、隐藏的SIM卡与家庭网络的通信以及指纹拨号进度音来跟踪受害者在不同漫游网络和国家之间的语音通话。

更新时间: 2024-03-13 13:15:13

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2403.08507v1

SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary datasets. This fine-tuning data is especially likely to contain personal or sensitive information about individuals, resulting in increased privacy risk. Membership inference attacks are the most commonly employed attack to assess the privacy leakage of a machine learning model. However, limited research is available on the factors that affect the vulnerability of language models to this kind of attack, or on the applicability of different defense strategies in the language domain. We provide the first systematic review of the vulnerability of fine-tuned large language models to membership inference attacks, the various factors that come into play, and the effectiveness of different defense strategies. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.

Updated: 2024-03-13 12:46:51

标题: SoK: 减少经过精调的语言模型对成员推断攻击的脆弱性

摘要: 近年来，自然语言处理模型经历了显著的增长，许多应用程序都建立在它们的基础上。许多这些应用程序需要在定制的专有数据集上对通用基础模型进行微调。这种微调数据特别可能包含个人或敏感信息，从而导致隐私风险增加。成员推断攻击是评估机器学习模型隐私泄漏的最常用攻击方式。然而，关于语言模型对这种攻击的脆弱性的影响因素以及不同防御策略在语言领域的适用性的研究有限。我们提供了对大型语言模型对成员推断攻击的脆弱性、涉及的各种因素以及不同防御策略的有效性的首次系统性审查。我们发现，一些训练方法可以显著降低隐私风险，差分隐私和低秩适配器的组合可以获得最佳的隐私保护，抵御这些攻击。

更新时间: 2024-03-13 12:46:51

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2403.08481v1

The Philosopher's Stone: Trojaning Plugins of Large Language Models

Open-source Large Language Models (LLMs) have recently gained popularity because of their comparable performance to proprietary LLMs. To efficiently fulfill domain-specialized tasks, open-source LLMs can be refined, without expensive accelerators, using low-rank adapters. However, it is still unknown whether low-rank adapters can be exploited to control LLMs. To address this gap, we demonstrate that an infected adapter can induce, on specific triggers, an LLM to output content defined by an adversary and to even maliciously use tools. To train a Trojan adapter, we propose two novel attacks, POLISHED and FUSION, that improve over prior approaches. POLISHED uses LLM-enhanced paraphrasing to polish benchmark poisoned datasets. In contrast, in the absence of a dataset, FUSION leverages an over-poisoning procedure to transform a benign adaptor. In our experiments, we first conduct two case studies to demonstrate that a compromised LLM agent can execute malware to control system (e.g., LLM-driven robot) or launch a spear-phishing attack. Then, in terms of targeted misinformation, we show that our attacks provide higher attack effectiveness than the baseline and, for the purpose of attracting downloads, preserve or improve the adapter's utility. Finally, we design and evaluate three potential defenses, yet none proved entirely effective in safeguarding against our attacks.

Updated: 2024-03-13 12:28:20

标题: 《贤者之石：大型语言模型的特洛伊插件》

摘要: 开源大型语言模型(LLMs)近来因其与专有LLMs性能相媲美而备受关注。为了有效地完成领域特定任务，可以使用低秩适配器对开源LLMs进行改进，而无需昂贵的加速器。然而，目前尚不清楚低秩适配器是否可以被利用来控制LLMs。为了填补这一空白，我们证明了一个感染的适配器可以在特定触发器上诱导LLMs输出由对手定义的内容，甚至恶意使用工具。为了训练特洛伊适配器，我们提出了两种新型攻击方法，POLISHED和FUSION，这些方法改进了先前的方法。POLISHED利用LLM增强的改写技术来改进基准中毒数据集。相反，在没有数据集的情况下，FUSION利用过度毒化程序来转变良性适配器。在我们的实验中，我们首先进行了两个案例研究，证明被 compromise 的LLM代理可以执行恶意软件以控制系统(例如，LLM驱动的机器人)或发动钓鱼攻击。然后，就针对性的错误信息而言，我们展示了我们的攻击比基准提供了更高的攻击效果，并且为了吸引下载，保留或改进了适配器的效用。最后，我们设计并评估了三种潜在的防御方法，但没有一种完全有效地防范我们的攻击。

更新时间: 2024-03-13 12:28:20

领域: cs.CR

下载: http://arxiv.org/abs/2312.00374v2

A Comparison of SynDiffix Multi-table versus Single-table Synthetic Data

SynDiffix is a new open-source tool for structured data synthesis. It has anonymization features that allow it to generate multiple synthetic tables while maintaining strong anonymity. Compared to the more common single-table approach, multi-table leads to more accurate data, since only the features of interest for a given analysis need be synthesized. This paper compares SynDiffix with 15 other commercial and academic synthetic data techniques using the SDNIST analysis framework, modified by us to accommodate multi-table synthetic data. The results show that SynDiffix is many times more accurate than other approaches for low-dimension tables, but somewhat worse than the best single-table techniques for high-dimension tables.

Updated: 2024-03-13 12:26:50

标题: SynDiffix多表与单表合成数据的比较

摘要: SynDiffix是一种新的开源工具，用于结构化数据合成。它具有匿名化功能，可以生成多个合成表格，同时保持强大的匿名性。与更常见的单表方法相比，多表方法可以产生更准确的数据，因为只有与特定分析感兴趣的特征需要合成。本文使用我们修改的SDNIST分析框架将SynDiffix与其他15种商业和学术合成数据技术进行比较，以适应多表合成数据。结果显示，对于低维表，SynDiffix比其他方法准确度高几倍，但对于高维表，略逊于最佳单表技术。

更新时间: 2024-03-13 12:26:50

领域: cs.CR

下载: http://arxiv.org/abs/2403.08463v1

Tastle: Distract Large Language Models for Automatic Jailbreak Attack

Large language models (LLMs) have achieved significant advances in recent days. Extensive efforts have been made before the public release of LLMs to align their behaviors with human values. The primary goal of alignment is to ensure their helpfulness, honesty and harmlessness. However, even meticulously aligned LLMs remain vulnerable to malicious manipulations such as jailbreaking, leading to unintended behaviors. The jailbreak is to intentionally develop a malicious prompt that escapes from the LLM security restrictions to produce uncensored detrimental contents. Previous works explore different jailbreak methods for red teaming LLMs, yet they encounter challenges regarding to effectiveness and scalability. In this work, we propose Tastle, a novel black-box jailbreak framework for automated red teaming of LLMs. We designed malicious content concealing and memory reframing with an iterative optimization algorithm to jailbreak LLMs, motivated by the research about the distractibility and over-confidence phenomenon of LLMs. Extensive experiments of jailbreaking both open-source and proprietary LLMs demonstrate the superiority of our framework in terms of effectiveness, scalability and transferability. We also evaluate the effectiveness of existing jailbreak defense methods against our attack and highlight the crucial need to develop more effective and practical defense strategies.

Updated: 2024-03-13 11:16:43

标题: 标题翻译：Tastle：用于自动越狱攻击的大型语言模型分散策略

摘要: 大型语言模型（LLMs）在最近取得了显著进展。在LLMs公开发布之前，已经进行了大量努力来使它们的行为与人类价值观相一致。对齐的主要目标是确保它们的有用性、诚实性和无害性。然而，即使经过精心对齐的LLMs也容易受到恶意操纵的影响，例如越狱，导致意外行为。越狱是有意开发一个恶意提示，逃脱LLM的安全限制，产生未经审查的有害内容。先前的研究探索了不同的越狱方法，用于对LLMs进行红队操作，但他们在效果和可扩展性方面遇到了挑战。在这项工作中，我们提出了Tastle，一种新颖的用于自动化LLMs红队操作的黑盒越狱框架。我们设计了恶意内容隐藏和内存重塑，采用迭代优化算法来越狱LLMs，受到对LLMs分散注意力和过度自信现象的研究的启发。对开源和专有LLMs进行的大量越狱实验表明了我们框架在效果、可扩展性和可转移性方面的优势。我们还评估了现有越狱防御方法对我们攻击的效果，并强调了发展更有效和实用的防御策略的迫切需要。

更新时间: 2024-03-13 11:16:43

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.08424v1

DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping

With the growing popularity of modularity in software development comes the rise of package managers and language ecosystems. Among them, npm stands out as the most extensive package manager, hosting more than 2 million third-party open-source packages that greatly simplify the process of building code. However, this openness also brings security risks, as evidenced by numerous package poisoning incidents. In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details. Further, we perform manual inspection and API call sequence analysis on packages collected from public datasets and security reports to build a hierarchical classification framework and behavioral knowledge base covering different sensitive behaviors. In addition, we propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis. It makes preliminary judgments on the degree of maliciousness of packages by code reconstruction techniques and static analysis, extracts dynamic API call sequences to confirm and identify obfuscated content that static analysis can not handle alone, and finally tags malicious software packages based on the constructed behavior knowledge base. To date, we have identified and manually confirmed 325 malicious samples and discovered 2 unusual API calls and 246 API call sequences that have not appeared in known samples.

Updated: 2024-03-13 08:38:21

标题: DONAPI：使用行为序列知识映射的恶意NPM软件包检测器

摘要: 随着软件开发中模块化的日益流行，包管理器和语言生态系统也日益兴起。在其中，npm作为最广泛的包管理器脱颖而出，托管了超过两百万个第三方开源包，极大地简化了构建代码的过程。然而，这种开放性也带来了安全风险，正如众多包污染事件所证明的那样。在本文中，我们将一个包含超过340万个包的本地包缓存与近实时同步，以便我们可以访问更多的包代码细节。此外，我们对从公共数据集和安全报告中收集的包进行手动检查和API调用序列分析，构建了一个涵盖不同敏感行为的分层分类框架和行为知识库。此外，我们提出了DONAPI，一个结合静态和动态分析的自动恶意npm包检测器。它通过代码重构技术和静态分析对包的恶意程度进行初步判断，提取动态API调用序列以确认和识别静态分析无法单独处理的混淆内容，并最终根据构建的行为知识库标记恶意软件包。迄今为止，我们已经识别和手动确认了325个恶意样本，并发现了2个不寻常的API调用和246个API调用序列，这些在已知样本中尚未出现。

更新时间: 2024-03-13 08:38:21

领域: cs.CR

下载: http://arxiv.org/abs/2403.08334v1

Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data

We introduce the concept of "universal password model" -- a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system. To achieve this, the model does not need to access any plaintext passwords from the target credentials. Instead, it exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying password distribution. Specifically, the model uses deep learning to capture the correlation between the auxiliary data of a group of users (e.g., users of a web application) and their passwords. It then exploits those patterns to create a tailored password model for the target system at inference time. No further training steps, targeted data collection, or prior knowledge of the community's password distribution is required. Besides improving over current password strength estimation techniques and attacks, the model enables any end-user (e.g., system administrators) to autonomously generate tailored password models for their systems without the often unworkable requirements of collecting suitable training data and fitting the underlying machine learning model. Ultimately, our framework enables the democratization of well-calibrated password models to the community, addressing a major challenge in the deployment of password security solutions at scale.

Updated: 2024-03-13 08:02:51

标题: 通用的神经破解机：利用辅助数据自配置密码模型

摘要: 我们介绍了“通用密码模型”的概念 - 一种密码模型，一旦预训练，就可以根据目标系统自动调整猜测策略。为了实现这一点，该模型不需要访问目标凭据中的任何明文密码。相反，它利用用户的辅助信息，如电子邮件地址，作为预测潜在密码分布的代理信号。具体而言，该模型使用深度学习来捕捉一组用户的辅助数据（例如，Web应用程序的用户）和他们密码之间的相关性。然后，在推理时利用这些模式为目标系统创建一个定制的密码模型。不需要进一步的训练步骤、定向数据收集或社区密码分布的先验知识。除了改进当前的密码强度估计技术和攻击之外，该模型还使任何最终用户（例如系统管理员）能够自主为其系统生成定制的密码模型，而无需收集合适的训练数据和拟合底层机器学习模型，通常难以实现。最终，我们的框架使得对社区提供校准良好的密码模型成为可能，解决了在规模上部署密码安全解决方案中的一个主要挑战。

更新时间: 2024-03-13 08:02:51

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2301.07628v5

TSFool: Crafting Highly-Imperceptible Adversarial Time Series through Multi-Objective Attack

Recent years have witnessed the success of recurrent neural network (RNN) models in time series classification (TSC). However, neural networks (NNs) are vulnerable to adversarial samples, which cause real-life adversarial attacks that undermine the robustness of AI models. To date, most existing attacks target at feed-forward NNs and image recognition tasks, but they cannot perform well on RNN-based TSC. This is due to the cyclical computation of RNN, which prevents direct model differentiation. In addition, the high visual sensitivity of time series to perturbations also poses challenges to local objective optimization of adversarial samples. In this paper, we propose an efficient method called TSFool to craft highly-imperceptible adversarial time series for RNN-based TSC. The core idea is a new global optimization objective known as "Camouflage Coefficient" that captures the imperceptibility of adversarial samples from the class distribution. Based on this, we reduce the adversarial attack problem to a multi-objective optimization problem that enhances the perturbation quality. Furthermore, to speed up the optimization process, we propose to use a representation model for RNN to capture deeply embedded vulnerable samples whose features deviate from the latent manifold. Experiments on 11 UCR and UEA datasets showcase that TSFool significantly outperforms six white-box and three black-box benchmark attacks in terms of effectiveness, efficiency and imperceptibility from various perspectives including standard measure, human study and real-world defense.

Updated: 2024-03-13 07:50:44

标题: TSFool：通过多目标攻击制造高度隐蔽的对抗性时间序列

摘要: 近年来，循环神经网络（RNN）模型在时间序列分类（TSC）中取得了成功。然而，神经网络（NNs）容易受到对抗样本的影响，这会导致现实生活中的对抗攻击，从而削弱人工智能模型的鲁棒性。迄今为止，大多数现有攻击针对前馈神经网络和图像识别任务，但它们在基于RNN的TSC上表现不佳。这是由于RNN的循环计算，这阻碍了直接模型的区分。此外，时间序列对扰动的高视觉敏感性也给对抗样本的局部目标优化带来挑战。在本文中，我们提出了一种称为TSFool的高效方法，用于为基于RNN的TSC制作高度不可察觉的对抗时间序列。核心思想是一个称为“伪装系数”的全局优化目标，捕捉了对抗样本与类别分布的不可察觉性。基于此，我们将对抗攻击问题简化为一个增强扰动质量的多目标优化问题。此外，为了加快优化过程，我们提议使用一个表示模型来捕捉深度嵌入的脆弱样本，其特征偏离了潜在流形。对11个UCR和UEA数据集的实验表明，TSFool在效果、效率和不可察觉性方面明显优于六种白盒和三种黑盒基准攻击，包括标准度量、人类研究和实际防御。

更新时间: 2024-03-13 07:50:44

领域: cs.LG,cs.CR,I.2.0; I.5.0

下载: http://arxiv.org/abs/2209.06388v3

SNOW-SCA: ML-assisted Side-Channel Attack on SNOW-V

This paper presents SNOW-SCA, the first power side-channel analysis (SCA) attack of a 5G mobile communication security standard candidate, SNOW-V, running on a 32-bit ARM Cortex-M4 microcontroller. First, we perform a generic known-key correlation (KKC) analysis to identify the leakage points. Next, a correlation power analysis (CPA) attack is performed, which reduces the attack complexity to two key guesses for each key byte. The correct secret key is then uniquely identified utilizing linear discriminant analysis (LDA). The profiled SCA attack with LDA achieves 100% accuracy after training with $<200$ traces, which means the attack succeeds with just a single trace. Overall, using the \textit{combined CPA and LDA attack} model, the correct secret key byte is recovered with <50 traces collected using the ChipWhisperer platform. The entire 256-bit secret key of SNOW-V can be recovered incrementally using the proposed SCA attack. Finally, we suggest low-overhead countermeasures that can be used to prevent these SCA attacks.

Updated: 2024-03-13 05:35:55

标题: SNOW-SCA: 基于机器学习的对SNOW-V的侧信道攻击

摘要: 本文介绍了SNOW-SCA，这是第一个针对5G移动通信安全标准候选SNOW-V的功耗侧信道分析（SCA）攻击，其在32位ARM Cortex-M4微控制器上运行。首先，我们进行了一般性的已知密钥相关性（KKC）分析，以确定泄漏点。接下来，进行相关功耗分析（CPA）攻击，将攻击复杂性降低到每个密钥字节两次密钥猜测。然后，利用线性判别分析（LDA）唯一识别正确的秘密密钥。通过与LDA的配合，经过训练使用<200条迹线的SCA攻击实现了100%的准确性，这意味着攻击成功仅需一条迹线。总体而言，使用\textit{组合CPA和LDA攻击}模型，使用ChipWhisperer平台收集<50条迹线就可以恢复正确的秘密密钥字节。建议采用低成本的对抗措施，以防止这些SCA攻击逐步恢复SNOW-V的整个256位秘密密钥。

更新时间: 2024-03-13 05:35:55

领域: cs.CR,cs.LG,cs.NI,E.3

下载: http://arxiv.org/abs/2403.08267v1

GPT, Ontology, and CAABAC: A Tripartite Personalized Access Control Model Anchored by Compliance, Context and Attribute

As digital healthcare evolves, the security of electronic health records (EHR) becomes increasingly crucial. This study presents the GPT-Onto-CAABAC framework, integrating Generative Pretrained Transformer (GPT), medical-legal ontologies and Context-Aware Attribute-Based Access Control (CAABAC) to enhance EHR access security. Unlike traditional models, GPT-Onto-CAABAC dynamically interprets policies and adapts to changing healthcare and legal environments, offering customized access control solutions. Through empirical evaluation, this framework is shown to be effective in improving EHR security by accurately aligning access decisions with complex regulatory and situational requirements. The findings suggest its broader applicability in sectors where access control must meet stringent compliance and adaptability standards.

Updated: 2024-03-13 05:30:30

标题: GPT、本体论和CAABAC：以合规性、上下文和属性为基础的三方个性化访问控制模型

摘要: 随着数字医疗的发展，电子健康记录（EHR）的安全性变得越来越重要。本研究提出了GPT-Onto-CAABAC框架，将生成预训练变换器（GPT）、医学法律本体论和上下文感知属性基于访问控制（CAABAC）集成在一起，以提升EHR访问安全性。与传统模型不同，GPT-Onto-CAABAC动态解释政策并适应不断变化的医疗和法律环境，提供定制的访问控制解决方案。通过实证评估，该框架被证明在通过准确对齐访问决策与复杂的监管和情境要求方面有效提高EHR安全性。研究结果表明其在需要满足严格合规性和适应性标准的领域具有更广泛的适用性。

更新时间: 2024-03-13 05:30:30

领域: cs.CY,cs.AI,cs.CR

下载: http://arxiv.org/abs/2403.08264v1

Machine Unlearning: Taxonomy, Metrics, Applications, Challenges, and Prospects

Personal digital data is a critical asset, and governments worldwide have enforced laws and regulations to protect data privacy. Data users have been endowed with the right to be forgotten of their data. In the course of machine learning (ML), the forgotten right requires a model provider to delete user data and its subsequent impact on ML models upon user requests. Machine unlearning emerges to address this, which has garnered ever-increasing attention from both industry and academia. While the area has developed rapidly, there is a lack of comprehensive surveys to capture the latest advancements. Recognizing this shortage, we conduct an extensive exploration to map the landscape of machine unlearning including the (fine-grained) taxonomy of unlearning algorithms under centralized and distributed settings, debate on approximate unlearning, verification and evaluation metrics, challenges and solutions for unlearning under different applications, as well as attacks targeting machine unlearning. The survey concludes by outlining potential directions for future research, hoping to serve as a guide for interested scholars.

Updated: 2024-03-13 05:11:24

标题: 机器去学习：分类、度量、应用、挑战和前景

摘要: 个人数字数据是一项重要资产，全球各国政府已经实施了法律和法规来保护数据隐私。数据使用者被赋予了被遗忘数据的权利。在机器学习（ML）过程中，被遗忘权利要求模型提供者在用户请求时删除用户数据及其对ML模型的后续影响。机器遗忘应运而生，受到工业界和学术界越来越多的关注。尽管这一领域发展迅速，但缺乏全面的调查来捕捉最新的进展。鉴于这一短缺，我们进行了广泛的探索，绘制了机器遗忘的景观，包括集中和分布式环境下的遗忘算法的（细粒度）分类，对近似遗忘、验证和评估指标、不同应用中遗忘的挑战和解决方案以及针对机器遗忘的攻击进行辩论。调查最后概述了未来研究的潜在方向，希望作为对感兴趣学者的指南。

更新时间: 2024-03-13 05:11:24

领域: cs.LG,cs.CR,cs.CY

下载: http://arxiv.org/abs/2403.08254v1

Graph Unlearning with Efficient Partial Retraining

Graph Neural Networks (GNNs) have achieved remarkable success in various real-world applications. However, GNNs may be trained on undesirable graph data, which can degrade their performance and reliability. To enable trained GNNs to efficiently unlearn unwanted data, a desirable solution is retraining-based graph unlearning, which partitions the training graph into subgraphs and trains sub-models on them, allowing fast unlearning through partial retraining. However, the graph partition process causes information loss in the training graph, resulting in the low model utility of sub-GNN models. In this paper, we propose GraphRevoker, a novel graph unlearning framework that better maintains the model utility of unlearnable GNNs. Specifically, we preserve the graph property with graph property-aware sharding and effectively aggregate the sub-GNN models for prediction with graph contrastive sub-model aggregation. We conduct extensive experiments to demonstrate the superiority of our proposed approach.

Updated: 2024-03-13 04:43:23

标题: 使用高效的部分重训练进行图形遗忘

摘要: 图神经网络（GNNs）在各种现实世界应用中取得了显著的成功。然而，GNNs可能在不良的图数据上进行训练，这可能会降低它们的性能和可靠性。为了使训练过的GNNs能够高效地去除不想要的数据，一种理想的解决方案是基于重新训练的图去除，该方法将训练图分为子图，并在其上训练子模型，允许通过部分重新训练快速去除数据。然而，图分区过程会导致训练图中的信息丢失，从而导致子GNN模型的模型效用较低。在本文中，我们提出了GraphRevoker，一种新颖的图去除框架，更好地保持了难以去除的GNNs的模型效用。具体而言，我们通过图属性感知分片来保留图属性，并使用图对比子模型聚合有效地聚合子GNN模型进行预测。我们进行了大量实验来展示我们提出的方法的优越性。

更新时间: 2024-03-13 04:43:23

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2403.07353v2

Learning to Watermark LLM-generated Text via Reinforcement Learning

We study how to watermark LLM outputs, i.e. embedding algorithmically detectable signals into LLM-generated text to track misuse. Unlike the current mainstream methods that work with a fixed LLM, we expand the watermark design space by including the LLM tuning stage in the watermark pipeline. While prior works focus on token-level watermark that embeds signals into the output, we design a model-level watermark that embeds signals into the LLM weights, and such signals can be detected by a paired detector. We propose a co-training framework based on reinforcement learning that iteratively (1) trains a detector to detect the generated watermarked text and (2) tunes the LLM to generate text easily detectable by the detector while keeping its normal utility. We empirically show that our watermarks are more accurate, robust, and adaptable (to new attacks). It also allows watermarked model open-sourcing. In addition, if used together with alignment, the extra overhead introduced is low - only training an extra reward model (i.e. our detector). We hope our work can bring more effort into studying a broader watermark design that is not limited to working with a fixed LLM. We open-source the code: https://github.com/xiaojunxu/learning-to-watermark-llm .

Updated: 2024-03-13 03:43:39

标题: 学习通过强化学习对由LLM生成的文本进行标记

摘要: 我们研究了如何为LLM输出添加水印，即将可通过算法检测的信号嵌入LLM生成的文本中，以追踪滥用情况。与目前主流方法不同，这些方法仅适用于固定的LLM，我们通过在水印流程中包含LLM调整阶段来扩展水印设计空间。先前的研究侧重于将信号嵌入到输出中的令牌级水印，而我们设计了一种模型级水印，将信号嵌入到LLM权重中，这样的信号可以被配对的检测器检测到。我们提出了基于强化学习的协同训练框架，迭代地（1）训练一个检测器来检测生成的带水印文本，（2）调整LLM以生成检测器轻松检测到的文本，同时保持其正常的实用性。我们通过实验证明，我们的水印更准确、更稳健、更适应（新攻击）。它还允许水印模型开源。此外，如果与对齐一起使用，引入的额外开销很低 - 只需训练一个额外的奖励模型（即我们的检测器）。我们希望我们的工作能够引起更多关于研究更广泛的水印设计的努力，而不仅仅是限于与固定的LLM一起工作。我们开源了代码：https://github.com/xiaojunxu/learning-to-watermark-llm。

更新时间: 2024-03-13 03:43:39

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2403.10553v1

Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks

In the rapidly evolving landscape of communication and network security, the increasing reliance on deep neural networks (DNNs) and cloud services for data processing presents a significant vulnerability: the potential for backdoors that can be exploited by malicious actors. Our approach leverages advanced tensor decomposition algorithms Independent Vector Analysis (IVA), Multiset Canonical Correlation Analysis (MCCA), and Parallel Factor Analysis (PARAFAC2) to meticulously analyze the weights of pre-trained DNNs and distinguish between backdoored and clean models effectively. The key strengths of our method lie in its domain independence, adaptability to various network architectures, and ability to operate without access to the training data of the scrutinized models. This not only ensures versatility across different application scenarios but also addresses the challenge of identifying backdoors without prior knowledge of the specific triggers employed to alter network behavior. We have applied our detection pipeline to three distinct computer vision datasets, encompassing both image classification and object detection tasks. The results demonstrate a marked improvement in both accuracy and efficiency over existing backdoor detection methods. This advancement enhances the security of deep learning and AI in networked systems, providing essential cybersecurity against evolving threats in emerging technologies.

Updated: 2024-03-13 03:10:11

标题: 推进AI系统的安全性：一种检测深度神经网络后门的新方法

摘要: 在通信和网络安全迅速发展的背景下，对深度神经网络（DNNs）和云服务进行数据处理的依赖日益增加，这带来了一个重要的漏洞：恶意行为者可以利用的后门潜在性。我们的方法利用先进的张量分解算法独立向量分析（IVA）、多集合典范相关分析（MCCA）和并行因子分析（PARAFAC2）来精细分析预训练DNNs的权重，并有效区分后门和干净模型。我们方法的关键优势在于其领域独立性，适应各种网络架构，以及在不访问被审查模型的训练数据的情况下运作的能力。这不仅确保了在不同应用场景下的多功能性，还解决了在不了解用于改变网络行为的具体触发器的情况下识别后门的挑战。我们已将我们的检测管道应用于三个不同的计算机视觉数据集，涵盖图像分类和目标检测任务。结果表明，与现有的后门检测方法相比，准确性和效率都有显着提高。这一进步增强了深度学习和人工智能在网络系统中的安全性，为新兴技术中不断发展的威胁提供了必要的网络安全保护。

更新时间: 2024-03-13 03:10:11

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2403.08208v1

Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows

With the rapid development of the Internet, various types of anomaly traffic are threatening network security. We consider the problem of anomaly network traffic detection and propose a three-stage anomaly detection framework using only normal traffic. Our framework can generate pseudo anomaly samples without prior knowledge of anomalies to achieve the detection of anomaly data. Firstly, we employ a reconstruction method to learn the deep representation of normal samples. Secondly, these representations are normalized to a standard normal distribution using a bidirectional flow module. To simulate anomaly samples, we add noises to the normalized representations which are then passed through the generation direction of the bidirectional flow module. Finally, a simple classifier is trained to differentiate the normal samples and pseudo anomaly samples in the latent space. During inference, our framework requires only two modules to detect anomalous samples, leading to a considerable reduction in model size. According to the experiments, our method achieves the state of-the-art results on the common benchmarking datasets of anomaly network traffic detection. The code is given in the https://github.com/ZxuanDang/ATD-via-Flows.git

Updated: 2024-03-13 02:10:32

标题: 双向归一化流用于异常流量检测的半监督学习

摘要: 随着互联网的快速发展，各种类型的异常网络流量正在威胁网络安全。我们考虑异常网络流量检测问题，并提出了一个只使用正常流量的三阶段异常检测框架。我们的框架可以生成伪异常样本，无需先前对异常进行了解即可实现异常数据的检测。首先，我们使用重构方法学习正常样本的深层表示。其次，这些表示通过双向流模块归一化为标准正态分布。为了模拟异常样本，我们向归一化表示添加噪声，然后通过双向流模块的生成方向传递。最后，在潜在空间中训练一个简单的分类器来区分正常样本和伪异常样本。在推断过程中，我们的框架只需要两个模块来检测异常样本，大大减小了模型大小。根据实验结果，我们的方法在常见的异常网络流量检测基准数据集上取得了最先进的结果。该代码可在https://github.com/ZxuanDang/ATD-via-Flows.git中找到。

更新时间: 2024-03-13 02:10:32

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2403.10550v1