Arxiv Day: Article

Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models

Low-rank adaptation (LoRA) is an efficient strategy for adapting latent diffusion models (LDMs) on a private dataset to generate specific images by minimizing the adaptation loss. However, the LoRA-adapted LDMs are vulnerable to membership inference (MI) attacks that can judge whether a particular data point belongs to the private dataset, thus leading to the privacy leakage. To defend against MI attacks, we first propose a straightforward solution: Membership-Privacy-preserving LoRA (MP-LoRA). MP-LoRA is formulated as a min-max optimization problem where a proxy attack model is trained by maximizing its MI gain while the LDM is adapted by minimizing the sum of the adaptation loss and the MI gain of the proxy attack model. However, we empirically find that MP-LoRA has the issue of unstable optimization, and theoretically analyze that the potential reason is the unconstrained local smoothness, which impedes the privacy-preserving adaptation. To mitigate this issue, we further propose a Stable Membership-Privacy-preserving LoRA (SMP-LoRA) that adapts the LDM by minimizing the ratio of the adaptation loss to the MI gain. Besides, we theoretically prove that the local smoothness of SMP-LoRA can be constrained by the gradient norm, leading to improved convergence. Our experimental results corroborate that SMP-LoRA can indeed defend against MI attacks and generate high-quality images. Our code is available at https://github.com/WilliamLUO0/StablePrivateLoRA.

Updated: 2024-06-08 23:46:34

标题: 隐私保护的低秩适应性在潜在扩散模型中的应用

摘要: 低秩适应（LoRA）是一种有效的策略，用于在私有数据集上调整潜在扩散模型（LDMs），以通过最小化适应损失生成特定图像。然而，经过LoRA调整的LDMs容易受到成员推断（MI）攻击的影响，这种攻击可以判断特定数据点是否属于私有数据集，从而导致隐私泄露。为了抵御MI攻击，我们首先提出了一个直接的解决方案：成员隐私保护LoRA（MP-LoRA）。MP-LoRA被构建为一个最大最小优化问题，其中一个代理攻击模型通过最大化其MI增益进行训练，而LDM通过最小化适应损失和代理攻击模型的MI增益之和进行调整。然而，我们在实证中发现MP-LoRA存在不稳定优化的问题，并且从理论上分析潜在原因是未受约束的局部平滑性，这会妨碍隐私保护适应。为了缓解这个问题，我们进一步提出了稳定的成员隐私保护LoRA（SMP-LoRA），通过最小化适应损失与MI增益的比率来调整LDM。此外，我们在理论上证明SMP-LoRA的局部平滑性可以通过梯度范数进行约束，从而实现改善收敛性。我们的实验结果证实SMP-LoRA确实可以抵御MI攻击并生成高质量图像。我们的代码可在https://github.com/WilliamLUO0/StablePrivateLoRA找到。

更新时间: 2024-06-08 23:46:34

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2402.11989v2

Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

Although explainability is essential in the clinical diagnosis, most deep learning models still function as black boxes without elucidating their decision-making process. In this study, we investigate the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria. We introduce a simple yet effective framework, Explicd, towards Explainable language-informed criteria-based diagnosis. Explicd initiates its process by querying domain knowledge from either large language models (LLMs) or human experts to establish diagnostic criteria across various concept axes (e.g., color, shape, texture, or specific patterns of diseases). By leveraging a pretrained vision-language model, Explicd injects these criteria into the embedding space as knowledge anchors, thereby facilitating the learning of corresponding visual concepts within medical images. The final diagnostic outcome is determined based on the similarity scores between the encoded visual concepts and the textual criteria embeddings. Through extensive evaluation of five medical image classification benchmarks, Explicd has demonstrated its inherent explainability and extends to improve classification performance compared to traditional black-box models.

Updated: 2024-06-08 23:23:28

标题: 将人类知识与视觉概念对齐，实现可解释的医学图像分类

摘要: 尽管可解释性在临床诊断中至关重要，但大多数深度学习模型仍然以不透明的方式运作，没有阐明它们的决策过程。在这项研究中，我们调查了可解释模型的发展，该模型可以通过融合明确诊断标准的领域知识来模拟人类专家的决策过程。我们引入了一个简单而有效的框架，Explicd，以实现基于语言的、基于标准的可解释诊断。Explicd通过从大型语言模型（LLMs）或人类专家查询领域知识来建立各种概念轴（例如颜色、形状、质地或特定疾病模式）上的诊断标准。通过利用预训练的视觉-语言模型，Explicd将这些标准注入到嵌入空间中作为知识锚点，从而促进学习医学图像中对应视觉概念。最终的诊断结果是基于编码的视觉概念与文本标准嵌入之间的相似度分数来确定的。通过对五个医学图像分类基准的广泛评估，Explicd已经证明了其固有的可解释性，并相比传统的不透明模型，扩展了改进分类性能的能力。

更新时间: 2024-06-08 23:23:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.05596v1

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace. We propose a basis-free generalization of the merged-staircase property in Abbe et al. (2022) and establish a necessary condition for the SGD-learnability. In addition, we prove that the condition is almost sufficient, in the sense that a condition slightly stronger than the necessary condition can guarantee the exponential decay of the loss functional to zero.

Updated: 2024-06-08 22:50:27

标题: 用高斯输入学习子空间稀疏多项式的均场分析

摘要: 在这项工作中，我们研究了使用随机梯度下降和两层神经网络学习子空间稀疏多项式的平均场流动，其中输入分布为标准高斯分布，输出仅取决于输入投影到低维子空间上。我们提出了Abbe等人（2022年）中合并阶梯性质的基础无关泛化，并建立了SGD可学习性的必要条件。此外，我们证明了该条件几乎足够，即比必要条件稍强一点的条件可以保证损失函数的指数衰减到零。

更新时间: 2024-06-08 22:50:27

领域: cs.LG,math.AP

下载: http://arxiv.org/abs/2402.08948v2

Rethinking the Capacity of Graph Neural Networks for Branching Strategy

Graph neural networks (GNNs) have been widely used to predict properties and heuristics of mixed-integer linear programs (MILPs) and hence accelerate MILP solvers. This paper investigates the capacity of GNNs to represent strong branching (SB), the most effective yet computationally expensive heuristic employed in the branch-and-bound algorithm. In the literature, message-passing GNN (MP-GNN), as the simplest GNN structure, is frequently used as a fast approximation of SB and we find that not all MILPs's SB can be represented with MP-GNN. We precisely define a class of "MP-tractable" MILPs for which MP-GNNs can accurately approximate SB scores. Particularly, we establish a universal approximation theorem: for any data distribution over the MP-tractable class, there always exists an MP-GNN that can approximate the SB score with arbitrarily high accuracy and arbitrarily high probability, which lays a theoretical foundation of the existing works on imitating SB with MP-GNN. For MILPs without the MP-tractability, unfortunately, a similar result is impossible, which can be illustrated by two MILP instances with different SB scores that cannot be distinguished by any MP-GNN, regardless of the number of parameters. Recognizing this, we explore another GNN structure called the second-order folklore GNN (2-FGNN) that overcomes this limitation, and the aforementioned universal approximation theorem can be extended to the entire MILP space using 2-FGNN, regardless of the MP-tractability. A small-scale numerical experiment is conducted to directly validate our theoretical findings.

Updated: 2024-06-08 22:44:02

标题: 重新思考图神经网络在分支策略中的能力

摘要: 图神经网络（GNNs）已被广泛用于预测混合整数线性规划（MILPs）的性质和启发式，并因此加速MILP求解器。本文研究了GNNs代表强分支（SB）的能力，SB是分支定界算法中最有效但计算昂贵的启发式。在文献中，消息传递GNN（MP-GNN）作为最简单的GNN结构经常被用作SB的快速近似，我们发现并非所有MILPs的SB都可以用MP-GNN表示。我们精确地定义了一类“MP-可处理”的MILPs，对于这类MILPs，MP-GNN可以准确地近似SB分数。特别地，我们建立了一个通用逼近定理：对于MP-可处理类别上的任何数据分布，总是存在一个MP-GNN可以以任意高的准确度和概率近似SB分数，这为现有关于用MP-GNN模拟SB的工作奠定了理论基础。对于没有MP-可处理性的MILPs，不幸的是，类似的结果是不可能的，这可以通过两个具有不同SB分数的MILP示例来说明，任何MP-GNN都无法区分，无论参数数量多少。鉴于此，我们探索了另一种GNN结构，称为二阶传说GNN（2-FGNN），它克服了这一限制，上述通用逼近定理可以通过2-FGNN扩展到整个MILP空间，无论MP-可处理性。进行了小规模数值实验，直接验证了我们的理论发现。

更新时间: 2024-06-08 22:44:02

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.07099v2

Learning High-Order Relationships of Brain Regions

Discovering reliable and informative relationships among brain regions from functional magnetic resonance imaging (fMRI) signals is essential in phenotypic predictions. Most of the current methods fail to accurately characterize those interactions because they only focus on pairwise connections and overlook the high-order relationships of brain regions. We propose that these high-order relationships should be maximally informative and minimally redundant (MIMR). However, identifying such high-order relationships is challenging and under-explored due to the exponential search space and the absence of a tractable objective. In response to this gap, we propose a novel method named HYBRID which aims to extract MIMR high-order relationships from fMRI data. HYBRID employs a CONSTRUCTOR to identify hyperedge structures, and a WEIGHTER to compute a weight for each hyperedge, which avoids searching in exponential space. HYBRID achieves the MIMR objective through an innovative information bottleneck framework named multi-head drop-bottleneck with theoretical guarantees. Our comprehensive experiments demonstrate the effectiveness of our model. Our model outperforms the state-of-the-art predictive model by an average of 11.2%, regarding the quality of hyperedges measured by CPM, a standard protocol for studying brain connections.

Updated: 2024-06-08 22:32:45

标题: 学习大脑区域的高阶关系

摘要: 从功能磁共振成像（fMRI）信号中发现大脑区域之间可靠且具信息量的关系对于表型预测至关重要。目前大多数方法未能准确描述这些相互作用，因为它们仅关注成对连接，忽视了大脑区域的高阶关系。我们提出这些高阶关系应该是最具信息量且最小冗余的（MIMR）。然而，由于指数搜索空间的存在和缺乏可处理的目标，识别这种高阶关系是具有挑战性且未被充分探索的。为弥补这一空白，我们提出了一种名为HYBRID的新颖方法，旨在从fMRI数据中提取MIMR高阶关系。HYBRID采用CONSTRUCTOR识别超边结构，以及WEIGHTER计算每个超边的权重，避免在指数空间中搜索。通过一种名为多头降维瓶颈的创新信息瓶颈框架，HYBRID实现了MIMR目标，并具有理论保证。我们的广泛实验证明了我们模型的有效性。我们的模型在使用CPM（一种用于研究大脑连接的标准协议）测量的超边质量方面，比最先进的预测模型平均提高了11.2%。

更新时间: 2024-06-08 22:32:45

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2312.02203v3

NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security

Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions. Utilizing the advanced function calling capabilities of LLMs, we build a fully automated system with an enhanced workflow and support for external tool calls. Our benchmark dataset and automated framework allow us to evaluate the performance of five LLMs, encompassing both black-box and open-source models. This work lays the foundation for future research into improving the efficiency of LLMs in interactive cybersecurity tasks and automated task planning. By providing a specialized dataset, our project offers an ideal platform for developing, testing, and refining LLM-based approaches to vulnerability detection and resolution. Evaluating LLMs on these challenges and comparing with human performance yields insights into their potential for AI-driven cybersecurity solutions to perform real-world threat management. We make our dataset open source to public https://github.com/NYU-LLM-CTF/LLM_CTF_Database along with our playground automated framework https://github.com/NYU-LLM-CTF/llm_ctf_automation.

Updated: 2024-06-08 22:21:42

标题: NYU CTF数据集：用于评估攻击性安全中LLMs的可扩展开源基准数据集

摘要: 大型语言模型（LLMs）目前正在各个领域得到应用。然而，它们在网络安全中解决夺旗挑战的能力尚未得到充分评估。为了解决这个问题，我们开发了一种新方法，通过创建一个可扩展的、开源的基准数据库来评估LLMs解决夺旗挑战的能力，这个数据库专门设计用于这些应用。该数据库包括LLMs测试和自适应学习的元数据，编制了来自流行比赛的多样化夺旗挑战。利用LLMs的高级函数调用能力，我们建立了一个完全自动化的系统，具有增强的工作流程和对外部工具调用的支持。我们的基准数据集和自动化框架使我们能够评估五种LLMs的性能，涵盖了黑盒和开源模型。这项工作为未来研究提高LLMs在交互式网络安全任务和自动任务规划中的效率奠定了基础。通过提供专门的数据集，我们的项目为开发、测试和完善基于LLMs的漏洞检测和解决方法提供了一个理想平台。在这些挑战上评估LLMs并与人类表现进行比较，可以深入了解它们在AI驱动的网络安全解决方案中应对实际威胁管理的潜力。我们将数据集开放源代码 https://github.com/NYU-LLM-CTF/LLM_CTF_Database，以及我们的游乐场自动化框架 https://github.com/NYU-LLM-CTF/llm_ctf_automation。

更新时间: 2024-06-08 22:21:42

领域: cs.CR,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.05590v1

CERET: Cost-Effective Extrinsic Refinement for Text Generation

Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by their high computational cost and lack of scalability. In this work, we propose CERET, a method for refining text generations by considering semantic stability, entailment and inter-sample uncertainty measures. Experimental results show that CERET outperforms Self-consistency and Self-rerank baselines consistently under various task setups, by ~1.6% in Rouge-1 for abstractive summarization and ~3.5% in hit rate for question answering. Compared to LLM Self-rerank method, our approach only requires 9.4% of its latency and is more cost-effective.

Updated: 2024-06-08 22:17:52

标题: CERET：用于文本生成的经济外部细化

摘要: 大型语言模型（LLMs）是生成任务的强大模型，但它们可能在第一次尝试时不能生成高质量的输出。除了模型微调外，现有的改善预测准确性和质量的方法通常涉及LLM自我改进/自我反思，其中包括来自模型本身的反馈。尽管这些方法有效，但它们受到计算成本高和缺乏可扩展性的限制。在这项工作中，我们提出了CERET，一种通过考虑语义稳定性、蕴涵性和样本间不确定性度量来改进文本生成的方法。实验结果表明，在各种任务设置下，CERET始终优于自一致性和自重新排列基线，对于抽象总结来说，Rouge-1提高了约1.6%，对于问题回答来说，命中率提高了约3.5%。与LLM自重新排列方法相比，我们的方法仅需要其延迟的9.4%，更具成本效益。

更新时间: 2024-06-08 22:17:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05588v1

Neural Operators with Localized Integral and Differential Kernels

Neural operators learn mappings between function spaces, which is practical for learning solution operators of PDEs and other scientific modeling applications. Among them, the Fourier neural operator (FNO) is a popular architecture that performs global convolutions in the Fourier space. However, such global operations are often prone to over-smoothing and may fail to capture local details. In contrast, convolutional neural networks (CNN) can capture local features but are limited to training and inference at a single resolution. In this work, we present a principled approach to operator learning that can capture local features under two frameworks by learning differential operators and integral operators with locally supported kernels. Specifically, inspired by stencil methods, we prove that we obtain differential operators under an appropriate scaling of the kernel values of CNNs. To obtain local integral operators, we utilize suitable basis representations for the kernels based on discrete-continuous convolutions. Both these approaches preserve the properties of operator learning and, hence, the ability to predict at any resolution. Adding our layers to FNOs significantly improves their performance, reducing the relative L2-error by 34-72% in our experiments, which include a turbulent 2D Navier-Stokes and the spherical shallow water equations.

Updated: 2024-06-08 22:16:13

标题: 神经算子与局部化积分和微分核

摘要: 神经算子学习函数空间之间的映射，这对于学习PDE的解算子和其他科学建模应用非常实用。其中，傅立叶神经算子（FNO）是一个流行的架构，它在傅立叶空间中执行全局卷积。然而，这种全局操作往往容易过度平滑，可能无法捕捉局部细节。相比之下，卷积神经网络（CNN）可以捕捉局部特征，但受限于在单个分辨率上的训练和推断。在这项工作中，我们提出了一种原则性的算子学习方法，可以在两个框架下捕捉局部特征，通过学习具有局部支持核的微分算子和积分算子。具体来说，受到模板方法的启发，我们证明我们可以在CNN的核值适当缩放的情况下获得微分算子。为了获得局部积分算子，我们利用基于离散连续卷积的核的适当基表示。这两种方法都保留了算子学习的属性，因此能够在任何分辨率下进行预测。将我们的层添加到FNO中显著提高了它们的性能，在我们的实验中，包括湍流2D Navier-Stokes和球形浅水方程，将相对L2误差减少了34-72%。

更新时间: 2024-06-08 22:16:13

领域: cs.LG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2402.16845v2

Creativity Has Left the Chat: The Price of Debiasing Language Models

Large Language Models (LLMs) have revolutionized natural language processing but can exhibit biases and may generate toxic content. While alignment techniques like Reinforcement Learning from Human Feedback (RLHF) reduce these issues, their impact on creativity, defined as syntactic and semantic diversity, remains unexplored. We investigate the unintended consequences of RLHF on the creativity of LLMs through three experiments focusing on the Llama-2 series. Our findings reveal that aligned models exhibit lower entropy in token predictions, form distinct clusters in the embedding space, and gravitate towards "attractor states", indicating limited output diversity. Our findings have significant implications for marketers who rely on LLMs for creative tasks such as copywriting, ad creation, and customer persona generation. The trade-off between consistency and creativity in aligned models should be carefully considered when selecting the appropriate model for a given application. We also discuss the importance of prompt engineering in harnessing the creative potential of base models.

Updated: 2024-06-08 22:14:51

标题: 创造力已经消失：去偏见语言模型的代价

摘要: 大语言模型（LLMs）已经彻底改变了自然语言处理，但可能存在偏见并产生有害内容。虽然像强化学习从人类反馈中学习（RLHF）这样的调整技术可以减少这些问题，但它们对创造力的影响（定义为句法和语义多样性）尚未被探索。我们通过三个实验，重点关注Llama-2系列，研究RLHF对LLMs创造力的意外后果。我们的研究发现，对齐模型在标记预测中熵值较低，在嵌入空间中形成明显的聚类，并倾向于“吸引态”，表明输出多样性受限。我们的研究结果对于依赖LLMs进行创意任务，如文案撰写、广告创作和客户角色生成的营销人员具有重要意义。在选择适用于特定应用程序的适当模型时，应谨慎考虑对齐模型中一致性和创造力之间的权衡。我们还讨论了在利用基础模型的创造潜力时提示工程的重要性。

更新时间: 2024-06-08 22:14:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05587v1

Quantum Machine Learning on Near-Term Quantum Devices: Current State of Supervised and Unsupervised Techniques for Real-World Applications

The past decade has witnessed significant advancements in quantum hardware, encompassing improvements in speed, qubit quantity, and quantum volume-a metric defining the maximum size of a quantum circuit effectively implementable on near-term quantum devices. This progress has led to a surge in Quantum Machine Learning (QML) applications on real hardware, aiming to achieve quantum advantage over classical approaches. This survey focuses on selected supervised and unsupervised learning applications executed on quantum hardware, specifically tailored for real-world scenarios. The exploration includes a thorough analysis of current QML implementation limitations on quantum hardware, covering techniques like encoding, ansatz structure, error mitigation, and gradient methods to address these challenges. Furthermore, the survey evaluates the performance of QML implementations in comparison to classical counterparts. In conclusion, we discuss existing bottlenecks related to applying QML on real quantum devices and propose potential solutions to overcome these challenges in the future.

Updated: 2024-06-08 21:30:41

标题: 近期量子设备上的量子机器学习：现实世界应用中监督和无监督技术的现状

摘要: 过去十年见证了量子硬件方面的重大进展，包括速度、量子比特数量和量子体积的提升——量子体积是指在近期量子设备上可有效实现的量子电路的最大尺寸。这一进展导致了量子机器学习（QML）在真实硬件上的应用激增，旨在实现量子优势，超越经典方法。本调查重点关注在量子硬件上执行的选择性监督学习和非监督学习应用，专门针对现实场景进行了定制。探索包括对当前QML实现在量子硬件上的限制进行彻底分析，涵盖编码、ansatz结构、误差缓解和梯度方法等技术，以解决这些挑战。此外，调查评估了QML实现与经典对应物的性能。最后，我们讨论了在真实量子设备上应用QML所面临的现有瓶颈，并提出了未来克服这些挑战的潜在解决方案。

更新时间: 2024-06-08 21:30:41

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2307.00908v3

CLASSP: a Biologically-Inspired Approach to Continual Learning through Adjustment Suppression and Sparsity Promotion

This paper introduces a new biologically-inspired training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP). CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation (LTP). The first principle is a decay rate over the weight adjustment, which is implemented as a generalization of the AdaGrad optimization algorithm. This means that weights that have received many updates should have lower learning rates as they likely encode important information about previously seen data. However, this principle results in a diffuse distribution of updates throughout the model, as it promotes updates for weights that haven't been previously updated, while a sparse update distribution is preferred to leave weights unassigned for future tasks. Therefore, the second principle introduces a threshold on the loss gradient. This promotes sparse learning by updating a weight only if the loss gradient with respect to that weight is above a certain threshold, i.e. only updating weights with a significant impact on the current loss. Both principles reflect phenomena observed in LTP, where a threshold effect and a gradual saturation of potentiation have been observed. CLASSP is implemented in a Python/PyTorch class, making it applicable to any model. When compared with Elastic Weight Consolidation (EWC) using Computer Vision and sentiment analysis datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.

Updated: 2024-06-08 21:02:15

标题: CLASSP：一种受生物启发的通过调整抑制和稀疏促进的持续学习方法

摘要: 这篇论文介绍了一种名为连续学习通过调整抑制和稀疏促进（CLASSP）的新的受生物启发的训练方法。CLASSP基于神经科学中观察到的两个主要原则，特别是在突触传递和长时程增强（LTP）的背景下。第一个原则是针对权重调整的衰减率，它被实现为AdaGrad优化算法的一般化。这意味着已经接收到许多更新的权重应该具有较低的学习率，因为它们很可能编码了先前看到的数据的重要信息。然而，这一原则导致更新在模型中分布扩散，因为它促进了那些以前未更新的权重的更新，而稀疏的更新分布更倾向于保留未分配给未来任务的权重。因此，第二个原则引入了损失梯度的阈值。这通过仅在损失相对于该权重的梯度高于一定阈值时更新权重，即仅更新对当前损失有重大影响的权重，促进稀疏学习。这两个原则反映了在LTP中观察到的现象，即已观察到阈值效应和潜在增强的逐渐饱和。CLASSP在Python/PyTorch类中实现，使其适用于任何模型。在使用计算机视觉和情感分析数据集进行与弹性权重一致性（EWC）相比较时，CLASSP在准确性和内存占用方面表现出优越性能。

更新时间: 2024-06-08 21:02:15

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.09637v2

Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction

Recent developments in pretrained large language models (LLMs) applied to robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constraints. We prompt the LLM to output code for a function with open parameters, which, together with environmental constraints, can be viewed as a Continuous Constraint Satisfaction Problem (CCSP). This CCSP can be solved through sampling or optimization to find a skill sequence and continuous parameter settings that achieve the goal while avoiding constraint violations. Additionally, we consider cases where the LLM proposes unsatisfiable CCSPs, such as those that are kinematically infeasible, dynamically unstable, or lead to collisions, and re-prompt the LLM to form a new CCSP accordingly. Experiments across three different simulated 3D domains demonstrate that our proposed strategy, PRoC3S, is capable of solving a wide range of complex manipulation tasks with realistic constraints on continuous parameters much more efficiently and effectively than existing baselines.

Updated: 2024-06-08 20:56:14

标题: 相信PRoC3S：使用LLMs和约束满足解决长期视野的机器人问题

摘要: 最近对预训练的大型语言模型（LLMs）在机器人学中的应用取得了新进展，展示了它们在简单机器人任务中对一组离散技能进行序列化以实现开放式目标的能力。在本文中，我们研究了LLM规划连续参数化技能集合的主题，其执行必须避免一组运动学、几何和物理约束的违规。我们促使LLM输出具有开放参数的函数代码，这与环境约束一起可以被视为一个连续约束满足问题（CCSP）。这个CCSP可以通过采样或优化来解决，以找到实现目标并避免约束违规的技能序列和连续参数设置。此外，我们考虑LLM提出不可满足的CCSP的情况，例如运动学上无法实现、动力学不稳定或导致碰撞，并重新促使LLM相应地形成一个新的CCSP。在三个不同的模拟3D领域中的实验表明，我们提出的策略PRoC3S能够比现有基线更有效地解决具有连续参数实际约束的各种复杂操作任务。

更新时间: 2024-06-08 20:56:14

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.05572v1

MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs

Generative Large Language Models (LLMs) are widely utilized for their excellence in various tasks. However, their tendency to produce inaccurate or misleading outputs poses a potential risk, particularly in high-stakes environments. Therefore, estimating the correctness of generative LLM outputs is an important task for enhanced reliability. Uncertainty Estimation (UE) in generative LLMs is an evolving domain, where SOTA probability-based methods commonly employ length-normalized scoring. In this work, we propose Meaning-Aware Response Scoring (MARS) as an alternative to length-normalized scoring for UE methods. MARS is a novel scoring function that considers the semantic contribution of each token in the generated sequence in the context of the question. We demonstrate that integrating MARS into UE methods results in a universal and significant improvement in UE performance. We conduct experiments using three distinct closed-book question-answering datasets across five popular pre-trained LLMs. Lastly, we validate the efficacy of MARS on a Medical QA dataset. Code can be found https://github.com/Ybakman/LLM_Uncertainity.

Updated: 2024-06-08 20:40:55

标题: MARS：在生成式LLMs中进行不确定性估计的含义感知响应评分

摘要: 生成式大型语言模型（LLMs）因其在各种任务中的卓越表现而被广泛应用。然而，它们倾向于产生不准确或误导性输出的特点可能构成潜在风险，尤其是在高风险环境中。因此，评估生成式LLM输出的正确性是提高可靠性的重要任务。不确定性估计（UE）在生成式LLMs中是一个不断发展的领域，其中SOTA基于概率的方法通常采用长度标准化评分。在这项工作中，我们提出了“Meaning-Aware Response Scoring”（MARS）作为UE方法的长度标准化评分的替代方法。MARS是一种考虑在问题背景下生成序列中每个标记的语义贡献的新型评分函数。我们证明将MARS集成到UE方法中会导致UE性能的普遍和显著改善。我们使用五个流行的预训练LLMs在三个不同的闭卷问答数据集上进行实验。最后，我们验证了MARS在医学问答数据集上的有效性。代码可以在https://github.com/Ybakman/LLM_Uncertainity找到。

更新时间: 2024-06-08 20:40:55

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.11756v3

Randomized Geometric Algebra Methods for Convex Neural Networks

We introduce randomized algorithms to Clifford's Geometric Algebra, generalizing randomized linear algebra to hypercomplex vector spaces. This novel approach has many implications in machine learning, including training neural networks to global optimality via convex optimization. Additionally, we consider fine-tuning large language model (LLM) embeddings as a key application area, exploring the intersection of geometric algebra and modern AI techniques. In particular, we conduct a comparative analysis of the robustness of transfer learning via embeddings, such as OpenAI GPT models and BERT, using traditional methods versus our novel approach based on convex optimization. We test our convex optimization transfer learning method across a variety of case studies, employing different embeddings (GPT-4 and BERT embeddings) and different text classification datasets (IMDb, Amazon Polarity Dataset, and GLUE) with a range of hyperparameter settings. Our results demonstrate that convex optimization and geometric algebra not only enhances the performance of LLMs but also offers a more stable and reliable method of transfer learning via embeddings.

Updated: 2024-06-08 20:35:12

标题: 凸神经网络的随机几何代数方法

摘要: 我们将随机算法引入到克利福德几何代数中，将随机线性代数推广到超复数向量空间。这种新颖的方法在机器学习中有许多含义，包括通过凸优化将神经网络训练到全局最优。此外，我们将微调大型语言模型（LLM）嵌入作为一个关键应用领域，探索几何代数和现代人工智能技术的交叉点。具体来说，我们通过对传统方法和基于凸优化的新方法进行比较分析，考察通过嵌入进行迁移学习的鲁棒性，如OpenAI GPT模型和BERT。我们通过一系列案例研究测试我们的凸优化迁移学习方法，使用不同的嵌入（GPT-4和BERT嵌入）和不同的文本分类数据集（IMDb，亚马逊极性数据集和GLUE）以及一系列超参数设置。我们的结果表明，凸优化和几何代数不仅提高了LLMs的性能，还提供了一种更稳定可靠的通过嵌入进行迁移学习的方法。

更新时间: 2024-06-08 20:35:12

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2406.02806v2

SAMM: Sharded Automated Market Makers

\emph{Automated Market Makers} (\emph{AMMs}) are a cornerstone of decentralized finance (DeFi) blockchain-based platforms. They are smart contracts, enabling the direct exchange of virtual tokens by maintaining \emph{liquidity pools}. Traders exchange tokens with the contract, paying a fee; liquidity comes from \emph{liquidity providers}, paid by those fees. But despite growing demand, the performance of AMMs is limited. State-of-the-art blockchain platforms allow for parallel execution of transactions. However, we show that AMMs do not enjoy these gains, since their operations are not commutative so transactions using them must be serialized. We present \emph{SAMM}, an AMM comprising multiple independent \emph{shards}. All shards are smart contracts operating in the same chain, but they allow for parallel execution as each is independent. The challenge is that trading in a standard AMM is cheaper if its liquidity pool is larger. Therefore, we show that simply using multiple smaller AMMs results in traders splitting each trade among all AMMs, which worsens performance. SAMM addresses this issue with a novel design of the trading fees. Traders are incentivized to use only a single smallest shard. We show that all Subgame-Perfect Nash Equilibria (SPNE) fit the desired behavior: Liquidity providers balance the liquidity among all pools, so the system converges to the state where trades are evenly distributed. Evaluation in the Sui blockchain shows that SAMM's throughput is over fivefold that of traditional AMMs, approaching the system's limit. SAMM is a directly deployable open-source smart contract, allowing trading at scale for individuals and DeFi applications.

Updated: 2024-06-08 20:19:35

标题: SAMM：分片自动化做市商

摘要: \emph{自动做市商}（\emph{AMMs}）是去中心化金融（DeFi）基于区块链的平台的重要基石。它们是智能合约，通过维护\emph{流动性池}实现虚拟代币的直接交换。交易者通过与合约交换代币并支付费用来获得流动性，这些流动性来自\emph{流动性提供者}，通过这些费用获得报酬。尽管需求不断增长，但AMMs的性能受到限制。最先进的区块链平台允许并行执行交易。然而，我们发现AMMs并未享受到这些收益，因为它们的操作不是可交换的，因此使用它们的交易必须进行串行化处理。我们提出了\emph{SAMM}，一个由多个独立\emph{分片}组成的AMM。所有分片都是在同一链上操作的智能合约，但它们允许并行执行，因为每个分片都是独立的。挑战在于，在标准AMM中，如果其流动性池更大，则交易更便宜。因此，我们发现简单地使用多个较小的AMM导致交易者将每笔交易分散到所有AMM中，从而降低性能。SAMM通过一种新颖的交易费设计解决了这个问题。交易者被激励只使用一个最小的分片。我们展示所有次博弈完美纳什均衡（SPNE）符合期望的行为：流动性提供者在所有池之间平衡流动性，因此系统收敛到交易均匀分布的状态。在Sui区块链上的评估显示，SAMM的吞吐量是传统AMMs的五倍以上，接近系统极限。SAMM是一个可直接部署的开源智能合约，为个人和DeFi应用提供规模化交易的可能性。

更新时间: 2024-06-08 20:19:35

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2406.05568v1

FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning

Recent advancements in federated learning (FL) seek to increase client-level performance by fine-tuning client parameters on local data or personalizing architectures for the local task. Existing methods for such personalization either prune a global model or fine-tune a global model on a local client distribution. However, these existing methods either personalize at the expense of retaining important global knowledge, or predetermine network layers for fine-tuning, resulting in suboptimal storage of global knowledge within client models. Enlightened by the lottery ticket hypothesis, we first introduce a hypothesis for finding optimal client subnetworks to locally fine-tune while leaving the rest of the parameters frozen. We then propose a novel FL framework, FedSelect, using this procedure that directly personalizes both client subnetwork structure and parameters, via the simultaneous discovery of optimal parameters for personalization and the rest of parameters for global aggregation during training. We show that this method achieves promising results on CIFAR-10.

Updated: 2024-06-08 20:15:48

标题: FedSelect：个性化联邦学习过程中用于微调的参数定制选择

摘要: 最近在联邦学习（FL）领域取得了进展，通过在本地数据上微调客户端参数或个性化本地任务的架构，来提高客户端级别的性能。现有的个性化方法要么剪枝全局模型，要么在本地客户端分布上对全局模型进行微调。然而，这些现有方法要么以牺牲重要的全局知识为代价进行个性化，要么预先确定网络层进行微调，导致全局知识在客户端模型中的存储不够优化。受到“中奖彩票假设”的启发，我们首先提出了一种假设，用于找到最佳的客户端子网络进行本地微调，同时保持其余参数冻结。然后，我们提出了一种新颖的FL框架FedSelect，通过这一过程直接个性化客户端子网络结构和参数，同时在训练过程中发现最佳参数以进行个性化和其余参数进行全局聚合。我们展示了这种方法在CIFAR-10上取得了令人期待的结果。

更新时间: 2024-06-08 20:15:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.13264v4

Automata Extraction from Transformers

In modern machine (ML) learning systems, Transformer-based architectures have achieved milestone success across a broad spectrum of tasks, yet understanding their operational mechanisms remains an open problem. To improve the transparency of ML systems, automata extraction methods, which interpret stateful ML models as automata typically through formal languages, have proven effective for explaining the mechanism of recurrent neural networks (RNNs). However, few works have been applied to this paradigm to Transformer models. In particular, understanding their processing of formal languages and identifying their limitations in this area remains unexplored. In this paper, we propose an automata extraction algorithm specifically designed for Transformer models. Treating the Transformer model as a black-box system, we track the model through the transformation process of their internal latent representations during their operations, and then use classical pedagogical approaches like L* algorithm to interpret them as deterministic finite-state automata (DFA). Overall, our study reveals how the Transformer model comprehends the structure of formal languages, which not only enhances the interpretability of the Transformer-based ML systems but also marks a crucial step toward a deeper understanding of how ML systems process formal languages. Code and data are available at https://github.com/Zhang-Yihao/Transfomer2DFA.

Updated: 2024-06-08 20:07:24

标题: 从变压器中提取自动机

摘要: 在现代机器学习系统中，基于Transformer的架构已经在各种任务中取得了里程碑式的成功，然而理解它们的操作机制仍然是一个开放性问题。为了提高机器学习系统的透明度，自动机提取方法被证明对解释循环神经网络（RNNs）的机制非常有效，这些方法通常通过形式语言将有状态的机器学习模型解释为自动机。然而，很少有工作将这种范式应用于Transformer模型。特别是，理解它们对形式语言的处理以及在这个领域中的限制仍未被探讨。在本文中，我们提出了一种专门为Transformer模型设计的自动机提取算法。将Transformer模型视为一个黑盒系统，我们跟踪模型在操作过程中的内部潜在表示的转换过程，然后使用经典的教学方法如L*算法将它们解释为确定性有限状态自动机（DFA）。总的来说，我们的研究揭示了Transformer模型如何理解形式语言的结构，这不仅提高了Transformer-based机器学习系统的可解释性，也标志着迈向更深入理解机器学习系统如何处理形式语言的关键一步。代码和数据可在https://github.com/Zhang-Yihao/Transfomer2DFA获取。

更新时间: 2024-06-08 20:07:24

领域: cs.LG,cs.AI,cs.CL,cs.FL

下载: http://arxiv.org/abs/2406.05564v1

ThatiAR: Subjectivity Detection in Arabic News Sentences

Detecting subjectivity in news sentences is crucial for identifying media bias, enhancing credibility, and combating misinformation by flagging opinion-based content. It provides insights into public sentiment, empowers readers to make informed decisions, and encourages critical thinking. While research has developed methods and systems for this purpose, most efforts have focused on English and other high-resourced languages. In this study, we present the first large dataset for subjectivity detection in Arabic, consisting of ~3.6K manually annotated sentences, and GPT-4o based explanation. In addition, we included instructions (both in English and Arabic) to facilitate LLM based fine-tuning. We provide an in-depth analysis of the dataset, annotation process, and extensive benchmark results, including PLMs and LLMs. Our analysis of the annotation process highlights that annotators were strongly influenced by their political, cultural, and religious backgrounds, especially at the beginning of the annotation process. The experimental results suggest that LLMs with in-context learning provide better performance. We aim to release the dataset and resources for the community.

Updated: 2024-06-08 19:24:17

标题: ThatiAR：阿拉伯新闻句子中的主观性检测

摘要: 在新闻句子中检测主观性对于识别媒体偏见、增强可信度以及通过标记基于观点的内容来打击错误信息至关重要。这为公众情绪提供了洞察，赋予读者作出明智决策的能力，并鼓励批判性思维。虽然研究已经开发了用于此目的的方法和系统，但大多数工作都集中在英语和其他高资源语言上。在本研究中，我们提出了阿拉伯语主观性检测的第一个大型数据集，包括约3.6K个手动注释的句子，以及基于GPT-4o的解释。此外，我们还包括了指导说明（英语和阿拉伯语），以促进基于LLM的微调。我们对数据集、注释过程以及包括PLMs和LLMs在内的广泛基准结果进行了深入分析。我们对注释过程的分析突显出注释者在注释过程的初期受到其政治、文化和宗教背景的强烈影响。实验结果表明，具有上下文学习的LLMs提供了更好的性能。我们的目标是为社区发布数据集和资源。

更新时间: 2024-06-08 19:24:17

领域: cs.CL,cs.AI,68T50,F.2.2; I.2.7

下载: http://arxiv.org/abs/2406.05559v1

M3H: Multimodal Multitask Machine Learning for Healthcare

Developing an integrated many-to-many framework leveraging multimodal data for multiple tasks is crucial to unifying healthcare applications ranging from diagnoses to operations. In resource-constrained hospital environments, a scalable and unified machine learning framework that improves previous forecast performances could improve hospital operations and save costs. We introduce M3H, an explainable Multimodal Multitask Machine Learning for Healthcare framework that consolidates learning from tabular, time-series, language, and vision data for supervised binary/multiclass classification, regression, and unsupervised clustering. It features a novel attention mechanism balancing self-exploitation (learning source-task), and cross-exploration (learning cross-tasks), and offers explainability through a proposed TIM score, shedding light on the dynamics of task learning interdependencies. M3H encompasses an unprecedented range of medical tasks and machine learning problem classes and consistently outperforms traditional single-task models by on average 11.6% across 40 disease diagnoses from 16 medical departments, three hospital operation forecasts, and one patient phenotyping task. The modular design of the framework ensures its generalizability in data processing, task definition, and rapid model prototyping, making it production ready for both clinical and operational healthcare settings, especially those in constrained environments.

Updated: 2024-06-08 19:11:57

标题: M3H：用于医疗保健的多模态多任务机器学习

摘要: 开发一个整合多对多框架，利用多模态数据进行多任务处理对于统一医疗应用程序至关重要，从诊断到手术。在资源受限的医院环境中，一个可扩展且统一的机器学习框架可以提高以往的预测性能，改善医院运营并节省成本。我们介绍了M3H，一个可解释的用于医疗保健的多模态多任务机器学习框架，整合了来自表格、时间序列、语言和视觉数据的学习，用于监督二元/多类别分类、回归和无监督聚类。它具有一种新颖的注意机制，平衡自我开发（学习源任务）和跨任务探索（学习跨任务），并通过提出的TIM分数提供解释性，揭示任务学习相互依赖的动态。M3H涵盖了一系列前所未有的医疗任务和机器学习问题类别，并在16个医学部门的40种疾病诊断、三个医院运营预测和一个患者表型任务中，平均比传统的单一任务模型表现提高了11.6%。框架的模块化设计确保了其在数据处理、任务定义和快速模型原型设计方面的普适性，使其适用于临床和运营医疗保健环境，特别是在受限制的环境中。

更新时间: 2024-06-08 19:11:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.18975v3

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the information embedded in the audio, which in turn restricts their generative capabilities. To circumvent these issues, we propose encoding audio as vector sequences in continuous space $\mathbb R^d$ and autoregressively generating these sequences using a decoder-only diffusion transformer (ARDiT). Our findings indicate that ARDiT excels in zero-shot text-to-speech and exhibits performance that compares to or even surpasses that of state-of-the-art models. High-bitrate continuous speech representation enables almost flawless reconstruction, allowing our model to achieve nearly perfect speech editing. Our experiments reveal that employing Integral Kullback-Leibler (IKL) divergence for distillation at each autoregressive step significantly boosts the perceived quality of the samples. Simultaneously, it condenses the iterative sampling process of the diffusion model into a single step. Furthermore, ARDiT can be trained to predict several continuous vectors in one step, significantly reducing latency during sampling. Impressively, one of our models can generate $170$ ms of $24$ kHz speech per evaluation step with minimal degradation in performance. Audio samples are available at http://ardit-tts.github.io/ .

Updated: 2024-06-08 18:57:13

标题: 自回归扩散变压器用于文本转语音合成

摘要: 音频语言模型最近已经成为各种音频生成任务的一种有前途的方法，依赖于音频标记器将波形编码为离散符号序列。音频标记化通常需要在编码比特率和重建准确性之间进行必要的折衷。当处理低比特率音频编码时，语言模型被限制只能处理音频中嵌入的部分信息，这进而限制了它们的生成能力。为了规避这些问题，我们提出将音频编码为连续空间 $\mathbb R^d$ 中的向量序列，并使用仅解码扩散变压器 (ARDiT) 自回归生成这些序列。我们的研究结果表明，ARDiT 在零样本文本到语音方面表现出色，并展示出与甚至超越最先进的模型的性能。高比特率的连续语音表示几乎完美重建，使我们的模型能够实现几乎完美的语音编辑。我们的实验证明，在每个自回归步骤中使用积分 Kullback-Leibler (IKL) 散度进行蒸馏显着提高了样本的感知质量。同时，它将扩散模型的迭代采样过程压缩为单一步骤。此外，ARDiT 可以被训练以在一个步骤中预测多个连续向量，显著减少采样过程中的延迟。令人印象深刻的是，我们的一个模型可以在每个评估步骤中生成 $170$ 毫秒的 $24$ kHz 语音，并且性能几乎没有降低。音频样本可在 http://ardit-tts.github.io/ 上找到。

更新时间: 2024-06-08 18:57:13

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.05551v1

Physics-Enhanced Machine Learning: a position paper for dynamical systems investigations

This position paper takes a broad look at Physics-Enhanced Machine Learning (PEML) -- also known as Scientific Machine Learning -- with particular focus to those PEML strategies developed to tackle dynamical systems' challenges. The need to go beyond Machine Learning (ML) strategies is driven by: (i) limited volume of informative data, (ii) avoiding accurate-but-wrong predictions; (iii) dealing with uncertainties; (iv) providing Explainable and Interpretable inferences. A general definition of PEML is provided by considering four physics and domain knowledge biases, and three broad groups of PEML approaches are discussed: physics-guided, physics-encoded and physics-informed. The advantages and challenges in developing PEML strategies for guiding high-consequence decision making in engineering applications involving complex dynamical systems, are presented.

Updated: 2024-06-08 18:49:34

标题: 物理增强机器学习：动力系统研究的立场论文

摘要: 这篇立场论文广泛审视了物理增强机器学习（PEML）——也被称为科学机器学习——特别关注那些为解决动态系统挑战而开发的PEML策略。需要超越机器学习（ML）策略的原因包括：（i）信息数据量有限，（ii）避免准确但错误的预测；（iii）处理不确定性；（iv）提供可解释和可解释的推论。通过考虑四种物理和领域知识偏见提供了PEML的一般定义，并讨论了三种PEML方法的广泛群体：物理引导、物理编码和物理通知。展示了为引导涉及复杂动态系统的工程应用中的高风险决策制定PEML策略的优势和挑战。

更新时间: 2024-06-08 18:49:34

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2405.05987v2

Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training

In this study, we explore the impact of relaxing data consistency in parallel machine learning training during a failure using various parameter server configurations. Our failure recovery strategies include traditional checkpointing, chain replication (which ensures a backup server takes over in case of failure), and a novel stateless parameter server approach. In the stateless approach, workers continue generating gradient updates even if the parameter server is down, applying these updates once the server is back online. We compare these techniques to a standard checkpointing approach, where the training job is resumed from the latest checkpoint. To assess the resilience and performance of each configuration, we intentionally killed the parameter server during training for each experiment. Our experiment results indicate that the stateless parameter server approach continues to train towards convergence and improves accuracy as much as 10\% in the face of a failure despite using stale weights and gradients. The chain replication and checkpointing techniques demonstrate convergence but suffer from setbacks in accuracy due to restarting from old checkpoints. These results suggest that allowing workers to continue generating updates during server downtime and applying these updates later can effectively improve hardware utilization. Furthermore, despite higher resource usage, the stateless parameter server method incurs similar monetary costs in terms of hardware usage compared to standard checkpointing methods due to the pricing structure of common cloud providers.

Updated: 2024-06-08 18:31:56

标题: 训练失败：数据一致性对并行机器学习训练的影响

摘要: 在这项研究中，我们探讨了在并行机器学习训练过程中放宽数据一致性对性能的影响，同时使用各种参数服务器配置进行故障恢复。我们的故障恢复策略包括传统的检查点、链式复制（确保在故障发生时备用服务器接管）以及一种新颖的无状态参数服务器方法。在无状态方法中，即使参数服务器宕机，工作节点仍然继续生成梯度更新，并在服务器恢复在线后应用这些更新。我们将这些技术与标准的检查点方法进行了比较，其中训练作业从最新的检查点恢复。为了评估每种配置的弹性和性能，我们在每个实验中故意终止了参数服务器的训练过程。我们的实验结果表明，无状态参数服务器方法在面对故障时仍然能够继续向收敛方向训练，并且在使用过时的权重和梯度时，精度提高了多达10\%。链式复制和检查点技术表现出收敛性，但由于从旧的检查点重新启动而在精度上遇到了挫折。这些结果表明，允许工作节点在服务器宕机期间继续生成更新，并稍后应用这些更新可以有效提高硬件利用率。此外，尽管资源使用率更高，由于常见云服务提供商的价格结构，与标准检查点方法相比，无状态参数服务器方法在硬件使用方面产生的货币成本相似。

更新时间: 2024-06-08 18:31:56

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2406.05546v1

Privacy-Preserving Optimal Parameter Selection for Collaborative Clustering

This study investigates the optimal selection of parameters for collaborative clustering while ensuring data privacy. We focus on key clustering algorithms within a collaborative framework, where multiple data owners combine their data. A semi-trusted server assists in recommending the most suitable clustering algorithm and its parameters. Our findings indicate that the privacy parameter ($\epsilon$) minimally impacts the server's recommendations, but an increase in $\epsilon$ raises the risk of membership inference attacks, where sensitive information might be inferred. To mitigate these risks, we implement differential privacy techniques, particularly the Randomized Response mechanism, to add noise and protect data privacy. Our approach demonstrates that high-quality clustering can be achieved while maintaining data confidentiality, as evidenced by metrics such as the Adjusted Rand Index and Silhouette Score. This study contributes to privacy-aware data sharing, optimal algorithm and parameter selection, and effective communication between data owners and the server.

Updated: 2024-06-08 18:21:12

标题: 隐私保护的协同聚类优化参数选择

摘要: 这项研究调查了在确保数据隐私的前提下协同聚类参数的最佳选择。我们关注协同框架内的关键聚类算法，其中多个数据所有者将他们的数据合并。一个半信任的服务器协助推荐最适合的聚类算法及其参数。我们的研究结果表明，隐私参数（ε）对服务器的推荐影响很小，但ε的增加会增加成员推断攻击的风险，从而可能推断出敏感信息。为了减轻这些风险，我们实施差分隐私技术，特别是随机响应机制，以添加噪音并保护数据隐私。我们的方法证明了可以在保持数据机密性的同时实现高质量的聚类，这一点可以通过调整兰德指数和轮廓分数等指标来证明。这项研究对于隐私感知的数据共享、最佳算法和参数选择以及数据所有者与服务器之间的有效沟通做出了贡献。

更新时间: 2024-06-08 18:21:12

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.05545v1

Reconfiguring Participatory Design to Resist AI Realism

The growing trend of artificial intelligence (AI) as a solution to social and technical problems reinforces AI Realism -- the belief that AI is an inevitable and natural order. In response, this paper argues that participatory design (PD), with its focus on democratic values and processes, can play a role in questioning and resisting AI Realism. I examine three concerning aspects of AI Realism: the facade of democratization that lacks true empowerment, demands for human adaptability in contrast to AI systems' inflexibility, and the obfuscation of essential human labor enabling the AI system. I propose resisting AI Realism by reconfiguring PD to continue engaging with value-centered visions, increasing its exploration of non-AI alternatives, and making the essential human labor underpinning AI systems visible. I position PD as a means to generate friction against AI Realism and open space for alternative futures centered on human needs and values.

Updated: 2024-06-08 18:19:00

标题: 重新配置参与式设计以抵制人工智能现实主义

摘要: 人工智能作为解决社会和技术问题的日益增长的趋势强化了人工智能现实主义 - 即人工智能是不可避免和自然秩序的信念。作为回应，本文认为，参与式设计（PD），以其对民主价值观和过程的关注，可以在质疑和抵制人工智能现实主义方面发挥作用。我审视了人工智能现实主义的三个令人关注的方面：缺乏真正赋权的民主化幌子，对人类适应性的要求与人工智能系统的僵化相对比，以及让人工智能系统能够运行的基本人类劳动的掩盖。我建议通过重新配置PD来抵制人工智能现实主义，以继续与以价值为中心的愿景进行互动，增加对非人工智能替代方案的探索，并使支撑人工智能系统的基本人类劳动可见。我将PD定位为一种对抗人工智能现实主义并为以人类需求和价值为中心的替代未来开辟空间的手段。

更新时间: 2024-06-08 18:19:00

领域: cs.HC,cs.AI,cs.SI

下载: http://arxiv.org/abs/2406.03245v2

VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification

Recent conditional 3D completion works have mainly relied on CLIP or BERT to encode textual information, which cannot support complex instruction. Meanwhile, large language models (LLMs) have shown great potential in multi-modal understanding and generation tasks. Inspired by the recent advancements of LLM, we present Volume Patch LLM (VP-LLM), which leverages LLMs to perform conditional 3D completion in a single-forward pass. To integrate a 3D model into the LLM tokenization configuration, the incomplete 3D object is first divided into small patches that can be encoded independently. These encoded patches are then fed into an LLM along with the text prompt, instructing the LLM to capture the relations between these patches as well as injecting semantic meanings into the 3D object. Our results demonstrate a strong ability of LLMs to interpret complex text instructions and understand 3D objects, surpassing state-of-the-art diffusion-based 3D completion models in generation quality.

Updated: 2024-06-08 18:17:09

标题: VP-LLM：通过Patchification进行文本驱动的3D体积补全

摘要: 最近的条件性3D完成工作主要依赖于CLIP或BERT来编码文本信息，这无法支持复杂的指令。同时，大型语言模型(LLMs)在多模态理解和生成任务中显示出巨大潜力。受LLM最近进展的启发，我们提出了Volume Patch LLM (VP-LLM)，利用LLMs在单次前向传递中执行条件性3D完成。为了将3D模型整合到LLM的标记配置中，不完整的3D对象首先被分成可以独立编码的小块。然后将这些编码的块与文本提示一起输入LLM，指导LLM捕捉这些块之间的关系以及将语义含义注入3D对象。我们的结果表明，LLMs具有解释复杂文本指令和理解3D对象的强大能力，超过了现有基于扩散的3D完成模型在生成质量方面的水平。

更新时间: 2024-06-08 18:17:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05543v1

A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding

The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have then attempted to adapt LLMs for protein understanding by integrating a protein sequence encoder with a pre-trained LLM. However, this adaptation raises a fundamental question: "Can LLMs, originally designed for NLP, effectively comprehend protein sequences as a form of language?" Current datasets fall short in addressing this question due to the lack of a direct correlation between protein sequences and corresponding text descriptions, limiting the ability to train and evaluate LLMs for protein understanding effectively. To bridge this gap, we introduce ProteinLMDataset, a dataset specifically designed for further self-supervised pretraining and supervised fine-tuning (SFT) of LLMs to enhance their capability for protein sequence comprehension. Specifically, ProteinLMDataset includes 17.46 billion tokens for pretraining and 893,000 instructions for SFT. Additionally, we present ProteinLMBench, the first benchmark dataset consisting of 944 manually verified multiple-choice questions for assessing the protein understanding capabilities of LLMs. ProteinLMBench incorporates protein-related details and sequences in multiple languages, establishing a new standard for evaluating LLMs' abilities in protein comprehension. The large language model InternLM2-7B, pretrained and fine-tuned on the ProteinLMDataset, outperforms GPT-4 on ProteinLMBench, achieving the highest accuracy score. The dataset and the benchmark are available at https://huggingface.co/datasets/tsynbio/ProteinLMBench.

Updated: 2024-06-08 18:11:30

标题: 一个针对大型语言模型的微调数据集和蛋白质理解基准。

摘要: 蛋白质序列和自然语言在它们的顺序结构之间的相似性启发了将大型语言模型(LLMs)应用于蛋白质理解。尽管在自然语言处理中LLMs取得了成功，但它们在理解蛋白质序列方面的有效性仍然是一个悬而未决的问题，主要是由于缺乏将蛋白质序列与描述性文本相联系的数据集。研究人员试图通过将蛋白质序列编码器与预训练的LLM集成来适应蛋白质理解，然而，这种适应性引发了一个基本问题：“LLMs最初设计用于自然语言处理，能否有效理解蛋白质序列作为一种语言形式？”由于当前数据集缺乏蛋白质序列和相应文本描述之间的直接相关性，无法有效地训练和评估LLMs以增强其对蛋白质序列理解的能力。为了弥补这一差距，我们引入了ProteinLMDataset，这是一个专门设计用于进一步自监督预训练和监督微调(LLMs)以增强其蛋白质序列理解能力的数据集。具体而言，ProteinLMDataset包括1746亿个令牌用于预训练和89.3万个指令用于SFT。此外，我们还提出了ProteinLMBench，这是第一个基准数据集，包括944个经过手动验证的多项选择问题，用于评估LLMs的蛋白质理解能力。ProteinLMBench包含多种语言中的蛋白质相关细节和序列，为评估LLMs在蛋白质理解方面的能力设立了一个新的标准。在ProteinLMDataset上预训练和微调的大型语言模型InternLM2-7B在ProteinLMBench上优于GPT-4，获得最高的准确性分数。数据集和基准数据集可在https://huggingface.co/datasets/tsynbio/ProteinLMBench找到。

更新时间: 2024-06-08 18:11:30

领域: q-bio.QM,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05540v1

Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing

An emergency responder management (ERM) system dispatches responders, such as ambulances, when it receives requests for medical aid. ERM systems can also proactively reposition responders between predesignated waiting locations to cover any gaps that arise due to the prior dispatch of responders or significant changes in the distribution of anticipated requests. Optimal repositioning is computationally challenging due to the exponential number of ways to allocate responders between locations and the uncertainty in future requests. The state-of-the-art approach in proactive repositioning is a hierarchical approach based on spatial decomposition and online Monte Carlo tree search, which may require minutes of computation for each decision in a domain where seconds can save lives. We address the issue of long decision times by introducing a novel reinforcement learning (RL) approach, based on the same hierarchical decomposition, but replacing online search with learning. To address the computational challenges posed by large, variable-dimensional, and discrete state and action spaces, we propose: (1) actor-critic based agents that incorporate transformers to handle variable-dimensional states and actions, (2) projections to fixed-dimensional observations to handle complex states, and (3) combinatorial techniques to map continuous actions to discrete allocations. We evaluate our approach using real-world data from two U.S. cities, Nashville, TN and Seattle, WA. Our experiments show that compared to the state of the art, our approach reduces computation time per decision by three orders of magnitude, while also slightly reducing average ambulance response time by 5 seconds.

Updated: 2024-06-08 18:08:09

标题: 多智能体强化学习与分层协调在应急响应者驻守中的应用

摘要: 一种应急响应者管理(ERM)系统在接收到医疗援助请求时，会调度救护车等响应者。ERM系统还可以主动重新调整响应者在预定等待位置之间的位置，以填补由于之前的响应者调度或预期请求分布的显著变化而产生的差距。由于在将响应者分配到位置的方式数量呈指数增长并且未来请求存在不确定性，最优重新调整是具有计算挑战性的。目前主流的预测性重新调整方法是基于空间分解和在线蒙特卡洛树搜索的分层方法，每个决策可能需要几分钟的计算时间，而在这个领域里每秒都可能拯救生命。我们通过引入一种新颖的强化学习(RL)方法来解决长时间决策的问题，该方法基于相同的分层分解，但用学习替代在线搜索。为了解决由于大型、可变维度和离散状态和动作空间而带来的计算挑战，我们提出了：(1)基于演员-评论家的代理，结合变维度状态和动作的转换器，(2)将复杂状态映射到固定维度观察的投影，以及(3)将连续动作映射到离散分配的组合技术。我们使用来自美国两个城市纳什维尔和西雅图的真实数据来评估我们的方法。我们的实验表明，与现有技术相比，我们的方法每个决策的计算时间降低了三个数量级，同时也稍微减少了平均救护车响应时间5秒。

更新时间: 2024-06-08 18:08:09

领域: cs.LG

下载: http://arxiv.org/abs/2405.13205v2

Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios. Further theoretical and experimental verification demonstrates that easy samples with low loss are more likely to be located in HSDR. Perturbations towards such easy samples in the target class can avoid density estimation for HSDR location. Based on the above facts, we verified that adding perturbations to easy samples in the target class improves targeted adversarial transferability of existing attack methods. A generative targeted attack strategy named Easy Sample Matching Attack (ESMA) is proposed, which has a higher success rate for targeted attacks and outperforms the SOTA generative method. Moreover, ESMA requires only 5% of the storage space and much less computation time comparing to the current SOTA, as ESMA attacks all classes with only one model instead of seperate models for each class. Our code is available at https://github.com/gjq100/ESMA.

Updated: 2024-06-08 17:33:23

标题: 扰动易样本有助于提高目标对抗传递性

摘要: 对抗性扰动的可转移性为黑盒攻击提供了一种有效的捷径。针对性扰动更具实用性，但在模型之间的转移更为困难。在本文中，我们通过实验证明和理论论证，训练在相同数据集上的神经网络在每个类别的高样本密度区域（HSDR）中具有更一致的性能，而不是低样本密度区域。因此，在目标设置中，向目标类别的HSDR添加扰动更有效地提高了转移性。然而，在高维场景中密度估计是具有挑战性的。进一步的理论和实验验证表明，易样本（低损失）更有可能位于HSDR中。对目标类别中这些易样本的扰动可以避免HSDR位置的密度估计。基于以上事实，我们验证了向目标类别的易样本添加扰动可以提高现有攻击方法的目标对抗性转移性。提出了一种生成式目标攻击策略，名为易样本匹配攻击（ESMA），对于目标攻击具有更高的成功率，并且优于当前最先进的生成方法。此外，与当前最先进的方法相比，ESMA仅需要5%的存储空间和较少的计算时间，因为ESMA使用一个模型攻击所有类别，而不是为每个类别单独使用模型。我们的代码可在https://github.com/gjq100/ESMA找到。

更新时间: 2024-06-08 17:33:23

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.05535v1

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

Direct Preference Optimization (DPO) improves the alignment of large language models (LLMs) with human values by training directly on human preference datasets, eliminating the need for reward models. However, due to the presence of cross-domain human preferences, direct continual training can lead to catastrophic forgetting, limiting DPO's performance and efficiency. Inspired by intraspecific competition driving species evolution, we propose a Online Fast-Slow chasing DPO (OFS-DPO) for preference alignment, simulating competition through fast and slow chasing among models to facilitate rapid adaptation. Specifically, we first derive the regret upper bound for online learning, validating our motivation with a min-max optimization pattern. Based on this, we introduce two identical modules using Low-rank Adaptive (LoRA) with different optimization speeds to simulate intraspecific competition, and propose a new regularization term to guide their learning. To further mitigate catastrophic forgetting in cross-domain scenarios, we extend the OFS-DPO with LoRA modules combination strategy, resulting in the Cross domain Online Fast-Slow chasing DPO (COFS-DPO). This method leverages linear combinations of fast modules parameters from different task domains, fully utilizing historical information to achive continual value alignment. Experimental results show that OFS-DPO outperforms DPO in in-domain alignment, while COFS-DPO excels in cross-domain continual learning scenarios.

Updated: 2024-06-08 17:30:54

标题: 在线DPO：在线直接偏好优化与快慢追踪

摘要: Direct Preference Optimization (DPO)通过直接在人类偏好数据集上进行训练，消除了对奖励模型的需求，从而改善了大型语言模型（LLMs）与人类价值观的一致性。然而，由于跨领域人类偏好的存在，直接持续训练可能导致灾难性遗忘，从而限制了DPO的性能和效率。受种内竞争驱动物种进化的启发，我们提出了一种用于偏好对齐的在线快慢追逐DPO（OFS-DPO），通过模拟模型之间的快速和慢速追逐来促进快速适应。具体来说，我们首先推导了在线学习的遗憾上界，通过最小-最大优化模式验证了我们的动机。基于此，我们引入了两个使用低秩自适应（LoRA）的相同模块，采用不同的优化速度来模拟种内竞争，并提出了一个新的正则化项来引导它们的学习。为了进一步减轻跨领域场景中的灾难性遗忘，我们将OFS-DPO扩展为LoRA模块组合策略，形成了跨领域在线快慢追逐DPO（COFS-DPO）。该方法利用不同任务领域快速模块参数的线性组合，充分利用历史信息实现持续的价值对齐。实验结果表明，OFS-DPO在领域内对齐方面优于DPO，而COFS-DPO在跨领域持续学习场景中表现突出。

更新时间: 2024-06-08 17:30:54

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05534v1

PAPR in Motion: Seamless Point-level 3D Scene Interpolation

We propose the problem of point-level 3D scene interpolation, which aims to simultaneously reconstruct a 3D scene in two states from multiple views, synthesize smooth point-level interpolations between them, and render the scene from novel viewpoints, all without any supervision between the states. The primary challenge is on achieving a smooth transition between states that may involve significant and non-rigid changes. To address these challenges, we introduce "PAPR in Motion", a novel approach that builds upon the recent Proximity Attention Point Rendering (PAPR) technique, which can deform a point cloud to match a significantly different shape and render a visually coherent scene even after non-rigid deformations. Our approach is specifically designed to maintain the temporal consistency of the geometric structure by introducing various regularization techniques for PAPR. The result is a method that can effectively bridge large scene changes and produce visually coherent and temporally smooth interpolations in both geometry and appearance. Evaluation across diverse motion types demonstrates that "PAPR in Motion" outperforms the leading neural renderer for dynamic scenes. For more results and code, please visit our project website at https://niopeng.github.io/PAPR-in-Motion/ .

Updated: 2024-06-08 17:27:27

标题: PAPR在运动中：无缝的点级3D场景插值

摘要: 我们提出了点级3D场景插值的问题，旨在从多个视角同时重建两个状态下的3D场景，合成它们之间平滑的点级插值，并从新的视点渲染场景，所有这些都在两个状态之间没有任何监督的情况下进行。主要挑战在于实现在可能涉及重大和非刚性变化的状态之间的平滑过渡。为了应对这些挑战，我们引入了“运动中的PAPR”，这是一种建立在最近的接近注意点渲染（PAPR）技术之上的新方法，该技术可以使点云变形以匹配明显不同的形状，并在非刚性变形后渲染出视觉上连贯的场景。我们的方法专门设计用于通过引入各种正则化技术来保持几何结构的时间一致性。结果是一种方法，可以有效地弥合大场景变化，并在几何和外观上产生视觉上连贯和时间平滑的插值。跨不同运动类型的评估表明，“运动中的PAPR”优于领先的神经渲染器用于动态场景。更多结果和代码，请访问我们的项目网站https://niopeng.github.io/PAPR-in-Motion/。

更新时间: 2024-06-08 17:27:27

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2406.05533v1

Exploring Adversarial Robustness of Deep State Space Models

Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs remains unclear. While many enhancements in SSM components, such as integrating Attention mechanisms and expanding to data-dependent SSM parameterizations, have brought significant gains in Standard Training (ST) settings, their potential benefits in AT remain unexplored. To investigate this, we evaluate existing structural variants of SSMs with AT to assess their AR performance. We observe that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly better trade-off between robustness and generalization for SSMs in AT compared to other components. Nonetheless, the integration of Attention also leads to Robust Overfitting (RO) issues. To understand these phenomena, we empirically and theoretically analyze the output error of SSMs under AP. We find that fixed-parameterized SSMs have output error bounds strictly related to their parameters, limiting their AT benefits, while input-dependent SSMs may face the problem of error explosion. Furthermore, we show that the Attention component effectively scales the output error of SSMs during training, enabling them to benefit more from AT, but at the cost of introducing RO due to its high model complexity. Inspired by this, we propose a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of RO.

Updated: 2024-06-08 17:25:48

标题: 探索深度状态空间模型的对抗性鲁棒性

摘要: 深度状态空间模型（SSM）已在许多任务场景中证明有效，但由于现实世界部署中的对抗性扰动（APs），面临重大安全挑战。对抗性训练（AT）是增强对抗性鲁棒性（AR）的主流方法，并已在各种传统DNN架构上得到验证。然而，其在改善SSM的AR方面的有效性仍不清楚。虽然SSM组件的许多增强措施，如整合注意机制和扩展到数据相关的SSM参数化，在标准训练（ST）设置中带来了显著收益，但它们在AT中的潜在好处尚未被探索。为了调查这一点，我们评估了SSM的现有结构变体与AT结合以评估它们的AR性能。我们观察到纯SSM结构很难从AT中受益，而整合注意力则在AT中为SSM提供了更好的鲁棒性和泛化的权衡，相比其他组件。然而，注意力的整合也导致了鲁棒过拟合（RO）问题。为了理解这些现象，我们在AP下经验和理论分析SSM的输出误差。我们发现固定参数化的SSM的输出误差界严格与其参数相关，限制了其AT的好处，而输入相关的SSM可能面临误差爆炸的问题。此外，我们展示了注意力组件在训练过程中有效地缩放SSM的输出误差，使它们能够更多地受益于AT，但代价是由于其高模型复杂性而引入RO。受此启发，我们提出了一种简单而有效的自适应缩放（AdS）机制，将AT性能接近于整合了注意力的SSM，而不引入RO问题。

更新时间: 2024-06-08 17:25:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.05532v1

Enhancing Adversarial Transferability via Information Bottleneck Constraints

From the perspective of information bottleneck (IB) theory, we propose a novel framework for performing black-box transferable adversarial attacks named IBTA, which leverages advancements in invariant features. Intuitively, diminishing the reliance of adversarial perturbations on the original data, under equivalent attack performance constraints, encourages a greater reliance on invariant features that contributes most to classification, thereby enhancing the transferability of adversarial attacks. Building on this motivation, we redefine the optimization of transferable attacks using a novel theoretical framework that centers around IB. Specifically, to overcome the challenge of unoptimizable mutual information, we propose a simple and efficient mutual information lower bound (MILB) for approximating computation. Moreover, to quantitatively evaluate mutual information, we utilize the Mutual Information Neural Estimator (MINE) to perform a thorough analysis. Our experiments on the ImageNet dataset well demonstrate the efficiency and scalability of IBTA and derived MILB. Our code is available at https://github.com/Biqing-Qi/Enhancing-Adversarial-Transferability-via-Information-Bottleneck-Constraints.

Updated: 2024-06-08 17:25:31

标题: 通过信息瓶颈约束增强对抗性迁移

摘要: 根据信息瓶颈（IB）理论的视角，我们提出了一个新颖的框架，用于执行黑盒可转移对抗攻击，其名称为IBTA，利用不变特征的先进性。直观地说，在等效攻击性能约束下减少对原始数据的依赖，鼓励对对分类最有贡献的不变特征的更大依赖，从而提高对抗攻击的可转移性。基于这一动机，我们重新定义了使用围绕IB的新理论框架进行可转移攻击的优化。具体来说，为了克服不可优化的互信息的挑战，我们提出了一个简单有效的互信息下界（MILB）来近似计算。此外，为了定量评估互信息，我们利用互信息神经估计器（MINE）进行了彻底分析。我们在ImageNet数据集上的实验充分展示了IBTA和衍生的MILB的效率和可扩展性。我们的代码可以在https://github.com/Biqing-Qi/Enhancing-Adversarial-Transferability-via-Information-Bottleneck-Constraints 上找到。

更新时间: 2024-06-08 17:25:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.05531v1

Adaptively Perturbed Mirror Descent for Learning in Games

This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or {\it slingshot}, strategy. In response, we propose {\it Adaptively Perturbed MD} (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence.

Updated: 2024-06-08 17:23:10

标题: 《自适应扰动镜像下降算法在博弈学习中的应用》

摘要: 本文提出了一种针对镜像下降（MD）算法的收益扰动技术，适用于游戏中的情况，其中收益函数的梯度在策略空间中是单调的，可能包含附加噪声。乐观家族的学习算法，例如乐观MD，在缺乏噪声的情况下成功实现了“最后一次迭代”收敛，导致动态达到纳什均衡。最近重新出现的趋势强调了扰动方法的潜力，其中收益函数根据与一个锚定或“弹弓”策略的距离而扰动。作为回应，我们提出了“自适应扰动MD”（APMD），通过在预定间隔内重复更新弹弓策略来调整扰动的幅度。这一创新使我们能够找到底层游戏的纳什均衡，并保证具有确定的收敛速率。实证演示证实了我们的算法展现出明显加速的收敛。

更新时间: 2024-06-08 17:23:10

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2305.16610v4

Safely Learning Dynamical Systems

A fundamental challenge in learning an unknown dynamical system is to reduce model uncertainty by making measurements while maintaining safety. We formulate a mathematical definition of what it means to safely learn a dynamical system by sequentially deciding where to initialize trajectories. The state of the system must stay within a safety region for a horizon of $T$ time steps under the action of all dynamical systems that (i) belong to a given initial uncertainty set, and (ii) are consistent with information gathered so far. First, we consider safely learning a linear dynamical system involving $n$ states. For the case $T=1$, we present an LP-based algorithm that either safely recovers the true dynamics from at most $n$ trajectories, or certifies that safe learning is impossible. For $T=2$, we give an SDP representation of the set of safe initial conditions and show that $\lceil n/2 \rceil$ trajectories generically suffice for safe learning. For $T = \infty$, we provide SDP-representable inner approximations of the set of safe initial conditions and show that one trajectory generically suffices for safe learning. We extend a number of our results to the cases where the initial uncertainty set contains sparse, low-rank, or permutation matrices, or when the system has a control input. Second, we consider safely learning a general class of nonlinear dynamical systems. For the case $T=1$, we give an SOCP-based representation of the set of safe initial conditions. For $T=\infty$, we provide semidefinite representable inner approximations to the set of safe initial conditions. We show how one can safely collect trajectories and fit a polynomial model of the nonlinear dynamics that is consistent with the initial uncertainty set and best agrees with the observations. We also present some extensions to cases where the measurements are noisy or the dynamical system involves disturbances.

Updated: 2024-06-08 17:22:24

标题: 安全学习动态系统

摘要: 学习未知动态系统的一个基本挑战是通过进行测量来减少模型不确定性，同时保持安全。我们制定了一个数学定义，说明了安全地学习动态系统意味着在选择初始轨迹的过程中连续决定。系统的状态必须在$T$个时间步长内保持在一个安全区域内，在所有属于给定初始不确定性集合且与迄今收集的信息一致的所有动态系统的作用下。首先，我们考虑安全地学习涉及$n$个状态的线性动态系统。对于$T=1$的情况，我们提出了一种基于LP的算法，可以在最多$n$条轨迹中安全地恢复真实动态，或证明安全学习是不可能的。对于$T=2$，我们给出了一个安全初始条件集合的SDP表示，并展示了$\lceil n/2 \rceil$条轨迹通常足以进行安全学习。对于$T=\infty$，我们提供了安全初始条件集合的SDP可表示的内估计，并展示了一条轨迹通常足以进行安全学习。我们将一些结果扩展到初始不确定性集合包含稀疏、低秩或排列矩阵，或系统具有控制输入的情况。其次，我们考虑安全地学习一般类别的非线性动态系统。对于$T=1$的情况，我们提供了一个基于SOCP的安全初始条件集合的表示。对于$T=\infty$，我们提供了安全初始条件集合的半定可表示的内估计。我们展示了如何安全地收集轨迹并拟合一个与初始不确定性集合一致且最符合观察结果的非线性动态的多项式模型。我们还对测量结果存在噪声或动态系统涉及干扰的情况进行了一些扩展。

更新时间: 2024-06-08 17:22:24

领域: math.OC,cs.LG,cs.SY,eess.SY,math.DS

下载: http://arxiv.org/abs/2305.12284v2

CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources

Data is the lifeblood of the modern world, forming a fundamental part of AI, decision-making, and research advances. With increase in interest in data, governments have taken important steps towards a regulated data world, drastically impacting data sharing and data usability and resulting in massive amounts of data confined within the walls of organizations. While synthetic data generation (SDG) is an appealing solution to break down these walls and enable data sharing, the main drawback of existing solutions is the assumption of a trusted aggregator for generative model training. Given that many data holders may not want to, or be legally allowed to, entrust a central entity with their raw data, we propose a framework for the collaborative and private generation of synthetic tabular data from distributed data holders. Our solution is general, applicable to any marginal-based SDG, and provides input privacy by replacing the trusted aggregator with secure multi-party computation (MPC) protocols and output privacy via differential privacy (DP). We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate SDG algorithms MWEM+PGM and AIM.

Updated: 2024-06-08 17:07:35

标题: CaPS：来自分布式源的协作和私密合成数据生成

摘要: 数据是现代世界的生命线，是人工智能、决策制定和研究进步的基本组成部分。随着对数据的兴趣增加，政府已经采取了重要步骤，朝着一个受监管的数据世界迈进，极大地影响了数据共享和数据可用性，导致大量数据被限制在组织的墙内。尽管合成数据生成（SDG）是打破这些壁垒并实现数据共享的一种吸引人的解决方案，但现有解决方案的主要缺点是基于对生成模型训练的信任聚合器的假设。鉴于许多数据持有者可能不愿意或法律上不允许将原始数据委托给中央实体，我们提出了一种从分布式数据持有者生成合成表格数据的协作和私密框架。我们的解决方案是通用的，适用于任何基于边际的SDG，并通过使用安全多方计算（MPC）协议替换可信聚合器并通过差分隐私（DP）提供输入隐私。我们展示了我们的方法在最先进的选择-测量-生成SDG算法MWEM+PGM和AIM中的适用性和可扩展性。

更新时间: 2024-06-08 17:07:35

领域: cs.CR

下载: http://arxiv.org/abs/2402.08614v2

Meta-learning in healthcare: A survey

As a subset of machine learning, meta-learning, or learning to learn, aims at improving the model's capabilities by employing prior knowledge and experience. A meta-learning paradigm can appropriately tackle the conventional challenges of traditional learning approaches, such as insufficient number of samples, domain shifts, and generalization. These unique characteristics position meta-learning as a suitable choice for developing influential solutions in various healthcare contexts, where the available data is often insufficient, and the data collection methodologies are different. This survey discusses meta-learning broad applications in the healthcare domain to provide insight into how and where it can address critical healthcare challenges. We first describe the theoretical foundations and pivotal methods of meta-learning. We then divide the employed meta-learning approaches in the healthcare domain into two main categories of multi/single-task learning and many/few-shot learning and survey the studies. Finally, we highlight the current challenges in meta-learning research, discuss the potential solutions, and provide future perspectives on meta-learning in healthcare.

Updated: 2024-06-08 17:02:29

标题: 在医疗保健领域的元学习：一项调查

摘要: 作为机器学习的一个子集，元学习，或学习学习，旨在通过利用先前的知识和经验来提高模型的能力。元学习范式可以适当地解决传统学习方法的常见挑战，例如样本数量不足、领域转移和泛化问题。这些独特的特征使得元学习成为在各种医疗保健背景下开发有影响力解决方案的合适选择，其中可用数据通常是不足的，并且数据收集方法不同。本调查讨论了元学习在医疗领域的广泛应用，以揭示它如何以及在哪里能够解决重要的医疗保健挑战。我们首先描述了元学习的理论基础和关键方法。然后将医疗领域中采用的元学习方法分为多/单任务学习和多/少样本学习两大类，并对相关研究进行调查。最后，我们强调了元学习研究中的当前挑战，讨论了潜在解决方案，并提供了关于医疗保健中元学习的未来展望。

更新时间: 2024-06-08 17:02:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2308.02877v2

TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring

Text-to-SQL enables users to interact with databases using natural language, simplifying the retrieval and synthesis of information. Despite the remarkable success of large language models (LLMs) in translating natural language questions into SQL queries, widespread deployment remains limited due to two primary challenges. First, the effective use of text-to-SQL models depends on users' understanding of the model's capabilities-the scope of questions the model can correctly answer. Second, the absence of abstention mechanisms can lead to incorrect SQL generation going unnoticed, thereby undermining trust in the model's output. To enable wider deployment, it is crucial to address these challenges in model design and enhance model evaluation to build trust in the model's output. To this end, we introduce TrustSQL, a novel comprehensive benchmark designed to evaluate text-to-SQL reliability-defined as a model's ability to correctly handle any type of input question by generating correct SQL queries for feasible questions and abstaining from generating infeasible ones (e.g., due to schema incompatibility or functionalities beyond SQL). We evaluate existing methods using a novel penalty-based scoring metric with two modeling approaches: (1) pipeline-based methods combining SQL generators with infeasible question detectors and SQL error detectors for abstention; and (2) unified methods using a single model for the entire task. Our experimental results reveal that achieving high scores under severe penalties requires significant effort and provide a new perspective on developing text-to-SQL models for safer deployment.

Updated: 2024-06-08 16:56:45

标题: TrustSQL：基于惩罚评分的文本到SQL可靠性基准测试

摘要: Text-to-SQL使用户能够使用自然语言与数据库进行交互，简化信息的检索和综合。尽管大型语言模型（LLMs）在将自然语言问题翻译成SQL查询方面取得了显著成功，但由于两个主要挑战，广泛部署仍然受到限制。首先，文本到SQL模型的有效使用取决于用户对模型能力的理解-模型能够正确回答的问题范围。其次，缺乏弃权机制可能导致生成不正确的SQL而不被注意，从而削弱对模型输出的信任。为了实现更广泛的部署，关键是解决这些挑战并增强模型评估，以建立对模型输出的信任。为此，我们引入了TrustSQL，一个新颖的全面基准，旨在评估文本到SQL的可靠性-定义为模型正确处理任何类型输入问题的能力，通过为可行问题生成正确的SQL查询并避免生成不可行的问题（例如，由于模式不兼容或SQL之外的功能）。我们使用一种新颖的基于惩罚的评分指标评估现有方法，采用两种建模方法：（1）基于管道的方法将SQL生成器与不可行问题检测器和SQL错误检测器结合起来进行弃权；和（2）使用单一模型进行整个任务的统一方法。我们的实验结果显示，在严格的惩罚条件下获得高得分需要大量努力，并提供了一个新的视角，用于开发更安全的文本到SQL模型。

更新时间: 2024-06-08 16:56:45

领域: cs.AI

下载: http://arxiv.org/abs/2403.15879v3

Kuro Siwo: 33 billion $m^2$ under the water. A global multi-temporal satellite dataset for rapid flood mapping

Global floods, exacerbated by climate change, pose severe threats to human life, infrastructure, and the environment. Recent catastrophic events in Pakistan and New Zealand underscore the urgent need for precise flood mapping to guide restoration efforts, understand vulnerabilities, and prepare for future occurrences. While Synthetic Aperture Radar (SAR) remote sensing offers day-and-night, all-weather imaging capabilities, its application in deep learning for flood segmentation is limited by the lack of large annotated datasets. To address this, we introduce Kuro Siwo, a manually annotated multi-temporal dataset, spanning 43 flood events globally. Our dataset maps more than 338 billion $m^2$ of land, with 33 billion designated as either flooded areas or permanent water bodies. Kuro Siwo includes a highly processed product optimized for flood mapping based on SAR Ground Range Detected, and a primal SAR Single Look Complex product with minimal preprocessing, designed to promote research on the exploitation of both the phase and amplitude information and to offer maximum flexibility for downstream task preprocessing. To leverage advances in large scale self-supervised pretraining methods for remote sensing data, we augment Kuro Siwo with a large unlabeled set of SAR samples. Finally, we provide an extensive benchmark, namely BlackBench, offering strong baselines for a diverse set of flood events from Europe, America, Africa, Asia and Australia.

Updated: 2024-06-08 16:56:21

标题: 黑潮: 水下价值330亿美元/m^2。用于快速洪水制图的全球多时相卫星数据集

摘要: 全球洪水受气候变化加剧的影响，对人类生命、基础设施和环境造成严重威胁。最近在巴基斯坦和新西兰发生的灾难事件突显了精确洪水制图的紧迫需求，以指导恢复工作、了解脆弱性并为未来事件做准备。合成孔径雷达（SAR）遥感提供了全天候、全天候成像的能力，但其在深度学习用于洪水分割方面的应用受限于缺乏大规模注释数据集。为解决这一问题，我们引入了Kuro Siwo，这是一个手动注释的多时相数据集，跨越了全球43次洪水事件。我们的数据集覆盖了超过3380亿平方米的土地，其中33亿被指定为洪水区域或永久性水体。Kuro Siwo包括一个基于SAR地面范围检测的高度加工产品，用于洪水制图，以及一个带有最少预处理的原始SAR单视复杂产品，旨在促进对相位和幅度信息的利用研究，并为下游任务预处理提供最大的灵活性。为了利用大规模自监督预训练方法在遥感数据中的进展，我们使用大量未标记的SAR样本增强了Kuro Siwo。最后，我们提供了一个名为BlackBench的广泛基准，为欧洲、美洲、非洲、亚洲和澳大利亚的多样化洪水事件提供了强有力的基线。

更新时间: 2024-06-08 16:56:21

领域: cs.CV,cs.AI,cs.LG,eess.IV,I.2; I.4; I.5.4

下载: http://arxiv.org/abs/2311.12056v2

The Perception-Robustness Tradeoff in Deterministic Image Restoration

We study the behavior of deterministic methods for solving inverse problems in imaging. These methods are commonly designed to achieve two goals: (1) attaining high perceptual quality, and (2) generating reconstructions that are consistent with the measurements. We provide a rigorous proof that the better a predictor satisfies these two requirements, the larger its Lipschitz constant must be, regardless of the nature of the degradation involved. In particular, to approach perfect perceptual quality and perfect consistency, the Lipschitz constant of the model must grow to infinity. This implies that such methods are necessarily more susceptible to adversarial attacks. We demonstrate our theory on single image super-resolution algorithms, addressing both noisy and noiseless settings. We also show how this undesired behavior can be leveraged to explore the posterior distribution, thereby allowing the deterministic model to imitate stochastic methods.

Updated: 2024-06-08 16:54:33

标题: 确定性图像恢复中的感知鲁棒性折衷

摘要: 我们研究了用于解决成像逆问题的确定性方法的行为。这些方法通常旨在实现两个目标：（1）达到高感知质量，和（2）生成与测量一致的重建。我们提供了一个严格的证明，即预测器满足这两个要求越好，其Lipschitz常数就必须越大，无论所涉及的退化性质如何。特别地，为了接近完美的感知质量和完美的一致性，模型的Lipschitz常数必须增长到无穷大。这意味着这类方法更容易受到敌对攻击的影响。我们在单图像超分辨率算法上展示了我们的理论，涉及噪声和无噪声设置。我们还展示了如何利用这种不良行为来探索后验分布，从而使确定性模型模拟随机方法。

更新时间: 2024-06-08 16:54:33

领域: eess.IV,cs.CV,cs.LG,eess.SP

下载: http://arxiv.org/abs/2311.09253v4

SMR: State Memory Replay for Long Sequence Modeling

Despite the promising performance of state space models (SSMs) in long sequence modeling, limitations still exist. Advanced SSMs like S5 and S6 (Mamba) in addressing non-uniform sampling, their recursive structures impede efficient SSM computation via convolution. To overcome compatibility limitations in parallel convolutional computation, this paper proposes a novel non-recursive non-uniform sample processing strategy. Theoretical analysis of SSMs through the lens of Event-Triggered Control (ETC) theory reveals the Non-Stable State (NSS) problem, where deviations from sampling point requirements lead to error transmission and accumulation, causing the divergence of the SSM's hidden state. Our analysis further reveals that adjustments of input sequences with early memories can mitigate the NSS problem, achieving Sampling Step Adaptation (SSA). Building on this insight, we introduce a simple yet effective plug-and-play mechanism, State Memory Replay (SMR), which utilizes learnable memories to adjust the current state with multi-step information for generalization at sampling points different from those in the training data. This enables SSMs to stably model varying sampling points. Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models.

Updated: 2024-06-08 16:49:00

标题: SMR：用于长序列建模的状态记忆重放

摘要: 尽管状态空间模型（SSMs）在长序列建模中表现出色，但仍存在限制。像S5和S6（Mamba）这样的先进SSMs在处理非均匀采样方面，它们的递归结构阻碍了通过卷积有效计算SSM。为了克服并行卷积计算中的兼容性限制，本文提出了一种新颖的非递归非均匀采样处理策略。通过事件触发控制（ETC）理论的视角对SSMs进行理论分析揭示了非稳定状态（NSS）问题，即偏离采样点要求会导致误差传输和累积，从而导致SSM隐藏状态的发散。我们的分析进一步揭示，通过早期记忆调整输入序列可以缓解NSS问题，实现采样步骤适应（SSA）。基于这一见解，我们引入了一个简单而有效的即插即用机制，状态记忆重放（SMR），利用可学习的记忆来调整当前状态，以多步信息进行泛化，以适应不同于训练数据中的采样点。这使得SSMs能够稳定地建模不同采样点。在自回归语言建模和长距离竞技场的长序列建模任务上的实验表明，SMR机制对一系列SSM模型的一般有效性。

更新时间: 2024-06-08 16:49:00

领域: cs.LG

下载: http://arxiv.org/abs/2405.17534v2

Taking Second-life Batteries from Exhausted to Empowered using Experiments, Data Analysis, and Health Estimation

The reuse of retired electric vehicle batteries in grid energy storage offers environmental and economic benefits. This study concentrates on health monitoring algorithms for retired batteries deployed in grid storage. Over 15 months of testing, we collect, analyze, and publicize a dataset of second-life batteries, implementing a cycling protocol simulating grid energy storage load profiles within a 3-4 V voltage window. Four machine-learning-based health estimation models, relying on online-accessible features and initial capacity, are compared, with the selected model achieving a mean absolute percentage error below 2.3% on test data. Additionally, an adaptive online health estimation algorithm is proposed by integrating a clustering-based method, thus limiting estimation errors during online deployment. These results showcase the feasibility of repurposing retired batteries for second-life applications. Based on obtained data and power demand, these second-life batteries exhibit potential for over a decade of grid energy storage use.

Updated: 2024-06-08 16:46:51

标题: 将用尽的二次电池从失效到强化：实验、数据分析和健康评估

摘要: 退役电动汽车电池在电网储能中的再利用带来了环境和经济上的好处。本研究集中在用于电网储能中的退役电池的健康监测算法。在15个月的测试中，我们收集、分析并公开了一组二次生命电池的数据集，实施了一个循环协议，模拟了在3-4 V电压范围内的电网储能负载轮廓。比较了四种基于机器学习的健康估计模型，依赖于在线可访问的特征和初始容量，选择的模型在测试数据上实现了平均绝对百分比误差低于2.3%。此外，通过整合基于聚类的方法，提出了一种自适应在线健康估计算法，从而限制了在线部署过程中的估计误差。这些结果展示了将退役电池重新用于二次生命应用的可行性。根据获得的数据和功率需求，这些二次生命电池展示了在电网储能中使用超过十年的潜力。

更新时间: 2024-06-08 16:46:51

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2402.18859v2

RewardBench: Evaluating Reward Models for Language Modeling

Reward models (RMs) are at the crux of successfully using RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those models. Evaluating reward models presents an opportunity to understand the opaque technologies used for alignment of language models and which values are embedded in them. Resources for reward model training and understanding are sparse in the nascent open-source community around them. To enhance scientific understanding of reward models, we present RewardBench, a benchmark dataset and code-base for evaluation. The RewardBench dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety, to benchmark how reward models perform on challenging, structured and out-of-distribution queries. We create specific comparison datasets for RMs that have subtle, but verifiable reasons (e.g. bugs, incorrect facts) why one answer should be preferred to another. On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods, such as the direct MLE training of classifiers and the implicit reward modeling of Direct Preference Optimization (DPO). We present many findings on propensity for refusals, reasoning limitations, and instruction following shortcomings of various reward models towards a better understanding of the RLHF process.

Updated: 2024-06-08 16:40:12

标题: 奖励基准：评估用于语言建模的奖励模型

摘要: 奖励模型（RMs）是成功使用RLHF来使预训练模型与人类偏好保持一致的关键，然而对这些模型进行评估的研究相对较少。评估奖励模型提供了一个机会，可以了解用于对齐语言模型的不透明技术以及其中嵌入的价值观。关于奖励模型的培训和理解的资源在新兴的开源社区中并不丰富。为了增强对奖励模型的科学理解，我们提出了RewardBench，这是一个用于评估的基准数据集和代码库。RewardBench数据集是一个跨聊天、推理和安全领域的prompt-chosen-rejected三元组集合，用于评估奖励模型在具有挑战性、结构化和超分布查询上的表现。我们为具有微妙但可验证理由（例如错误、不正确的事实）的RMs创建了特定的比较数据集，以确定为什么应该更喜欢一个答案而不是另一个答案。在RewardBench排行榜上，我们评估了通过多种方法训练的奖励模型，例如直接MLE训练分类器和Direct Preference Optimization（DPO）的隐式奖励建模。我们提出了许多关于拒绝倾向、推理限制以及各种奖励模型在RLHF过程中对指令遵循的缺陷的结果，以更好地理解RLHF过程。

更新时间: 2024-06-08 16:40:12

领域: cs.LG

下载: http://arxiv.org/abs/2403.13787v2

Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset

Traditional applications of natural language processing (NLP) in healthcare have predominantly focused on patient-centered services, enhancing patient interactions and care delivery, such as through medical dialogue systems. However, the potential of NLP to benefit inexperienced doctors, particularly in areas such as communicative medical coaching, remains largely unexplored. We introduce "ChatCoach", a human-AI cooperative framework designed to assist medical learners in practicing their communication skills during patient consultations. ChatCoach (Our data and code are available online: https://github.com/zerowst/Chatcoach)differentiates itself from conventional dialogue systems by offering a simulated environment where medical learners can practice dialogues with a patient agent, while a coach agent provides immediate, structured feedback. This is facilitated by our proposed Generalized Chain-of-Thought (GCoT) approach, which fosters the generation of structured feedback and enhances the utilization of external knowledge sources. Additionally, we have developed a dataset specifically for evaluating Large Language Models (LLMs) within the ChatCoach framework on communicative medical coaching tasks. Our empirical results validate the effectiveness of ChatCoach.

Updated: 2024-06-08 16:36:56

标题: 在沟通性医疗指导中对大型语言模型进行基准测试：一种新型系统和数据集

摘要: 传统的自然语言处理（NLP）在医疗保健领域的应用主要集中在以患者为中心的服务，增强患者互动和护理交付，比如通过医疗对话系统。然而，NLP在惠及经验不足的医生方面的潜力，特别是在交流医疗辅导等领域，仍然很少被探索。我们介绍了“ChatCoach”，这是一个人工智能合作框架，旨在帮助医学学习者在患者咨询过程中练习他们的沟通技巧。ChatCoach（我们的数据和代码可在网上获取：https://github.com/zerowst/Chatcoach）通过提供一个模拟环境，让医学学习者可以与患者代理进行对话练习，同时教练代理提供即时的结构化反馈，与传统的对话系统有所区别。这是通过我们提出的广义思维链（GCoT）方法实现的，该方法促进了结构化反馈的生成，并增强了外部知识源的利用。此外，我们专门开发了一个数据集，用于在ChatCoach框架上评估大型语言模型（LLMs）在交流医疗辅导任务中的表现。我们的实证结果验证了ChatCoach的有效性。

更新时间: 2024-06-08 16:36:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.05547v2

Blockchain Integrated Federated Learning in Edge-Fog-Cloud Systems for IoT based Healthcare Applications A Survey

Modern Internet of Things (IoT) applications generate enormous amounts of data, making data-driven machine learning essential for developing precise and reliable statistical models. However, data is often stored in silos, and strict user-privacy legislation complicates data utilization, limiting machine learning's potential in traditional centralized paradigms due to diverse data probability distributions and lack of personalization. Federated learning, a new distributed paradigm, supports collaborative learning while preserving privacy, making it ideal for IoT applications. By employing cryptographic techniques, IoT systems can securely store and transmit data, ensuring consistency. The integration of federated learning and blockchain is particularly advantageous for handling sensitive data, such as in healthcare. Despite the potential of these technologies, a comprehensive examination of their integration in edge-fog-cloud-based IoT computing systems and healthcare applications is needed. This survey article explores the architecture, structure, functions, and characteristics of federated learning and blockchain, their applications in various computing paradigms, and evaluates their implementations in healthcare.

Updated: 2024-06-08 16:36:48

标题: 区块链集成的边缘-雾-云系统中的联邦学习，用于基于物联网的医疗应用：一项调查

摘要: 现代物联网（IoT）应用程序生成大量数据，这使得基于数据驱动的机器学习对于开发精确和可靠的统计模型至关重要。然而，数据通常存储在孤立的存储中，并严格的用户隐私立法使数据利用变得复杂，限制了传统集中式范式中机器学习的潜力，因为数据具有不同的概率分布并且缺乏个性化。联邦学习是一种新的分布式范式，支持协作学习同时保护隐私，使其成为物联网应用程序的理想选择。通过使用加密技术，物联网系统可以安全存储和传输数据，确保一致性。联邦学习和区块链的集成对于处理敏感数据，如医疗保健领域，特别有优势。尽管这些技术具有潜力，但需要对它们在边缘-雾-云基础的物联网计算系统和医疗应用程序中的整合进行全面审查。本调查文章探讨了联邦学习和区块链的架构、结构、功能和特性，它们在各种计算范式中的应用，并评估了它们在医疗保健领域中的实施情况。

更新时间: 2024-06-08 16:36:48

领域: cs.CR

下载: http://arxiv.org/abs/2406.05517v1

Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation

We propose and study a realistic Continual Learning (CL) setting where learning algorithms are granted a restricted computational budget per time step while training. We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates. Previous proficient CL methods perform very poorly in this challenging setting. Overfitting to the sparse labeled data and insufficient computational budget are the two main culprits for such a poor performance. Our new setting encourages learning methods to effectively and efficiently utilize the unlabeled data during training. To that end, we propose a simple but highly effective baseline, DietCL, which utilizes both unlabeled and labeled data jointly. DietCL meticulously allocates computational budget for both types of data. We validate our baseline, at scale, on several datasets, e.g., CLOC, ImageNet10K, and CGLM, under constraint budget setups. DietCL outperforms, by a large margin, all existing supervised CL algorithms as well as more recent continual semi-supervised methods. Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.

Updated: 2024-06-08 16:36:17

标题: 持续学习的饮食：在受限计算条件下从稀疏标记流中学习

摘要: 我们提出并研究了一个现实的持续学习（Continual Learning，CL）设置，在此设置中，学习算法在训练过程中每个时间步被授予有限的计算预算。我们将这一设置应用于大规模半监督持续学习场景，其中标签稀缺率较高。先前的高效CL方法在这种具有挑战性的设置中表现非常糟糕。对稀疏标记数据的过拟合和计算预算不足是导致性能不佳的两个主要原因。我们的新设置鼓励学习方法在训练过程中有效且高效地利用未标记数据。为此，我们提出了一个简单但高效的基线方法，DietCL，它同时利用未标记和已标记数据。DietCL精心分配计算预算用于两种类型的数据。我们在多个数据集（如CLOC、ImageNet10K和CGLM）上，采用受约束的预算设置对我们的基线进行了验证。DietCL在性能上远远超过所有现有的监督CL算法以及更近期的持续半监督方法。我们的广泛分析和消融试验表明，DietCL在全范围的标签稀疏性、计算预算和其他各种消融条件下都是稳定的。

更新时间: 2024-06-08 16:36:17

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.12766v2

Verbalized Probabilistic Graphical Modeling with Large Language Models

Faced with complex problems, the human brain demonstrates a remarkable capacity to transcend sensory input and form latent understandings of perceived world patterns. However, this cognitive capacity is not explicitly considered or encoded in current large language models (LLMs). As a result, LLMs often struggle to capture latent structures and model uncertainty in complex compositional reasoning tasks. This work introduces a novel Bayesian prompting approach that facilitates training-free Bayesian inference with LLMs by using a verbalized Probabilistic Graphical Model (PGM). While traditional Bayesian approaches typically depend on extensive data and predetermined mathematical structures for learning latent factors and dependencies, our approach efficiently reasons latent variables and their probabilistic dependencies by prompting LLMs to adhere to Bayesian principles. We evaluated our model on several compositional reasoning tasks, both close-ended and open-ended. Our results indicate that the model effectively enhances confidence elicitation and text generation quality, demonstrating its potential to improve AI language understanding systems, especially in modeling uncertainty.

Updated: 2024-06-08 16:35:31

标题: 使用大型语言模型进行概率图模型推理

摘要: 面对复杂问题，人类大脑展现出一种非凡的能力，超越感官输入并形成对所感知世界模式的潜在理解。然而，这种认知能力并未被明确考虑或编码在当前的大型语言模型（LLMs）中。因此，LLMs经常难以捕捉潜在结构并模拟复杂组合推理任务中的不确定性。这项工作引入了一种新颖的贝叶斯提示方法，通过使用口头化的概率图模型（PGM）促进与LLMs的无需训练的贝叶斯推理。传统的贝叶斯方法通常依赖于大量数据和预先确定的数学结构来学习潜在因素和依赖关系，而我们的方法通过提示LLMs遵循贝叶斯原则，有效地推理潜在变量及其概率依赖关系。我们在几个组合推理任务上评估了我们的模型，包括封闭式和开放式任务。我们的结果表明，该模型有效地提高了置信度引发和文本生成质量，展示了其在改进人工智能语言理解系统，特别是在建模不确定性方面的潜力。

更新时间: 2024-06-08 16:35:31

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05516v1

Rethink Tree Traversal

We will show how to implement binary decision tree traversal in the language of matrix computation. Our main contribution is to propose some equivalent algorithms of binary tree traversal based on a novel matrix representation of the hierarchical structure of the decision tree. Our key idea is to travel the binary decision tree by maximum inner product search. We not only implement decision tree methods without the recursive traverse but also delve into the partitioning nature of tree-based methods.

Updated: 2024-06-08 16:26:13

标题: 重新思考树的遍历

摘要: 我们将展示如何在矩阵计算的语言中实现二叉决策树遍历。我们的主要贡献是提出一些基于决策树层次结构的新颖矩阵表示的等价算法。我们的关键思想是通过最大内积搜索遍历二叉决策树。我们不仅实现了无需递归遍历的决策树方法，还深入研究了基于树的方法的分区特性。

更新时间: 2024-06-08 16:26:13

领域: cs.LG,cs.DS,cs.NA,math.NA

下载: http://arxiv.org/abs/2209.04825v4

Interactive Greybox Penetration Testing for Cloud Access Control using IAM Modeling and Deep Reinforcement Learning

Identity and Access Management (IAM) is an access control service in cloud platforms. To securely manage cloud resources, customers need to configure IAM to specify the access control rules for their cloud organizations. However, incorrectly configured IAM can be exploited to cause a security attack such as privilege escalation (PE), leading to severe economic loss. To detect such PEs due to IAM misconfigurations, third-party cloud security services are commonly used. The state-of-the-art services apply whitebox penetration testing techniques, which require access to complete IAM configurations. However, the configurations can contain sensitive information. To prevent the disclosure of such information, customers need to manually anonymize the configuration. In this paper, we propose a precise greybox penetration testing approach called TAC for third-party services to detect IAM PEs. To mitigate the dual challenges of labor-intensive anonymization and potentially sensitive information disclosures, TAC interacts with customers by selectively querying only the essential information needed. Our key insight is that only a small fraction of information in the IAM configuration is relevant to the IAM PE detection. We first propose IAM modeling, enabling TAC to detect a broad class of IAM PEs based on the partial information collected from queries. To improve the efficiency and applicability of TAC, we aim to minimize interactions with customers by applying Reinforcement Learning (RL) with Graph Neural Networks (GNNs), allowing TAC to learn to make as few queries as possible. Experimental results on both synthetic and real-world tasks show that, compared to state-of-the-art whitebox approaches, TAC detects IAM PEs with competitively low false negative rates, employing a limited number of queries.

Updated: 2024-06-08 16:23:32

标题: 互动式灰盒渗透测试：使用IAM建模和深度强化学习进行云访问控制

摘要: 身份和访问管理（IAM）是云平台中的一种访问控制服务。为了安全管理云资源，客户需要配置IAM来指定他们的云组织的访问控制规则。然而，错误配置的IAM可能被利用来导致安全攻击，如权限升级（PE），从而导致严重的经济损失。为了检测由于IAM配置错误而导致的PE，通常会使用第三方云安全服务。最先进的服务应用白盒渗透测试技术，需要访问完整的IAM配置。然而，配置可能包含敏感信息。为防止泄露此类信息，客户需要手动对配置进行匿名处理。在本文中，我们提出了一种精确的灰盒渗透测试方法TAC，用于检测IAM PE的第三方服务。为了减轻繁重的匿名化和潜在的敏感信息披露的双重挑战，TAC通过选择性查询仅需要的基本信息与客户进行交互。我们的关键观点是，IAM配置中只有一小部分信息与IAM PE检测相关。我们首先提出IAM建模，使TAC能够基于从查询中收集的部分信息检测广泛类别的IAM PE。为了提高TAC的效率和适用性，我们旨在通过应用强化学习（RL）与图神经网络（GNNs）来最小化与客户的交互，使TAC学会尽可能少地提出查询。对合成和真实任务的实验结果显示，与最先进的白盒方法相比，TAC以竞争性低误报率检测IAM PE，使用有限数量的查询。

更新时间: 2024-06-08 16:23:32

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2304.14540v5

Representation Learning with Conditional Information Flow Maximization

This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) for the target task. Firstly, an information flow maximization principle is proposed to learn more sufficient representations by simultaneously maximizing both input-representation and representation-label mutual information. In contrast to information bottleneck, we handle the input-representation information in an opposite way to avoid the over-compression issue of latent representations. Besides, to mitigate the negative effect of potential redundant features, a conditional information minimization principle is designed to eliminate negative redundant features while preserve noise-invariant features from the input. Experiments on 13 language understanding benchmarks demonstrate that our method effectively improves the performance of PLMs for classification and regression. Extensive experiments show that the learned representations are more sufficient, robust and transferable.

Updated: 2024-06-08 16:19:18

标题: 用条件信息流最大化进行表示学习

摘要: 本文提出了一种信息论表示学习框架，名为条件信息流最大化，用于提取对输入数据和目标任务具有噪声不变性的充分表示。它促进了学习到的表示具有良好的特征均匀性和充分的预测能力，可以增强预训练语言模型(PLMs)对目标任务的泛化能力。首先，提出了一种信息流最大化原则，通过同时最大化输入-表示和表示-标签互信息来学习更充分的表示。与信息瓶颈相反，我们以相反的方式处理输入-表示信息，以避免潜在表示的过度压缩问题。此外，为了减轻潜在冗余特征的负面影响，设计了一种条件信息最小化原则，可以消除负面冗余特征，同时保留来自输入的噪声不变特征。在13个语言理解基准测试上的实验证明，我们的方法有效地提高了PLMs在分类和回归任务中的性能。大量实验表明，学习到的表示更充分、更稳健且更可转移。

更新时间: 2024-06-08 16:19:18

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.05510v1

Towards a Benchmark for Causal Business Process Reasoning with LLMs

Large Language Models (LLMs) are increasingly used for boosting organizational efficiency and automating tasks. While not originally designed for complex cognitive processes, recent efforts have further extended to employ LLMs in activities such as reasoning, planning, and decision-making. In business processes, such abilities could be invaluable for leveraging on the massive corpora LLMs have been trained on for gaining deep understanding of such processes. In this work, we plant the seeds for the development of a benchmark to assess the ability of LLMs to reason about causal and process perspectives of business operations. We refer to this view as Causally-augmented Business Processes (BP^C). The core of the benchmark comprises a set of BP^C related situations, a set of questions about these situations, and a set of deductive rules employed to systematically resolve the ground truth answers to these questions. Also with the power of LLMs, the seed is then instantiated into a larger-scale set of domain-specific situations and questions. Reasoning on BP^C is of crucial importance for process interventions and process improvement. Our benchmark could be used in one of two possible modalities: testing the performance of any target LLM and training an LLM to advance its capability to reason about BP^C.

Updated: 2024-06-08 16:10:53

标题: 朝向具有LLMs的因果业务流程推理基准

摘要: 大型语言模型（LLMs）越来越被用于提高组织效率和自动化任务。虽然最初并非为复杂的认知过程设计，但最近的努力进一步扩展了LLMs在推理、规划和决策等活动中的应用。在业务流程中，这些能力对于利用LLMs所训练的大规模语料库来深入理解这些流程可能是非常宝贵的。在这项工作中，我们种下了一个基准的种子，用于评估LLMs推理有关业务运营的因果和过程视角的能力。我们将这种视角称为因果增强业务流程（BP^C）。基准的核心包括一组与BP^C相关的情境，一组关于这些情境的问题，以及一组推理规则，用于系统地解决这些问题的真实答案。同时，借助LLMs的能力，种子随后被实例化为一个更大规模的特定领域情境和问题集合。对BP^C的推理对于流程干预和流程改进至关重要。我们的基准可以以两种可能的方式之一使用：测试任何目标LLM的性能和训练LLM以提高其推理BP^C的能力。

更新时间: 2024-06-08 16:10:53

领域: cs.AI

下载: http://arxiv.org/abs/2406.05506v1

Generalizing Reward Modeling for Out-of-Distribution Preference Learning

Preference learning (PL) with large language models (LLMs) aims to align the LLMs' generations with human preferences. Previous work on reinforcement learning from human feedback (RLHF) has demonstrated promising results in in-distribution PL. However, due to the difficulty of obtaining human feedback, discretely training reward models for every encountered distribution is challenging. Thus, out-of-distribution (OOD) PL is practically useful for enhancing the generalization ability of LLMs with limited preference feedback. This work addresses OOD PL by optimizing a general reward model through a meta-learning approach. During meta-training, a bilevel optimization algorithm is utilized to learn a reward model capable of guiding policy learning to align with human preferences across various distributions. When encountering a test distribution, the meta-test procedure conducts regularized policy optimization using the learned reward model for PL. We theoretically demonstrate the convergence rate of the bilevel optimization algorithm under reasonable assumptions. Additionally, we conduct experiments on two text generation tasks across 20 held-out domains and outperform a variety of strong baselines across various evaluation metrics.

Updated: 2024-06-08 16:10:45

标题: 将"Generalizing Reward Modeling for Out-of-Distribution Preference Learning"翻译为"泛化奖励建模用于超出分布偏好学习"

摘要: 使用大型语言模型（LLMs）进行偏好学习（PL）旨在将LLMs的生成与人类偏好对齐。先前关于从人类反馈中学习强化学习（RLHF）的研究已经在分布内PL方面取得了令人鼓舞的结果。然而，由于获取人类反馈的困难，为每个遇到的分布离散训练奖励模型是具有挑战性的。因此，对于具有有限偏好反馈的LLMs增强泛化能力，超出分布（OOD）PL在实践中是有用的。本文通过优化一个通用奖励模型通过元学习方法来解决OOD PL。在元训练期间，利用一个双层优化算法来学习一个能够引导策略学习与人类偏好对齐的奖励模型。当遇到测试分布时，元测试过程使用学习的奖励模型进行常规策略优化进行PL。我们在合理假设下理论上证明了双层优化算法的收敛速度。此外，我们在20个保留领域上进行了两个文本生成任务的实验，并在各种评估指标上胜过各种强基线。

更新时间: 2024-06-08 16:10:45

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.14760v2

On the Stability of Expressive Positional Encodings for Graphs

Designing effective positional encodings for graphs is key to building powerful graph transformers and enhancing message-passing graph neural networks. Although widespread, using Laplacian eigenvectors as positional encodings faces two fundamental challenges: (1) \emph{Non-uniqueness}: there are many different eigendecompositions of the same Laplacian, and (2) \emph{Instability}: small perturbations to the Laplacian could result in completely different eigenspaces, leading to unpredictable changes in positional encoding. Despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. We identify the cause of instability to be a ``hard partition'' of eigenspaces. Hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to ``softly partition'' eigenspaces. SPE is the first architecture that is (1) provably stable, and (2) universally expressive for basis invariant functions whilst respecting all symmetries of eigenvectors. Besides guaranteed stability, we prove that SPE is at least as expressive as existing methods, and highly capable of counting graph structures. Finally, we evaluate the effectiveness of our method on molecular property prediction, and out-of-distribution generalization tasks, finding improved generalization compared to existing positional encoding methods. Our code is available at \url{https://github.com/Graph-COM/SPE}.

Updated: 2024-06-08 16:09:55

标题: 关于图形表达位置编码的稳定性

摘要: 设计有效的图位置编码对于构建强大的图变换器和增强消息传递图神经网络至关重要。尽管广泛使用，但使用拉普拉斯特征向量作为位置编码面临两个基本挑战：（1）\emph{非唯一性}：相同拉普拉斯特征的不同特征分解有许多种，以及（2）\emph{不稳定性}：对拉普拉斯矩阵的微小扰动可能导致完全不同的特征空间，从而导致位置编码的不可预测变化。尽管已有许多尝试解决非唯一性的方法，但大多数方法忽略了稳定性，导致在未见过的图结构上的泛化能力较差。我们确定不稳定性的原因是特征空间的“硬分区”。因此，我们引入了稳定且表达丰富的位置编码（SPE）架构，用于处理特征向量，该架构使用特征值来“软分区”特征空间。SPE是第一个经证明稳定且在基础不变函数方面具有普遍表达能力的架构，同时尊重特征向量的所有对称性。除了保证稳定性外，我们证明SPE至少与现有方法一样具有表达力，并且高度能够计数图结构。最后，我们评估了我们的方法在分子性质预测和超出分布泛化任务上的有效性，发现相比现有的位置编码方法，泛化能力得到了改进。我们的代码可在\url{https://github.com/Graph-COM/SPE} 上找到。

更新时间: 2024-06-08 16:09:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.02579v3

I-SIRch: AI-Powered Concept Annotation Tool For Equitable Extraction And Analysis Of Safety Insights From Maternity Investigations

Maternity care is a complex system involving treatments and interactions between patients, providers, and the care environment. To improve patient safety and outcomes, understanding the human factors (e.g. individuals decisions, local facilities) influencing healthcare delivery is crucial. However, most current tools for analysing healthcare data focus only on biomedical concepts (e.g. health conditions, procedures and tests), overlooking the importance of human factors. We developed a new approach called I-SIRch, using artificial intelligence to automatically identify and label human factors concepts in maternity healthcare investigation reports describing adverse maternity incidents produced by England's Healthcare Safety Investigation Branch (HSIB). These incident investigation reports aim to identify opportunities for learning and improving maternal safety across the entire healthcare system. I-SIRch was trained using real data and tested on both real and simulated data to evaluate its performance in identifying human factors concepts. When applied to real reports, the model achieved a high level of accuracy, correctly identifying relevant concepts in 90\% of the sentences from 97 reports. Applying I-SIRch to analyse these reports revealed that certain human factors disproportionately affected mothers from different ethnic groups. Our work demonstrates the potential of using automated tools to identify human factors concepts in maternity incident investigation reports, rather than focusing solely on biomedical concepts. This approach opens up new possibilities for understanding the complex interplay between social, technical, and organisational factors influencing maternal safety and population health outcomes. By taking a more comprehensive view of maternal healthcare delivery, we can develop targeted interventions to address disparities and improve maternal outcomes.

Updated: 2024-06-08 16:05:31

标题: I-SIRch：基于人工智能的概念标注工具，用于公平提取和分析产科调查中的安全见解

摘要: 孕产保健是一个涉及患者、医护人员和护理环境之间治疗和交互的复杂系统。为了提高患者安全和结果，理解影响医疗服务提供的人为因素（例如个人决策、地方设施）至关重要。然而，大多数当前用于分析医疗数据的工具仅关注生物医学概念（如健康状况、程序和检测），忽视了人为因素的重要性。我们开发了一种称为I-SIRch的新方法，利用人工智能自动识别和标记描述英格兰卫生安全调查局（HSIB）制作的产科事故调查报告中的人为因素概念。这些事故调查报告旨在识别全医疗系统范围内学习和改进产妇安全的机会。I-SIRch经过实际数据训练，并在真实和模拟数据上进行测试，以评估其在识别人为因素概念方面的表现。应用于实际报告时，该模型达到了很高的准确性，在97份报告中的90\%句子中正确识别了相关概念。将I-SIRch应用于分析这些报告揭示了某些人为因素对不同种族的母亲产生了不成比例的影响。我们的工作展示了利用自动化工具识别产科事故调查报告中的人为因素概念的潜力，而不仅仅关注生物医学概念。这种方法为理解社会、技术和组织因素相互影响产妇安全和人口健康结果的复杂互动提供了新的可能性。通过更全面地看待产妇保健服务，我们可以制定有针对性的干预措施，解决差异并改善产妇的结果。

更新时间: 2024-06-08 16:05:31

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.05505v1

G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes

In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. Prior machine learning approaches for counterfactual predictions under time-varying treatments focus on static time-varying treatment regimes where treatments do not depend on previous covariate history. In this work, we present G-Transformer, a Transformer-based framework supporting g-computation for counterfactual prediction under dynamic and time-varying treatment strategies. G-Transfomer captures complex, long-range dependencies in time-varying covariates using a Transformer architecture. G-Transformer estimates the conditional distribution of relevant covariates given covariate and treatment history at each time point using an encoder architecture, then produces Monte Carlo estimates of counterfactual outcomes by simulating forward patient trajectories under treatment strategies of interest. We evaluate G-Transformer extensively using two simulated longitudinal datasets from mechanistic models, and a real-world sepsis ICU dataset from MIMIC-IV. G-Transformer outperforms both classical and state-of-the-art counterfactual prediction models in these settings. To the best of our knowledge, this is the first Transformer-based architecture for counterfactual outcome prediction under dynamic and time-varying treatment strategies. Code will be released upon publication of the paper.

Updated: 2024-06-08 16:04:33

标题: G-Transformer：动态和时间变化的治疗方案下的反事实结果预测

摘要: 在医学决策的背景下，反事实预测使临床医生能够根据观察到的患者历史，预测在替代治疗行动下感兴趣的治疗结果。先前用于时间变化治疗的反事实预测的机器学习方法主要集中在静态时间变化治疗方案上，其中治疗不依赖于先前的协变量历史。在这项工作中，我们提出了G-Transformer，这是一个基于Transformer的框架，支持在动态和时间变化的治疗策略下进行反事实预测。G-Transformer使用Transformer架构捕捉时间变化协变量中复杂的、长期的依赖关系。G-Transformer使用编码器架构估计在每个时间点给定协变量和治疗历史的相关协变量的条件分布，然后通过模拟感兴趣的治疗策略下的患者前进轨迹来产生反事实结果的蒙特卡罗估计。我们使用两个来自机械模型的模拟纵向数据集以及来自MIMIC-IV的真实世界脓毒症ICU数据集对G-Transformer进行了广泛评估。在这些设置中，G-Transformer在性能上优于传统和最先进的反事实预测模型。据我们所知，这是第一个基于Transformer架构的用于在动态和时间变化的治疗策略下进行反事实结果预测的模型。代码将在论文发布后发布。

更新时间: 2024-06-08 16:04:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.05504v1

ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment

We present a comprehensive evaluation of large language models for multilingual readability assessment. Existing evaluation resources lack domain and language diversity, limiting the ability for cross-domain and cross-lingual analyses. This paper introduces ReadMe++, a multilingual multi-domain dataset with human annotations of 9757 sentences in Arabic, English, French, Hindi, and Russian, collected from 112 different data sources. This benchmark will encourage research on developing robust multilingual readability assessment methods. Using ReadMe++, we benchmark multilingual and monolingual language models in the supervised, unsupervised, and few-shot prompting settings. The domain and language diversity in ReadMe++ enable us to test more effective few-shot prompting, and identify shortcomings in state-of-the-art unsupervised methods. Our experiments also reveal exciting results of superior domain generalization and enhanced cross-lingual transfer capabilities by models trained on ReadMe++. We will make our data publicly available and release a python package tool for multilingual sentence readability prediction using our trained models at: https://github.com/tareknaous/readme

Updated: 2024-06-08 15:54:54

标题: ReadMe++：用于多领域可读性评估的多语言语言模型基准测试

摘要: 我们提出了一个全面评估大型语言模型用于多语言易读性评估的方法。现有的评估资源缺乏领域和语言的多样性，限制了跨领域和跨语言分析的能力。本文介绍了ReadMe++，这是一个多语言多领域数据集，包含来自112个不同数据源的9757个句子的人工注释，涵盖阿拉伯语、英语、法语、印地语和俄语。这一基准将鼓励研究开发强大的多语言易读性评估方法。使用ReadMe++，我们在监督、无监督和少样本提示设置中对多语言和单语言语言模型进行基准测试。ReadMe++中的领域和语言多样性使我们能够测试更有效的少样本提示，并识别最先进的无监督方法的缺点。我们的实验还揭示了在ReadMe++上训练的模型具有优越的领域泛化和增强的跨语言转移能力的令人振奋的结果。我们将公开发布我们的数据，并发布一个用于使用我们训练的模型进行多语言句子易读性预测的Python包工具，网址为: https://github.com/tareknaous/readme

更新时间: 2024-06-08 15:54:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2305.14463v3

SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs) and has evolved into four major categories: optimization-based attacks such as Greedy Coordinate Gradient (GCG), jailbreak template-based attacks such as "Do-Anything-Now", advanced indirect attacks like DrAttack, and multilingual jailbreaks. However, delivering a practical jailbreak defense is challenging because it needs to not only handle all the above jailbreak attacks but also incur negligible delay to user prompts, as well as be compatible with both open-source and closed-source LLMs. Inspired by how the traditional security concept of shadow stacks defends against memory overflow attacks, this paper introduces a generic LLM jailbreak defense framework called SelfDefend, which establishes a shadow LLM defense instance to concurrently protect the target LLM instance in the normal stack and collaborate with it for checkpoint-based access control. The effectiveness of SelfDefend builds upon our observation that existing LLMs (both target and defense LLMs) have the capability to identify harmful prompts or intentions in user queries, which we empirically validate using the commonly used GPT-3.5/4 models across all major jailbreak attacks. Our measurements show that SelfDefend enables GPT-3.5 to suppress the attack success rate (ASR) by 8.97-95.74% (average: 60%) and GPT-4 by even 36.36-100% (average: 83%), while incurring negligible effects on normal queries. To further improve the defense's robustness and minimize costs, we employ a data distillation approach to tune dedicated open-source defense models. These models outperform four SOTA defenses and match the performance of GPT-4-based SelfDefend, with significantly lower extra delays. We also empirically show that the tuned models are robust to targeted GCG and prompt injection attacks.

Updated: 2024-06-08 15:45:31

标题: 自卫：LLMs可以在实践中抵御越狱

摘要: 越狱是一种新兴的对抗性攻击，它绕过了现成的大型语言模型（LLM）中部署的安全对齐，并演变成了四种主要类别：基于优化的攻击，例如贪婪坐标梯度（GCG），基于越狱模板的攻击，如“现在做任何事情”，高级间接攻击，如DrAttack，以及多语言越狱。然而，提供一个实用的越狱防御是具有挑战性的，因为它不仅需要处理所有上述越狱攻击，还需要对用户提示产生微不足道的延迟，并且与开源和闭源LLMs兼容。受传统安全概念中影子栈如何防御内存溢出攻击的启发，本文介绍了一个称为SelfDefend的通用LLM越狱防御框架，该框架建立了一个影子LLM防御实例，同时保护目标LLM实例在正常栈中，并与其合作进行基于检查点的访问控制。SelfDefend的有效性建立在我们的观察之上，即现有LLMs（包括目标和防御LLMs）具有识别用户查询中有害提示或意图的能力，我们使用常用的GPT-3.5/4模型在所有主要越狱攻击中对其进行了实证验证。我们的测量结果显示，SelfDefend使GPT-3.5能够将攻击成功率（ASR）降低8.97-95.74%（平均60%），而GPT-4甚至降低36.36-100%（平均83%），同时对正常查询产生微不足道的影响。为了进一步提高防御的鲁棒性并降低成本，我们采用数据蒸馏方法调整专门的开源防御模型。这些模型优于四种最先进的防御，并与基于GPT-4的SelfDefend的性能相匹敌，而额外延迟明显更低。我们还从实证上展示了调整后的模型对有针对性的GCG和提示注入攻击具有鲁棒性。

更新时间: 2024-06-08 15:45:31

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.05498v1

Security Code Review by Large Language Models

Security code review, as a time-consuming and labour-intensive process, typically requires integration with automated security defect detection tools to ensure code security. Despite the emergence of numerous security analysis tools, those tools face challenges in terms of their poor generalization, high false positive rates, and coarse detection granularity. A recent development with Large Language Models (LLMs) has made them a promising candidate to support security code review. To this end, we conducted the first empirical study to understand the capabilities of LLMs in security code review, delving into the performance, quality problems, and influential factors of LLMs to detect security defects in code reviews. Specifically, we compared the performance of 6 LLMs under five different prompts with the state-of-the-art static analysis tools to detect and analyze security defects. For the best-performing LLM, we conducted a linguistic analysis to explore quality problems in its responses, as well as a regression analysis to investigate the factors influencing its performance. The results are that: (1) existing pre-trained LLMs have limited capability in detecting security defects during code review but significantly outperform the state-of-the-art static analysis tools. (2) GPT-4 performs best among all LLMs when provided with a CWE list for reference. (3) GPT-4 makes few factual errors but frequently generates unnecessary content or responses that are not compliant with the task requirements given in the prompts. (4) GPT-4 is more adept at identifying security defects in code files with fewer tokens, containing functional logic and written by developers with less involvement in the project.

Updated: 2024-06-08 15:28:13

标题: 大型语言模型进行安全代码审查

摘要: 安全代码审查作为一个耗时且劳动密集的过程，通常需要与自动安全缺陷检测工具集成，以确保代码安全。尽管出现了许多安全分析工具，但这些工具面临着一些挑战，如泛化能力差、误报率高和检测粒度粗糙。最近，大型语言模型（LLMs）的发展使它们成为支持安全代码审查的有希望的候选者。为此，我们进行了第一项实证研究，以了解LLMs在安全代码审查中的能力，深入研究LLMs在检测代码审查中的安全缺陷时的性能、质量问题和影响因素。具体来说，我们比较了6种LLMs在五种不同提示下与最先进的静态分析工具的性能，以检测和分析安全缺陷。对于表现最佳的LLM，我们进行了语言分析，探讨其回应中的质量问题，以及进行了回归分析，研究影响其性能的因素。结果是：（1）现有的预训练LLMs在代码审查过程中检测安全缺陷的能力有限，但明显优于最先进的静态分析工具。 (2) 在提供CWE列表作为参考时，GPT-4在所有LLMs中表现最佳。 (3) GPT-4几乎没有事实错误，但经常生成与提示中的任务要求不符的不必要内容或回应。 (4) GPT-4更擅长识别代码文件中包含功能逻辑、由项目参与度较低的开发人员编写的较少标记的安全缺陷。

更新时间: 2024-06-08 15:28:13

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2401.16310v2

Graph-based Forecasting with Missing Data through Spatiotemporal Downsampling

Given a set of synchronous time series, each associated with a sensor-point in space and characterized by inter-series relationships, the problem of spatiotemporal forecasting consists of predicting future observations for each point. Spatiotemporal graph neural networks achieve striking results by representing the relationships across time series as a graph. Nonetheless, most existing methods rely on the often unrealistic assumption that inputs are always available and fail to capture hidden spatiotemporal dynamics when part of the data is missing. In this work, we tackle this problem through hierarchical spatiotemporal downsampling. The input time series are progressively coarsened over time and space, obtaining a pool of representations that capture heterogeneous temporal and spatial dynamics. Conditioned on observations and missing data patterns, such representations are combined by an interpretable attention mechanism to generate the forecasts. Our approach outperforms state-of-the-art methods on synthetic and real-world benchmarks under different missing data distributions, particularly in the presence of contiguous blocks of missing values.

Updated: 2024-06-08 15:27:35

标题: 基于图的空间-时间降采样方法进行缺失数据的预测

摘要: 考虑到一组具有时间同步的时间序列，每个序列与空间中的传感器点相关联，并且具有序列间关系，时空预测问题包括预测每个点的未来观测。时空图神经网络通过将时间序列之间的关系表示为图来取得显著结果。然而，大多数现有方法依赖于通常不现实的假设，即输入始终可用，并且在部分数据丢失时无法捕捉隐藏的时空动态。在这项工作中，我们通过分层时空下采样解决了这个问题。输入时间序列随着时间和空间的推移逐渐变粗，获得一组能够捕捉异质时间和空间动态的表示。在观察和缺失数据模式的条件下，这些表示通过可解释的注意机制进行组合，以生成预测。我们的方法在合成和真实世界基准测试中表现优于最先进的方法，在不同的缺失数据分布下特别是在连续缺失值块存在的情况下。

更新时间: 2024-06-08 15:27:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.10634v3

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

Vision-Language Pre-training (VLP) models trained on large-scale image-text pairs have demonstrated unprecedented capability in many practical applications. However, previous studies have revealed that VLP models are vulnerable to adversarial samples crafted by a malicious adversary. While existing attacks have achieved great success in improving attack effect and transferability, they all focus on instance-specific attacks that generate perturbations for each input sample. In this paper, we show that VLP models can be vulnerable to a new class of universal adversarial perturbation (UAP) for all input samples. Although initially transplanting existing UAP algorithms to perform attacks showed effectiveness in attacking discriminative models, the results were unsatisfactory when applied to VLP models. To this end, we revisit the multimodal alignments in VLP model training and propose the Contrastive-training Perturbation Generator with Cross-modal conditions (C-PGC). Specifically, we first design a generator that incorporates cross-modal information as conditioning input to guide the training. To further exploit cross-modal interactions, we propose to formulate the training objective as a multimodal contrastive learning paradigm based on our constructed positive and negative image-text pairs. By training the conditional generator with the designed loss, we successfully force the adversarial samples to move away from its original area in the VLP model's feature space, and thus essentially enhance the attacks. Extensive experiments show that our method achieves remarkable attack performance across various VLP models and Vision-and-Language (V+L) tasks. Moreover, C-PGC exhibits outstanding black-box transferability and achieves impressive results in fooling prevalent large VLP models including LLaVA and Qwen-VL.

Updated: 2024-06-08 15:01:54

标题: 一个扰动就足够：生成通用对抗性扰动以攻击视觉-语言预训练模型

摘要: 视觉语言预训练（VLP）模型在大规模图像文本对上训练后，在许多实际应用中展示了前所未有的能力。然而，先前的研究表明，VLP模型容易受到恶意对手制作的对抗样本的攻击。虽然现有的攻击手段在提高攻击效果和可转移性方面取得了巨大成功，但它们都专注于为每个输入样本生成扰动的特定实例攻击。在本文中，我们展示了VLP模型可能对所有输入样本产生一种新型通用对抗扰动（UAP）的脆弱性。尽管最初将现有的UAP算法移植用于执行攻击，在攻击区分模型时表现出了有效性，但当应用于VLP模型时结果令人不满。因此，我们重新审视了VLP模型训练中的多模态对齐，并提出了带有跨模态条件的对比训练扰动生成器（C-PGC）。具体而言，我们首先设计一个生成器，将跨模态信息作为条件输入，以指导训练。为了进一步利用跨模态交互，我们提出将训练目标制定为基于我们构建的正负图像文本对的多模态对比学习范式。通过使用设计的损失训练有条件的生成器，我们成功地迫使对抗样本远离VLP模型特征空间中的原始区域，从而本质上增强了攻击力。大量实验表明，我们的方法在各种VLP模型和视觉语言（V+L）任务中实现了出色的攻击性能。此外，C-PGC表现出出色的黑盒可转移性，并在愚弄流行的大型VLP模型如LLaVA和Qwen-VL方面取得了令人印象深刻的结果。

更新时间: 2024-06-08 15:01:54

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.05491v1

Learning Goal-Conditioned Policies from Sub-Optimal Offline Data via Metric Learning

We address the problem of learning optimal behavior from sub-optimal datasets for goal-conditioned offline reinforcement learning. To do so, we propose the use of metric learning to approximate the optimal value function for goal-conditioned offline RL problems under sparse rewards, invertible actions and deterministic transitions. We introduce distance monotonicity, a property for representations to recover optimality and propose an optimization objective that leads to such property. We use the proposed value function to guide the learning of a policy in an actor-critic fashion, a method we name MetricRL. Experimentally, we show that our method estimates optimal behaviors from severely sub-optimal offline datasets without suffering from out-of-distribution estimation errors. We demonstrate that MetricRL consistently outperforms prior state-of-the-art goal-conditioned RL methods in learning optimal policies from sub-optimal offline datasets.

Updated: 2024-06-08 14:56:23

标题: 从次优离线数据中通过度量学习学习目标条件策略

摘要: 我们针对目标条件离线强化学习中从次优数据集中学习最优行为的问题进行了讨论。为此，我们提出使用度量学习来近似稀疏奖励、可逆动作和确定性转换下目标条件离线RL问题的最优值函数。我们引入了距离单调性，一种用于恢复最优性的表示属性，并提出了导致这种属性的优化目标。我们使用提出的值函数以actor-critic的方式指导策略的学习，这种方法我们称之为MetricRL。实验结果表明，我们的方法能够从严重次优的离线数据集中估计最优行为，而不会受到超出分布估计错误的影响。我们展示了MetricRL在从次优离线数据集中学习最优策略方面始终优于先前最先进的目标条件RL方法。

更新时间: 2024-06-08 14:56:23

领域: cs.LG

下载: http://arxiv.org/abs/2402.10820v2

Efficient Low-Rank Matrix Estimation, Experimental Design, and Arm-Set-Dependent Low-Rank Bandits

We study low-rank matrix trace regression and the related problem of low-rank matrix bandits. Assuming access to the distribution of the covariates, we propose a novel low-rank matrix estimation method called LowPopArt and provide its recovery guarantee that depends on a novel quantity denoted by B(Q) that characterizes the hardness of the problem, where Q is the covariance matrix of the measurement distribution. We show that our method can provide tighter recovery guarantees than classical nuclear norm penalized least squares (Koltchinskii et al., 2011) in several problems. To perform efficient estimation with a limited number of measurements from an arbitrarily given measurement set A, we also propose a novel experimental design criterion that minimizes B(Q) with computational efficiency. We leverage our novel estimator and design of experiments to derive two low-rank linear bandit algorithms for general arm sets that enjoy improved regret upper bounds. This improves over previous works on low-rank bandits, which make somewhat restrictive assumptions that the arm set is the unit ball or that an efficient exploration distribution is given. To our knowledge, our experimental design criterion is the first one tailored to low-rank matrix estimation beyond the naive reduction to linear regression, which can be of independent interest.

Updated: 2024-06-08 14:56:22

标题: 高效低秩矩阵估计、实验设计和依赖于臂集的低秩赌博机

摘要: 我们研究了低秩矩阵迹回归以及相关的低秩矩阵赌博问题。在假设可以访问协变量分布的情况下，我们提出了一种名为LowPopArt的新颖低秩矩阵估计方法，并提供了其恢复保证，该保证取决于一个被标记为B(Q)的新颖数量，该数量描述了问题的难度，其中Q是测量分布的协方差矩阵。我们展示了我们的方法在几个问题中可以提供比经典核范数惩罚最小二乘（Koltchinskii等人，2011）更紧凑的恢复保证。为了在从任意给定的测量集A中使用有限数量的测量进行高效估计，我们还提出了一种新颖的实验设计标准，该标准通过计算效率最小化了B(Q)。我们利用我们的新颖估计器和实验设计来推导出两种适用于一般臂集的低秩线性赌博算法，这些算法享有改进的遗憾上界。这超过了以前关于低秩赌博的工作，这些工作对臂集是单位球或给定有效的探索分布做出了有些限制的假设。据我们所知，我们的实验设计标准是第一个专门针对低秩矩阵估计而非简单降维到线性回归的标准，这可能具有独立的兴趣。

更新时间: 2024-06-08 14:56:22

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.11156v2

Online Policy Distillation with Decision-Attention

Policy Distillation (PD) has become an effective method to improve deep reinforcement learning tasks. The core idea of PD is to distill policy knowledge from a teacher agent to a student agent. However, the teacher-student framework requires a well-trained teacher model which is computationally expensive.In the light of online knowledge distillation, we study the knowledge transfer between different policies that can learn diverse knowledge from the same environment.In this work, we propose Online Policy Distillation (OPD) with Decision-Attention (DA), an online learning framework in which different policies operate in the same environment to learn different perspectives of the environment and transfer knowledge to each other to obtain better performance together. With the absence of a well-performance teacher policy, the group-derived targets play a key role in transferring group knowledge to each student policy. However, naive aggregation functions tend to cause student policies quickly homogenize. To address the challenge, we introduce the Decision-Attention module to the online policies distillation framework. The Decision-Attention module can generate a distinct set of weights for each policy to measure the importance of group members. We use the Atari platform for experiments with various reinforcement learning algorithms, including PPO and DQN. In different tasks, our method can perform better than an independent training policy on both PPO and DQN algorithms. This suggests that our OPD-DA can transfer knowledge between different policies well and help agents obtain more rewards.

Updated: 2024-06-08 14:40:53

标题: 在线决策关注下的政策精炼

摘要: 政策蒸馏（PD）已成为改进深度强化学习任务的有效方法。PD的核心思想是将政策知识从教师代理传递给学生代理。然而，教师-学生框架需要一个经过良好训练的教师模型，这在计算上是昂贵的。在在线知识蒸馏的基础上，我们研究了不同政策之间的知识传递，这些政策可以从同一环境中学习不同的知识。在这项工作中，我们提出了带有决策关注（DA）的在线政策蒸馏（OPD），这是一个在线学习框架，在这个框架中，不同的政策在同一个环境中运行，学习环境的不同视角，并相互传递知识以获得更好的性能。在没有性能良好的教师政策的情况下，群体衍生的目标在将群体知识传递给每个学生政策时起着关键作用。然而，朴素的聚合函数往往会导致学生政策迅速同质化。为了解决这一挑战，我们在在线政策蒸馏框架中引入了决策关注模块。决策关注模块可以为每个政策生成独特的权重集，以衡量组成员的重要性。我们使用Atari平台进行实验，包括PPO和DQN等各种强化学习算法。在不同的任务中，我们的方法在PPO和DQN算法上均能表现优于独立训练政策。这表明我们的OPD-DA能够很好地在不同政策之间传递知识，并帮助代理获取更多奖励。

更新时间: 2024-06-08 14:40:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.05488v1

Efficient Topology-aware Data Augmentation for High-Degree Graph Neural Networks

In recent years, graph neural networks (GNNs) have emerged as a potent tool for learning on graph-structured data and won fruitful successes in varied fields. The majority of GNNs follow the message-passing paradigm, where representations of each node are learned by recursively aggregating features of its neighbors. However, this mechanism brings severe over-smoothing and efficiency issues over high-degree graphs (HDGs), wherein most nodes have dozens (or even hundreds) of neighbors, such as social networks, transaction graphs, power grids, etc. Additionally, such graphs usually encompass rich and complex structure semantics, which are hard to capture merely by feature aggregations in GNNs. Motivated by the above limitations, we propose TADA, an efficient and effective front-mounted data augmentation framework for GNNs on HDGs. Under the hood, TADA includes two key modules: (i) feature expansion with structure embeddings, and (ii) topology- and attribute-aware graph sparsification. The former obtains augmented node features and enhanced model capacity by encoding the graph structure into high-quality structure embeddings with our highly-efficient sketching method. Further, by exploiting task-relevant features extracted from graph structures and attributes, the second module enables the accurate identification and reduction of numerous redundant/noisy edges from the input graph, thereby alleviating over-smoothing and facilitating faster feature aggregations over HDGs. Empirically, TADA considerably improves the predictive performance of mainstream GNN models on 8 real homophilic/heterophilic HDGs in terms of node classification, while achieving efficient training and inference processes.

Updated: 2024-06-08 14:14:19

标题: 高度图神经网络的高效拓扑感知数据增强

摘要: 最近几年，图神经网络（GNNs）已经成为在图结构数据上学习的强大工具，并在各个领域取得了丰硕的成果。大多数GNNs遵循消息传递范式，其中每个节点的表示通过递归地聚合其邻居的特征来学习。然而，这种机制在高度图（HDGs）上带来了严重的过度平滑和效率问题，其中大多数节点有几十个（甚至上百个）邻居，例如社交网络、交易图、电网等。此外，这种图通常涵盖丰富复杂的结构语义，仅通过GNN中的特征聚合很难捕捉到。受到上述限制的启发，我们提出了TADA，一种适用于HDGs的高效有效的前置数据增强框架。在内部，TADA包括两个关键模块：（i）具有结构嵌入的特征扩展，以及（ii）拓扑和属性感知的图稀疏化。前者通过将图结构编码到高质量结构嵌入中，使用我们高效的草图方法获得了增强的节点特征和增强的模型容量。此外，通过利用从图结构和属性中提取的与任务相关的特征，第二个模块使得能够准确识别和减少输入图中的大量冗余/噪声边，从而缓解过度平滑，并促进在HDGs上更快的特征聚合。在实证上，TADA在节点分类方面显著改善了主流GNN模型在8个真实同构/异构HDGs上的预测性能，同时实现了高效的训练和推理过程。

更新时间: 2024-06-08 14:14:19

领域: cs.LG

下载: http://arxiv.org/abs/2406.05482v1

Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers

The most recent pointwise Large Language Model (LLM) rankers have achieved remarkable ranking results. However, these rankers are hindered by two major drawbacks: (1) they fail to follow a standardized comparison guidance during the ranking process, and (2) they struggle with comprehensive considerations when dealing with complicated passages. To address these shortcomings, we propose to build a ranker that generates ranking scores based on a set of criteria from various perspectives. These criteria are intended to direct each perspective in providing a distinct yet synergistic evaluation. Our research, which examines eight datasets from the BEIR benchmark demonstrates that incorporating this multi-perspective criteria ensemble approach markedly enhanced the performance of pointwise LLM rankers.

Updated: 2024-06-08 14:09:22

标题: 动态生成多样化标准以改善逐点LLM排序器

摘要: 最近的逐点大型语言模型（LLM）排名器取得了显著的排名结果。然而，这些排名器受到两个主要缺点的限制：（1）它们在排名过程中未能遵循标准比较指导，（2）在处理复杂段落时，它们很难进行全面考虑。为了解决这些缺点，我们提出建立一个排名器，根据各种角度的一组标准生成排名分数。这些标准旨在引导每个角度提供独特而协同的评估。我们的研究检查了来自BEIR基准测试的八个数据集，证明了将这种多角度标准集成方法纳入点对LLM排名器显著提高了性能。

更新时间: 2024-06-08 14:09:22

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.11960v2

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

The field of image synthesis is currently flourishing due to the advancements in diffusion models. While diffusion models have been successful, their computational intensity has prompted the pursuit of more efficient alternatives. As a representative work, non-autoregressive Transformers (NATs) have been recognized for their rapid generation. However, a major drawback of these models is their inferior performance compared to diffusion models. In this paper, we aim to re-evaluate the full potential of NATs by revisiting the design of their training and inference strategies. Specifically, we identify the complexities in properly configuring these strategies and indicate the possible sub-optimality in existing heuristic-driven designs. Recognizing this, we propose to go beyond existing methods by directly solving the optimal strategies in an automatic framework. The resulting method, named AutoNAT, advances the performance boundaries of NATs notably, and is able to perform comparably with the latest diffusion models at a significantly reduced inference cost. The effectiveness of AutoNAT is validated on four benchmark datasets, i.e., ImageNet-256 & 512, MS-COCO, and CC3M. Our code is available at https://github.com/LeapLabTHU/ImprovedNAT.

Updated: 2024-06-08 13:52:20

标题: 重新审视非自回归Transformer以实现高效图像合成

摘要: 当前，由于扩散模型的进展，图像合成领域正蓬勃发展。虽然扩散模型取得了成功，但其计算强度促使人们寻求更高效的替代方案。作为代表作品，非自回归变压器（NATs）以其快速生成而被认可。然而，这些模型的一个主要缺点是它们与扩散模型相比性能较差。本文旨在通过重新审视其训练和推理策略的设计，重新评估NATs的全部潜力。具体来说，我们确定了适当配置这些策略的复杂性，并指出现有启发式驱动设计中可能存在的次优性。鉴于此，我们提出通过直接在自动框架中解决最优策略来超越现有方法。结果方法被命名为AutoNAT，显著推进了NATs的性能边界，并能够以显著降低的推理成本与最新的扩散模型进行可比较的性能。AutoNAT的有效性在四个基准数据集上得到验证，即ImageNet-256和512，MS-COCO和CC3M。我们的代码可在https://github.com/LeapLabTHU/ImprovedNAT 上获得。

更新时间: 2024-06-08 13:52:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05478v1

Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

Interpretability is crucial for machine learning algorithms in high-stakes medical applications. However, high-performing neural networks typically cannot explain their predictions. Post-hoc explanation methods provide a way to understand neural networks but have been shown to suffer from conceptual problems. Moreover, current research largely focuses on providing local explanations for individual samples rather than global explanations for the model itself. In this paper, we propose Attri-Net, an inherently interpretable model for multi-label classification that provides local and global explanations. Attri-Net first counterfactually generates class-specific attribution maps to highlight the disease evidence, then performs classification with logistic regression classifiers based solely on the attribution maps. Local explanations for each prediction can be obtained by interpreting the attribution maps weighted by the classifiers' weights. Global explanation of whole model can be obtained by jointly considering learned average representations of the attribution maps for each class (called the class centers) and the weights of the linear classifiers. To ensure the model is ``right for the right reason", we further introduce a mechanism to guide the model's explanations to align with human knowledge. Our comprehensive evaluations show that Attri-Net can generate high-quality explanations consistent with clinical knowledge while not sacrificing classification performance.

Updated: 2024-06-08 13:52:02

标题: Attri-Net：一种全局和局部固有可解释的多标签分类模型，使用特定类别的反事实证据

摘要: 可解释性对于高风险医疗应用中的机器学习算法至关重要。然而，高性能神经网络通常无法解释其预测结果。事后解释方法提供了理解神经网络的一种方式，但已被证明存在概念问题。此外，当前研究主要集中在为个别样本提供局部解释，而不是为模型本身提供全局解释。在本文中，我们提出了一种名为Attri-Net的本质可解释的多标签分类模型，它提供了局部和全局解释。Attri-Net首先通过反事实生成类别特定的归因图来突出疾病证据，然后仅基于归因图使用逻辑回归分类器进行分类。通过解释由分类器权重加权的归因图，可以获得每个预测的局部解释。通过同时考虑每个类别的学习平均归因图表示（称为类中心）和线性分类器的权重，可以获得整个模型的全局解释。为了确保模型是“基于正确原因”，我们进一步引入了一个机制来引导模型的解释与人类知识一致。我们的综合评估表明，Attri-Net能够生成与临床知识一致的高质量解释，同时不牺牲分类性能。

更新时间: 2024-06-08 13:52:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.05477v1

FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models

The rapid development and reduced barriers to entry for Text-to-Image (T2I) models have raised concerns about the biases in their outputs, but existing research lacks a holistic definition and evaluation framework of biases, limiting the enhancement of debiasing techniques. To address this issue, we introduce FAIntbench, a holistic and precise benchmark for biases in T2I models. In contrast to existing benchmarks that evaluate bias in limited aspects, FAIntbench evaluate biases from four dimensions: manifestation of bias, visibility of bias, acquired attributes, and protected attributes. We applied FAIntbench to evaluate seven recent large-scale T2I models and conducted human evaluation, whose results demonstrated the effectiveness of FAIntbench in identifying various biases. Our study also revealed new research questions about biases, including the side-effect of distillation. The findings presented here are preliminary, highlighting the potential of FAIntbench to advance future research aimed at mitigating the biases in T2I models. Our benchmark is publicly available to ensure the reproducibility.

Updated: 2024-06-08 13:41:36

标题: FAIntbench：文本到图像模型中偏见评估的全面和精确基准

摘要: 快速发展和降低进入门槛的文本到图像（T2I）模型引发了人们对其输出偏见的担忧，但现有研究缺乏关于偏见的整体定义和评估框架，限制了去偏见技术的改进。为解决这一问题，我们引入了FAIntbench，一个关于T2I模型偏见的全面而精确的基准。与现有基准评估有限方面的偏见不同，FAIntbench从四个维度评估偏见：偏见的表现、偏见的可见性、获取的属性和受保护的属性。我们应用FAIntbench评估了七个最近的大规模T2I模型，并进行了人类评估，结果表明FAIntbench在识别各种偏见方面的有效性。我们的研究还揭示了有关偏见的新研究问题，包括蒸馏的副作用。这里提出的发现是初步的，突显了FAIntbench在推进旨在减少T2I模型偏见的未来研究的潜力。我们的基准是公开可用的，以确保可重复性。

更新时间: 2024-06-08 13:41:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17814v3

A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications

Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a task-oriented dialogue (ToD) system for anomaly detection (AD) in datasets of multicast messages e.g., generic object oriented substation event (GOOSE) and sampled value (SV) in digital substations using large language models (LLMs). This model has a lower potential error and better scalability and adaptability than a process that considers the cybersecurity guidelines recommended by humans, known as the human-in-the-loop (HITL) process. Also, this methodology significantly reduces the effort required when addressing new cyber threats or anomalies compared with machine learning (ML) techniques, since it leaves the models complexity and precision unaffected and offers a faster implementation. These findings present a comparative assessment, conducted utilizing standard and advanced performance evaluation metrics for the proposed AD framework and the HITL process. To generate and extract datasets of IEC 61850 communications, a hardware-in-the-loop (HIL) testbed was employed.

Updated: 2024-06-08 13:28:50

标题: 智能电网通信中多播消息异常检测的一种新型生成式AI框架

摘要: 数字变电站中的网络安全漏洞可能对电力系统运行的稳定性和可靠性构成重大挑战。为了解决这些挑战，需要采取防御和缓解技术。在数字变电站中，识别和检测信息和通信技术（ICT）中的异常对确保安全设备互动至关重要。本文提出了一种面向任务的对话（ToD）系统，用于使用大型语言模型（LLMs）在数字变电站中的多播消息数据集中进行异常检测（AD），例如通用面向对象的变电站事件（GOOSE）和采样值（SV）。该模型具有比人类推荐的网络安全指导更低的潜在错误率，更好的可扩展性和适应性，这些指导被称为人机协同（HITL）过程。此外，与机器学习（ML）技术相比，这种方法显著减少了处理新的网络威胁或异常所需的工作量，因为它使模型的复杂性和精度不受影响，并提供更快的实施。这些发现提供了针对所提出的AD框架和HITL过程的标准和先进性能评估指标的比较评估。为了生成和提取IEC 61850通信数据集，采用了硬件在环（HIL）测试平台。

更新时间: 2024-06-08 13:28:50

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.05472v1

A case study of spatiotemporal forecasting techniques for weather forecasting

The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most essential processes in this domain, and weather forecasting has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource-intensive and time-consuming. Numerous studies have proposed time series-based models as a viable alternative to numerical forecasts. Recent research in the area of time series analysis indicates significant advancements, particularly regarding the use of state-space-based models (white box) and, more recently, the integration of machine learning and deep neural network-based models (black box). The most famous examples of such models are RNNs and transformers. These models have demonstrated remarkable results in the field of time-series analysis and have demonstrated effectiveness in modelling temporal correlations. It is crucial to capture both temporal and spatial correlations for a spatiotemporal process, as the values at nearby locations and time affect the values of a spatiotemporal process at a specific point. This self-contained paper explores various regional data-driven weather forecasting methods, i.e., forecasting over multiple latitude-longitude points (matrix-shaped spatial grid) to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational costs while improving accuracy. In particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to the state-of-the-art models without the need for training. We provide convincing numerical experiments to show that the proposed approach is practical.

Updated: 2024-06-08 13:24:26

标题: 气象预报的时空预测技术案例研究

摘要: 绝大多数现实世界的过程都是时空过程，由它们生成的数据表现出空间和时间演变。天气是这一领域中最重要的过程之一，天气预报已成为我们日常生活中不可或缺的一部分。天气数据分析被认为是最复杂和具有挑战性的任务。尽管数值天气预报模型目前是最先进的，但它们需要大量资源且耗时。许多研究提出基于时间序列的模型作为数值预测的一个可行替代方案。时间序列分析领域的最新研究表明了显著的进展，尤其是关于使用基于状态空间的模型（白盒）以及最近整合机器学习和基于深度神经网络的模型（黑盒）。这些模型中最著名的例子是循环神经网络（RNNs）和transformers。这些模型在时间序列分析领域表现出色，并在建模时间相关性方面表现出有效性。对于时空过程来说，捕捉时间和空间相关性至关重要，因为附近位置和时间的值会影响特定点处时空过程的值。本文探讨了各种基于区域数据驱动的天气预报方法，即在多个纬度-经度点（矩阵形状的空间网格）上进行预测以捕捉时空相关性。结果显示，时空预测模型降低了计算成本，同时提高了准确性。特别是，提出的基于张量列动态模式分解的预测模型具有与最先进模型相媲美的准确性，而无需进行训练。我们提供令人信服的数值实验结果，以展示所提出的方法的实用性。

更新时间: 2024-06-08 13:24:26

领域: cs.LG,cs.CV,cs.NA,math.NA,physics.ao-ph,stat.ML

下载: http://arxiv.org/abs/2209.14782v2

RandONet: Shallow-Networks with Random Projections for learning linear and nonlinear operators

Deep Operator Networks (DeepOnets) have revolutionized the domain of scientific machine learning for the solution of the inverse problem for dynamical systems. However, their implementation necessitates optimizing a high-dimensional space of parameters and hyperparameters. This fact, along with the requirement of substantial computational resources, poses a barrier to achieving high numerical accuracy. Here, inpsired by DeepONets and to address the above challenges, we present Random Projection-based Operator Networks (RandONets): shallow networks with random projections that learn linear and nonlinear operators. The implementation of RandONets involves: (a) incorporating random bases, thus enabling the use of shallow neural networks with a single hidden layer, where the only unknowns are the output weights of the network's weighted inner product; this reduces dramatically the dimensionality of the parameter space; and, based on this, (b) using established least-squares solvers (e.g., Tikhonov regularization and preconditioned QR decomposition) that offer superior numerical approximation properties compared to other optimization techniques used in deep-learning. In this work, we prove the universal approximation accuracy of RandONets for approximating nonlinear operators and demonstrate their efficiency in approximating linear nonlinear evolution operators (right-hand-sides (RHS)) with a focus on PDEs. We show, that for this particular task, RandONets outperform, both in terms of numerical approximation accuracy and computational cost, the ``vanilla" DeepOnets.

Updated: 2024-06-08 13:20:48

标题: RandONet：使用随机投影的浅层网络用于学习线性和非线性运算符

摘要: Deep Operator Networks (DeepOnets)已经彻底改革了科学机器学习领域，用于解决动态系统的反问题。然而，它们的实施需要优化高维参数和超参数空间。这一事实，加上对大量计算资源的需求，构成了实现高数值精度的障碍。在DeepONets的启发下，为了解决上述挑战，我们提出了基于随机投影的操作网络（RandONets）：使用随机投影学习线性和非线性运算符的浅层网络。RandONets的实施包括：（a）整合随机基础，从而使得可以使用具有单隐藏层的浅层神经网络，其中唯一未知的是网络加权内积的输出权重；这极大地减少了参数空间的维度；基于此，（b）使用已建立的最小二乘求解器（例如，Tikhonov正则化和预条件的QR分解），相比于深度学习中使用的其他优化技术，它们具有更优越的数值逼近特性。在这项工作中，我们证明了RandONets用于逼近非线性运算符的普遍逼近精度，并展示了它们在逼近线性非线性演化运算符（右手边（RHS））方面的效率，重点放在偏微分方程上。我们展示了，对于这个特定任务，RandONets在数值逼近精度和计算成本方面均优于“原始”的DeepOnets。

更新时间: 2024-06-08 13:20:48

领域: cs.LG,cs.NA,math.DS,math.NA,65M32, 65D12, 65J22, 41A35, 68T20, 65D15, 68T07, 68W20, 41A35

下载: http://arxiv.org/abs/2406.05470v1

Bayesian vs. PAC-Bayesian Deep Neural Network Ensembles

Bayesian neural networks address epistemic uncertainty by learning a posterior distribution over model parameters. Sampling and weighting networks according to this posterior yields an ensemble model referred to as Bayes ensemble. Ensembles of neural networks (deep ensembles) can profit from the cancellation of errors effect: Errors by ensemble members may average out and the deep ensemble achieves better predictive performance than each individual network. We argue that neither the sampling nor the weighting in a Bayes ensemble are particularly well-suited for increasing generalization performance, as they do not support the cancellation of errors effect, which is evident in the limit from the Bernstein-von~Mises theorem for misspecified models. In contrast, a weighted average of models where the weights are optimized by minimizing a PAC-Bayesian generalization bound can improve generalization performance. This requires that the optimization takes correlations between models into account, which can be achieved by minimizing the tandem loss at the cost that hold-out data for estimating error correlations need to be available. The PAC-Bayesian weighting increases the robustness against correlated models and models with lower performance in an ensemble. This allows us to safely add several models from the same learning process to an ensemble, instead of using early-stopping for selecting a single weight configuration. Our study presents empirical results supporting these conceptual considerations on four different classification datasets. We show that state-of-the-art Bayes ensembles from the literature, despite being computationally demanding, do not improve over simple uniformly weighted deep ensembles and cannot match the performance of deep ensembles weighted by optimizing the tandem loss, which additionally come with non-vacuous generalization guarantees.

Updated: 2024-06-08 13:19:18

标题: 贝叶斯 vs. PAC-Bayesian 深度神经网络集成

摘要: 贝叶斯神经网络通过学习模型参数上的后验分布来解决认知不确定性问题。根据这个后验分布对网络进行采样和加权可以产生一个被称为贝叶斯集成模型的集成模型。神经网络集成（深度集成）可以从错误抵消效应中获益：集成成员的错误可能会平均化，深度集成可以比单个网络实现更好的预测性能。我们认为，在贝叶斯集成中，无论是采样还是加权都不太适合提高泛化性能，因为它们不支持错误抵消效应，这在伯恩斯坦-冯米斯定理中对错误规范模型限制中是明显的。相反，通过最小化PAC-Bayesian泛化边界来优化权重的模型加权平均可以提高泛化性能。这需要优化过程考虑模型之间的相关性，可以通过最小化串联损失来实现，但需要消耗持有数据来估计错误相关性。PAC-Bayesian加权可以提高对相关模型和性能较低模型的鲁棒性。这使我们能够安全地将来自同一学习过程的多个模型添加到集成中，而不是使用早停来选择单个权重配置。我们的研究在四个不同的分类数据集上呈现了支持这些概念考虑的经验结果。我们展示了来自文献中最先进的贝叶斯集成，尽管在计算上要求较高，但并没有比简单均匀加权的深度集成更好，并且无法与通过优化串联损失进行加权的深度集成相匹配，后者还附带着非空泛化保证。

更新时间: 2024-06-08 13:19:18

领域: cs.LG

下载: http://arxiv.org/abs/2406.05469v1

ITCMA: A Generative Agent Based on a Computational Consciousness Structure

Large Language Models (LLMs) still face challenges in tasks requiring understanding implicit instructions and applying common-sense knowledge. In such scenarios, LLMs may require multiple attempts to achieve human-level performance, potentially leading to inaccurate responses or inferences in practical environments, affecting their long-term consistency and behavior. This paper introduces the Internal Time-Consciousness Machine (ITCM), a computational consciousness structure to simulate the process of human consciousness. We further propose the ITCM-based Agent (ITCMA), which supports action generation and reasoning in open-world settings, and can independently complete tasks. ITCMA enhances LLMs' ability to understand implicit instructions and apply common-sense knowledge by considering agents' interaction and reasoning with the environment. Evaluations in the Alfworld environment show that trained ITCMA outperforms the state-of-the-art (SOTA) by 9% on the seen set. Even untrained ITCMA achieves a 96% task completion rate on the seen set, 5% higher than SOTA, indicating its superiority over traditional intelligent agents in utility and generalization. In real-world tasks with quadruped robots, the untrained ITCMA achieves an 85% task completion rate, which is close to its performance in the unseen set, demonstrating its comparable utility and universality in real-world settings.

Updated: 2024-06-08 13:04:40

标题: ITCMA: 基于计算意识结构的生成式智能体

摘要: 大型语言模型（LLMs）在需要理解隐含指令和应用常识知识的任务中仍然面临挑战。在这种情况下，LLMs可能需要多次尝试才能达到人类水平的表现，可能导致在实际环境中产生不准确的回应或推理，影响它们的长期一致性和行为。本文介绍了内部时间意识机器（ITCM），这是一种计算意识结构，用于模拟人类意识的过程。我们进一步提出了基于ITCM的Agent（ITCMA），它支持在开放世界环境中生成动作和推理，并可以独立完成任务。ITCMA通过考虑与环境的交互和推理来增强LLMs理解隐含指令和应用常识知识的能力。在Alfworld环境中的评估表明，训练后的ITCMA在已知集上的表现优于最先进技术（SOTA）9%。即使未经训练的ITCMA在已知集上的任务完成率也达到96%，比SOTA高5%，表明其在效用和泛化方面优于传统智能Agent。在四足机器人的实际任务中，未经训练的ITCMA完成率达到85%，接近未知集中的表现，展示了其在实际环境中的可比性和普适性。

更新时间: 2024-06-08 13:04:40

领域: cs.AI,cs.HC,q-bio.NC,I.2; J.4

下载: http://arxiv.org/abs/2403.20097v2

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models

Self-supervised speech models have shown to be useful for various tasks, but their large size limits the use in devices with low computing power and memory. In this work, we explore early exit, an approach for reducing latency by exiting the forward process of a network early. Most approaches of early exit need a separate early exit model for each task, with some even requiring fine-tuning of the entire pretrained model. We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss, eliminating the need for multiple round of training and fine-tuning. DAISY matches the performance of HuBERT on the MiniSUPERB benchmark, but with much faster inference times. Our analysis on the adaptivity of DAISY shows that the model exits early (using fewer layers) on clean data while exits late (using more layers) on noisy data, dynamically adjusting the computational cost of inference based on the noise level of each sample.

Updated: 2024-06-08 12:58:13

标题: DAISY：数据自适应自监督早期退出用于语音表示模型

摘要: 自我监督语音模型已被证明在各种任务中非常有用，但它们的巨大尺寸限制了在计算能力和内存较低的设备上的使用。在这项工作中，我们探讨了提前退出的方法，通过提前退出网络的前向过程来减少延迟。大多数提前退出方法都需要为每个任务准备一个单独的提前退出模型，甚至可能需要对整个预训练模型进行微调。我们引入了数据自适应自我监督提前退出（DAISY）方法，该方法根据自我监督损失决定何时退出，消除了多轮训练和微调的需要。DAISY在MiniSUPERB基准测试上与HuBERT的性能相匹配，但推理时间更快。我们对DAISY的适应性进行的分析表明，该模型在干净数据上提前退出（使用更少的层），而在嘈杂数据上退出较晚（使用更多的层），根据每个样本的噪声水平动态调整推理的计算成本。

更新时间: 2024-06-08 12:58:13

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.05464v1

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. Under the new assumption that the human's partial observability is known and accounted for, we then analyze how much information the feedback process provides about the return function. We show that sometimes, the human's feedback determines the return function uniquely up to an additive constant, but in other realistic cases, there is irreducible ambiguity. We propose exploratory research directions to help tackle these challenges and caution against blindly applying RLHF in partially observable settings.

Updated: 2024-06-08 12:49:22

标题: 当你的AI欺骗你：来自人类反馈的强化学习中部分可观察性的挑战

摘要: 过去对人类反馈强化学习（RLHF）的分析假设人类评估者完全观察到环境。当人类反馈仅基于部分观察时会发生什么？我们正式定义了两种失败情况：欺骗性膨胀和过度辩解。将人类建模为对轨迹信念的Boltzmann理性，我们证明了在什么条件下RLHF保证会导致策略欺骗性地膨胀其表现，过度辩解其行为以留下印象，或两者兼而有之。在新的假设下，即人类的部分可观察性是已知且被考虑的，然后我们分析了反馈过程提供关于回报函数多少信息。我们展示有时，人类的反馈唯一确定了回报函数，只有一个可加常数，但在其他现实情况下，存在不可减少的模糊性。我们提出了探索性研究方向，以帮助解决这些挑战，并警告不要盲目应用RLHF在部分可观察的环境中。

更新时间: 2024-06-08 12:49:22

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.17747v3

Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition

Few-shot named entity recognition (NER) systems recognize entities using a few labeled training examples. The general pipeline consists of a span detector to identify entity spans in text and an entity-type classifier to assign types to entities. Current span detectors rely on extensive manual labeling to guide training. Almost every span detector requires initial training on basic span features followed by adaptation to task-specific features. This process leads to repetitive training of the basic span features among span detectors. Additionally, metric-based entity-type classifiers, such as prototypical networks, typically employ a specific metric that gauges the distance between the query sample and entity-type referents, ultimately assigning the most probable entity type to the query sample. However, these classifiers encounter the sample dependency problem, primarily stemming from the limited samples available for each entity-type referent. To address these challenges, we proposed an improved few-shot NER pipeline. First, we introduce a steppingstone span detector that is pre-trained on open-domain Wikipedia data. It can be used to initialize the pipeline span detector to reduce the repetitive training of basic features. Second, we leverage a large language model (LLM) to set reliable entity-type referents, eliminating reliance on few-shot samples of each type. Our model exhibits superior performance with fewer training steps and human-labeled data compared with baselines, as demonstrated through extensive experiments on various datasets. Particularly in fine-grained few-shot NER settings, our model outperforms strong baselines, including ChatGPT. We will publicly release the code, datasets, LLM outputs, and model checkpoints.

Updated: 2024-06-08 12:36:30

标题: 在少样本命名实体识别中解决重复训练和样本依赖性问题

摘要: 少样本命名实体识别（NER）系统使用少量标记的训练示例识别实体。一般的流程包括一个跨度检测器来识别文本中的实体跨度，以及一个实体类型分类器来分配类型给实体。当前的跨度检测器依赖于广泛的手动标记来指导训练。几乎每个跨度检测器都需要在基本跨度特征上进行初始训练，然后适应特定任务的特征。这个过程导致了在跨度检测器之间重复训练基本跨度特征。此外，基于度量的实体类型分类器，如原型网络，通常使用一个特定的度量来衡量查询样本与实体类型参照之间的距离，最终将最可能的实体类型分配给查询样本。然而，这些分类器遇到了样本依赖性问题，主要源于每种实体类型参照的有限样本。为了解决这些挑战，我们提出了一个改进的少样本NER流程。首先，我们引入了一个基于开放领域维基百科数据预训练的跨度检测器。它可以用来初始化流程中的跨度检测器，以减少基本特征的重复训练。其次，我们利用一个大型语言模型（LLM）来设置可靠的实体类型参照，消除了对每种类型的少样本的依赖。我们的模型在比较了大量实验数据集后，表现出了比基线更出色的性能，而且训练步骤和人工标记数据更少。特别是在细粒度少样本NER设置中，我们的模型胜过包括ChatGPT在内的强基线。我们将公开发布代码、数据集、LLM输出和模型检查点。

更新时间: 2024-06-08 12:36:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05460v1

PriviFy: Designing Tangible Interfaces for Configuring IoT Privacy Preferences

The Internet of Things (IoT) devices, such as smart speakers can collect sensitive user data, necessitating the need for users to manage their privacy preferences. However, configuring these preferences presents users with multiple challenges. Existing privacy controls often lack transparency, are hard to understand, and do not provide meaningful choices. On top of that, users struggle to locate privacy settings due to multiple menus or confusing labeling, which discourages them from using these controls. We introduce PriviFy (Privacy Simplify-er), a novel and user-friendly tangible interface that can simplify the configuration of smart devices privacy settings. PriviFy is designed to propose an enhancement to existing hardware by integrating additional features that improve privacy management. We envision that positive feedback and user experiences from our study will inspire consumer product developers and smart device manufacturers to incorporate the useful design elements we have identified. Using fidelity prototyping, we iteratively designed PriviFy prototype with 20 participants to include interactive features such as knobs, buttons, lights, and notifications that allow users to configure their data privacy preferences and receive confirmation of their choices. We further evaluated PriviFy high-fidelity prototype with 20 more participants. Our results show that PriviFy helps simplify the complexity of privacy preferences configuration with a significant usability score at p < .05 (P = 0.000000017, t = -8.8639). PriviFy successfully met users privacy needs and enabled them to regain control over their data. We conclude by recommending the importance of designing specific privacy configuration options.

Updated: 2024-06-08 12:35:46

标题: PriviFy：为配置物联网隐私偏好设计有形界面

摘要: 物联网（IoT）设备，如智能音箱，可以收集敏感用户数据，这要求用户管理他们的隐私偏好。然而，配置这些偏好给用户带来了多重挑战。现有的隐私控制通常缺乏透明度，难以理解，并且没有提供有意义的选择。除此之外，由于多个菜单或混乱的标签，用户很难找到隐私设置，这使他们不愿使用这些控制。我们引入了PriviFy（隐私简化器），这是一个新颖且用户友好的有形界面，可以简化智能设备隐私设置的配置。PriviFy旨在通过集成改进隐私管理的附加功能，为现有硬件提出增强方案。我们设想，从我们的研究中得到的积极反馈和用户体验将激发消费产品开发人员和智能设备制造商整合我们已识别的有用设计元素。通过忠实原型设计，我们与20名参与者反复设计了PriviFy原型，包括旋钮、按钮、灯光和通知等交互功能，使用户能够配置他们的数据隐私偏好并接收其选择的确认。我们进一步以另外20名参与者评估了PriviFy高保真原型。我们的结果表明，PriviFy帮助简化了隐私偏好配置的复杂性，具有显著的可用性分数，p < .05（P = 0.000000017，t = -8.8639）。PriviFy成功满足了用户的隐私需求，并使他们重新获得对数据的控制。我们结论是，设计特定的隐私配置选项的重要性。

更新时间: 2024-06-08 12:35:46

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2406.05459v1

PrivacyCube: Data Physicalization for Enhancing Privacy Awareness in IoT

People are increasingly bringing Internet of Things (IoT) devices into their homes without understanding how their data is gathered, processed, and used. We describe PrivacyCube, a novel data physicalization designed to increase privacy awareness within smart home environments. PrivacyCube visualizes IoT data consumption by displaying privacy-related notices. PrivacyCube aims to assist smart home occupants to (i) understand their data privacy better and (ii) have conversations around data management practices of IoT devices used within their homes. Using PrivacyCube, households can learn and make informed privacy decisions collectively. To evaluate PrivacyCube, we used multiple research methods throughout the different stages of design. We first conducted a focus group study in two stages with six participants to compare PrivacyCube to text and state-of-the-art privacy policies. We then deployed PrivacyCube in a 14-day-long field study with eight households. Our results show that PrivacyCube helps home occupants comprehend IoT privacy better with significantly increased privacy awareness at p < .05 (p=0.00041, t= -5.57). Participants preferred PrivacyCube over text privacy policies because it was comprehensive and easier to use. PrivacyCube and Privacy Label, a state-of-the-art approach, both received positive reviews from participants, with PrivacyCube being preferred for its interactivity and ability to encourage conversations. PrivacyCube was also considered by home occupants as a piece of home furniture, encouraging them to socialize and discuss IoT privacy implications using this device.

Updated: 2024-06-08 12:20:42

标题: PrivacyCube：数据实体化以增强物联网隐私意识

摘要: 人们越来越多地将物联网（IoT）设备引入家庭，却不了解其数据是如何收集、处理和使用的。我们描述了PrivacyCube，一种旨在提高智能家居环境中隐私意识的新型数据可视化技术。PrivacyCube通过显示与隐私相关的通知来可视化物联网数据的使用情况。PrivacyCube旨在帮助智能家居居民（i）更好地了解其数据隐私，（ii）就其家庭内使用的物联网设备的数据管理实践进行对话。使用PrivacyCube，家庭可以集体学习并做出知情的隐私决策。为了评估PrivacyCube，我们在设计的不同阶段使用了多种研究方法。我们首先进行了两个阶段的焦点小组研究，共有六名参与者，以比较PrivacyCube与文本和最先进的隐私政策。然后我们在八个家庭进行了为期14天的现场研究。我们的结果显示，PrivacyCube有助于家庭居民更好地理解物联网隐私，隐私意识显著增加，显著性水平小于.05（p=0.00041，t=-5.57）。参与者更喜欢PrivacyCube而不是文本隐私政策，因为它更全面，更易于使用。参与者对PrivacyCube和Privacy Label（一种先进方法）都给予了积极的评价，但更喜欢PrivacyCube，因为它具有互动性和促进对话的能力。家庭居民还认为PrivacyCube是家具的一部分，鼓励他们使用这种设备社交和讨论物联网隐私的影响。

更新时间: 2024-06-08 12:20:42

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2406.05451v1

LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs

Knowledge Graph (KG) inductive reasoning, which aims to infer missing facts from new KGs that are not seen during training, has been widely adopted in various applications. One critical challenge of KG inductive reasoning is handling low-resource scenarios with scarcity in both textual and structural aspects. In this paper, we attempt to address this challenge with Large Language Models (LLMs). Particularly, we utilize the state-of-the-art LLMs to generate a graph-structural prompt to enhance the pre-trained Graph Neural Networks (GNNs), which brings us new methodological insights into the KG inductive reasoning methods, as well as high generalizability in practice. On the methodological side, we introduce a novel pretraining and prompting framework ProLINK, designed for low-resource inductive reasoning across arbitrary KGs without requiring additional training. On the practical side, we experimentally evaluate our approach on 36 low-resource KG datasets and find that ProLINK outperforms previous methods in three-shot, one-shot, and zero-shot reasoning tasks, exhibiting average performance improvements by 20%, 45%, and 147%, respectively. Furthermore, ProLINK demonstrates strong robustness for various LLM promptings as well as full-shot scenarios.

Updated: 2024-06-08 12:16:22

标题: LLM作为提示器：对任意知识图进行低资源归纳推理

摘要: 知识图谱（KG）归纳推理旨在推断在训练过程中未见过的新KG中缺失的事实，在各种应用中被广泛采用。KG归纳推理的一个关键挑战是处理文本和结构方面都稀缺的低资源场景。在本文中，我们尝试利用大型语言模型（LLMs）来应对这一挑战。特别地，我们利用最先进的LLMs生成一个图结构提示，以增强预训练图神经网络（GNNs），为我们带来了对KG归纳推理方法的新方法论见解，以及在实践中的高泛化性。在方法论方面，我们引入了一种新颖的预训练和提示框架ProLINK，旨在跨任意KG进行低资源归纳推理，而无需额外的训练。在实践方面，我们在36个低资源KG数据集上进行了实验评估，发现ProLINK在三次、一次和零次推理任务中表现优于先前的方法，分别平均性能提升了20％、45％和147％。此外，ProLINK对于各种LLM提示以及全射击场景表现出强大的鲁棒性。

更新时间: 2024-06-08 12:16:22

领域: cs.AI,cs.CL,cs.SI

下载: http://arxiv.org/abs/2402.11804v2

Coherent Zero-Shot Visual Instruction Generation

Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge. This paper introduces a simple, training-free framework to tackle the issues, capitalizing on the advancements in diffusion models and large language models (LLMs). Our approach systematically integrates text comprehension and image generation to ensure visual instructions are visually appealing and maintain consistency and accuracy throughout the instruction sequence. We validate the effectiveness by testing multi-step instructions and comparing the text alignment and consistency with several baselines. Our experiments show that our approach can visualize coherent and visually pleasing instructions

Updated: 2024-06-08 12:07:32

标题: 一致的零射击视觉指导生成

摘要: 尽管在文本到图像合成方面取得了进展，特别是使用扩散模型，生成需要跨连续步骤中对象的一致表示和平滑状态转换的视觉指导仍然是一个艰巨的挑战。本文介绍了一个简单的、无需训练的框架来解决这些问题，利用了扩散模型和大型语言模型（LLMs）的进步。我们的方法系统地整合了文本理解和图像生成，以确保视觉指导在整个指导序列中既具有视觉吸引力又保持一致性和准确性。我们通过测试多步骤指导并将文本对齐和一致性与几个基线进行比较来验证有效性。我们的实验表明，我们的方法可以可视化连贯且视觉上令人愉悦的指导。

更新时间: 2024-06-08 12:07:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.04337v2

What Do Language Models Learn in Context? The Structured Task Hypothesis

Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL). Understandably, a swath of research has been dedicated to uncovering the theories underpinning ICL. One popular hypothesis explains ICL by task selection. LLMs identify the task based on the demonstration and generalize it to the prompt. Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration. Finally, a third hypothesis argues that LLMs use the demonstration to select a composition of tasks learned during pre-training to perform ICL. In this paper, we empirically explore these three hypotheses that explain LLMs' ability to learn in context with a suite of experiments derived from common text classification tasks. We invalidate the first two hypotheses with counterexamples and provide evidence in support of the last hypothesis. Our results suggest an LLM could learn a novel task in context via composing tasks learned during pre-training.

Updated: 2024-06-08 11:59:08

标题: 语言模型在上下文中学习到了什么？结构任务假设

摘要: 大型语言模型(LLMs)展示了一种引人注目的能力，即从演示中呈现的上下文示例中学习一项新任务，称为上下文学习(ICL)。可以理解的是，大量研究致力于揭示支撑ICL的理论。一个流行的假设解释了ICL是通过任务选择实现的。LLMs根据演示确定任务并将其推广到提示。另一个流行的假设是ICL是一种元学习的形式，即模型在预训练时学习学习算法并将其应用于演示。最后，第三个假设认为LLMs使用演示来选择在预训练期间学习的任务组合来执行ICL。在本文中，我们通过一系列源自常见文本分类任务的实验，实证探讨了解释LLMs能够在上下文中学习的这三个假设。我们用反例否定了前两个假设，并提供了支持最后一个假设的证据。我们的结果表明，LLM可以通过组合在预训练期间学习的任务来在上下文中学习一项新任务。

更新时间: 2024-06-08 11:59:08

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.04216v2

Design of reliable technology valuation model with calibrated machine learning of patent indicators

Machine learning (ML) has revolutionized the digital transformation of technology valuation by predicting the value of patents with high accuracy. However, the lack of validation regarding the reliability of these models hinders experts from fully trusting the confidence of model predictions. To address this issue, we propose an analytical framework for reliable technology valuation using calibrated ML models, which provide robust confidence levels in model predictions. We extract quantitative patent indicators that represent various technology characteristics as input data, using the patent maintenance period as a proxy for technology values. Multiple ML models are developed to capture the nonlinear relationship between patent indicators and technology value. The reliability and accuracy of these models are evaluated, presenting a Pareto-front map where the expected calibration error, Matthews correlation coefficient and F1-scores are compared. After identifying the best-performing model, we apply SHapley Additive exPlanation (SHAP) analysis to pinpoint the most significant input features by confidence bin. Through a case study, we confirmed that the proposed approach offers a practical guideline for developing reliable and accurate ML-based technology valuation models, with significant implications for both academia and industry.

Updated: 2024-06-08 11:52:37

标题: 可靠技术估值模型的设计与专利指标机器学习校准

摘要: 机器学习（ML）已经通过高精度预测专利价值彻底改变了科技估值的数字转型。然而，由于这些模型可信度的验证不足，专家们很难完全信任模型预测的可信度。为了解决这个问题，我们提出了一个可靠科技估值的分析框架，使用校准的ML模型来提供模型预测中的强大可信度水平。我们提取代表不同技术特征的定量专利指标作为输入数据，使用专利维护期作为技术价值的代理。我们开发了多个ML模型来捕捉专利指标和技术价值之间的非线性关系。这些模型的可靠性和准确性得到评估，并呈现出帕累托前沿图，比较了期望校准误差、马修斯相关系数和F1分数。在确定表现最佳的模型后，我们应用SHapley Additive exPlanation（SHAP）分析来通过置信度区间找出最重要的输入特征。通过一个案例研究，我们确认了提出的方法为开发可靠和准确的基于ML的技术估值模型提供了实用指导，对学术界和工业界都具有重要意义。

更新时间: 2024-06-08 11:52:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05446v1

Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models

Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because of the multifactor nature of these models and the lack of theory to back these choices. This paper aims at improving upon these issues. By studying a more general model called the Homogeneous Regularized Scale-Invariant, we prove that the scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects. This observation allows to better understand the effect of regularization functions in low-rank approximation models, to guide the choice of the regularization hyperparameters, and to design balancing strategies to enhance the convergence speed of dedicated optimization algorithms. Some of these results were already known but restricted to specific instances of regularized low-rank approximations. We also derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations, with convergence guarantees. We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.

Updated: 2024-06-08 11:41:26

标题: 高效算法用于正则化的非负尺度不变低秩逼近模型

摘要: 正则化非负低秩逼近，如稀疏非负矩阵分解或稀疏非负Tucker分解，是具有增强可解释性的降维模型的重要分支。然而，从实际角度来看，由于这些模型的多因素性质以及缺乏支持这些选择的理论，正则化器和正则化系数的选择，以及高效算法的设计是具有挑战性的。本文旨在改进这些问题。通过研究一个称为均匀正则化尺度不变的更一般模型，我们证明低秩逼近模型固有的尺度不变性导致了隐式正则化，产生了意想不到的有益和有害效果。这一观察使我们能够更好地理解低秩逼近模型中正则化函数的效果，指导正则化超参数的选择，并设计平衡策略来增强专用优化算法的收敛速度。其中一些结果已经知晓，但限于正则化非负低秩逼近的特定实例。我们还推导了一个通用的Majorization Minimization算法，处理许多正则化非负低秩逼近，具有收敛保证。我们展示了我们在稀疏非负矩阵分解、岭正则化的Canonical Polyadic分解和稀疏非负Tucker分解上的贡献。

更新时间: 2024-06-08 11:41:26

领域: cs.LG,cs.NA,math.NA,math.OC

下载: http://arxiv.org/abs/2403.18517v3

AI-driven Mobile Apps: an Explorative Study

The integration of artificial intelligence (AI) into mobile applications has significantly transformed various domains, enhancing user experiences and providing personalized services through advanced machine learning (ML) and deep learning (DL) technologies. AI-driven mobile apps typically refer to applications that leverage ML/DL technologies to perform key tasks such as image recognition and natural language processing. In this paper, we conducted the most extensive empirical study on AI applications, exploring on-device ML apps, on-device DL apps, and AI service-supported (cloud-based) apps. Our study encompasses 56,682 real-world AI applications, focusing on three crucial perspectives: 1) Application analysis, where we analyze the popularity of AI apps and investigate the update states of AI apps; 2) Framework and model analysis, where we analyze AI framework usage and AI model protection; 3) User analysis, where we examine user privacy protection and user review attitudes. Our study has strong implications for AI app developers, users, and AI R\&D. On one hand, our findings highlight the growing trend of AI integration in mobile applications, demonstrating the widespread adoption of various AI frameworks and models. On the other hand, our findings emphasize the need for robust model protection to enhance app security. Additionally, our study highlights the importance of user privacy and presents user attitudes towards the AI technologies utilized in current AI apps. We provide our AI app dataset (currently the most extensive AI app dataset) as an open-source resource for future research on AI technologies utilized in mobile applications.

Updated: 2024-06-08 11:28:53

标题: 基于人工智能的移动应用程序：一项探索性研究

摘要: 将人工智能（AI）整合到移动应用程序中已经显著改变了各个领域，通过先进的机器学习（ML）和深度学习（DL）技术增强了用户体验，并提供了个性化服务。由AI驱动的移动应用程序通常指的是利用ML/DL技术执行诸如图像识别和自然语言处理等关键任务的应用程序。本文对AI应用进行了最广泛的实证研究，探索了设备端ML应用程序、设备端DL应用程序和AI服务支持的（基于云的）应用程序。我们的研究涵盖了56,682个真实世界的AI应用程序，重点关注了三个关键视角：1）应用程序分析，我们分析了AI应用程序的流行度，并调查了AI应用程序的更新状态；2）框架和模型分析，我们分析了AI框架的使用情况和AI模型的保护；3）用户分析，我们检查了用户隐私保护和用户评论态度。我们的研究对AI应用程序开发人员、用户和AI研发具有重要意义。一方面，我们的发现突显了AI整合到移动应用程序中的增长趋势，展示了各种AI框架和模型的广泛采用。另一方面，我们的发现强调了强大的模型保护需求以增强应用程序安全性。此外，我们的研究突出了用户隐私的重要性，并展示了用户对当前AI应用程序中使用的AI技术的态度。我们提供我们的AI应用程序数据集（目前是最广泛的AI应用程序数据集）作为未来研究移动应用程序中使用的AI技术的开源资源。

更新时间: 2024-06-08 11:28:53

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2212.01635v2

Novel Approach to Intrusion Detection: Introducing GAN-MSCNN-BILSTM with LIME Predictions

This paper introduces an innovative intrusion detection system that harnesses Generative Adversarial Networks (GANs), Multi-Scale Convolutional Neural Networks (MSCNNs), and Bidirectional Long Short-Term Memory (BiLSTM) networks, supplemented by Local Interpretable Model-Agnostic Explanations (LIME) for interpretability. Employing a GAN, the system generates realistic network traffic data, encompassing both normal and attack patterns. This synthesized data is then fed into an MSCNN-BiLSTM architecture for intrusion detection. The MSCNN layer extracts features from the network traffic data at different scales, while the BiLSTM layer captures temporal dependencies within the traffic sequences. Integration of LIME allows for explaining the model's decisions. Evaluation on the Hogzilla dataset, a standard benchmark, showcases an impressive accuracy of 99.16\% for multi-class classification and 99.10\% for binary classification, while ensuring interpretability through LIME. This fusion of deep learning and interpretability presents a promising avenue for enhancing intrusion detection systems by improving transparency and decision support in network security.

Updated: 2024-06-08 11:26:44

标题: 新型入侵检测方法：引入具有LIME预测的GAN-MSCNN-BILSTM

摘要: 本文介绍了一种创新的入侵检测系统，利用生成对抗网络（GANs）、多尺度卷积神经网络（MSCNNs）和双向长短期记忆（BiLSTM）网络，并辅以局部可解释的模型不可知解释（LIME）以提高可解释性。利用GAN，系统生成真实的网络流量数据，包括正常和攻击模式。然后将这些合成数据输入到MSCNN-BiLSTM架构中进行入侵检测。MSCNN层在不同尺度上提取网络流量数据的特征，而BiLSTM层捕获流量序列中的时间依赖关系。LIME的整合使得能够解释模型的决策。在Hogzilla数据集上的评估结果显示，多类别分类的准确率为99.16\%，二元分类的准确率为99.10\%，同时通过LIME确保可解释性。深度学习和可解释性的融合为增强入侵检测系统提供了一个有前途的途径，提高了网络安全中的透明度和决策支持。

更新时间: 2024-06-08 11:26:44

领域: cs.CR,cs.AI,cs.NI

下载: http://arxiv.org/abs/2406.05443v1

A Scalable and Near-Optimal Conformance Checking Approach for Long Traces

Long traces and large event logs that originate from sensors and prediction models are becoming more common in our data-rich world. In such circumstances, conformance checking, a key task in process mining, can become computationally infeasible due to the exponential complexity of finding an optimal alignment. This paper introduces a novel sliding window approach to address these scalability challenges while preserving the interpretability of alignment-based methods. By breaking down traces into manageable subtraces and iteratively aligning each with the process model, our method significantly reduces the search space. The approach uses global information that captures structural properties of the trace and the process model to make informed alignment decisions, discarding unpromising alignments even if they are optimal for a local subtrace. This improves the overall accuracy of the results. Experimental evaluations demonstrate that the proposed method consistently finds optimal alignments in most cases and highlight its scalability. This is further supported by a theoretical complexity analysis, which shows the reduced growth of the search space compared to other common conformance checking methods. This work provides a valuable contribution towards efficient conformance checking for large-scale process mining applications.

Updated: 2024-06-08 11:04:42

标题: 一种可扩展且接近最佳的长跟踪符合性检查方法

摘要: 长跟踪和源自传感器和预测模型的大型事件日志在我们数据丰富的世界中变得越来越普遍。在这种情况下，符合性检查，作为过程挖掘中的关键任务，可能由于寻找最佳对齐的指数复杂性而变得计算不可行。本文介绍了一种新颖的滑动窗口方法，以解决这些可扩展性挑战，同时保留基于对齐的方法的可解释性。通过将跟踪分解为可管理的子跟踪，并迭代地将每个子跟踪与过程模型对齐，我们的方法显著减少了搜索空间。该方法使用全局信息捕捉跟踪和过程模型的结构特性，以做出明智的对齐决策，即使对于本地子跟踪来说，也会丢弃不太有希望的对齐。这提高了结果的整体准确性。实验评估表明，所提出的方法在大多数情况下始终找到最佳对齐，并突出了其可扩展性。这进一步得到了理论复杂性分析的支持，该分析显示了搜索空间的增长相对于其他常见的符合性检查方法而言的降低。这项工作为大规模过程挖掘应用程序的高效符合性检查提供了有价值的贡献。

更新时间: 2024-06-08 11:04:42

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2406.05439v1

Multi-Class Unlearning for Image Classification via Weight Filtering

Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network. Unlike existing methods that target a limited subset or a single class, our framework unlearns all classes in a single round. We achieve this by modulating the network's components using memory matrices, enabling the network to demonstrate selective unlearning behavior for any class after training. By discovering weights that are specific to each class, our approach also recovers a representation of the classes which is explainable by design. We test the proposed framework on small- and medium-scale image classification datasets, with both convolution- and Transformer-based backbones, showcasing the potential for explainable solutions through unlearning.

Updated: 2024-06-08 10:56:27

标题: 通过权重过滤的多类别图像分类的取消学习

摘要: Machine Unlearning是一种新兴的范式，用于有选择地从网络中去除训练数据点的影响。与现有方法不同，现有方法针对有限的子集或单个类别，我们的框架在单个回合中去除所有类别。我们通过使用记忆矩阵调制网络的组件来实现这一点，使网络在训练后能展示任何类别的选择性遗忘行为。通过发现针对每个类别特定的权重，我们的方法还通过设计恢复了可解释的类别表示。我们在小型和中型图像分类数据集上测试了提出的框架，使用卷积和基于Transformer的主干网络，展示了通过遗忘实现可解释性解决方案的潜力。

更新时间: 2024-06-08 10:56:27

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2304.02049v2

Extreme Compression of Large Language Models via Additive Quantization

The emergence of accurate open large language models (LLMs) has led to a race towards performant quantization techniques which can enable their execution on end-user devices. In this paper, we revisit the problem of ``extreme'' LLM compression -- defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter -- from the point of view of classic methods in Multi-Codebook Quantization (MCQ). Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval to advance the state-of-the-art in LLM compression, via two innovations: 1) learned additive quantization of weight matrices in input-adaptive fashion, and 2) joint optimization of codebook parameters across each transformer blocks. Broadly, AQLM is the first scheme that is Pareto optimal in terms of accuracy-vs-model-size when compressing to less than 3 bits per parameter, and significantly improves upon all known schemes in the extreme compression (2bit) regime. In addition, AQLM is practical: we provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed, while executing in a much smaller memory footprint.

Updated: 2024-06-08 10:55:52

标题: 通过加性量化对大型语言模型进行极端压缩

摘要: 准确的开放式大型语言模型（LLMs）的出现导致了对能够在最终用户设备上执行它们的高性能量化技术的竞赛。在本文中，我们从多码本量化（MCQ）的经典方法的角度重新审视了“极端”LLM压缩的问题，即将目标定为极低的位数，例如每个参数2到3位。我们的算法名为AQLM，将信息检索中的经典加法量化（AQ）方法推广到了LLM压缩的最新技术水平，通过两项创新：1）以输入自适应方式学习权重矩阵的加法量化，以及2）联合优化每个转换器块的码本参数。广义上说，AQLM是第一个在将压缩至每个参数不到3位时在准确性与模型大小之间达到帕累托最优的方案，并且在极端压缩（2位）领域显著优于所有已知的方案。此外，AQLM是实用的：我们提供了快速的GPU和CPU实现AQLM用于标记生成，使我们能够在速度上与或超过优化的FP16实现，同时在更小的内存占用下执行。

更新时间: 2024-06-08 10:55:52

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2401.06118v3

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination

Zero-shot coordination (ZSC) is a new cooperative multi-agent reinforcement learning (MARL) challenge that aims to train an ego agent to work with diverse, unseen partners during deployment. The significant difference between the deployment-time partners' distribution and the training partners' distribution determined by the training algorithm makes ZSC a unique out-of-distribution (OOD) generalization challenge. The potential distribution gap between evaluation and deployment-time partners leads to inadequate evaluation, which is exacerbated by the lack of appropriate evaluation metrics. In this paper, we present ZSC-Eval, the first evaluation toolkit and benchmark for ZSC algorithms. ZSC-Eval consists of: 1) Generation of evaluation partner candidates through behavior-preferring rewards to approximate deployment-time partners' distribution; 2) Selection of evaluation partners by Best-Response Diversity (BR-Div); 3) Measurement of generalization performance with various evaluation partners via the Best-Response Proximity (BR-Prox) metric. We use ZSC-Eval to benchmark ZSC algorithms in Overcooked and Google Research Football environments and get novel empirical findings. We also conduct a human experiment of current ZSC algorithms to verify the ZSC-Eval's consistency with human evaluation. ZSC-Eval is now available at https://github.com/sjtu-marl/ZSC-Eval.

Updated: 2024-06-08 10:43:58

标题: ZSC-Eval：多智能体零射击协调的评估工具包和基准测试

摘要: 零示范协调（ZSC）是一个新的合作多智能体强化学习（MARL）挑战，旨在训练一个自我代理与多样化、未知的合作伙伴在部署中进行合作。部署时伙伴的分布与训练算法确定的训练伙伴分布之间的显著差异使ZSC成为一项独特的分布外泛化挑战（OOD）。评估和部署时伙伴之间潜在的分布差距导致评估不足，这一问题加剧了缺乏适当评估指标的困境。本文介绍了ZSC-Eval，这是用于ZSC算法的第一个评估工具包和基准。ZSC-Eval包括：1）通过偏好行为奖励生成评估伙伴候选以逼近部署时伙伴的分布；2）通过最佳响应多样性（BR-Div）选择评估伙伴；3）通过最佳响应接近度（BR-Prox）指标测量不同评估伙伴的泛化性能。我们使用ZSC-Eval在Overcooked和Google Research Football环境中对ZSC算法进行基准测试，并获得新颖的实证发现。我们还进行了一项当前ZSC算法的人类实验，以验证ZSC-Eval与人类评估的一致性。ZSC-Eval现在可在https://github.com/sjtu-marl/ZSC-Eval获得。

更新时间: 2024-06-08 10:43:58

领域: cs.AI,cs.HC,cs.LG,cs.MA

下载: http://arxiv.org/abs/2310.05208v2

BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining

BERT (Bidirectional Encoder Representations from Transformers) has revolutionized the field of natural language processing through its exceptional performance on numerous tasks. Yet, the majority of researchers have mainly concentrated on enhancements related to the model structure, such as relative position embedding and more efficient attention mechanisms. Others have delved into pretraining tricks associated with Masked Language Modeling, including whole word masking. DeBERTa introduced an enhanced decoder adapted for BERT's encoder model for pretraining, proving to be highly effective. We argue that the design and research around enhanced masked language modeling decoders have been underappreciated. In this paper, we propose several designs of enhanced decoders and introduce BPDec (BERT Pretraining Decoder), a novel method for modeling training. Typically, a pretrained BERT model is fine-tuned for specific Natural Language Understanding (NLU) tasks. In our approach, we utilize the original BERT model as the encoder, making only changes to the decoder without altering the encoder. This approach does not necessitate extensive modifications to the encoder architecture and can be seamlessly integrated into existing fine-tuning pipelines and services, offering an efficient and effective enhancement strategy. Compared to other methods, while we also incur a moderate training cost for the decoder during the pretraining process, our approach does not introduce additional training costs during the fine-tuning phase. We test multiple enhanced decoder structures after pretraining and evaluate their performance on the GLUE tasks and SQuAD tasks. Our results demonstrate that BPDec, having only undergone subtle refinements to the model structure during pretraining, significantly enhances model performance without escalating the finetuning cost, inference time and serving budget.

Updated: 2024-06-08 10:37:52

标题: BPDec: 揭示BERT预训练中掩码语言建模解码器的潜力

摘要: BERT（Bidirectional Encoder Representations from Transformers）通过在众多任务上表现出色，彻底改变了自然语言处理领域。然而，大多数研究人员主要集中在与模型结构相关的增强方面，如相对位置嵌入和更高效的注意机制。其他人则深入研究了与遮蔽语言建模相关的预训练技巧，包括整个词的遮蔽。DeBERTa引入了一个经过增强的解码器，专门为BERT的编码器模型进行预训练，证明其非常有效。我们认为增强遮蔽语言建模解码器的设计和研究受到了低估。在本文中，我们提出了几种增强解码器的设计，并介绍了BPDec（BERT预训练解码器），这是一种用于建模训练的新方法。通常，预训练的BERT模型会针对特定的自然语言理解（NLU）任务进行微调。在我们的方法中，我们利用原始的BERT模型作为编码器，仅对解码器进行更改而不改变编码器。这种方法不需要对编码器架构进行大量修改，可以无缝集成到现有的微调流程和服务中，提供一种高效和有效的增强策略。与其他方法相比，虽然我们在预训练过程中也会为解码器带来适度的训练成本，但我们的方法在微调阶段不会引入额外的训练成本。我们在预训练后测试了多种增强解码器结构，并评估它们在GLUE任务和SQuAD任务上的性能。我们的结果表明，BPDec在预训练过程中只经历了对模型结构的微调，就显著提升了模型性能，而且没有增加微调成本、推理时间和服务预算。

更新时间: 2024-06-08 10:37:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.15861v3

Cuff-less Arterial Blood Pressure Waveform Synthesis from Single-site PPG using Transformer & Frequency-domain Learning

We develop and evaluate two novel purpose-built deep learning (DL) models for synthesis of the arterial blood pressure (ABP) waveform in a cuff-less manner, using a single-site photoplethysmography (PPG) signal. We train and evaluate our DL models on the data of 209 subjects from the public UCI dataset on cuff-less blood pressure (CLBP) estimation. Our transformer model consists of an encoder-decoder pair that incorporates positional encoding, multi-head attention, layer normalization, and dropout techniques for ABP waveform synthesis. Secondly, under our frequency-domain (FD) learning approach, we first obtain the discrete cosine transform (DCT) coefficients of the PPG and ABP signals, and then learn a linear/non-linear (L/NL) regression between them. The transformer model (FD L/NL model) synthesizes the ABP waveform with a mean absolute error (MAE) of 3.01 (4.23). Further, the synthesis of ABP waveform also allows us to estimate the systolic blood pressure (SBP) and diastolic blood pressure (DBP) values. To this end, the transformer model reports an MAE of 3.77 mmHg and 2.69 mmHg, for SBP and DBP, respectively. On the other hand, the FD L/NL method reports an MAE of 4.37 mmHg and 3.91 mmHg, for SBP and DBP, respectively. Both methods fulfill the AAMI criterion. As for the BHS criterion, our transformer model (FD L/NL regression model) achieves grade A (grade B).

Updated: 2024-06-08 10:35:58

标题: 使用变压器和频域学习从单侧PPG合成无袖动脉血压波形

摘要: 我们开发并评估了两种新颖的目的构建深度学习（DL）模型，用于无袖式合成动脉血压（ABP）波形，使用单一位置光电容积描记（PPG）信号。我们在来自公共UCI数据集的209名受试者的数据上训练和评估我们的DL模型，用于无袖式血压（CLBP）估计。我们的变压器模型包括一个编码器-解码器对，其中包含位置编码、多头注意力、层归一化和辍学技术，用于ABP波形合成。其次，在我们的频域（FD）学习方法下，我们首先获得PPG和ABP信号的离散余弦变换（DCT）系数，然后学习它们之间的线性/非线性（L/NL）回归。变压器模型（FD L/NL模型）合成的ABP波形的平均绝对误差（MAE）为3.01（4.23）。此外，ABP波形的合成还使我们能够估计收缩压（SBP）和舒张压（DBP）值。到目前为止，变压器模型报告了分别为3.77 mmHg和2.69 mmHg的SBP和DBP的MAE。另一方面，FD L/NL方法报告了分别为4.37 mmHg和3.91 mmHg的SBP和DBP的MAE。两种方法均符合AAMI标准。至于BHS标准，我们的变压器模型（FD L/NL回归模型）实现了A级（B级）。

更新时间: 2024-06-08 10:35:58

领域: eess.SP,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2401.05452v2

Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Bo

This study investigates the impact of machine learning models on the generation of counterfactual explanations by conducting a benchmark evaluation over three different types of models: a decision tree (fully transparent, interpretable, white-box model), a random forest (semi-interpretable, grey-box model), and a neural network (fully opaque, black-box model). We tested the counterfactual generation process using four algorithms (DiCE, WatcherCF, prototype, and GrowingSpheresCF) in the literature in 25 different datasets. Our findings indicate that: (1) Different machine learning models have little impact on the generation of counterfactual explanations; (2) Counterfactual algorithms based uniquely on proximity loss functions are not actionable and will not provide meaningful explanations; (3) One cannot have meaningful evaluation results without guaranteeing plausibility in the counterfactual generation. Algorithms that do not consider plausibility in their internal mechanisms will lead to biased and unreliable conclusions if evaluated with the current state-of-the-art metrics; (4) A counterfactual inspection analysis is strongly recommended to ensure a robust examination of counterfactual explanations and the potential identification of biases.

Updated: 2024-06-08 10:35:17

标题: 基于实例的反事实算法在XAI中的基准测试：从白盒到黑盒

摘要: 这项研究调查了机器学习模型对反事实解释生成的影响，通过对三种不同类型的模型进行基准评估：决策树（完全透明、可解释、白盒模型）、随机森林（半可解释、灰盒模型）和神经网络（完全不透明、黑盒模型）。我们在文献中使用四种算法（DiCE、WatcherCF、原型和GrowingSpheresCF）在25个不同数据集上测试了反事实生成过程。我们的发现表明：（1）不同的机器学习模型对反事实解释生成几乎没有影响；（2）仅基于接近度损失函数的反事实算法没有可行性，不会提供有意义的解释；（3）在反事实生成中不能保证合理性的情况下，无法获得有意义的评估结果。如果评估当前最先进的指标，那么内部机制不考虑合理性的算法将导致有偏见和不可靠的结论；（4）强烈建议进行反事实检查分析，以确保对反事实解释进行强大的检查，同时潜在地发现偏见。

更新时间: 2024-06-08 10:35:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2203.02399v3

Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation

Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equation (SDE) associated with a pre-trained diffusion model as a sequential black-box optimization problem. Furthermore, we propose a novel covariance-adaptive sequential optimization algorithm to optimize cumulative black-box scores under unknown transition dynamics. Theoretically, we prove a $O(\frac{d^2}{\sqrt{T}})$ convergence rate for cumulative convex functions without smooth and strongly convex assumptions. Empirically, experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores.

Updated: 2024-06-08 10:28:48

标题: 协方差自适应的顺序黑盒优化用于扩散目标生成

摘要: 扩散模型已经展示了在生成高质量内容方面的巨大潜力，包括图像、自然语言、蛋白质域等。然而，如何通过仅具有用户黑盒目标评分的扩散模型执行用户首选的有针对性生成仍然具有挑战性。为了解决这个问题，我们首先将与预训练的扩散模型相关的有针对性储备时间随机微分方程的微调形式化为一个顺序黑盒优化问题。此外，我们提出了一种新颖的协方差自适应顺序优化算法，以在未知转换动态下优化累积黑盒评分。从理论上讲，我们证明了对于没有光滑和强凸假设的累积凸函数，收敛速度为$O(\frac{d^2}{\sqrt{T}})$。在实证方面，对数值测试问题和目标引导的3D分子生成任务的实验表明，我们的方法在实现更好的目标评分方面表现出优越性能。

更新时间: 2024-06-08 10:28:48

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.00812v2

Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

The accurate detection and grasping of transparent objects are challenging but of significance to robots. Here, a visual-tactile fusion framework for transparent object grasping under complex backgrounds and variant light conditions is proposed, including the grasping position detection, tactile calibration, and visual-tactile fusion based classification. First, a multi-scene synthetic grasping dataset generation method with a Gaussian distribution based data annotation is proposed. Besides, a novel grasping network named TGCNN is proposed for grasping position detection, showing good results in both synthetic and real scenes. In tactile calibration, inspired by human grasping, a fully convolutional network based tactile feature extraction method and a central location based adaptive grasping strategy are designed, improving the success rate by 36.7% compared to direct grasping. Furthermore, a visual-tactile fusion method is proposed for transparent objects classification, which improves the classification accuracy by 34%. The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.

Updated: 2024-06-08 10:26:05

标题: 复杂背景下透明物体抓取的视触融合

摘要: 透明物体的准确检测和抓取对机器人具有挑战性但又具有重要意义。本文提出了一种透明物体抓取的视触融合框架，可在复杂背景和变化光照条件下工作，包括抓取位置检测、触觉校准和基于视触融合的分类。首先，提出了一种基于高斯分布数据注释的多场景合成抓取数据集生成方法。此外，提出了一种名为TGCNN的新型抓取网络用于抓取位置检测，在合成和真实场景中均表现出良好结果。在触觉校准中，受人类抓取启发，设计了一种基于完全卷积网络的触觉特征提取方法和基于中心位置的自适应抓取策略，与直接抓取相比，成功率提高了36.7%。此外，提出了一种透明物体分类的视触融合方法，将分类准确性提高了34%。该框架充分利用了视觉和触觉的优势，极大地提高了透明物体的抓取效率。

更新时间: 2024-06-08 10:26:05

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2211.16693v2

Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL

While the conditional sequence modeling with the transformer architecture has demonstrated its effectiveness in dealing with offline reinforcement learning (RL) tasks, it is struggle to handle out-of-distribution states and actions. Existing work attempts to address this issue by data augmentation with the learned policy or adding extra constraints with the value-based RL algorithm. However, these studies still fail to overcome the following challenges: (1) insufficiently utilizing the historical temporal information among inter-steps, (2) overlooking the local intrastep relationships among states, actions and return-to-gos (RTGs), (3) overfitting suboptimal trajectories with noisy labels. To address these challenges, we propose Decision Mamba (DM), a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy. DM explicitly models the historical hidden state to extract the temporal information by using the mamba architecture. To capture the relationship among state-action-RTG triplets, a fine-grained SSM module is designed and integrated into the original coarse-grained SSM in mamba, resulting in a novel mamba architecture tailored for offline RL. Finally, to mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization. The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations. Extensive experiments on various tasks show that DM outperforms other baselines substantially.

Updated: 2024-06-08 10:12:00

标题: Decision Mamba: 一种具有自进化正则化的多粒度状态空间模型，用于离线RL

摘要: 使用变压器架构的条件序列建模已经证明在处理离线强化学习（RL）任务中的有效性，但在处理超出分布状态和动作时存在困难。现有工作尝试通过学习的策略进行数据增强或通过值基RL算法添加额外约束来解决这一问题。然而，这些研究仍然无法克服以下挑战：（1）在步骤之间不充分利用历史时间信息，（2）忽略状态、动作和返回目标（RTG）之间的局部步骤关系，（3）过度拟合带有噪声标签的次优轨迹。为了解决这些挑战，我们提出了决策玛巴（DM），这是一个具有自我进化策略学习的新型多粒度状态空间模型（SSM）。DM通过使用玛巴架构明确地对历史隐藏状态进行建模，以提取时间信息。为了捕捉状态-动作-RTG三元组之间的关系，设计了一个细粒度SSM模块，并将其集成到原始的粗粒度SSM中，从而形成了一个专为离线RL定制的新型玛巴架构。最后，为了减轻在嘈杂轨迹上过度拟合的问题，提出了一种通过渐进正则化使用自我进化策略的方法。策略通过使用自己的过去知识来优化次优动作的方式发展，从而增强其对嘈杂示范的鲁棒性。对各种任务的广泛实验表明，DM在性能上远远超过其他基线。

更新时间: 2024-06-08 10:12:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.05427v1

Baking Symmetry into GFlowNets

GFlowNets have exhibited promising performance in generating diverse candidates with high rewards. These networks generate objects incrementally and aim to learn a policy that assigns probability of sampling objects in proportion to rewards. However, the current training pipelines of GFlowNets do not consider the presence of isomorphic actions, which are actions resulting in symmetric or isomorphic states. This lack of symmetry increases the amount of samples required for training GFlowNets and can result in inefficient and potentially incorrect flow functions. As a consequence, the reward and diversity of the generated objects decrease. In this study, our objective is to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process. Experimental results using synthetic data demonstrate the promising performance of our proposed approaches.

Updated: 2024-06-08 10:11:10

标题: 把对称性融入到GFlowNets中

摘要: GFlowNets表现出很好的性能，能够生成多样化的对象并获得高奖励。这些网络逐步生成对象，并旨在学习一种策略，该策略根据奖励分配对象采样的概率。然而，目前GFlowNets的训练流程并未考虑同构动作的存在，即导致对称或同构状态的动作。这种缺乏对称性会增加训练GFlowNets所需的样本量，并可能导致流函数低效且可能不正确。因此，生成对象的奖励和多样性会下降。本研究的目标是通过在生成过程中识别等效动作来将对称性整合到GFlowNets中。使用合成数据的实验结果展示了我们提出方法的优异性能。

更新时间: 2024-06-08 10:11:10

领域: cs.LG

下载: http://arxiv.org/abs/2406.05426v1

Refining Minimax Regret for Unsupervised Environment Design

In unsupervised environment design, reinforcement learning agents are trained on environment configurations (levels) generated by an adversary that maximises some objective. Regret is a commonly used objective that theoretically results in a minimax regret (MMR) policy with desirable robustness guarantees; in particular, the agent's maximum regret is bounded. However, once the agent reaches this regret bound on all levels, the adversary will only sample levels where regret cannot be further reduced. Although there are possible performance improvements to be made outside of these regret-maximising levels, learning stagnates. In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation. We formally show that solving for this objective results in a subset of MMR policies, and that BLP policies act consistently with a Perfect Bayesian policy over all levels. We further introduce an algorithm, ReMiDi, that results in a BLP policy at convergence. We empirically demonstrate that training on levels from a minimax regret adversary causes learning to prematurely stagnate, but that ReMiDi continues learning.

Updated: 2024-06-08 10:08:25

标题: 优化无监督环境设计的最小后悔算法

摘要: 在无监督环境设计中，强化学习代理被训练在由最大化某个目标的对手生成的环境配置（关卡）上。遗憾是一个常用的目标，理论上导致一个具有良好鲁棒性保证的极小遗憾（MMR）策略；特别地，代理的最大遗憾被限制。然而，一旦代理在所有关卡上达到了这个遗憾边界，对手将只采样无法进一步减少遗憾的关卡。尽管在这些最大遗憾的关卡之外可能有性能改进的空间，学习停滞了。在这项工作中，我们引入了贝叶斯级别完美MMR（BLP），这是极小遗憾目标的一种改进，克服了这一限制。我们正式证明，解决这个目标会导致一组MMR策略，并且BLP策略在所有关卡上都与完美贝叶斯策略一致。我们进一步介绍了一种算法ReMiDi，该算法在收敛时会得到一个BLP策略。我们在实验证明，在从极小遗憾对手那里训练会导致学习过早停滞，但是ReMiDi能够持续学习。

更新时间: 2024-06-08 10:08:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.12284v2

Recent advancements in computational morphology : A comprehensive survey

Computational morphology handles the language processing at the word level. It is one of the foundational tasks in the NLP pipeline for the development of higher level NLP applications. It mainly deals with the processing of words and word forms. Computational Morphology addresses various sub problems such as morpheme boundary detection, lemmatization, morphological feature tagging, morphological reinflection etc. In this paper, we present exhaustive survey of the methods for developing computational morphology related tools. We survey the literature in the chronological order starting from the conventional methods till the recent evolution of deep neural network based approaches. We also review the existing datasets available for this task across the languages. We discuss about the effectiveness of neural model compared with the traditional models and present some unique challenges associated with building the computational morphology tools. We conclude by discussing some recent and open research issues in this field.

Updated: 2024-06-08 10:07:33

标题: 计算形态学的最新进展：一项全面调查

摘要: 计算形态学处理单词级别的语言处理。它是发展更高级别自然语言处理应用的NLP流水线中的基本任务之一。它主要处理单词和单词形式的处理。计算形态学涉及各种子问题，如词素边界检测，词形还原，形态特征标记，形态重新屈折等。在本文中，我们提出了开发计算形态学相关工具的方法的详尽调查。我们按照时间顺序调查文献，从传统方法到最近基于深度神经网络的方法的演变。我们还回顾了跨语言可用于此任务的现有数据集。我们讨论了神经模型与传统模型的有效性，并提出了构建计算形态学工具时面临的一些独特挑战。最后，我们讨论了该领域一些最近和未解决的研究问题。

更新时间: 2024-06-08 10:07:33

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05424v1

Diffusion-based Reinforcement Learning for Dynamic UAV-assisted Vehicle Twins Migration in Vehicular Metaverses

Air-ground integrated networks can relieve communication pressure on ground transportation networks and provide 6G-enabled vehicular Metaverses services offloading in remote areas with sparse RoadSide Units (RSUs) coverage and downtown areas where users have a high demand for vehicular services. Vehicle Twins (VTs) are the digital twins of physical vehicles to enable more immersive and realistic vehicular services, which can be offloaded and updated on RSU, to manage and provide vehicular Metaverses services to passengers and drivers. The high mobility of vehicles and the limited coverage of RSU signals necessitate VT migration to ensure service continuity when vehicles leave the signal coverage of RSUs. However, uneven VT task migration might overload some RSUs, which might result in increased service latency, and thus impactive immersive experiences for users. In this paper, we propose a dynamic Unmanned Aerial Vehicle (UAV)-assisted VT migration framework in air-ground integrated networks, where UAVs act as aerial edge servers to assist ground RSUs during VT task offloading. In this framework, we propose a diffusion-based Reinforcement Learning (RL) algorithm, which can efficiently make immersive VT migration decisions in UAV-assisted vehicular networks. To balance the workload of RSUs and improve VT migration quality, we design a novel dynamic path planning algorithm based on a heuristic search strategy for UAVs. Simulation results show that the diffusion-based RL algorithm with UAV-assisted performs better than other baseline schemes.

Updated: 2024-06-08 09:53:56

标题: Diffusion-based Reinforcement Learning for Dynamic UAV-assisted Vehicle Twins Migration in Vehicular Metaverses 在车辆元宇宙中动态无人机辅助车辆双子迁移的基于扩散的强化学习

摘要: 空地一体化网络可以减轻地面交通网络的通信压力，并为6G启用的车辆Metaverses服务提供卸载，这在道路边缘单元（RSU）覆盖稀疏的偏远地区和用户对车辆服务有很高需求的市区地区尤为重要。车辆双胞胎（VTs）是物理车辆的数字孪生体，可以实现更沉浸和逼真的车辆服务，可以在RSU上卸载和更新，以管理和提供车辆Metaverses服务给乘客和驾驶员。车辆的高移动性和RSU信号的有限覆盖范围使得VT迁移成为必要，以确保当车辆离开RSU信号覆盖范围时服务的连续性。然而，不均匀的VT任务迁移可能会过载一些RSU，这可能导致服务延迟增加，从而影响用户的沉浸体验。在本文中，我们提出了一种动态的无人机（UAV）辅助VT迁移框架，在空地一体化网络中，其中无人机充当空中边缘服务器，协助地面RSU进行VT任务卸载。在该框架中，我们提出了一种基于扩散的强化学习（RL）算法，可以有效地在UAV辅助车辆网络中做出沉浸式VT迁移决策。为了平衡RSU的工作负载并提高VT迁移质量，我们设计了一种基于启发式搜索策略的新颖动态路径规划算法，用于无人机。模拟结果显示，基于扩散的RL算法与UAV辅助的表现优于其他基线方案。

更新时间: 2024-06-08 09:53:56

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.05422v1

Multi-attribute Auction-based Resource Allocation for Twins Migration in Vehicular Metaverses: A GPT-based DRL Approach

Vehicular Metaverses are developed to enhance the modern automotive industry with an immersive and safe experience among connected vehicles and roadside infrastructures, e.g., RoadSide Units (RSUs). For seamless synchronization with virtual spaces, Vehicle Twins (VTs) are constructed as digital representations of physical entities. However, resource-intensive VTs updating and high mobility of vehicles require intensive computation, communication, and storage resources, especially for their migration among RSUs with limited coverages. To address these issues, we propose an attribute-aware auction-based mechanism to optimize resource allocation during VTs migration by considering both price and non-monetary attributes, e.g., location and reputation. In this mechanism, we propose a two-stage matching for vehicular users and Metaverse service providers in multi-attribute resource markets. First, the resource attributes matching algorithm obtains the resource attributes perfect matching, namely, buyers and sellers can participate in a double Dutch auction (DDA). Then, we train a DDA auctioneer using a generative pre-trained transformer (GPT)-based deep reinforcement learning (DRL) algorithm to adjust the auction clocks efficiently during the auction process. We compare the performance of social welfare and auction information exchange costs with state-of-the-art baselines under different settings. Simulation results show that our proposed GPT-based DRL auction schemes have better performance than others.

Updated: 2024-06-08 09:41:38

标题: 基于多属性拍卖的车载元宇宙双胞胎迁移资源分配：基于GPT的DRL方法

摘要: 车载Metaverses的发展旨在增强现代汽车行业，在连接车辆和道路基础设施（例如，路侧单元（RSUs））之间提供沉浸式和安全体验。为了与虚拟空间实现无缝同步，车辆双生体（VTs）被构建为物理实体的数字表示。然而，资源密集型的VTs更新和车辆的高移动性需要密集的计算、通信和存储资源，尤其是它们在覆盖有限的RSUs之间迁移时。为了解决这些问题，我们提出了一种基于属性感知的拍卖机制，通过考虑价格和非货币属性（例如，位置和声誉）来优化VTs迁移期间的资源分配。在这个机制中，我们提出了一个多属性资源市场中车载用户和Metaverse服务提供商的两阶段匹配。首先，资源属性匹配算法获得了资源属性的完美匹配，即买家和卖家可以参与双荷兰式拍卖（DDA）。然后，我们使用基于生成式预训练变换器（GPT）的深度强化学习（DRL）算法训练一个DDA拍卖师，在拍卖过程中高效调整拍卖时钟。我们在不同设置下比较了社会福利和拍卖信息交换成本与最先进基准的性能。模拟结果表明，我们提出的基于GPT的DRL拍卖方案比其他方案表现更好。

更新时间: 2024-06-08 09:41:38

领域: cs.AI,cs.NI

下载: http://arxiv.org/abs/2406.05418v1

Discover Your Neighbors: Advanced Stable Test-Time Adaptation in Dynamic World

Despite progress, deep neural networks still suffer performance declines under distribution shifts between training and test domains, leading to a substantial decrease in Quality of Experience (QoE) for multimedia applications. Existing test-time adaptation (TTA) methods are challenged by dynamic, multiple test distributions within batches. This work provides a new perspective on analyzing batch normalization techniques through class-related and class-irrelevant features, our observations reveal combining source and test batch normalization statistics robustly characterizes target distributions. However, test statistics must have high similarity. We thus propose Discover Your Neighbours (DYN), the first backward-free approach specialized for dynamic TTA. The core innovation is identifying similar samples via instance normalization statistics and clustering into groups which provides consistent class-irrelevant representations. Specifically, Our DYN consists of layer-wise instance statistics clustering (LISC) and cluster-aware batch normalization (CABN). In LISC, we perform layer-wise clustering of approximate feature samples at each BN layer by calculating the cosine similarity of instance normalization statistics across the batch. CABN then aggregates SBN and TCN statistics to collaboratively characterize the target distribution, enabling more robust representations. Experimental results validate DYN's robustness and effectiveness, demonstrating maintained performance under dynamic data stream patterns.

Updated: 2024-06-08 09:22:32

标题: 发现您的邻居：动态世界中的高级稳定测试时间适应性

摘要: 尽管取得了进展，深度神经网络在训练和测试域之间的分布转变下仍然存在性能下降的问题，导致多媒体应用程序的体验质量（QoE）大幅下降。现有的测试时适应（TTA）方法面临着批次内动态、多个测试分布的挑战。本文提出了一种新的分析批次归一化技术的视角，通过与类相关和类不相关特征，我们的观察结果表明，组合源和测试批次归一化统计数据能够稳健地表征目标分布。然而，测试统计数据必须具有高相似性。因此，我们提出了Discover Your Neighbours（DYN），这是专门针对动态TTA的第一种无后向方法。其核心创新是通过实例归一化统计数据识别相似样本，并将其聚类成群，以提供一致的类不相关表示。具体来说，我们的DYN包括分层实例统计聚类（LISC）和群体感知批次归一化（CABN）。在LISC中，我们通过计算批次中实例归一化统计数据的余弦相似性，在每个BN层上对近似特征样本进行分层聚类。然后，CABN将SBN和TCN统计数据聚合起来，共同表征目标分布，从而实现更加稳健的表示。实验结果验证了DYN的稳健性和有效性，证明了在动态数据流模式下保持性能。

更新时间: 2024-06-08 09:22:32

领域: cs.LG,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2406.05413v1

MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models

Formulas are the language of communication between humans and nature. It is an important research topic of artificial intelligence to find expressions from observed data to reflect the relationship between each variable in the data, which is called a symbolic regression problem. The existing symbolic regression methods directly generate expressions according to the given observation data, and we cannot require the algorithm to generate expressions that meet specific requirements according to the known prior knowledge. For example, the expression needs to contain $\sin$ or be symmetric, and so on. Even if it can, it often requires very complex operations, which is very inconvenient. In this paper, based on multi-modal large language models, we propose MLLM-SR, a conversational symbolic regression method that can generate expressions that meet the requirements simply by describing the requirements with natural language instructions. By experimenting on the Nguyen dataset, we can demonstrate that MLLM-SR leads the state-of-the-art baselines in fitting performance. More notably, we experimentally demonstrate that MLLM-SR can well understand the prior knowledge we add to the natural language instructions. Moreover, the addition of prior knowledge can effectively guide MLLM-SR to generate correct expressions.

Updated: 2024-06-08 09:17:54

标题: MLLM-SR：基于多模型大型语言模型的对话符号回归

摘要: 公式是人类与自然之间沟通的语言。找到从观察数据中反映数据中每个变量之间关系的表达式是人工智能的重要研究课题，这被称为符号回归问题。现有的符号回归方法直接根据给定的观察数据生成表达式，我们无法要求算法根据已知的先验知识生成符合特定要求的表达式。例如，表达式需要包含$\sin$或对称等。即使可以，通常需要非常复杂的操作，这是非常不方便的。在本文中，基于多模态大型语言模型，我们提出了MLLM-SR，一种可以通过用自然语言指令描述要求来生成符合要求的表达式的对话式符号回归方法。通过在Nguyen数据集上进行实验，我们可以证明MLLM-SR在拟合性能方面领先于现有技术基线。更重要的是，我们通过实验证明MLLM-SR可以很好地理解我们添加到自然语言指令中的先验知识。此外，先验知识的添加可以有效地引导MLLM-SR生成正确的表达式。

更新时间: 2024-06-08 09:17:54

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05410v1

Natural Language-Oriented Programming (NLOP): Towards Democratizing Software Creation

As generative Artificial Intelligence (AI) technologies evolve, they offer unprecedented potential to automate and enhance various tasks, including coding. Natural Language-Oriented Programming (NLOP), a vision introduced in this paper, harnesses this potential by allowing developers to articulate software requirements and logic in their natural language, thereby democratizing software creation. This approach streamlines the development process and significantly lowers the barrier to entry for software engineering, making it feasible for non-experts to contribute effectively to software projects. By simplifying the transition from concept to code, NLOP can accelerate development cycles, enhance collaborative efforts, and reduce misunderstandings in requirement specifications. This paper reviews various programming models, assesses their contributions and limitations, and highlights that natural language will be the new programming language. Through this comparison, we illustrate how NLOP stands to transform the landscape of software engineering by fostering greater inclusivity and innovation.

Updated: 2024-06-08 09:13:54

标题: 自然语言导向编程（NLOP）：朝向软件创造的民主化

摘要: 随着生成式人工智能（AI）技术的发展，它们提供了前所未有的潜力来自动化和增强各种任务，包括编码。本文介绍的自然语言导向编程（NLOP）视觉利用了这一潜力，使开发人员能够用他们的自然语言表达软件需求和逻辑，从而使软件创建民主化。这种方法简化了开发过程，显著降低了软件工程的门槛，使非专家能够有效地为软件项目做出贡献。通过简化从概念到代码的过渡，NLOP可以加快开发周期，增强协作努力，并减少需求规格中的误解。本文回顾了各种编程模型，评估了它们的贡献和局限性，并强调自然语言将成为新的编程语言。通过这种比较，我们说明了NLOP如何改变软件工程的格局，促进更广泛的包容性和创新。

更新时间: 2024-06-08 09:13:54

领域: cs.SE,cs.AI,cs.PL

下载: http://arxiv.org/abs/2406.05409v1

Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective

Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication, highlighting their pivotal role in modern security systems. Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning), raising significant concerns about their reliability and trustworthiness. Previous studies primarily focus on traditional adversarial or backdoor attacks, overlooking the resource-intensive or privileged-manipulation nature of such threats, thus limiting their practical generalization, stealthiness, universality and robustness. Correspondingly, in this paper, we delve into the inherent vulnerabilities in FRS through user studies and preliminary explorations. By exploiting these vulnerabilities, we identify a novel attack, facial identity backdoor attack dubbed FIBA, which unveils a potentially more devastating threat against FRS:an enrollment-stage backdoor attack. FIBA circumvents the limitations of traditional attacks, enabling broad-scale disruption by allowing any attacker donning a specific trigger to bypass these systems. This implies that after a single, poisoned example is inserted into the database, the corresponding trigger becomes a universal key for any attackers to spoof the FRS. This strategy essentially challenges the conventional attacks by initiating at the enrollment stage, dramatically transforming the threat landscape by poisoning the feature database rather than the training data.

Updated: 2024-06-08 09:09:29

标题: 重新思考人脸识别系统的脆弱性：从实践角度出发

摘要: 面部识别系统（FRS）已越来越多地整合到关键应用程序中，包括监视和用户身份验证，突显了它们在现代安全系统中的关键作用。最近的研究揭示了FRS对敌对（例如，敌对贴片攻击）和后门攻击（例如，训练数据污染）的脆弱性，引发了对它们可靠性和可信度的重大担忧。先前的研究主要集中在传统的敌对或后门攻击上，忽视了这些威胁的资源密集型或特权操纵性质，从而限制了它们的实际泛化、隐蔽性、普适性和鲁棒性。因此，本文深入探讨了FRS中固有的脆弱性通过用户研究和初步探索。通过利用这些脆弱性，我们确定了一种新型攻击，名为面部身份后门攻击（FIBA），揭示了对FRS的潜在更具破坏性的威胁：一个注册阶段的后门攻击。FIBA规避了传统攻击的限制，通过允许任何攻击者戴上特定触发器来绕过这些系统，从而使广泛的破坏成为可能。这意味着在数据库中插入一个毒害示例后，相应的触发器就成为任何攻击者欺骗FRS的通用钥匙。这种策略在注册阶段发起，从而通过对特征数据库进行毒害而不是对训练数据，从而根本挑战了传统攻击。

更新时间: 2024-06-08 09:09:29

领域: cs.CR

下载: http://arxiv.org/abs/2405.12786v3

Robust Conformal Prediction Using Privileged Information

We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.

Updated: 2024-06-08 08:56:47

标题: 使用特权信息的鲁棒性合规性预测

摘要: 我们开发了一种方法，可以生成具有保证覆盖率的预测集，对训练数据中的破坏（如缺失或噪声变量）具有鲁棒性。我们的方法建立在符合性预测之上，这是一个强大的框架，用于构建在独立同分布假设下有效的预测集。重要的是，在这种情况下，简单应用符合性预测不能提供可靠的预测，这是由于破坏所引起的分布变化。为了考虑分布变化，我们假设可以访问特权信息（PI）。PI被形式化为额外的特征，解释了分布变化，但在训练期间仅可用，在测试时间不可用。我们通过引入一种新颖的加权符合性预测的泛化来解决这个问题，并用理论覆盖保证支持我们的方法。对真实和合成数据集的实证实验表明，与现有方法相比，我们的方法实现了有效的覆盖率，并构建了更具信息量的预测，这些方法没有得到理论保证的支持。

更新时间: 2024-06-08 08:56:47

领域: cs.LG

下载: http://arxiv.org/abs/2406.05405v1

SemPat: Using Hyperproperty-based Semantic Analysis to Generate Microarchitectural Attack Patterns

Microarchitectural security verification of software has seen the emergence of two broad classes of approaches. The first is based on semantic security properties (e.g., non-interference) which are verified for a given program and a specified abstract model of the hardware microarchitecture. The second is based on attack patterns, which, if found in a program execution, indicates the presence of an exploit. While the former uses a formal specification that can capture several gadget variants targeting the same vulnerability, it is limited by the scalability of verification. Patterns, while more scalable, must be currently constructed manually, as they are narrower in scope and sensitive to gadget-specific structure. This work develops a technique that, given a non-interference-based semantic security hyperproperty, automatically generates attack patterns up to a certain complexity parameter (called the skeleton size). Thus, we combine the advantages of both approaches: security can be specified by a hyperproperty that uniformly captures several gadget variants, while automatically generated patterns can be used for scalable verification. We implement our approach in a tool and demonstrate the ability to generate new patterns, (e.g., for SpectreV1, SpectreV4) and improved scalability using the generated patterns over hyperproperty-based verification.

Updated: 2024-06-08 08:54:27

标题: SemPat：使用基于超属性的语义分析生成微体系结构攻击模式

摘要: 软件的微体系结构安全验证已经出现了两种广义的方法。第一种基于语义安全属性（例如，非干扰），这些属性针对给定程序和硬件微体系结构的特定抽象模型进行验证。第二种基于攻击模式，如果在程序执行中发现，就表明存在漏洞。尽管前者使用可以捕获针对相同漏洞的多个小工具变体的正式规范，但受到验证的可伸缩性的限制。攻击模式虽然更具可扩展性，但目前必须手动构建，因为它们的范围更窄，且对小工具特定结构敏感。本文开发了一种技术，根据基于非干扰的语义安全超属性，自动生成攻击模式，直到达到一定复杂度参数（称为骨架大小）。因此，我们结合了两种方法的优势：安全性可以由统一捕获多个小工具变体的超属性来指定，同时可以使用自动生成的模式进行可伸缩验证。我们在一个工具中实现了我们的方法，并展示了使用生成的模式提高的可扩展性，例如，对SpectreV1、SpectreV4。

更新时间: 2024-06-08 08:54:27

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2406.05403v1

On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models. In this study, towards training on further larger scales, we identify a specific parameterization for second-order optimization that promotes feature learning in a stable manner even if the network width increases significantly. Inspired by a maximal update parameterization, we consider a one-step update of the gradient and reveal the appropriate scales of hyperparameters including random initialization, learning rates, and damping terms. Our approach covers two major second-order optimization algorithms, K-FAC and Shampoo, and we demonstrate that our parameterization achieves higher generalization performance in feature learning. In particular, it enables us to transfer the hyperparameters across models with different widths.

Updated: 2024-06-08 08:45:12

标题: 关于针对无限宽度有效的二阶优化参数化

摘要: 第二阶优化已经被开发出来加速深度神经网络的训练，并且正在应用于越来越大规模的模型。在本研究中，为了在更大规模上进行训练，我们确定了一个特定的参数化方法，可以促进特征学习稳定进行，即使网络宽度显著增加。受最大更新参数化的启发，我们考虑了梯度的一步更新，并揭示了包括随机初始化、学习率和阻尼项在内的超参数的适当规模。我们的方法涵盖了两种主要的第二阶优化算法，K-FAC和Shampoo，并且我们证明了我们的参数化在特征学习中实现了更高的泛化性能。特别是，它使我们能够在具有不同宽度的模型之间转移超参数。

更新时间: 2024-06-08 08:45:12

领域: cs.LG

下载: http://arxiv.org/abs/2312.12226v2

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. Toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. On this basis, this study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE). By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.

Updated: 2024-06-08 08:41:32

标题: 边缘计算中无线LLM推断的自适应层分割：基于模型的强化学习方法

摘要: 在边缘计算环境中优化大型语言模型（LLMs）的部署对于增强隐私和计算效率至关重要。为了实现在边缘计算中高效的无线LLM推断，本研究全面分析了主流开源LLMs中不同分割点的影响。在此基础上，本研究引入了一个灵感来自基于模型的强化学习（MBRL）的框架，以确定跨边缘和用户设备（UE）的最佳分割点。通过结合奖励替代模型，我们的方法显著减少了频繁性能评估的计算成本。大量模拟表明，该方法有效地平衡了推理性能和计算负载在不同网络条件下，为LLM在分散设置中的部署提供了稳健的解决方案。

更新时间: 2024-06-08 08:41:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.02616v3

Mean-field Chaos Diffusion Models

In this paper, we introduce a new class of score-based generative models (SGMs) designed to handle high-cardinality data distributions by leveraging concepts from mean-field theory. We present mean-field chaos diffusion models (MF-CDMs), which address the curse of dimensionality inherent in high-cardinality data by utilizing the propagation of chaos property of interacting particles. By treating high-cardinality data as a large stochastic system of interacting particles, we develop a novel score-matching method for infinite-dimensional chaotic particle systems and propose an approximation scheme that employs a subdivision strategy for efficient training. Our theoretical and empirical results demonstrate the scalability and effectiveness of MF-CDMs for managing large high-cardinality data structures, such as 3D point clouds.

Updated: 2024-06-08 08:24:06

标题: 平均场混沌扩散模型

摘要: 在本文中，我们介绍了一种新的基于分数的生成模型（SGMs），旨在通过利用均场理论的概念处理高基数数据分布。我们提出了均场混沌扩散模型（MF-CDMs），该模型通过利用相互作用粒子的混沌传播特性来解决高基数数据固有的维度诅咒问题。通过将高基数数据视为一个大型随机系统的相互作用粒子，我们为无限维度的混沌粒子系统开发了一种新颖的得分匹配方法，并提出了一种采用细分策略的近似方案，以实现高效训练。我们的理论和实证结果表明，MF-CDMs在处理大型高基数数据结构（例如3D点云）方面具有可扩展性和有效性。

更新时间: 2024-06-08 08:24:06

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.05396v1

A Single Graph Convolution Is All You Need: Efficient Grayscale Image Classification

Image classifiers often rely on convolutional neural networks (CNN) for their tasks, which are inherently more heavyweight than multilayer perceptrons (MLPs), which can be problematic in real-time applications. Additionally, many image classification models work on both RGB and grayscale datasets. Classifiers that operate solely on grayscale images are much less common. Grayscale image classification has diverse applications, including but not limited to medical image classification and synthetic aperture radar (SAR) automatic target recognition (ATR). Thus, we present a novel grayscale (single channel) image classification approach using a vectorized view of images. We exploit the lightweightness of MLPs by viewing images as a vector and reducing our problem setting to the grayscale image classification setting. We find that using a single graph convolutional layer batch-wise increases accuracy and reduces variance in the performance of our model. Moreover, we develop a customized accelerator on FPGA for the proposed model with several optimizations to improve its performance. Our experimental results on benchmark grayscale image datasets demonstrate the effectiveness of the proposed model, achieving vastly lower latency (up to 16$\times$ less) and competitive or leading performance compared to other state-of-the-art image classification models on various domain-specific grayscale image classification datasets.

Updated: 2024-06-08 08:21:26

标题: 一次图卷积就够了：高效的灰度图像分类

摘要: 图像分类器通常依赖于卷积神经网络（CNN）来完成任务，这种网络在计算上比多层感知器（MLP）更加复杂，这在实时应用中可能会出现问题。此外，许多图像分类模型适用于RGB和灰度数据集。仅处理灰度图像的分类器较少见。灰度图像分类具有各种应用，包括但不限于医学图像分类和合成孔径雷达（SAR）自动目标识别（ATR）。因此，我们提出了一种新颖的灰度（单通道）图像分类方法，利用图像的向量化视图。我们通过将图像视为矢量并将问题设置为灰度图像分类设置来利用MLP的轻量级优势。我们发现，批处理方式使用单个图形卷积层可以提高准确性并减少模型性能的差异。此外，我们在FPGA上为所提出的模型开发了定制加速器，并进行了多项优化以改善其性能。我们在基准灰度图像数据集上的实验结果表明，所提出的模型的有效性，其延迟大大降低（最多降低16倍），并且与其他最先进的图像分类模型在各种特定领域的灰度图像分类数据集上具有竞争力或领先的性能。

更新时间: 2024-06-08 08:21:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.00564v3

Automating the Correctness Assessment of AI-generated Code for Security Contexts

Evaluating the correctness of code generated by AI is a challenging open problem. In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with the human evaluation (Pearson's correlation coefficient r=0.84 on average). Finally, since it is a fully automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ~0.17s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience.

Updated: 2024-06-08 08:19:46

标题: 自动化AI生成代码在安全环境下的正确性评估

摘要: 评估由人工智能生成的代码的正确性是一个具有挑战性的开放问题。在本文中，我们提出了一种名为ACCA的全自动方法，用于评估人工智能生成的代码的正确性，以确保安全性。该方法使用符号执行来评估人工智能生成的代码是否与参考实现行为一致。我们使用ACCA来评估四种用于生成面向安全性的汇编代码的最先进模型，并将评估结果与不同基准解决方案进行比较，其中包括在该领域广泛使用的输出相似性度量以及由OpenAI开发的人工智能动力语言模型ChatGPT。我们的实验表明，我们的方法胜过基准解决方案，并评估人工智能生成的代码的正确性类似于基于人类的评估，这在该领域被认为是评估的真实标准。此外，ACCA与人类评估具有非常强的相关性（平均皮尔逊相关系数r=0.84）。最后，由于它是一个完全自动化的解决方案，不需要任何人类干预，所以所提出的方法平均每个代码片段的评估时间为约0.17秒，这明显低于根据我们的经验人类分析员手动检查代码所需的平均时间。

更新时间: 2024-06-08 08:19:46

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2310.18834v2

TabSynDex: A Universal Metric for Robust Evaluation of Synthetic Tabular Data

Synthetic tabular data generation becomes crucial when real data is limited, expensive to collect, or simply cannot be used due to privacy concerns. However, producing good quality synthetic data is challenging. Several probabilistic, statistical, generative adversarial networks (GANs), and variational auto-encoder (VAEs) based approaches have been presented for synthetic tabular data generation. Once generated, evaluating the quality of the synthetic data is quite challenging. Some of the traditional metrics have been used in the literature but there is lack of a common, robust, and single metric. This makes it difficult to properly compare the effectiveness of different synthetic tabular data generation methods. In this paper we propose a new universal metric, TabSynDex, for robust evaluation of synthetic data. The proposed metric assesses the similarity of synthetic data with real data through different component scores which evaluate the characteristics that are desirable for ``high quality'' synthetic data. Being a single score metric and having an implicit bound, TabSynDex can also be used to observe and evaluate the training of neural network based approaches. This would help in obtaining insights that was not possible earlier. We present several baseline models for comparative analysis of the proposed evaluation metric with existing generative models. We also give a comparative analysis between TabSynDex and existing synthetic tabular data evaluation metrics. This shows the effectiveness and universality of our metric over the existing metrics. Source Code: \url{https://github.com/vikram2000b/tabsyndex}

Updated: 2024-06-08 08:13:22

标题: TabSynDex：用于对合成表格数据进行鲁棒评估的通用度量标准

摘要: 合成表格数据生成在真实数据有限、收集昂贵或由于隐私问题无法使用时变得至关重要。然而，生成高质量的合成数据是具有挑战性的。已经提出了几种基于概率、统计、生成对抗网络（GANs）和变分自动编码器（VAEs）的方法用于合成表格数据生成。一旦生成，评估合成数据的质量就变得非常具有挑战性。文献中使用了一些传统指标，但缺乏一个通用、稳健和单一的指标。这使得难以正确比较不同合成表格数据生成方法的有效性。在本文中，我们提出了一个新的通用指标TabSynDex，用于对合成数据进行稳健评估。所提出的指标通过评估不同组件得分来评估合成数据与真实数据的相似性，这些得分评估了“高质量”合成数据所需的特征。作为一个单一分数指标并具有隐含边界，TabSynDex也可以用于观察和评估基于神经网络的方法的训练。这将有助于获得以前无法实现的见解。我们提供了几个基线模型，用于比较分析所提出的评估指标与现有生成模型。我们还对TabSynDex和现有合成表格数据评估指标进行了比较分析。这显示了我们的指标在现有指标上的有效性和普适性。源代码：\url{https://github.com/vikram2000b/tabsyndex}

更新时间: 2024-06-08 08:13:22

领域: cs.LG

下载: http://arxiv.org/abs/2207.05295v2

Dynamic importance learning using fisher information gain for nonlinear system identification

The Fisher Information Matrix (FIM) provides a way for quantifying the information content of an observable random variable concerning unknown parameters within a model that characterizes the variable. When parameters in a model are directly linked to individual features, the diagonal elements of the FIM can signify the relative importance of each feature. However, in scenarios where feature interactions may exist, a comprehensive exploration of the full FIM is necessary rather than focusing solely on its diagonal elements. This paper presents an end-to-end black box system identification approach that integrates the FIM into the training process to gain insights into dynamic importance and overall model structure. A decision module is added to the first layer of the network to determine the relevance scores using the entire FIM as input. The forward propagation is then performed on element-wise multiplication of inputs and relevance scores. Simulation results demonstrate that the proposed methodology effectively captures various types of interactions between dynamics, outperforming existing methods limited to polynomial interactions. Moreover, the effectiveness of this novel approach is confirmed through its application in identifying a real-world industrial system, specifically the PH neutralization process.

Updated: 2024-06-08 08:12:41

标题: 使用费舍尔信息增益进行非线性系统识别的动态重要性学习

摘要: 费舍尔信息矩阵（FIM）提供了一种量化可观测随机变量关于未知参数的信息内容的方式，这些参数在描述变量的模型中直接与个体特征相关联。当模型中的参数与个体特征直接相关时，FIM的对角元素可以表示每个特征的相对重要性。然而，在存在特征交互的情况下，有必要全面探索完整FIM，而不仅仅关注其对角元素。本文提出了一种端到端的黑盒系统识别方法，将FIM整合到训练过程中，以获得对动态重要性和整体模型结构的洞察。在网络的第一层添加了一个决策模块，使用整个FIM作为输入来确定相关性得分。然后，在输入和相关性得分的元素乘积上执行前向传播。模拟结果表明，所提出的方法有效捕捉了动态之间各种类型的相互作用，优于仅限于多项式相互作用的现有方法。此外，通过其在识别真实世界工业系统（具体地说是PH中和过程）中的应用，确认了这种新颖方法的有效性。

更新时间: 2024-06-08 08:12:41

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.05395v1

Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, and data privacy, to emerging problems like truthfulness and social norms. We critically analyze existing research aimed at understanding, examining, and mitigating these ethical risks. Our survey underscores integrating ethical standards and societal values into the development of LLMs, thereby guiding the development of responsible and ethically aligned language models.

Updated: 2024-06-08 07:55:01

标题: 拆解大型语言模型的伦理：从长期存在的问题到新兴的困境

摘要: 大语言模型(LLMs)在最近几年在各种语言建模任务中取得了空前的成功。然而，这一进展也加剧了伦理关切，影响了LLMs在日常环境中的部署。本文全面调查了与LLMs相关的伦理挑战，从长期存在的问题如侵犯版权、系统偏见和数据隐私，到新兴问题如真实性和社会规范。我们批判性地分析了现有研究，旨在理解、审查和减轻这些伦理风险。我们的调查强调了将伦理标准和社会价值观纳入LLMs的发展中，从而引导负责任和伦理对齐的语言模型的发展。

更新时间: 2024-06-08 07:55:01

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.05392v1

A Study in Dataset Pruning for Image Super-Resolution

In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset pruning to solve these challenges. We introduce a novel approach that reduces a dataset to a core-set of training samples, selected based on their loss values as determined by a simple pre-trained SR model. By focusing the training on just 50\% of the original dataset, specifically on the samples characterized by the highest loss values, we achieve results comparable to or surpassing those obtained from training on the entire dataset. Interestingly, our analysis reveals that the top 5\% of samples with the highest loss values negatively affect the training process. Excluding these samples and adjusting the selection to favor easier samples further enhances training outcomes. Our work opens new perspectives to the untapped potential of dataset pruning in image SR. It suggests that careful selection of training data based on loss-value metrics can lead to better SR models, challenging the conventional wisdom that more data inevitably leads to better performance.

Updated: 2024-06-08 07:53:16

标题: 图像超分辨率数据集修剪研究

摘要: 在图像超分辨率（SR）中，依赖大型数据集进行训练是一把双刃剑。虽然提供了丰富的训练材料，但也需要大量的计算和存储资源。在这项工作中，我们分析了数据集修剪来解决这些挑战。我们引入了一种新颖的方法，将数据集减少到一组核心训练样本，这些样本是基于它们的损失值由一个简单的预训练SR模型确定选择的。通过将训练重点放在原始数据集的仅50％上，特别是在那些具有最高损失值特征的样本上，我们实现了与或超过在整个数据集上训练获得的结果。有趣的是，我们的分析表明，具有最高损失值的前5％样本对训练过程产生了负面影响。排除这些样本并调整选择以支持更容易的样本进一步增强了训练结果。我们的工作为图像SR中数据集修剪的未开发潜力开辟了新的视角。它表明，基于损失值指标的训练数据的谨慎选择可以导致更好的SR模型，挑战了更多数据不可避免地导致更好性能的传统智慧。

更新时间: 2024-06-08 07:53:16

领域: eess.IV,cs.AI,cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2403.17083v2

EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction

Predicting the binding sites of target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2) sensitive to rotations; 3) insufficient to characterize the protein surface; 4) unaware of protein size shift. To address the above issues, this work proposes EquiPocket, an E(3)-equivariant Graph Neural Network (GNN) for binding site prediction, which comprises three modules: the first one to extract local geometric information for each surface atom, the second one to model both the chemical and spatial structure of protein and the last one to capture the geometry of the surface via equivariant message passing over the surface atoms. We further propose a dense attention output layer to alleviate the effect incurred by variable protein size. Extensive experiments on several representative benchmarks demonstrate the superiority of our framework to the state-of-the-art methods.

Updated: 2024-06-08 07:51:46

标题: EquiPocket：一种用于配体结合位点预测的E(3)-等变几何图神经网络

摘要: 预测靶蛋白的结合位点在药物发现中起着基础性作用。大多数现有的深度学习方法将蛋白质视为一个三维图像，通过将其原子空间聚类为体素，然后将体素化的蛋白质输入到三维卷积神经网络（CNN）进行预测。然而，基于CNN的方法遇到了几个关键问题：1）难以表示不规则的蛋白质结构；2）对旋转敏感；3）无法充分表征蛋白质表面；4）不了解蛋白质大小的变化。为了解决上述问题，本文提出了EquiPocket，一种用于结合位点预测的E(3)-等变图神经网络（GNN），包括三个模块：第一个模块用于提取每个表面原子的局部几何信息，第二个模块用于建模蛋白质的化学和空间结构，最后一个模块通过在表面原子上进行等变消息传递来捕捉表面的几何形状。我们进一步提出了一个密集的注意力输出层，以减轻由可变蛋白质大小引起的影响。对几个代表性基准测试进行的大量实验表明，我们的框架优于现有方法。

更新时间: 2024-06-08 07:51:46

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2302.12177v2

Constructive Interpolation and Concept-Based Beth Definability for Description Logics via Sequents

We introduce a constructive method applicable to a large number of description logics (DLs) for establishing the concept-based Beth definability property (CBP) based on sequent systems. Using the highly expressive DL RIQ as a case study, we introduce novel sequent calculi for RIQ-ontologies and show how certain interpolants can be computed from sequent calculus proofs, which permit the extraction of explicit definitions of implicitly definable concepts. To the best of our knowledge, this is the first sequent-based approach to computing interpolants and definitions within the context of DLs, as well as the first proof that RIQ enjoys the CBP. Moreover, due to the modularity of our sequent systems, our results hold for any restriction of RIQ, and are applicable to other DLs by suitable modifications.

Updated: 2024-06-08 07:48:20

标题: 通过Sequents进行的描述逻辑的建设性插值和基于概念的Beth可定义性

摘要: 我们介绍了一种适用于大量描述逻辑（DLs）的建设性方法，用于建立基于概念的贝斯可定义性质（CBP），基于序理系统。以高度表达性的DL RIQ为案例研究，我们为RIQ本体论引入了新颖的序理演算，并展示了如何从序理演算证明中计算出某些插值物，从而允许提取出隐含可定义概念的显式定义。据我们所知，这是第一个基于序理的方法来计算插值物和定义DLs上下文中的第一个证明，RIQ享有CBP的证明。此外，由于我们序理系统的模块化性，我们的结果适用于RIQ的任何限制，并通过适当的修改适用于其他DLs。

更新时间: 2024-06-08 07:48:20

领域: cs.LO,cs.AI,cs.DB,math.LO

下载: http://arxiv.org/abs/2404.15840v2

DUPLEX: Dual GAT for Complex Embedding of Directed Graphs

Current directed graph embedding methods build upon undirected techniques but often inadequately capture directed edge information, leading to challenges such as: (1) Suboptimal representations for nodes with low in/out-degrees, due to the insufficient neighbor interactions; (2) Limited inductive ability for representing new nodes post-training; (3) Narrow generalizability, as training is overly coupled with specific tasks. In response, we propose DUPLEX, an inductive framework for complex embeddings of directed graphs. It (1) leverages Hermitian adjacency matrix decomposition for comprehensive neighbor integration, (2) employs a dual GAT encoder for directional neighbor modeling, and (3) features two parameter-free decoders to decouple training from particular tasks. DUPLEX outperforms state-of-the-art models, especially for nodes with sparse connectivity, and demonstrates robust inductive capability and adaptability across various tasks. The code is available at https://github.com/alipay/DUPLEX.

Updated: 2024-06-08 07:48:16

标题: 双GAT：用于有向图复杂嵌入的双GAT

摘要: 当前的有向图嵌入方法建立在无向技术的基础上，但通常不能充分捕捉有向边信息，导致诸如以下挑战：(1) 由于邻居交互不足，导致低入/出度节点的表示不佳；(2) 训练后对新节点的归纳能力有限；(3) 训练与特定任务过于耦合，导致泛化能力较窄。为此，我们提出了DUPLEX，一个用于复杂有向图嵌入的归纳框架。它(1) 利用埃尔米特邻接矩阵分解进行综合邻居整合，(2) 使用双GAT编码器进行方向性邻居建模，(3) 具有两个无参数解码器，将训练与特定任务解耦。DUPLEX在性能上优于最先进的模型，特别是对于稀疏连接的节点，展现出强大的归纳能力和适应性，适用于各种任务。源代码可在https://github.com/alipay/DUPLEX获取。

更新时间: 2024-06-08 07:48:16

领域: cs.LG

下载: http://arxiv.org/abs/2406.05391v1

An Empirically Grounded Reference Architecture for Software Supply Chain Metadata Management

With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Adopting SSC metadata requires organisations to procure or develop a Software Supply Chain Metadata Management system (SCM2), a suite of software tools for performing life cycle activities of SSC metadata documents such as creation, signing, distribution, and consumption. Selecting or developing an SCM2 is challenging due to the lack of a comprehensive domain model and architectural blueprint to aid practitioners in navigating the vast design space of SSC metadata terminologies, frameworks, and solutions. This paper addresses the above-mentioned challenge by presenting an empirically grounded Reference Architecture (RA) comprising of a domain model and an architectural blueprint for SCM2 systems. Our proposed RA is constructed systematically on an empirical foundation built with industry-driven and peer-reviewed SSC security frameworks. Our theoretical evaluation, which consists of an architectural mapping of five prominent SSC security tools on the RA, ensures its validity and applicability, thus affirming the proposed RA as an effective framework for analysing existing SCM2 solutions and guiding the engineering of new SCM2 systems.

Updated: 2024-06-08 07:48:11

标题: 一个经验基础的软件供应链元数据管理参考架构

摘要: 随着软件供应链（SSC）攻击的迅速增加，组织需要对其软件清单的整个SSC具有彻底和可信的可见性，以便及早发现风险，并在发生SSC攻击时迅速识别受损资产。实现这种可见性的一种方式是通过SSC元数据，即描述工件生命周期的机器可读和经过身份验证的文档。采用SSC元数据需要组织采购或开发一个软件供应链元数据管理系统（SCM2），这是一套用于执行SSC元数据文档的生命周期活动的软件工具，如创建、签名、分发和消费。由于缺乏全面的领域模型和架构蓝图，帮助从业者在导航SSC元数据术语、框架和解决方案的庞大设计空间中进行选择或开发SCM2是具有挑战性的。本文通过提出一个经验基础的参考架构（RA），包括一个领域模型和一个SCM2系统的架构蓝图，来解决上述挑战。我们提出的RA是系统地建立在以业界驱动和同行评审的SSC安全框架为基础上。我们的理论评估包括将五个著名的SSC安全工具在RA上进行架构映射，以确保其有效性和适用性，并因此确认所提出的RA作为分析现有SCM2解决方案和指导新SCM2系统工程的有效框架。

更新时间: 2024-06-08 07:48:11

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2310.06300v2

Simplification of Risk Averse POMDPs with Performance Guarantees

Risk averse decision making under uncertainty in partially observable domains is a fundamental problem in AI and essential for reliable autonomous agents. In our case, the problem is modeled using partially observable Markov decision processes (POMDPs), when the value function is the conditional value at risk (CVaR) of the return. Calculating an optimal solution for POMDPs is computationally intractable in general. In this work we develop a simplification framework to speedup the evaluation of the value function, while providing performance guarantees. We consider as simplification a computationally cheaper belief-MDP transition model, that can correspond, e.g., to cheaper observation or transition models. Our contributions include general bounds for CVaR that allow bounding the CVaR of a random variable X, using a random variable Y, by assuming bounds between their cumulative distributions. We then derive bounds for the CVaR value function in a POMDP setting, and show how to bound the value function using the computationally cheaper belief-MDP transition model and without accessing the computationally expensive model in real-time. Then, we provide theoretical performance guarantees for the estimated bounds. Our results apply for a general simplification of a belief-MDP transition model and support simplification of both the observation and state transition models simultaneously.

Updated: 2024-06-08 07:37:12

标题: 风险规避POMDPs的简化及性能保证

摘要: 在部分可观察领域中，面临不确定性的风险规避决策是人工智能中的一个基本问题，对于可靠的自主代理至关重要。在我们的情况下，该问题使用部分可观察马尔可夫决策过程（POMDPs）建模，其中价值函数是回报的条件风险值（CVaR）。一般情况下，计算POMDPs的最优解是计算上棘手的。在这项工作中，我们开发了一个简化框架，以加快价值函数的评估，同时提供性能保证。我们将计算成本更低的信念-MDP转移模型作为简化，例如，可以对应更便宜的观察或转移模型。我们的贡献包括CVaR的一般界限，允许通过假设它们的累积分布之间存在界限，使用随机变量Y来界定随机变量X的CVaR。然后，我们推导了POMDP设置中CVaR价值函数的界限，并展示如何通过计算成本更低的信念-MDP转移模型来界定价值函数，而无需在实时中访问计算成本昂贵的模型。接着，我们为估算的界限提供了理论性能保证。我们的结果适用于信念-MDP转移模型的一般简化，并支持同时简化观察和状态转移模型。

更新时间: 2024-06-08 07:37:12

领域: cs.AI

下载: http://arxiv.org/abs/2406.03000v2

Improving Antibody Humanness Prediction using Patent Data

We investigate the potential of patent data for improving the antibody humanness prediction using a multi-stage, multi-loss training process. Humanness serves as a proxy for the immunogenic response to antibody therapeutics, one of the major causes of attrition in drug discovery and a challenging obstacle for their use in clinical settings. We pose the initial learning stage as a weakly-supervised contrastive-learning problem, where each antibody sequence is associated with possibly multiple identifiers of function and the objective is to learn an encoder that groups them according to their patented properties. We then freeze a part of the contrastive encoder and continue training it on the patent data using the cross-entropy loss to predict the humanness score of a given antibody sequence. We illustrate the utility of the patent data and our approach by performing inference on three different immunogenicity datasets, unseen during training. Our empirical results demonstrate that the learned model consistently outperforms the alternative baselines and establishes new state-of-the-art on five out of six inference tasks, irrespective of the used metric.

Updated: 2024-06-08 07:14:03

标题: 利用专利数据改进抗体人性预测

摘要: 我们研究了专利数据在改善抗体人性预测中的潜力，采用多阶段、多损失训练过程。人性作为抗体治疗的免疫原性反应的代理，是药物发现中的主要失效原因之一，也是临床设置中使用它们的挑战性障碍之一。我们将初始学习阶段设定为一个弱监督对比学习问题，其中每个抗体序列与可能的多个功能标识符相关联，目标是学习一个编码器，根据其专利属性将它们分组。然后我们冻结部分对比编码器，并继续使用交叉熵损失在专利数据上训练它，以预测给定抗体序列的人性评分。我们通过对三个不同的免疫原性数据集进行推断，证明了专利数据和我们的方法的实用性，在训练期间未见。我们的实证结果表明，学习模型始终优于替代基线，并在六个推断任务中的五个上建立了新的最新技术，无论使用的指标如何。

更新时间: 2024-06-08 07:14:03

领域: q-bio.QM,cs.LG,stat.ML

下载: http://arxiv.org/abs/2401.14442v3

Adversarial flows: A gradient flow characterization of adversarial attacks

A popular method to perform adversarial attacks on neuronal networks is the so-called fast gradient sign method and its iterative variant. In this paper, we interpret this method as an explicit Euler discretization of a differential inclusion, where we also show convergence of the discretization to the associated gradient flow. To do so, we consider the concept of p-curves of maximal slope in the case $p=\infty$. We prove existence of $\infty$-curves of maximum slope and derive an alternative characterization via differential inclusions. Furthermore, we also consider Wasserstein gradient flows for potential energies, where we show that curves in the Wasserstein space can be characterized by a representing measure on the space of curves in the underlying Banach space, which fulfill the differential inclusion. The application of our theory to the finite-dimensional setting is twofold: On the one hand, we show that a whole class of normalized gradient descent methods (in particular signed gradient descent) converge, up to subsequences, to the flow, when sending the step size to zero. On the other hand, in the distributional setting, we show that the inner optimization task of adversarial training objective can be characterized via $\infty$-curves of maximum slope on an appropriate optimal transport space.

Updated: 2024-06-08 07:05:26

标题: 对抗流：对抗性攻击的梯度流特征化

摘要: 一种在神经网络上进行对抗性攻击的流行方法是所谓的快速梯度符号方法及其迭代变体。在本文中，我们将这种方法解释为微分包含的显式欧拉离散化，同时展示了离散化收敛到相关的梯度流。为此，我们考虑了$p=\infty$情况下最大斜率的p-曲线的概念。我们证明了最大斜率的$\infty$-曲线的存在，并通过微分包含推导了一种替代特征。此外，我们还考虑了潜在能量的Wasserstein梯度流，在那里我们展示了Wasserstein空间中的曲线可以通过在基本Banach空间中满足微分包含的曲线上的代表性测量来刻画。我们的理论在有限维设置中的应用是双重的：一方面，我们展示了一整类归一化梯度下降方法（特别是带符号的梯度下降）在将步长趋近于零时，收敛到流，直到子序列。另一方面，在分布设置中，我们展示了对抗性训练目标的内部优化任务可以通过适当的最优输运空间上的最大斜率的$\infty$-曲线来刻画。

更新时间: 2024-06-08 07:05:26

领域: cs.LG,math.AP,49Q20, 34A60, 68Q32, 65K15

下载: http://arxiv.org/abs/2406.05376v1

LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis

Root cause analysis (RCA) is crucial for enhancing the reliability and performance of complex systems. However, progress in this field has been hindered by the lack of large-scale, open-source datasets tailored for RCA. To bridge this gap, we introduce LEMMA-RCA, a large dataset designed for diverse RCA tasks across multiple domains and modalities. LEMMA-RCA features various real-world fault scenarios from IT and OT operation systems, encompassing microservices, water distribution, and water treatment systems, with hundreds of system entities involved. We evaluate the quality of LEMMA-RCA by testing the performance of eight baseline methods on this dataset under various settings, including offline and online modes as well as single and multiple modalities. Our experimental results demonstrate the high quality of LEMMA-RCA. The dataset is publicly available at https://lemma-rca.github.io/.

Updated: 2024-06-08 07:00:31

标题: LEMMA-RCA：一个用于根本原因分析的大型多模态多领域数据集

摘要: 根本原因分析（RCA）对于提高复杂系统的可靠性和性能至关重要。然而，该领域的进展受到缺乏专为RCA量身定制的大规模开源数据集的阻碍。为了弥补这一差距，我们介绍了LEMMA-RCA，这是一个针对多个领域和多种模态的不同RCA任务设计的大型数据集。LEMMA-RCA包含来自IT和OT操作系统的各种真实世界故障场景，涵盖了微服务、供水和水处理系统，涉及数百个系统实体。我们通过在该数据集上测试八种基准方法的性能，包括离线和在线模式以及单一和多个模态，评估了LEMMA-RCA的质量。我们的实验结果表明了LEMMA-RCA的高质量。该数据集可在https://lemma-rca.github.io/ 上公开获取。

更新时间: 2024-06-08 07:00:31

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05375v1

Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation

Graph unlearning has emerged as a pivotal method to delete information from a pre-trained graph neural network (GNN). One may delete nodes, a class of nodes, edges, or a class of edges. An unlearning method enables the GNN model to comply with data protection regulations (i.e., the right to be forgotten), adapt to evolving data distributions, and reduce the GPU-hours carbon footprint by avoiding repetitive retraining. Existing partitioning and aggregation-based methods have limitations due to their poor handling of local graph dependencies and additional overhead costs. More recently, GNNDelete offered a model-agnostic approach that alleviates some of these issues. Our work takes a novel approach to address these challenges in graph unlearning through knowledge distillation, as it distills to delete in GNN (D2DGN). It is a model-agnostic distillation framework where the complete graph knowledge is divided and marked for retention and deletion. It performs distillation with response-based soft targets and feature-based node embedding while minimizing KL divergence. The unlearned model effectively removes the influence of deleted graph elements while preserving knowledge about the retained graph elements. D2DGN surpasses the performance of existing methods when evaluated on various real-world graph datasets by up to $43.1\%$ (AUC) in edge and node unlearning tasks. Other notable advantages include better efficiency, better performance in removing target elements, preservation of performance for the retained elements, and zero overhead costs. Notably, our D2DGN surpasses the state-of-the-art GNNDelete in AUC by $2.4\%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.

Updated: 2024-06-08 06:50:47

标题: 精炼以删除：知识蒸馏中的图网络中的取消学习

摘要: 图形取消学习已经成为从预训练图神经网络（GNN）中删除信息的关键方法。一个可以删除节点，节点类，边或边类。取消学习方法使GNN模型能够遵守数据保护法规（即被遗忘的权利），适应不断变化的数据分布，并通过避免重复训练来减少GPU小时的碳足迹。由于它们对本地图依赖性的处理不佳以及额外的开销，现有的基于分区和聚合的方法存在局限性。最近，GNNDelete提供了一种模型无关的方法，可以缓解其中一些问题。我们的工作采用了一种新颖的方法来解决图取消学习中的挑战，即通过知识蒸馏来删除GNN中的知识（D2DGN）。这是一个模型无关的蒸馏框架，其中完整的图知识被划分并标记为保留和删除。它使用基于响应的软目标和基于特征的节点嵌入进行蒸馏，同时最小化KL散度。未学习的模型有效地消除了删除图元素的影响，同时保留了有关保留图元素的知识。在各种真实世界的图数据集上评估时，D2DGN在边和节点取消学习任务上的性能超过现有方法高达$43.1\%$（AUC）。其他显着优势包括更好的效率，在删除目标元素方面的更好性能，保留元素的性能，并且没有额外的开销。值得注意的是，我们的D2DGN在AUC方面超过了最先进的GNNDelete $2.4\%$，增加了成员推断比率$+1.3$，每次前向传递需要较少的$10.2\times10^6$ FLOP并且速度最多快$\mathbf{3.2}\times$。

更新时间: 2024-06-08 06:50:47

领域: cs.LG

下载: http://arxiv.org/abs/2309.16173v2

Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization

Training Deep Neural Networks (DNNs) with adversarial examples often results in poor generalization to test-time adversarial data. This paper investigates this issue, known as adversarially robust generalization, through the lens of Rademacher complexity. Building upon the studies by Khim and Loh (2018); Yin et al. (2019), numerous works have been dedicated to this problem, yet achieving a satisfactory bound remains an elusive goal. Existing works on DNNs either apply to a surrogate loss instead of the robust loss or yield bounds that are notably looser compared to their standard counterparts. In the latter case, the bounds have a higher dependency on the width $m$ of the DNNs or the dimension $d$ of the data, with an extra factor of at least $\mathcal{O}(\sqrt{m})$ or $\mathcal{O}(\sqrt{d})$. This paper presents upper bounds for adversarial Rademacher complexity of DNNs that match the best-known upper bounds in standard settings, as established in the work of Bartlett et al. (2017), with the dependency on width and dimension being $\mathcal{O}(\ln(dm))$. The central challenge addressed is calculating the covering number of adversarial function classes. We aim to construct a new cover that possesses two properties: 1) compatibility with adversarial examples, and 2) precision comparable to covers used in standard settings. To this end, we introduce a new variant of covering number called the \emph{uniform covering number}, specifically designed and proven to reconcile these two properties. Consequently, our method effectively bridges the gap between Rademacher complexity in robust and standard generalization.

Updated: 2024-06-08 06:45:19

标题: 填補差距：在強健和標準泛化中的Rademacher複雜性

摘要: 在对抗性示例下训练深度神经网络（DNNs）通常会导致对测试时的对抗数据的泛化性能较差。本文通过Rademacher复杂度的视角调查了这个问题，即对抗性鲁棒泛化。在Khim和Loh（2018）以及Yin等人（2019）的研究基础上，已经有许多研究致力于解决这个问题，但是达到令人满意的界限仍然是一个难以实现的目标。现有的关于DNNs的工作要么适用于替代损失而不是鲁棒损失，要么产生的界限与它们的标准对应物相比明显较松。在后一种情况下，这些界限对DNNs的宽度$m$或数据的维度$d$有更高的依赖性，至少有一个额外的因子$\mathcal{O}(\sqrt{m})$或$\mathcal{O}(\sqrt{d})$。本文提出了DNNs对抗性Rademacher复杂度的上界，与标准设置中已知的最佳上界相匹配，正如Bartlett等人（2017）的研究所建立的那样，其对宽度和维度的依赖性为$\mathcal{O}(\ln(dm))$。所解决的核心挑战是计算对抗性函数类的覆盖数。我们的目标是构建一个具有两个特性的新覆盖：1）与对抗性示例兼容，2）精度与标准设置中使用的覆盖相当。为此，我们引入了一种称为\emph{均匀覆盖数}的新变体，专门设计并证明可以调和这两个属性。因此，我们的方法有效地弥合了对抗和标准泛化中Rademacher复杂度之间的差距。

更新时间: 2024-06-08 06:45:19

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.05372v1

Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect

We introduce Venn Diagram (VD) Prompting, an innovative prompting technique which allows Large Language Models (LLMs) to combine and synthesize information across complex, diverse and long-context documents in knowledge-intensive question-answering tasks. Generating answers from multiple documents involves numerous steps to extract relevant and unique information and amalgamate it into a cohesive response. To improve the quality of the final answer, multiple LLM calls or pretrained models are used to perform different tasks such as summarization, reorganization and customization. The approach covered in the paper focuses on replacing the multi-step strategy via a single LLM call using VD prompting. Our proposed technique also aims to eliminate the inherent position bias in the LLMs, enhancing consistency in answers by removing sensitivity to the sequence of input information. It overcomes the challenge of inconsistency traditionally associated with varying input sequences. We also explore the practical applications of the VD prompt based on our examination of the prompt's outcomes. In the experiments performed on four public benchmark question-answering datasets, VD prompting continually matches or surpasses the performance of a meticulously crafted instruction prompt which adheres to optimal guidelines and practices.

Updated: 2024-06-08 06:27:26

标题: 文献标题翻译：文氏图提示：利用支架效应加速理解

摘要: 我们引入了Venn Diagram (VD) Prompting，这是一种创新的提示技术，允许大型语言模型（LLMs）在知识密集型问答任务中跨复杂、多样和长文本文档中结合和综合信息。从多个文档中生成答案涉及许多步骤，以提取相关和独特的信息，并将其融合成一个连贯的回答。为了提高最终答案的质量，使用多个LLM调用或预训练模型来执行不同的任务，如总结、重组和定制。本文涵盖的方法侧重于通过VD提示替换多步策略，只需使用一个LLM调用。我们提出的技术还旨在消除LLMs中固有的位置偏见，通过消除对输入信息顺序的敏感性来增强答案的一致性。它克服了传统上与不同输入顺序相关的不一致性挑战。我们还探讨了基于我们对提示结果的检查所进行的VD提示的实际应用。在四个公共基准问答数据集上进行的实验中，VD提示不断匹配或超越了严谨制作的指导提示的表现，后者遵循最佳指南和实践。

更新时间: 2024-06-08 06:27:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05369v1

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .

Updated: 2024-06-08 06:17:48

标题: 3D扩散策略：通过简单的3D表示学习可泛化的视觉运动策略

摘要: 模仿学习为教授机器人灵巧技能提供了一种有效的方法；然而，学习复杂的技能通常需要大量的人类示范，以使其具有鲁棒性和泛化性。为了解决这一具有挑战性的问题，我们提出了3D扩散策略（DP3），这是一种新颖的视觉模仿学习方法，将3D视觉表示的强大性融入到扩散策略中，这是一类条件动作生成模型。DP3的核心设计是利用一种紧凑的3D视觉表示，从稀疏点云中提取，并使用高效的点编码器。在我们的实验中涉及72个模拟任务，DP3仅通过10次示范就成功处理了大多数任务，并超过了基线方法，相对改善了24.2%。在4个真实机器人任务中，DP3仅通过每项任务40次示范就展示了精确的控制，成功率高达85%，并在多个方面展现出优秀的泛化能力，包括空间、视角、外观和实例。有趣的是，在真实机器人实验中，DP3很少违反安全要求，而基线方法经常需要人类干预。我们的广泛评估突显了3D表示在现实世界机器人学习中的关键重要性。视频、代码和数据可在https://3d-diffusion-policy.github.io 上获取。

更新时间: 2024-06-08 06:17:48

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.03954v6

Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

Aligning large language models (LLMs) with human values is imperative to mitigate potential adverse effects resulting from their misuse. Drawing from the sociological insight that acknowledging all parties' concerns is a key factor in shaping human values, this paper proposes a novel direction to align LLMs by themselves: social scene simulation. To achieve this, we present MATRIX, a novel social scene simulator that emulates realistic scenes around a user's input query, enabling the LLM to take social consequences into account before responding. MATRIX serves as a virtual rehearsal space, akin to a Monopolylogue, where the LLM performs diverse roles related to the query and practice by itself. To inject this alignment, we fine-tune the LLM with MATRIX-simulated data, ensuring adherence to human values without compromising inference speed. We theoretically show that the LLM with MATRIX outperforms Constitutional AI under mild assumptions. Finally, extensive experiments validate that our method outperforms over 10 baselines across 4 benchmarks. As evidenced by 875 user ratings, our tuned 13B-size LLM exceeds GPT-4 in aligning with human values. See our project page at https://shuotang123.github.io/MATRIX.

Updated: 2024-06-08 06:13:55

标题: 大型语言模型的自我调整：基于Monopolylogue的社交场景模拟

摘要: 将大型语言模型（LLMs）与人类价值观保持一致对于减轻由于它们被滥用而产生的潜在不良影响至关重要。借鉴社会学的见解，承认所有各方的关切是塑造人类价值观的关键因素，本文提出了一种新颖的方法来使LLMs自身保持一致：社会场景模拟。为了实现这一目标，我们提出了MATRIX，一种新颖的社会场景模拟器，模拟用户查询周围的现实场景，使LLM在回应之前考虑社会后果。MATRIX充当虚拟排练空间，类似于一个Monopolylogue，在那里LLM扮演与查询相关的各种角色并自行进行实践。为了注入这种一致性，我们使用MATRIX模拟的数据对LLM进行微调，确保遵守人类价值观而不影响推理速度。我们在理论上表明，LLM与MATRIX优于在温和假设下的宪法AI。最后，广泛的实验验证了我们的方法在4个基准测试中优于10个基线。根据875个用户评分的证据，我们调整后的13B大小的LLM在与人类价值观保持一致方面超过了GPT-4。请访问我们的项目页面https://shuotang123.github.io/MATRIX。

更新时间: 2024-06-08 06:13:55

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2402.05699v3

JointRF: End-to-End Joint Optimization for Dynamic Neural Radiance Field Representation and Compression

Neural Radiance Field (NeRF) excels in photo-realistically static scenes, inspiring numerous efforts to facilitate volumetric videos. However, rendering dynamic and long-sequence radiance fields remains challenging due to the significant data required to represent volumetric videos. In this paper, we propose a novel end-to-end joint optimization scheme of dynamic NeRF representation and compression, called JointRF, thus achieving significantly improved quality and compression efficiency against the previous methods. Specifically, JointRF employs a compact residual feature grid and a coefficient feature grid to represent the dynamic NeRF. This representation handles large motions without compromising quality while concurrently diminishing temporal redundancy. We also introduce a sequential feature compression subnetwork to further reduce spatial-temporal redundancy. Finally, the representation and compression subnetworks are end-to-end trained combined within the JointRF. Extensive experiments demonstrate that JointRF can achieve superior compression performance across various datasets.

Updated: 2024-06-08 06:12:05

标题: JointRF：端到端联合优化动态神经辐射场表示与压缩

摘要: 神经光辐射场（NeRF）在逼真的静态场景中表现出色，激发了许多努力以促进体积视频的实现。然而，由于表示体积视频所需的数据量巨大，渲染动态和长序列辐射场仍然具有挑战性。在本文中，我们提出了一种新颖的动态NeRF表示和压缩的端到端联合优化方案，称为JointRF，从而在质量和压缩效率方面显著提高了与先前方法相比的性能。具体来说，JointRF使用紧凑的残差特征网格和系数特征网格来表示动态NeRF。这种表示处理大运动而不会影响质量，同时减少了时间冗余。我们还引入了一个顺序特征压缩子网络，进一步减少了空间-时间冗余。最后，表示和压缩子网络在JointRF中被端到端地组合训练。大量实验证明，JointRF可以在各种数据集上实现卓越的压缩性能。

更新时间: 2024-06-08 06:12:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.14452v2

Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves $\widetilde{\mathcal{O}}(\log N)$ regret under a specific identifiability assumption, where $N$ is the total number of episodes. If the identifiability assumption is not satisfied, we propose incorporating exploration noise into the least-squares-based algorithm, resulting in an algorithm with $\widetilde{\mathcal{O}}(\sqrt{N})$ regret. To our best knowledge, this is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator. Our proof relies on perturbation analysis of less-standard Riccati equations for risk-sensitive linear quadratic control, and a delicate analysis of the loss in the risk-sensitive performance criterion due to applying the suboptimal controller in the online learning process.

Updated: 2024-06-08 06:06:20

标题: 为具有风险感知特性的时序风险敏感线性二次调节器的后悔界限

摘要: 风险敏感线性二次调节器是风险敏感最优控制中最基本的问题之一。本文研究了在有限时间段分集设定下风险敏感线性二次调节器的在线自适应控制。我们提出了一个简单的最小二乘贪心算法，并证明在特定可识别性假设下，该算法在总分集数为N时实现了$\widetilde{\mathcal{O}}(\log N)$的后悔值。如果可识别性假设不成立，我们提出将探索噪声纳入基于最小二乘的算法，从而得到一个具有$\widetilde{\mathcal{O}}(\sqrt{N})$后悔值的算法。据我们所知，这是分集风险敏感线性二次调节器的首个后悔值界限集。我们的证明依赖于对风险敏感线性二次控制的不太标准的Riccati方程的扰动分析，以及对在在线学习过程中应用次优控制器导致的风险敏感性能准则损失的细致分析。

更新时间: 2024-06-08 06:06:20

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.05366v1

CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs. Larger LM responses that closely align with the smaller LMs' output, which relies exclusively on cited documents, are verified. Responses showing discrepancies are iteratively refined through a feedback loop. Experiments on three open-domain question-answering datasets demonstrate significant performance gains of 1.5% to 7% absolute average without any required model fine-tuning.

Updated: 2024-06-08 06:04:55

标题: CaLM: 对比大型和小型语言模型以验证基于实证的生成

摘要: Grounded generation旨在让语言模型（LMs）具备通过准确引用可验证来源来产生更可信和可靠回应的能力。然而，现有方法，无论是通过提供原始材料还是预处理材料来喂养LMs，仍然容易出现错误。为了解决这个问题，我们引入了CaLM，一个新颖的验证框架。CaLM利用这样的见解，即一个强大的grounded回应应该与仅从其引用的来源中得出的信息一致。我们的框架赋予了更小的LMs更多的权力，这些LMs更少依赖参数记忆，并擅长处理给定查询的相关信息，以验证更大的LMs的输出。与仅依赖引用文档的更小LMs的输出密切一致的更大的LMs的回应被验证。显示出差异的回应通过反馈循环进行迭代改进。对三个开放领域问答数据集的实验证明，无需进行任何模型微调，平均绝对性能提升了1.5%到7%。

更新时间: 2024-06-08 06:04:55

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05365v1

Graph Neural Network Training Systems: A Performance Comparison of Full-Graph and Mini-Batch

Graph Neural Networks (GNNs) have gained significant attention in recent years due to their ability to learn representations of graph structured data. Two common methods for training GNNs are mini-batch training and full-graph training. Since these two methods require different training pipelines and systems optimizations, two separate categories of GNN training systems emerged, each tailored for one method. Works that introduce systems belonging to a particular category predominantly compare them with other systems within the same category, offering limited or no comparison with systems from the other category. Some prior work also justifies its focus on one specific training method by arguing that it achieves higher accuracy than the alternative. The literature, however, has incomplete and contradictory evidence in this regard. In this paper, we provide a comprehensive empirical comparison of full-graph and mini-batch GNN training systems to get a clearer picture of the state of the art in the field. We find that the mini-batch training systems we consider consistently converge faster than the full-graph training ones across multiple datasets, GNN models, and system configurations, with speedups between 2.4x - 15.2x. We also find that both training techniques converge to similar accuracy values, so comparing systems across the two categories in terms of time-to-accuracy is a sound approach.

Updated: 2024-06-08 05:52:08

标题: 图神经网络训练系统：全图和小批量的性能比较

摘要: 图神经网络（GNNs）近年来受到了极大关注，因为它们能够学习图结构数据的表示。训练GNNs的两种常见方法是小批量训练和全图训练。由于这两种方法需要不同的训练流程和系统优化，因此出现了两个独立的GNN训练系统类别，分别针对不同的方法进行了定制。介绍属于特定类别的系统的作品主要将它们与同一类别内的其他系统进行比较，很少或根本没有与另一类别的系统进行比较。一些先前的研究也通过辩称其专注于一种特定训练方法来证明它比另一种方法具有更高的准确性。然而，文献在这方面存在不完整和矛盾的证据。在本文中，我们对全图和小批量GNN训练系统进行了全面的实证比较，以更清晰地了解该领域的最新情况。我们发现，我们考虑的小批量训练系统在多个数据集、GNN模型和系统配置下始终比全图训练系统收敛得更快，加速比在2.4倍至15.2倍之间。我们还发现，两种训练技术都会收敛到类似的准确度值，因此以时间达到准确度为基准比较两类系统是合理的。

更新时间: 2024-06-08 05:52:08

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.00552v2

Detecting Severity of Diabetic Retinopathy from Fundus Images: A Transformer Network-based Review

Diabetic Retinopathy (DR) is considered one of the significant concerns worldwide, primarily due to its impact on causing vision loss among most people with diabetes. The severity of DR is typically comprehended manually by ophthalmologists from fundus photography-based retina images. This paper deals with an automated understanding of the severity stages of DR. In the literature, researchers have focused on this automation using traditional machine learning-based algorithms and convolutional architectures. However, the past works hardly focused on essential parts of the retinal image to improve the model performance. In this study, we adopt and fine-tune transformer-based learning models to capture the crucial features of retinal images for a more nuanced understanding of DR severity. Additionally, we explore the effectiveness of image transformers to infer the degree of DR severity from fundus photographs. For experiments, we utilized the publicly available APTOS-2019 blindness detection dataset, where the performances of the transformer-based models were quite encouraging.

Updated: 2024-06-08 05:50:49

标题: 从眼底图像检测糖尿病视网膜病变的严重程度：基于Transformer网络的综述

摘要: 糖尿病视网膜病变（DR）被认为是全球一个重要的关注点，主要是因为它对大多数患有糖尿病的人造成视力丧失的影响。DR的严重程度通常由眼科医生从基于眼底照相的视网膜图像手动理解。本文涉及对DR严重程度阶段的自动理解。在文献中，研究人员已经集中精力使用传统的基于机器学习的算法和卷积架构来实现这种自动化。然而，过去的研究工作很少关注改进模型性能的视网膜图像的关键部分。在本研究中，我们采用并微调基于变压器的学习模型，以捕捉视网膜图像的关键特征，从而更加细致地理解DR严重程度。此外，我们探讨了图像变换器从眼底照片推断DR严重程度的有效性。在实验中，我们利用公开可用的APTOS-2019盲目检测数据集，其中基于变压器的模型的性能表现相当令人鼓舞。

更新时间: 2024-06-08 05:50:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2301.00973v2

An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval

CLIP4Clip model transferred from the CLIP has been the de-factor standard to solve the video clip retrieval task from frame-level input, triggering the surge of CLIP4Clip-based models in the video-text retrieval domain. In this work, we rethink the inherent limitation of widely-used mean pooling operation in the frame features aggregation and investigate the adaptions of excitation and aggregation design for discriminative video representation generation. We present a novel excitationand-aggregation design, including (1) The excitation module is available for capturing non-mutuallyexclusive relationships among frame features and achieving frame-wise features recalibration, and (2) The aggregation module is applied to learn exclusiveness used for frame representations aggregation. Similarly, we employ the cascade of sequential module and aggregation design to generate discriminative video representation in the sequential type. Besides, we adopt the excitation design in the tight type to obtain representative frame features for multi-modal interaction. The proposed modules are evaluated on three benchmark datasets of MSR-VTT, ActivityNet and DiDeMo, achieving MSR-VTT (43.9 R@1), ActivityNet (44.1 R@1) and DiDeMo (31.0 R@1). They outperform the CLIP4Clip results by +1.2% (+0.5%), +4.5% (+1.9%) and +9.5% (+2.7%) relative (absolute) improvements, demonstrating the superiority of our proposed excitation and aggregation designs. We hope our work will serve as an alternative for frame representations aggregation and facilitate future research.

Updated: 2024-06-08 05:49:24

标题: 一个关于在CLIP4Clip视频文本检索中激励和聚合设计改进的实证研究

摘要: 从CLIP转移的CLIP4Clip模型已成为解决视频剪辑检索任务的事实标准，引发了在视频文本检索领域中基于CLIP4Clip的模型激增。在这项工作中，我们重新思考了广泛使用的均值池化操作在帧特征汇聚中的固有局限，并研究了激励和聚合设计对于生成具有辨识性的视频表示的适应性。我们提出了一种新颖的激励和聚合设计，包括(1) 激励模块用于捕捉帧特征之间的非相互排斥关系，并实现帧级特征重新校准，以及(2) 聚合模块用于学习用于帧表示聚合的排斥性。类似地，我们采用了连续模块和聚合设计的级联来生成序贯类型的具有辨识性的视频表示。此外，我们采用了紧凑类型中的激励设计，以获取用于多模态交互的代表性帧特征。提出的模块在MSR-VTT、ActivityNet和DiDeMo三个基准数据集上进行评估，分别实现了MSR-VTT（43.9 R@1）、ActivityNet（44.1 R@1）和DiDeMo（31.0 R@1）的结果。它们相对于CLIP4Clip的结果表现出+1.2%（+0.5%）、+4.5%（+1.9%）和+9.5%（+2.7%）的相对（绝对）改进，证明了我们提出的激励和聚合设计的优越性。我们希望我们的工作能够作为帧表示聚合的另一种选择，并促进未来的研究。

更新时间: 2024-06-08 05:49:24

领域: cs.IR,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2406.01604v2

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

In this paper, we present a very first study to investigate trust and ethical implications of on-device artificial intelligence (AI), focusing on ''small'' language models (SLMs) amenable for personal devices like smartphones. While on-device SLMs promise enhanced privacy, reduced latency, and improved user experience compared to cloud-based services, we posit that they might also introduce significant challenges and vulnerabilities compared to on-server counterparts. As part of our trust assessment study, we conduct a systematic evaluation of the state-of-the-art on-devices SLMs, contrasted to their on-server counterparts, based on a well-established trustworthiness measurement framework. Our results show on-device SLMs to be (statistically) significantly less trustworthy, specifically demonstrating more stereotypical, unfair and privacy-breaching behavior. Informed by these findings, we then perform our ethics assessment study by inferring whether SLMs would provide responses to potentially unethical vanilla prompts, collated from prior jailbreaking and prompt engineering studies and other sources. Strikingly, the on-device SLMs did answer valid responses to these prompts, which ideally should be rejected. Even more seriously, the on-device SLMs responded with valid answers without any filters and without the need for any jailbreaking or prompt engineering. These responses can be abused for various harmful and unethical scenarios including: societal harm, illegal activities, hate, self-harm, exploitable phishing content and exploitable code, all of which indicates the high vulnerability and exploitability of these on-device SLMs. Overall, our findings highlight gaping vulnerabilities in state-of-the-art on-device AI which seem to stem from resource constraints faced by these models and which may make typical defenses fundamentally challenging to be deployed in these environments.

Updated: 2024-06-08 05:45:42

标题: 设备上的人工智能是否存在漏洞并可被利用？评估小型语言模型中的信任和伦理问题

摘要: 在这篇论文中，我们首次进行了一项研究，探讨了设备上人工智能（AI）的信任和道德影响，重点关注适用于个人设备如智能手机的“小型”语言模型（SLMs）。尽管设备上的SLMs承诺提供增强的隐私性、降低的延迟和改善用户体验，与基于云的服务相比，我们认为它们也可能引入与在服务器上的对应物相比显著的挑战和漏洞。作为我们信任评估研究的一部分，我们对最先进的设备上SLMs进行了系统评估，与它们在服务器上的对应物进行对比，基于一个成熟的信任度量框架。我们的结果显示，设备上的SLMs在信任度上显著较低，具体表现为更具刻板印象、不公平和侵犯隐私的行为。受这些发现启发，我们进行了道德评估研究，推断SLMs是否会对来自先前越狱和提示工程研究以及其他来源的潜在不道德提示提供回应。令人惊讶的是，设备上的SLMs对这些提示提供了有效的回应，而这些回应理想情况下应被拒绝。更严重的是，设备上的SLMs在没有任何过滤器、也无需任何越狱或提示工程的情况下，提供了有效的答案。这些回应可能被滥用于各种有害和不道德的情景，包括：社会伤害、非法活动、仇恨、自残、可利用的钓鱼内容和可利用的代码，所有这些都表明了这些设备上SLMs的高漏洞性和易受攻击性。总的来说，我们的发现突显了最先进的设备上AI存在明显漏洞，这些漏洞似乎源自这些模型面临的资源限制，使得在这些环境中部署典型的防御措施变得基本上具有挑战性。

更新时间: 2024-06-08 05:45:42

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.05364v1

Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where

While image data starts to enjoy the simple-but-effective self-supervised learning scheme built upon masking and self-reconstruction objective thanks to the introduction of tokenization procedure and vision transformer backbone, convolutional neural networks as another important and widely-adopted architecture for image data, though having contrastive-learning techniques to drive the self-supervised learning, still face the difficulty of leveraging such straightforward and general masking operation to benefit their learning process significantly. In this work, we aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks as an extra augmentation method. In addition to the additive but unwanted edges (between masked and unmasked regions) as well as other adverse effects caused by the masking operations for ConvNets, which have been discussed by prior works, we particularly identify the potential problem where for one view in a contrastive sample-pair the randomly-sampled masking regions could be overly concentrated on important/salient objects thus resulting in misleading contrastiveness to the other view. To this end, we propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background for realizing the masking-based augmentation. Moreover, we introduce hard negative samples by masking larger regions of salient patches in an input image. Extensive experiments conducted on various datasets, contrastive learning mechanisms, and downstream tasks well verify the efficacy as well as the superior performance of our proposed method with respect to several state-of-the-art baselines.

Updated: 2024-06-08 05:42:53

标题: 遮盖改善ConvNets的对比自监督学习，并且显著告诉您位置

摘要: 尽管图像数据开始享受基于遮罩和自重建目标的简单但有效的自监督学习方案，这要归功于引入了分词过程和视觉变换器骨干，卷积神经网络作为另一种重要且广泛采用的图像数据架构，尽管具有驱动自监督学习的对比学习技术，但仍面临利用这种简单直接且通用的遮罩操作明显受益于其学习过程的困难。在这项工作中，我们旨在减轻将遮罩操作纳入卷积神经网络对比学习框架作为额外增强方法的负担。除了卷积神经网络的遮罩操作所引起的不良效果（在遮罩和未遮罩区域之间产生不必要的边缘）以及其他负面影响，这些影响已经被先前的研究讨论过，我们特别识别出潜在问题，即在对比样本对中的一个视图中，随机抽样的遮罩区域可能过度集中在重要/显著的对象上，从而导致对另一个视图的误导性对比性。因此，我们建议明确考虑显著性约束，其中遮罩区域在前景和背景之间更均匀地分布，以实现基于遮罩的增强。此外，我们通过在输入图像中遮罩更大的显著区域引入硬负样本。在各种数据集、对比学习机制和下游任务上进行的广泛实验充分验证了我们提出的方法相对于几种最先进基线的有效性以及优越性能。

更新时间: 2024-06-08 05:42:53

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.12757v2

RAPID: Robust APT Detection and Investigation Using Context-Aware Deep Learning

Advanced persistent threats (APTs) pose significant challenges for organizations, leading to data breaches, financial losses, and reputational damage. Existing provenance-based approaches for APT detection often struggle with high false positive rates, a lack of interpretability, and an inability to adapt to evolving system behavior. We introduce RAPID, a novel deep learning-based method for robust APT detection and investigation, leveraging context-aware anomaly detection and alert tracing. By utilizing self-supervised sequence learning and iteratively learned embeddings, our approach effectively adapts to dynamic system behavior. The use of provenance tracing both enriches the alerts and enhances the detection capabilities of our approach. Our extensive evaluation demonstrates RAPID's effectiveness and computational efficiency in real-world scenarios. In addition, RAPID achieves higher precision and recall than state-of-the-art methods, significantly reducing false positives. RAPID integrates contextual information and facilitates a smooth transition from detection to investigation, providing security teams with detailed insights to efficiently address APT threats.

Updated: 2024-06-08 05:39:24

标题: 快速: 利用上下文感知深度学习进行强大的高级持久性威胁检测和调查

摘要: 高级持久性威胁（APTs）对组织构成重大挑战，可能导致数据泄露、财务损失和声誉受损。现有基于溯源的APTs检测方法通常面临高虚警率、缺乏可解释性和难以适应不断演化的系统行为的困境。我们介绍了RAPID，一种基于深度学习的新型方法，用于稳健的APTs检测和调查，利用上下文感知的异常检测和警报追踪。通过利用自监督序列学习和迭代学习的嵌入，我们的方法有效地适应动态系统行为。溯源追踪的使用丰富了警报，并增强了我们方法的检测能力。我们的广泛评估证明了RAPID在真实场景中的有效性和计算效率。此外，RAPID比最先进的方法实现了更高的精确度和召回率，显著降低了虚警率。RAPID集成了上下文信息，促进了从检测到调查的平稳过渡，为安全团队提供了详细的见解，以有效应对APTs威胁。

更新时间: 2024-06-08 05:39:24

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.05362v1

Improved Sample Complexity Bounds for Diffusion Model Training

Diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. From a theoretical standpoint, a number of recent works~\cite{chen2022,chen2022improved,benton2023linear} have studied the iteration complexity of sampling, assuming access to an accurate diffusion model. In this work, we focus on understanding the \emph{sample complexity} of training such a model; how many samples are needed to learn an accurate diffusion model using a sufficiently expressive neural network? Prior work~\cite{BMR20} showed bounds polynomial in the dimension, desired Total Variation error, and Wasserstein error. We show an \emph{exponential improvement} in the dependence on Wasserstein error and depth, along with improved dependencies on other relevant parameters.

Updated: 2024-06-08 05:34:29

标题: 改进的扩散模型训练样本复杂度界限

摘要: 扩散模型已成为图像深度生成建模的最流行方法，主要是由于其经验性能和可靠性。从理论的角度看，最近一些研究作品~\cite{chen2022,chen2022improved,benton2023linear}已经研究了在假设可以访问准确的扩散模型的情况下，采样的迭代复杂度。在这项工作中，我们关注理解训练这样一个模型的\emph{样本复杂度}；使用足够表达能力的神经网络学习准确的扩散模型需要多少样本？之前的研究~\cite{BMR20}表明，在维度、期望总变差误差和Wasserstein误差方面存在多项式界限。我们展示了对Wasserstein误差和深度依赖性的\emph{指数级改进}，以及对其他相关参数的改进依赖性。

更新时间: 2024-06-08 05:34:29

领域: cs.LG,cs.CV,cs.IT,math.IT,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2311.13745v2

Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management

Intensity control is a type of continuous-time dynamic optimization problems with many important applications in Operations Research including queueing and revenue management. In this study, we adapt the reinforcement learning framework to intensity control using choice-based network revenue management as a case study, which is a classical problem in revenue management that features a large state space, a large action space and a continuous time horizon. We show that by utilizing the inherent discretization of the sample paths created by the jump points, a unique and defining feature of intensity control, one does not need to discretize the time horizon in advance, which was believed to be necessary because most reinforcement learning algorithms are designed for discrete-time problems. As a result, the computation can be facilitated and the discretization error is significantly reduced. We lay the theoretical foundation for the Monte Carlo and temporal difference learning algorithms for policy evaluation and develop policy gradient based actor critic algorithms for intensity control. Via a comprehensive numerical study, we demonstrate the benefit of our approach versus other state-of-the-art benchmarks.

Updated: 2024-06-08 05:27:01

标题: 强化学习用于强度控制：一种基于选择的网络收益管理应用

摘要: 强度控制是一种连续时间动态优化问题，在运营研究中有许多重要应用，包括排队和收入管理。在这项研究中，我们将强化学习框架应用于强度控制，以选择性网络收入管理作为案例研究，这是收入管理中的经典问题，具有大的状态空间，大的动作空间和连续的时间范围。我们展示了通过利用由跳点创建的样本路径的固有离散化，这是强度控制的一个独特和定义性特征，一个不需要预先离散化时间范围的方法，因为大多数强化学习算法是为离散时间问题设计的。因此，计算可以得到简化，离散化误差显著减少。我们为蒙特卡洛和时间差分学习算法的策略评估奠定了理论基础，并为强度控制开发了基于策略梯度的演员评论算法。通过全面的数值研究，我们展示了我们的方法与其他最先进的基准的优势。

更新时间: 2024-06-08 05:27:01

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.05358v1

Do Androids Know They're Only Dreaming of Electric Sheep?

We design probes trained on the internal representations of a transformer language model to predict its hallucinatory behavior on three grounded generation tasks. To train the probes, we annotate for span-level hallucination on both sampled (organic) and manually edited (synthetic) reference outputs. Our probes are narrowly trained and we find that they are sensitive to their training domain: they generalize poorly from one task to another or from synthetic to organic hallucinations. However, on in-domain data, they can reliably detect hallucinations at many transformer layers, achieving 95% of their peak performance as early as layer 4. Here, probing proves accurate for evaluating hallucination, outperforming several contemporary baselines and even surpassing an expert human annotator in response-level detection F1. Similarly, on span-level labeling, probes are on par or better than the expert annotator on two out of three generation tasks. Overall, we find that probing is a feasible and efficient alternative to language model hallucination evaluation when model states are available.

Updated: 2024-06-08 05:15:57

标题: 安卓知道自己只是在梦见电子绵羊吗？

摘要: 我们设计了探针，通过对Transformer语言模型的内部表示进行训练，以预测其在三个基于生成的任务中的幻觉行为。为了训练这些探针，我们对采样（有机）和手动编辑（合成）的参考输出进行了跨度级别的幻觉注释。我们的探针受到其训练域的敏感性，发现它们在从一个任务到另一个任务或从合成到有机幻觉的泛化能力较差。然而，在领域内数据上，它们可以可靠地检测许多Transformer层的幻觉，早在第4层就达到了其性能峰值的95%。在这里，探测证明对于评估幻觉是准确的，在回应级别检测F1方面胜过了几种当代基线甚至超过了专家人工标注者。同样，在跨度级别标记方面，探针在三个生成任务中的两个上与专家标注者相当或更好。总的来说，我们发现，在模型状态可用时，探测是一种可行且高效的语言模型幻觉评估的替代方法。

更新时间: 2024-06-08 05:15:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.17249v2

Investigating Memory Failure Prediction Across CPU Architectures

Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs, yet they typically neglect how these errors vary between different CPU architectures, especially in terms of Error Correction Code (ECC) applicability. In this paper, we investigate the correlation between CEs and UEs across different CPU architectures, including X86 and ARM. Our analysis identifies unique patterns of memory failure associated with each processor platform. Leveraging Machine Learning (ML) techniques on production datasets, we conduct the memory failure prediction in different processors' platforms, achieving up to 15% improvements in F1-score compared to the existing algorithm. Finally, an MLOps (Machine Learning Operations) framework is provided to consistently improve the failure prediction in the production environment.

Updated: 2024-06-08 05:10:23

标题: 跨CPU架构研究内存故障预测

摘要: 大型数据中心经常出现内存故障，其中不可纠正的错误（UEs）突显双列直插内存模块（DIMMs）中的严重故障。现有方法主要利用可纠正错误（CEs）来预测UEs，但它们通常忽视了这些错误在不同CPU架构之间的变化，特别是在错误校正码（ECC）适用性方面。本文研究了不同CPU架构（包括X86和ARM）之间CEs和UEs之间的相关性。我们的分析确定了与每个处理器平台相关的内存故障的独特模式。利用机器学习（ML）技术对生产数据集进行内存故障预测，我们在不同处理器平台上实现了F1分数高达15％的改进，与现有算法相比。最后，提供了一个MLOps（机器学习运营）框架，以持续改进生产环境中的故障预测。

更新时间: 2024-06-08 05:10:23

领域: cs.AR,cs.AI,cs.DC

下载: http://arxiv.org/abs/2406.05354v1

A Survey on Efficient Inference for Large Language Models

Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards developing techniques aimed at enhancing the efficiency of LLM inference. This paper presents a comprehensive survey of the existing literature on efficient LLM inference. We start by analyzing the primary causes of the inefficient LLM inference, i.e., the large model size, the quadratic-complexity attention operation, and the auto-regressive decoding approach. Then, we introduce a comprehensive taxonomy that organizes the current literature into data-level, model-level, and system-level optimization. Moreover, the paper includes comparative experiments on representative methods within critical sub-fields to provide quantitative insights. Last but not least, we provide some knowledge summary and discuss future research directions.

Updated: 2024-06-08 04:49:51

标题: 大语言模型高效推理调查

摘要: 大型语言模型（LLMs）因其在各种任务中表现出色而受到广泛关注。然而，LLM推理的巨大计算和内存需求给在资源受限场景中部署带来挑战。该领域的努力主要集中在开发旨在提高LLM推理效率的技术上。本文对现有文献中关于高效LLM推理的全面调查进行了介绍。我们首先分析了导致LLM推理低效的主要原因，即大型模型大小、二次复杂度的注意力操作和自回归解码方法。然后，我们引入了一个全面的分类法，将当前文献分为数据级、模型级和系统级优化。此外，本文还对关键子领域内代表性方法进行了比较实验，以提供定量见解。最后，我们提供一些知识总结并讨论未来的研究方向。

更新时间: 2024-06-08 04:49:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.14294v2

On Computationally Efficient Multi-Class Calibration

Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in $k$, or needing to solve computationally intractable problems, or give rather weak guarantees. Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in $k$. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting.

Updated: 2024-06-08 04:27:46

标题: 关于计算效率高的多类别校准

摘要: 考虑一个多类别标记问题，其中标签可以在$[k]$中取值，预测器预测标签的分布。在这项工作中，我们研究以下基础问题：是否有多类别校准概念可以提供有意义的预测的强有力保证，并且可以在时间和样本复杂度多项式地实现$k$？先前的校准概念在计算效率和表达能力之间存在折衷：它们要么在样本复杂度上呈指数增长，要么需要解决计算难题，要么提供相当弱的保证。我们的主要贡献是一个校准概念，实现了所有这些愿望：我们为多类别预测制定了一个稳健的投影平滑校准概念，并为在这个定义下高效校准预测器提供了新的重新校准算法，其复杂度在$k$中是多项式的。投影平滑校准为所有希望将预测器用于二元分类问题的下游决策者提供了强有力的保证：标签是否属于子集$T \subseteq [k]$：例如，这是一张动物的图片吗？它确保通过将分配给$T$中标签的概率相加得到的概率与该任务的完全校准二元预测器接近。我们还展示了我们定义的自然加强版本在计算上很难实现：它们遇到信息论障碍或计算难题。我们证明我们的上下界之间的紧密联系，这是我们证明的多类别校准与在（标准）二元预测设置中广泛研究的对立学习问题之间的密切联系。

更新时间: 2024-06-08 04:27:46

领域: cs.LG,cs.CC,cs.DS,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2402.07821v2

k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text

Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semantic space with arbitrary hyperplanes, which results in a suboptimal tradeoff between robustness and speed. We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure. Experimental results indicate that k-SemStamp saliently improves its robustness and sampling efficiency while preserving the generation quality, advancing a more effective tool for machine-generated text detection.

Updated: 2024-06-08 04:24:27

标题: k-SemStamp：一种基于聚类的语义水印，用于检测机器生成文本

摘要: 最近的水印生成算法在语言生成过程中注入可检测的签名，以便在后期进行检测。虽然基于单词级别的水印容易受到释义攻击的影响，但SemStamp（Hou等人，2023年）在句子的语义表示上应用水印，并展示出良好的鲁棒性。SemStamp利用局部敏感哈希（LSH）来使用任意超平面对语义空间进行划分，这导致了在鲁棒性和速度之间的次优权衡。我们提出了k-SemStamp，这是对SemStamp的简单而有效的增强，利用k均值聚类作为LSH的替代方法来划分嵌入空间，并意识到内在的语义结构。实验结果表明，k-SemStamp显著提高了其鲁棒性和采样效率，同时保持了生成质量，推动了一种更有效的用于检测机器生成文本的工具。

更新时间: 2024-06-08 04:24:27

领域: cs.CL,cs.CR,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.11399v2

Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets

We explore the ability of GPT-4 to perform ad-hoc schema based information extraction from scientific literature. We assess specifically whether it can, with a basic prompting approach, replicate two existing material science datasets, given the manuscripts from which they were originally manually extracted. We employ materials scientists to perform a detailed manual error analysis to assess where the model struggles to faithfully extract the desired information, and draw on their insights to suggest research directions to address this broadly important task.

Updated: 2024-06-08 04:24:16

标题: 朝向可靠的自适应科学信息提取：基于两个材料数据集的案例研究

摘要: 我们探讨了GPT-4从科学文献中执行基于ad-hoc架构的信息提取的能力。我们具体评估了它是否可以通过基本提示方法复制两个现有的材料科学数据集，考虑到这些数据集最初是从手动提取的原始手稿中提取的。我们聘请材料科学家进行详细的手动错误分析，以评估模型在如何忠实地提取所需信息方面所遇到的困难，并借鉴他们的见解提出研究方向，以解决这一广泛重要的任务。

更新时间: 2024-06-08 04:24:16

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.05348v1

MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training

Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high quality MSA. Although various methods have been proposed to generate virtual MSA under these conditions, they fall short in comprehensively capturing the intricate coevolutionary patterns within MSA or require guidance from external oracle models. Here we introduce MSAGPT, a novel approach to prompt protein structure predictions via MSA generative pretraining in the low MSA regime. MSAGPT employs a simple yet effective 2D evolutionary positional encoding scheme to model complex evolutionary patterns. Endowed by this, its flexible 1D MSA decoding framework facilitates zero or few shot learning. Moreover, we demonstrate that leveraging the feedback from AlphaFold2 can further enhance the model capacity via Rejective Fine tuning (RFT) and Reinforcement Learning from AF2 Feedback (RLAF). Extensive experiments confirm the efficacy of MSAGPT in generating faithful virtual MSA to enhance the structure prediction accuracy. The transfer learning capabilities also highlight its great potential for facilitating other protein tasks.

Updated: 2024-06-08 04:23:57

标题: MSAGPT：通过MSA生成预训练的神经提示蛋白结构预测

摘要: 多序列比对（MSA）在揭示蛋白家族的进化轨迹中发挥着关键作用。蛋白结构预测的准确性通常会因缺乏足够同源信息以构建高质量MSA的蛋白序列而受损。虽然已提出各种方法在这些条件下生成虚拟MSA，但它们在全面捕捉MSA内复杂的共同进化模式或需要外部oracle模型的指导方面存在不足。在这里，我们介绍了MSAGPT，一种通过低MSA范围中的MSA生成预训练来促进蛋白结构预测的新方法。MSAGPT采用简单而有效的2D进化位置编码方案来建模复杂的进化模式。通过这种方式，其灵活的1D MSA解码框架促进了零或少量样本学习。此外，我们证明利用来自AlphaFold2的反馈可以通过拒绝性微调（RFT）和从AF2反馈中的强化学习（RLAF）进一步增强模型容量。广泛的实验证实了MSAGPT在生成忠实的虚拟MSA以提高结构预测准确性方面的功效。迁移学习能力也突显了它在促进其他蛋白任务方面的巨大潜力。

更新时间: 2024-06-08 04:23:57

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05347v1

Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning

Audiovisual representation learning typically relies on the correspondence between sight and sound. However, there are often multiple audio tracks that can correspond with a visual scene. Consider, for example, different conversations on the same crowded street. The effect of such counterfactual pairs on audiovisual representation learning has not been previously explored. To investigate this, we use dubbed versions of movies and television shows to augment cross-modal contrastive learning. Our approach learns to represent alternate audio tracks, differing only in speech, similarly to the same video. Our results, from a comprehensive set of experiments investigating different training strategies, show this general approach improves performance on a range of downstream auditory and audiovisual tasks, without majorly affecting linguistic task performance overall. These findings highlight the importance of considering speech variation when learning scene-level audiovisual correspondences and suggest that dubbed audio can be a useful augmentation technique for training audiovisual models toward more robust performance on diverse downstream tasks.

Updated: 2024-06-08 04:19:06

标题: 寻找相似之处，听起来不同：利用反事实的跨模态对音频视觉表示学习

摘要: 视听表征学习通常依赖于视觉和声音之间的对应关系。然而，通常存在多个音轨可以与视觉场景相对应。例如，考虑同一拥挤街道上的不同对话。这种事实对视听表征学习的影响以前尚未被探讨。为了调查这一点，我们使用电影和电视节目的配音版本来增强跨模态对比学习。我们的方法学习代表仅在语音上有所不同的替代音轨，类似于相同的视频。我们的结果来自一系列全面的实验，研究不同的训练策略，表明这种一般方法改善了各种下游听觉和视听任务的表现，而不会主要影响整体语言任务的表现。这些发现强调了在学习场景级别的视听对应关系时考虑语音变化的重要性，并建议配音音频可以是训练视听模型朝向更强大性能的多样化下游任务的有用增强技术。

更新时间: 2024-06-08 04:19:06

领域: cs.SD,cs.CV,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2304.05600v2

Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture

Making neural networks remember over the long term has been a longstanding issue. Although several external memory techniques have been introduced, most focus on retaining recent information in the short term. Regardless of its importance, information tends to be fatefully forgotten over time. We present Memoria, a memory system for artificial neural networks, drawing inspiration from humans and applying various neuroscientific and psychological theories. The experimental results prove the effectiveness of Memoria in the diverse tasks of sorting, language modeling, and classification, surpassing conventional techniques. Engram analysis reveals that Memoria exhibits the primacy, recency, and temporal contiguity effects which are characteristics of human memory.

Updated: 2024-06-08 04:17:55

标题: 记忆：通过人类启发的记忆架构解决命运性遗忘问题

摘要: 长期记忆神经网络一直是一个长期存在的问题。尽管引入了几种外部记忆技术，但大多数都集中在短期内保留最近的信息上。尽管信息的重要性，但随着时间的推移，信息往往会被命运般地遗忘。我们提出了Memoria，这是一个为人工神经网络设计的记忆系统，灵感来源于人类，并应用了各种神经科学和心理学理论。实验结果证明了Memoria在排序、语言建模和分类等多种任务中的有效性，超越了传统技术。Engram分析表明，Memoria表现出人类记忆的首要性、最近性和时间上的连续性效应。

更新时间: 2024-06-08 04:17:55

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2310.03052v3

ProG: A Graph Prompt Learning Benchmark

Artificial general intelligence on graphs has shown significant advancements across various applications, yet the traditional 'Pre-train & Fine-tune' paradigm faces inefficiencies and negative transfer issues, particularly in complex and few-shot settings. Graph prompt learning emerges as a promising alternative, leveraging lightweight prompts to manipulate data and fill the task gap by reformulating downstream tasks to the pretext. However, several critical challenges still remain: how to unify diverse graph prompt models, how to evaluate the quality of graph prompts, and to improve their usability for practical comparisons and selection. In response to these challenges, we introduce the first comprehensive benchmark for graph prompt learning. Our benchmark integrates SIX pre-training methods and FIVE state-of-the-art graph prompt techniques, evaluated across FIFTEEN diverse datasets to assess performance, flexibility, and efficiency. We also present 'ProG', an easy-to-use open-source library that streamlines the execution of various graph prompt models, facilitating objective evaluations. Additionally, we propose a unified framework that categorizes existing graph prompt methods into two main approaches: prompts as graphs and prompts as tokens. This framework enhances the applicability and comparison of graph prompt techniques. The code is available at: https://github.com/sheldonresearch/ProG.

Updated: 2024-06-08 04:17:48

标题: ProG：一个图形提示学习基准

摘要: 在图上的人工通用智能已经在各种应用中取得了显著进展，然而传统的“预训练和微调”范式在复杂和少样本情境下存在低效和负迁移问题。图提示学习作为一种有前途的替代方案出现，利用轻量级提示来操纵数据，并通过重新构造下游任务到前提来填补任务间隙。然而，仍然存在一些关键挑战：如何统一不同的图提示模型，如何评估图提示的质量，以及如何提高它们的实用性以进行实际比较和选择。针对这些挑战，我们引入了第一个全面的图提示学习基准。我们的基准集成了六种预训练方法和五种最先进的图提示技术，评估了十五个不同的数据集以评估性能、灵活性和效率。我们还提出了“ProG”，一个易于使用的开源库，简化了各种图提示模型的执行，促进客观评估。此外，我们提出了一个统一的框架，将现有的图提示方法分类为两种主要方法：提示作为图和提示作为标记。该框架增强了图提示技术的适用性和比较。代码可在以下链接找到：https://github.com/sheldonresearch/ProG。

更新时间: 2024-06-08 04:17:48

领域: cs.LG

下载: http://arxiv.org/abs/2406.05346v1

Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds

Clinical reasoning refers to the cognitive process that physicians employ in evaluating and managing patients. This process typically involves suggesting necessary examinations, diagnosing patients' diseases, and deciding on appropriate therapies, etc. Accurate clinical reasoning requires extensive medical knowledge and rich clinical experience, setting a high bar for physicians. This is particularly challenging in developing countries due to the overwhelming number of patients and limited physician resources, contributing significantly to global health inequity and necessitating automated clinical reasoning approaches. Recently, the emergence of large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated their potential in clinical reasoning. However, these LLMs are prone to hallucination problems, and the reasoning process of LLMs may not align with the clinical decision path of physicians. In this study, we introduce a novel framework, In-Context Padding (ICP), designed to enhance LLMs with medical knowledge. Specifically, we infer critical clinical reasoning elements (referred to as knowledge seeds) and use these as anchors to guide the generation process of LLMs. Experiments on two clinical question datasets demonstrate that ICP significantly improves the clinical reasoning ability of LLMs.

Updated: 2024-06-08 04:14:46

标题: 用知识种子指导大型语言模型的临床推理

摘要: 临床推理是指医生在评估和管理患者时采用的认知过程。这个过程通常涉及建议必要的检查、诊断患者的疾病和决定适当的治疗等。准确的临床推理需要丰富的医学知识和丰富的临床经验，为医生设定了一个很高的标准。这在发展中国家尤其具有挑战性，因为患者数量庞大，医疗资源有限，这对全球卫生不平等产生了重大影响，并需要自动化的临床推理方法。最近，大型语言模型（LLMs）如ChatGPT和GPT-4的出现展示了它们在临床推理中的潜力。然而，这些LLMs容易出现幻觉问题，而且LLMs的推理过程可能与医生的临床决策路径不一致。在本研究中，我们引入了一种新的框架，称为上下文填充（ICP），旨在增强LLMs的医学知识。具体而言，我们推断出关键的临床推理元素（称为知识种子），并将其作为锚点来引导LLMs的生成过程。对两个临床问题数据集的实验表明，ICP显著提高了LLMs的临床推理能力。

更新时间: 2024-06-08 04:14:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.06609v2

The Bayesian Learning Rule

We show that many machine-learning algorithms are specific instances of a single algorithm called the \emph{Bayesian learning rule}. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.

Updated: 2024-06-08 04:07:38

标题: 贝叶斯学习规则

摘要: 我们展示了许多机器学习算法是一个称为贝叶斯学习规则的单一算法的特定实例。这一规则源自贝叶斯原理，可以从优化、深度学习和图形模型等领域得出各种算法。这包括经典算法如岭回归、牛顿法和卡尔曼滤波器，以及现代深度学习算法如随机梯度下降、RMSprop和Dropout。导出这些算法的关键思想是利用自然梯度估计的候选分布来近似后验分布。不同的候选分布会导致不同的算法，对自然梯度的进一步近似会产生这些算法的变体。我们的工作不仅统一、泛化和改进了现有算法，还帮助我们设计新的算法。

更新时间: 2024-06-08 04:07:38

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2107.04562v4

M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed cognitive science to understand the intelligence of MLLMs beyond superficial achievements remains largely unexplored. To this end, we introduce the first cognitive-driven multi-lingual and multi-modal benchmark to evaluate the general intelligence ability of MLLMs, dubbed M3GIA. Specifically, we identify five key cognitive factors based on the well-recognized Cattell-Horn-Carrol (CHC) model of intelligence and propose a novel evaluation metric. In addition, since most MLLMs are trained to perform in different languages, a natural question arises: is language a key factor influencing the cognitive ability of MLLMs? As such, we go beyond English to encompass other languages based on their popularity, including Chinese, French, Spanish, Portuguese and Korean, to construct our M3GIA. We make sure all the data relevant to the cultural backgrounds are collected from their native context to avoid English-centric bias. We collected a significant corpus of data from human participants, revealing that the most advanced MLLM reaches the lower boundary of human intelligence in English. Yet, there remains a pronounced disparity in the other five languages assessed. We also reveals an interesting winner takes all phenomenon that are aligned with the discovery in cognitive studies. Our benchmark will be open-sourced, with the aspiration of facilitating the enhancement of cognitive capabilities in MLLMs.

Updated: 2024-06-08 04:07:09

标题: M3GIA：一种启发认知的多语言和多模态通用智能能力基准

摘要: 随着最近的多模态大型语言模型（MLLMs）在各种复杂任务上展现出强大的能力，人们越来越关注讨论这些模型是否最终能够模拟人类智能。然而，现有的基准主要集中在评估任务性能，比如识别对象属性的准确性。将发展完善的认知科学与了解MLLMs的智能能力超越表面成就的结合仍然是一个未被充分探讨的领域。为此，我们引入第一个认知驱动的多语言和多模态基准，用于评估MLLMs的普遍智能能力，命名为M3GIA。具体来说，我们基于公认的Cattell-Horn-Carrol（CHC）智能模型确定了五个关键认知因素，并提出了一种新颖的评估指标。此外，由于大多数MLLMs是在不同语言中进行训练的，一个自然的问题是：语言是否是影响MLLMs认知能力的关键因素？因此，我们超越英语，根据它们的流行程度涵盖其他语言，包括中文、法文、西班牙文、葡萄牙文和韩文，构建我们的M3GIA。我们确保所有与文化背景相关的数据都是从其本土环境收集的，以避免英语中心偏见。我们从参与者收集了大量数据，揭示了最先进的MLLM在英语中达到人类智能的下限。然而，在其他五种语言中仍存在明显的不平衡。我们还揭示了一个与认知研究中的发现相一致的有趣的“胜者通吃”现象。我们的基准将开源，旨在促进MLLMs认知能力的提升。

更新时间: 2024-06-08 04:07:09

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05343v1

Improving Adversarial Energy-Based Model via Diffusion Process

Generative models have shown strong generation ability while efficient likelihood estimation is less explored. Energy-based models~(EBMs) define a flexible energy function to parameterize unnormalized densities efficiently but are notorious for being difficult to train. Adversarial EBMs introduce a generator to form a minimax training game to avoid expensive MCMC sampling used in traditional EBMs, but a noticeable gap between adversarial EBMs and other strong generative models still exists. Inspired by diffusion-based models, we embedded EBMs into each denoising step to split a long-generated process into several smaller steps. Besides, we employ a symmetric Jeffrey divergence and introduce a variational posterior distribution for the generator's training to address the main challenges that exist in adversarial EBMs. Our experiments show significant improvement in generation compared to existing adversarial EBMs, while also providing a useful energy function for efficient density estimation.

Updated: 2024-06-08 04:05:49

标题: 通过扩散过程改进对抗性基于能量的模型

摘要: 生成模型展示了强大的生成能力，但有效的似然估计却较少被探索。基于能量的模型（EBMs）定义了一个灵活的能量函数，以有效地参数化未标准化的密度，但以难以训练而臭名昭著。对抗性EBMs引入了一个生成器来形成一个最小最大训练游戏，以避免传统EBMs中使用的昂贵的MCMC采样，但对抗性EBMs与其他强大的生成模型之间仍存在明显差距。受扩散模型的启发，我们将EBMs嵌入到每个去噪步骤中，将一个长时间生成的过程分割成几个较小的步骤。此外，我们采用对称的Jeffrey散度，并引入一个变分后验分布用于生成器的训练，以解决对抗性EBMs中存在的主要挑战。我们的实验显示，与现有的对抗性EBMs相比，在生成方面取得了显著的改进，同时也为有效的密度估计提供了有用的能量函数。

更新时间: 2024-06-08 04:05:49

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.01666v2

DeepEdit: Knowledge Editing as Decoding with Constraints

Answering multi-hop questions involving new knowledge is a challenging task in evaluating large language models' (LLMs) knowledge editing (KE) methods. This task is rather difficult because the LLMs' hallucinations on new knowledge would harm the logical coherence of LLMs' multi-hop reasoning and lead to incorrect answers. To address this issue, we design decoding constraints to "regulate" LLMs' reasoning, enhancing logical coherence when incorporating new knowledge. We incorporate the constraints into a new KE framework: DEEPEDIT (Depth-first Search-based Constrained Decoding for Knowledge Editing), which enhances LLMs to generate coherent reasoning chains with new knowledge through a depth-first search. Our search selects the most important knowledge that satisfies our constraints as the reasoning step to efficiently increase the reasoning depth. In addition to DEEPEDIT, we propose two new KE benchmarks: MQUAKE-2002 and MQUAKE-HARD, which provide more precise and challenging assessments of KE approaches. Qualitatively, DEEPEDIT enables LLMs to produce succinct and coherent reasoning chains involving new knowledge. Quantitatively, it yields significant improvements on multiple KE benchmarks.

Updated: 2024-06-08 03:47:03

标题: DeepEdit: 使用约束进行知识编辑的解码

摘要: 回答涉及新知识的多跳问题是评估大型语言模型（LLMs）知识编辑（KE）方法的挑战性任务。这项任务相当困难，因为LLMs对新知识的幻觉会损害LLMs的多跳推理的逻辑连贯性，导致错误答案。为了解决这个问题，我们设计解码约束来“规范”LLMs的推理，增强逻辑连贯性并整合新知识。我们将这些约束引入到一个新的KE框架中：DEEPEDIT（基于深度优先搜索的知识编辑约束解码），通过深度优先搜索增强LLMs生成具有新知识的连贯推理链。我们的搜索选择满足约束的最重要知识作为推理步骤，有效地增加推理深度。除了DEEPEDIT，我们提出了两个新的KE基准：MQUAKE-2002和MQUAKE-HARD，提供更精确和具有挑战性的KE方法评估。在质量上，DEEPEDIT使LLMs能够生成包含新知识的简洁连贯推理链。在数量上，它在多个KE基准上实现了显著改进。

更新时间: 2024-06-08 03:47:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.10471v3

To what extent can ASV systems naturally defend against spoofing attacks?

The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically exploring diverse ASV systems and spoofing attacks, ranging from traditional to cutting-edge techniques. Through extensive analyses conducted on eight distinct ASV systems and 29 spoofing attack systems, we demonstrate that the evolution of ASV inherently incorporates defense mechanisms against spoofing attacks. Nevertheless, our findings also underscore that the advancement of spoofing attacks far outpaces that of ASV systems, hence necessitating further research on spoofing-robust ASV methodologies.

Updated: 2024-06-08 03:44:39

标题: ASV系统能够自然地防御欺骗攻击到什么程度？

摘要: 目前的自动说话者验证（ASV）任务涉及对两种类型的试验进行二元决策：目标和非目标。然而，语音生成技术的新兴进展对ASV系统的可靠性构成重大威胁。本研究通过系统地探索各种ASV系统和欺骗攻击，从传统到尖端技术，调查了ASV是否轻松地对抗欺骗攻击（即零射击能力）。通过对八种不同ASV系统和29种欺骗攻击系统进行广泛分析，我们表明ASV的发展本质上包含对抗欺骗攻击的防御机制。然而，我们的研究结果也强调欺骗攻击的进步远远超过ASV系统的进步，因此需要进一步研究欺骗攻击鲁棒的ASV方法论。

更新时间: 2024-06-08 03:44:39

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2406.05339v1

PRSA: PRompt Stealing Attacks against Large Language Models

In recent years, "prompt as a service" has greatly enhanced the utility of large language models (LLMs) by enabling them to perform various downstream tasks efficiently without fine-tuning. This has also increased the commercial value of prompts. However, the potential risk of leakage in these commercialized prompts remains largely underexplored. In this paper, we introduce a novel attack framework, PRSA, designed for prompt stealing attacks against LLMs. The main idea of PRSA is to infer the intent behind a prompt by analyzing its input-output content, enabling the generation of a surrogate prompt that replicates the original's functionality. Specifically, PRSA mainly consists of two key phases: prompt mutation and prompt pruning. In the mutation phase, we propose a prompt attention algorithm based on output difference. The algorithm facilitates the generation of effective surrogate prompts by learning key factors that influence the accurate inference of prompt intent. During the pruning phase, we employ a two-step related word identification strategy to detect and mask words that are highly related to the input, thus improving the generalizability of the surrogate prompts. We verify the actual threat of PRSA through evaluation in both real-world settings, non-interactive and interactive prompt services. The results strongly confirm the PRSA's effectiveness and generalizability. We have reported these findings to prompt service providers and actively collaborate with them to implement defensive measures.

Updated: 2024-06-08 03:43:12

标题: PRSA：大型语言模型的PRompt Stealing 攻击

摘要: 近年来，“prompt as a service”大大增强了大型语言模型（LLMs）的实用性，使它们能够在不经过微调的情况下有效地执行各种下游任务。这也提高了prompt的商业价值。然而，这些商业化prompt中潜在的泄漏风险仍然未被充分探讨。在本文中，我们介绍了一种新的攻击框架PRSA，旨在对抗LLMs的prompt窃取攻击。PRSA的主要思想是通过分析输入输出内容推断prompt背后的意图，从而生成一个复制原始功能的替代prompt。具体来说，PRSA主要包括两个关键阶段：prompt变异和prompt修剪。在变异阶段，我们提出了一种基于输出差异的prompt注意力算法。该算法通过学习影响prompt意图准确推断的关键因素，促进了有效替代prompt的生成。在修剪阶段，我们采用了一个两步相关词识别策略来检测和屏蔽与输入高度相关的单词，从而提高替代prompt的泛化能力。我们通过在真实世界设置中的评估，包括非交互和交互式prompt服务，验证了PRSA的实际威胁。结果强烈证实了PRSA的有效性和泛化能力。我们已将这些发现报告给prompt服务提供商，并积极与他们合作实施防御措施。

更新时间: 2024-06-08 03:43:12

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2402.19200v2

Critical Phase Transition in a Large Language Model

The performance of large language models (LLMs) strongly depends on the \textit{temperature} parameter. Empirically, at very low temperatures, LLMs generate sentences with clear repetitive structures, while at very high temperatures, generated sentences are often incomprehensible. In this study, using GPT-2, we numerically demonstrate that the difference between the two regimes is not just a smooth change but a phase transition with singular, divergent statistical quantities. Our extensive analysis shows that critical behaviors, such as a power-law decay of correlation in a text, emerge in the LLM at the transition temperature as well as in a natural language dataset. We also discuss that several statistical quantities characterizing the criticality should be useful to evaluate the performance of LLMs.

Updated: 2024-06-08 03:37:05

标题: 一个大型语言模型中的临界相变

摘要: 大型语言模型（LLMs）的性能在很大程度上取决于“温度”参数。根据经验，在非常低的温度下，LLMs生成具有明显重复结构的句子，而在非常高的温度下，生成的句子通常是难以理解的。在这项研究中，我们使用GPT-2，通过数值方法证明了这两种情况之间的差异不仅仅是一个平滑的变化，而是具有奇异、发散的统计量的相变。我们的广泛分析显示，在过渡温度下，LLM中出现了临界行为，比如文本中的相关性的幂律衰减，这种现象也出现在自然语言数据集中。我们还讨论了几种表征临界性的统计量可能对评估LLM性能有用。

更新时间: 2024-06-08 03:37:05

领域: cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2406.05335v1

Sparse is Enough in Fine-tuning Pre-trained Large Language Models

With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed for low-cost adaptation. Although PEFT has demonstrated effectiveness and been widely applied, the underlying principles are still unclear. In this paper, we adopt the PAC-Bayesian generalization error bound, viewing pre-training as a shift of prior distribution which leads to a tighter bound for generalization error. We validate this shift from the perspectives of oscillations in the loss landscape and the quasi-sparsity in gradient distribution. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at https://github.com/song-wx/SIFT/.

Updated: 2024-06-08 03:29:17

标题: 稀疏性在微调预训练大型语言模型中足够

摘要: 随着预训练-微调范式的普及，如何有效地将预训练模型适应到下游任务已成为一个引人注目的问题。为了实现低成本适应，已经提出了参数高效微调（PEFT）方法。尽管PEFT已经证明了其有效性并被广泛应用，但其基本原理仍不清楚。在本文中，我们采用了PAC-Bayesian泛化误差界限，将预训练视为先验分布的偏移，从而导致更严格的泛化误差界限。我们从损失景观中的振荡和梯度分布中的准稀疏性的角度验证了这种偏移。基于此，我们提出了一种基于梯度的稀疏微调算法，命名为Sparse Increment Fine-Tuning（SIFT），并验证了其在GLUE基准和指令微调等一系列任务中的有效性。代码可在https://github.com/song-wx/SIFT/上获取。

更新时间: 2024-06-08 03:29:17

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2312.11875v3

Transformer Conformal Prediction for Time Series

We present a conformal prediction method for time series using the Transformer architecture to capture long-memory and long-range dependencies. Specifically, we use the Transformer decoder as a conditional quantile estimator to predict the quantiles of prediction residuals, which are used to estimate the prediction interval. We hypothesize that the Transformer decoder benefits the estimation of the prediction interval by learning temporal dependencies across past prediction residuals. Our comprehensive experiments using simulated and real data empirically demonstrate the superiority of the proposed method compared to the existing state-of-the-art conformal prediction methods.

Updated: 2024-06-08 03:17:48

标题: 变形器在时间序列中的符合预测

摘要: 我们提出了一种使用Transformer架构的时间序列符合预测方法，以捕捉长期记忆和长程依赖关系。具体来说，我们使用Transformer解码器作为条件分位数估计器，以预测预测残差的分位数，这些分位数用于估计预测区间。我们假设Transformer解码器通过学习过去预测残差之间的时间依赖性，有助于估计预测区间。我们使用模拟和真实数据进行了全面实验，实证地证明了所提方法相对于现有最先进的符合预测方法的优越性。

更新时间: 2024-06-08 03:17:48

领域: cs.LG

下载: http://arxiv.org/abs/2406.05332v1

Hidden Question Representations Tell Non-Factuality Within and Across Large Language Models

Despite the remarkable advance of large language models (LLMs), the prevalence of non-factual responses remains a common issue. This work studies non-factuality prediction (NFP), which predicts whether an LLM will generate non-factual responses to a question before the generation process. Previous efforts on NFP usually rely on extensive computation. In this work, we conduct extensive analysis to explore the capabilities of using a lightweight probe to elicit ``whether an LLM knows'' from the hidden representations of questions. Additionally, we discover that the non-factuality probe employs similar patterns for NFP across multiple LLMs. Motivated by the intriguing finding, we conduct effective transfer learning for cross-LLM NFP and propose a question-aligned strategy to ensure the efficacy of mini-batch based training.

Updated: 2024-06-08 02:59:52

标题: 隐藏问题表示法揭示大型语言模型中的非事实性信息

摘要: 尽管大型语言模型（LLMs）取得了显著进展，但非事实性响应的普遍存在仍然是一个常见问题。本研究探讨了非事实性预测（NFP），即在生成过程之前预测LLM是否会对问题生成非事实性响应。先前的NFP工作通常依赖于大量的计算。在这项工作中，我们进行了广泛的分析，探讨使用轻量级探针从问题的隐藏表示中引出“LLM是否知道”的能力。另外，我们发现非事实性探针在多个LLMs中跨NFP时采用了相似的模式。受这一有趣发现的启发，我们进行了有效的跨LLM NFP迁移学习，并提出了一种问题对齐策略，以确保基于小批量的训练的有效性。

更新时间: 2024-06-08 02:59:52

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05328v1

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

Stochastic differential equations (SDEs) have been shown recently to characterize well the dynamics of training machine learning models with SGD. When the generalization error of the SDE approximation closely aligns with that of SGD in expectation, it provides two opportunities for understanding better the generalization behaviour of SGD through its SDE approximation. Firstly, viewing SGD as full-batch gradient descent with Gaussian gradient noise allows us to obtain trajectory-based generalization bound using the information-theoretic bound from Xu and Raginsky [2017]. Secondly, assuming mild conditions, we estimate the steady-state weight distribution of SDE and use information-theoretic bounds from Xu and Raginsky [2017] and Negrea et al. [2019] to establish terminal-state-based generalization bounds. Our proposed bounds have some advantages, notably the trajectory-based bound outperforms results in Wang and Mao [2022], and the terminal-state-based bound exhibits a fast decay rate comparable to stability-based bounds.

Updated: 2024-06-08 02:48:47

标题: 一个信息论视角下SDE的两个方面：通过训练轨迹和终端状态推广SGD

摘要: 随机微分方程（SDEs）最近被证明能很好地描述使用随机梯度下降（SGD）训练机器学习模型的动态特性。当SDE近似的泛化误差与SGD的期望值密切一致时，提供了两种机会更好地理解SGD的泛化行为。首先，将SGD视为带有高斯梯度噪声的全批量梯度下降，使我们能够利用来自Xu和Raginsky[2017]的信息论界限获得基于轨迹的泛化界限。其次，在假设温和的条件下，我们估计SDE的稳态权重分布，并利用Xu和Raginsky[2017]以及Negrea等人[2019]的信息论界限建立基于终端状态的泛化界限。我们提出的界限具有一些优势，特别是基于轨迹的界限优于Wang和Mao[2022]的结果，而基于终端状态的界限显示出与基于稳定性界限相当的快速衰减率。

更新时间: 2024-06-08 02:48:47

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2211.10691v2

Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios

There is increasing interest in distilling task-specific knowledge from large language models (LLM) to smaller student models. Nonetheless, LLM distillation presents a dual challenge: 1) there is a high cost associated with querying the teacher LLM, such as GPT-4, for gathering an ample number of demonstrations; 2) the teacher LLM might provide imperfect outputs with a negative impact on the student's learning process. To enhance sample efficiency within resource-constrained, imperfect teacher scenarios, we propose a three-component framework leveraging three signal types. The first signal is the student's self-consistency (consistency of student multiple outputs), which is a proxy of the student's confidence. Specifically, we introduce a ``teaching assistant'' (TA) model to assess the uncertainty of both the student's and the teacher's outputs via confidence scoring, which serves as another two signals for student training. Furthermore, we propose a two-stage training schema to first warm up the student with a small proportion of data to better utilize student's signal. Experiments have shown the superiority of our proposed framework for four complex reasoning tasks. On average, our proposed two-stage framework brings a relative improvement of up to 20.79% compared to fine-tuning without any signals across datasets.

Updated: 2024-06-08 02:17:43

标题: “循环中的助教：在低预算情况下改进从不完美教师模型中的知识蒸馏”

摘要: 越来越多的人对从大型语言模型（LLM）中提取任务特定知识到较小的学生模型感兴趣。然而，LLM蒸馏面临双重挑战：1）向教师LLM（如GPT-4）查询成本高昂，需要收集足够数量的示范；2）教师LLM可能提供不完美的输出，对学生的学习过程产生负面影响。为了增强在资源受限、教师不完美情况下的样本效率，我们提出了一个利用三种信号类型的三组件框架。第一个信号是学生的自一致性（多个输出的一致性），这是学生信心的代理。具体来说，我们引入了一个“教学助理”（TA）模型，通过置信度评分评估学生和教师的输出的不确定性，这也作为学生训练的另外两个信号。此外，我们提出了一个两阶段训练模式，首先用一小部分数据热身学生，以更好地利用学生的信号。实验结果显示，相较于没有任何信号的微调，我们提出的两阶段框架在四个复杂推理任务中表现卓越。平均而言，我们提出的两阶段框架相对改善率高达20.79%。

更新时间: 2024-06-08 02:17:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05322v1

Deep Neural Networks are Adaptive to Function Regularity and Data Distribution in Approximation and Estimation

Deep learning has exhibited remarkable results across diverse areas. To understand its success, substantial research has been directed towards its theoretical foundations. Nevertheless, the majority of these studies examine how well deep neural networks can model functions with uniform regularity. In this paper, we explore a different angle: how deep neural networks can adapt to different regularity in functions across different locations and scales and nonuniform data distributions. More precisely, we focus on a broad class of functions defined by nonlinear tree-based approximation. This class encompasses a range of function types, such as functions with uniform regularity and discontinuous functions. We develop nonparametric approximation and estimation theories for this function class using deep ReLU networks. Our results show that deep neural networks are adaptive to different regularity of functions and nonuniform data distributions at different locations and scales. We apply our results to several function classes, and derive the corresponding approximation and generalization errors. The validity of our results is demonstrated through numerical experiments.

Updated: 2024-06-08 02:01:50

标题: 深度神经网络对函数规律性和数据分布在逼近和估计中具有适应性

摘要: 深度学习在各个领域展现出了显著的成果。为了理解其成功，大量的研究已经致力于其理论基础。然而，大多数研究都是关于深度神经网络如何能够很好地模拟具有均匀规则性的函数。本文探讨了一个不同的角度：深度神经网络如何能够适应不同位置和尺度以及非均匀数据分布中函数的不同规则性。更具体地说，我们关注一类由非线性基于树的逼近定义的函数。这个类别包含一系列函数类型，如具有均匀规则性和不连续函数。我们使用深层ReLU网络开发了这个函数类别的非参数逼近和估计理论。我们的结果表明，深度神经网络能够适应不同位置和尺度处函数的不同规则性和非均匀数据分布。我们将我们的结果应用于几个函数类别，并推导相应的逼近和泛化误差。我们通过数值实验验证了我们结果的有效性。

更新时间: 2024-06-08 02:01:50

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.05320v1

From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards

Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging safe reinforcement learning from human feedback, multiple concerns regarding the safety and ingrained biases in these models remain. Furthermore, previous work has demonstrated that models optimized for safety often display exaggerated safety behaviors, such as a tendency to refrain from responding to certain requests as a precautionary measure. As such, a clear trade-off between the helpfulness and safety of these models has been documented in the literature. In this paper, we further investigate the effectiveness of safety measures by evaluating models on already mitigated biases. Using the case of Llama 2 as an example, we illustrate how LLMs' safety responses can still encode harmful assumptions. To do so, we create a set of non-toxic prompts, which we then use to evaluate Llama models. Through our new taxonomy of LLMs responses to users, we observe that the safety/helpfulness trade-offs are more pronounced for certain demographic groups which can lead to quality-of-service harms for marginalized populations.

Updated: 2024-06-08 01:58:20

标题: 从代表性伤害到服务质量伤害：关于Llama 2安全保障的案例研究

摘要: 最近在大型语言模型（LLMs）方面取得的进展导致它们在各个领域的广泛应用。然而，这些进展也引入了额外的安全风险，并引发了对它们对已经边缘化人群的不利影响的担忧。尽管在开发安全保障措施方面的努力不断增加，如监督安全取向的微调和利用人类反馈进行安全强化学习，但对这些模型中安全性和根深蒂固的偏见的多重担忧仍然存在。此外，先前的研究表明，为安全性而优化的模型往往显示出夸张的安全行为，例如出于预防目的而避免回应某些请求的倾向。因此，文献中已经记录了这些模型的帮助性和安全性之间的明显权衡。在本文中，我们进一步通过评估已经减轻偏见的模型来调查安全措施的有效性。以Llama 2为例，我们说明LLMs的安全响应仍然可以编码有害假设。为此，我们创建一组非毒性提示，然后用它们来评估Llama模型。通过我们对LLMs响应用户的新分类法，我们观察到某些人群的安全/帮助性权衡更为显著，这可能导致边缘群体的服务质量损害。

更新时间: 2024-06-08 01:58:20

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2403.13213v3

Integrating Text and Image Pre-training for Multi-modal Algorithmic Reasoning

In this paper, we present our solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024. Unlike traditional visual questions and answer tasks, this challenge evaluates abstraction, deduction and generalization ability of neural network in solving visuo-linguistic puzzles designed for specially children in the 6-8 age group. Our model is based on two pre-trained models, dedicated to extract features from text and image respectively. To integrate the features from different modalities, we employed a fusion layer with attention mechanism. We explored different text and image pre-trained models, and fine-tune the integrated classifier on the SMART-101 dataset. Experiment results show that under the data splitting style of puzzle split, our proposed integrated classifier achieves superior performance, verifying the effectiveness of multi-modal pre-trained representations.

Updated: 2024-06-08 01:45:06

标题: 整合文本和图像预训练以实现多模式算法推理

摘要: 在本文中，我们提出了我们针对CVPR多模式算法推理任务2024年SMART-101挑战的解决方案。与传统的视觉问题和回答任务不同，这个挑战评估了神经网络在解决专为6-8岁儿童设计的视觉语言谜题时的抽象、推理和泛化能力。我们的模型基于两个预训练模型，分别用于从文本和图像中提取特征。为了整合不同模态的特征，我们采用了一个带有注意机制的融合层。我们探索了不同的文本和图像预训练模型，并在SMART-101数据集上对整合分类器进行微调。实验结果表明，在拼图拆分的数据分割风格下，我们提出的整合分类器实现了优越的性能，验证了多模式预训练表示的有效性。

更新时间: 2024-06-08 01:45:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05318v1

LoCoCo: Dropping In Convolutions for Long Context Compression

This paper tackles the memory hurdle of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo). LoCoCo employs only a fixed-size Key-Value (KV) cache, and can enhance efficiency in both inference and fine-tuning stages. Diverging from prior methods that selectively drop KV pairs based on heuristics, LoCoCo leverages a data-driven adaptive fusion technique, blending previous KV pairs with incoming tokens to minimize the loss of contextual information and ensure accurate attention modeling. This token integration is achieved through injecting one-dimensional convolutional kernels that dynamically calculate mixing weights for each KV cache slot. Designed for broad compatibility with existing LLM frameworks, LoCoCo allows for straightforward "drop-in" integration without needing architectural modifications, while incurring minimal tuning overhead. Experiments demonstrate that LoCoCo maintains consistently outstanding performance across various context lengths and can achieve a high context compression rate during both inference and fine-tuning phases. During inference, we successfully compressed up to 3482 tokens into a 128-size KV cache, while retaining comparable performance to the full sequence - an accuracy improvement of up to 0.2791 compared to baselines at the same cache size. During post-training tuning, we also effectively extended the context length from 4K to 32K using a KV cache of fixed size 512, achieving performance similar to fine-tuning with entire sequences.

Updated: 2024-06-08 01:35:11

标题: LoCoCo：为长上下文压缩而减少卷积

摘要: 这篇论文解决了在大型语言模型（LLMs）中处理长上下文序列的内存障碍，提出了一种新颖的方法，即用于长上下文压缩的Dropping In Convolutions（LoCoCo）。LoCoCo仅使用固定大小的键-值（KV）缓存，可以提高推断和微调阶段的效率。LoCoCo与先前基于启发式选择性丢弃KV对的方法不同，它利用数据驱动的自适应融合技术，将先前的KV对与传入的标记混合，以最小化上下文信息的丢失并确保准确的注意建模。这种标记集成是通过注入一维卷积核实现的，这些卷积核动态计算每个KV缓存槽的混合权重。LoCoCo专为与现有LLM框架广泛兼容而设计，可以直接进行“插入”集成，无需进行架构修改，同时产生最小的调整开销。实验证明，LoCoCo在各种上下文长度下始终保持出色的性能，并且在推断和微调阶段均能实现高上下文压缩率。在推断阶段，我们成功地将最多3482个标记压缩为128大小的KV缓存，同时保持与完整序列相当的性能 - 与相同缓存大小的基线相比，准确性提高了最多0.2791。在后训练微调过程中，我们还成功地使用固定大小为512的KV缓存将上下文长度从4K扩展到32K，实现了与整个序列微调相似的性能。

更新时间: 2024-06-08 01:35:11

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.05317v1

C-Mamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting

In recent years, significant progress has been made in multivariate time series forecasting using Linear-based, Transformer-based, and Convolution-based models. However, these approaches face notable limitations: linear forecasters struggle with representation capacities, attention mechanisms suffer from quadratic complexity, and convolutional models have a restricted receptive field. These constraints impede their effectiveness in modeling complex time series, particularly those with numerous variables. Additionally, many models adopt the Channel-Independent (CI) strategy, treating multivariate time series as uncorrelated univariate series while ignoring their correlations. For models considering inter-channel relationships, whether through the self-attention mechanism, linear combination, or convolution, they all incur high computational costs and focus solely on weighted summation relationships, neglecting potential proportional relationships between channels. In this work, we address these issues by leveraging the newly introduced state space model and propose \textbf{C-Mamba}, a novel approach that captures cross-channel dependencies while maintaining linear complexity without losing the global receptive field. Our model consists of two key components: (i) channel mixup, where two channels are mixed to enhance the training sets; (ii) channel attention enhanced patch-wise Mamba encoder that leverages the ability of the state space models to capture cross-time dependencies and models correlations between channels by mining their weight relationships. Our model achieves state-of-the-art performance on seven real-world time series datasets. Moreover, the proposed mixup and attention strategy exhibits strong generalizability across other frameworks.

Updated: 2024-06-08 01:32:44

标题: C-Mamba: 用于多变量时间序列预测的通道相关增强状态空间模型

摘要: 近年来，在使用基于线性、基于Transformer和基于卷积的模型进行多变量时间序列预测方面取得了显著进展。然而，这些方法面临显著的限制：线性预测器在表示能力上存在困难，注意机制受到二次复杂性的影响，卷积模型具有受限的感知域。这些约束阻碍了它们在建模复杂时间序列方面的有效性，特别是那些具有大量变量的时间序列。此外，许多模型采用通道独立（CI）策略，将多变量时间序列视为不相关的单变量序列，而忽略它们之间的相关性。对于考虑通道间关系的模型，无论是通过自注意力机制、线性组合还是卷积，它们都会产生高计算成本，并且仅关注加权求和关系，忽视了通道之间的潜在比例关系。在这项工作中，我们通过利用新引入的状态空间模型，提出了一种名为C-Mamba的新方法，该方法在保持线性复杂性的同时捕获了跨通道的依赖关系，而不会失去全局感知域。我们的模型由两个关键组件组成：（i）通道混合，其中两个通道混合以增强训练集；（ii）通道注意力增强的基于补丁的Mamba编码器，利用状态空间模型捕获跨时间依赖关系，并通过挖掘它们的权重关系来建模通道之间的相关性。我们的模型在七个真实世界时间序列数据集上实现了最先进的性能。此外，所提出的混合和注意策略在其他框架中展现出很强的泛化能力。

更新时间: 2024-06-08 01:32:44

领域: cs.LG

下载: http://arxiv.org/abs/2406.05316v1

Concept Formation and Alignment in Language Models: Bridging Statistical Patterns in Latent Space to Concept Taxonomy

This paper explores the concept formation and alignment within the realm of language models (LMs). We propose a mechanism for identifying concepts and their hierarchical organization within the semantic representations learned by various LMs, encompassing a spectrum from early models like Glove to the transformer-based language models like ALBERT and T5. Our approach leverages the inherent structure present in the semantic embeddings generated by these models to extract a taxonomy of concepts and their hierarchical relationships. This investigation sheds light on how LMs develop conceptual understanding and opens doors to further research to improve their ability to reason and leverage real-world knowledge. We further conducted experiments and observed the possibility of isolating these extracted conceptual representations from the reasoning modules of the transformer-based LMs. The observed concept formation along with the isolation of conceptual representations from the reasoning modules can enable targeted token engineering to open the door for potential applications in knowledge transfer, explainable AI, and the development of more modular and conceptually grounded language models.

Updated: 2024-06-08 01:27:19

标题: 概念形成与语言模型中的对齐：将潜在空间中的统计模式与概念分类桥接

摘要: 本文探讨了语言模型（LMs）领域内的概念形成和对齐。我们提出了一种机制，用于识别各种LMs学习的语义表示中的概念及其层次结构，涵盖从早期模型如Glove到基于变压器的语言模型如ALBERT和T5的光谱。我们的方法利用这些模型生成的语义嵌入中存在的固有结构，提取概念的分类法以及它们之间的层次关系。这项研究揭示了LMs如何发展概念理解，并为改进它们推理和利用现实世界知识的能力开辟了新的研究方向。我们进一步进行了实验，并观察到可以将这些提取出的概念表示从基于变压器的LMs的推理模块中隔离出来的可能性。观察到的概念形成以及将概念表示与推理模块隔离可以实现针对性的令牌工程，为知识转移、可解释AI以及更模块化和概念基础的语言模型的潜在应用打开大门。

更新时间: 2024-06-08 01:27:19

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05315v1

Relational Proxy Loss for Audio-Text based Keyword Spotting

In recent years, there has been an increasing focus on user convenience, leading to increased interest in text-based keyword enrollment systems for keyword spotting (KWS). Since the system utilizes text input during the enrollment phase and audio input during actual usage, we call this task audio-text based KWS. To enable this task, both acoustic and text encoders are typically trained using deep metric learning loss functions, such as triplet- and proxy-based losses. This study aims to improve existing methods by leveraging the structural relations within acoustic embeddings and within text embeddings. Unlike previous studies that only compare acoustic and text embeddings on a point-to-point basis, our approach focuses on the relational structures within the embedding space by introducing the concept of Relational Proxy Loss (RPL). By incorporating RPL, we demonstrated improved performance on the Wall Street Journal (WSJ) corpus.

Updated: 2024-06-08 01:21:17

标题: 关于基于音频-文本的关键词检测的关系代理损失

摘要: 近年来，用户便利性日益受到关注，这导致对基于文本关键字的关键字识别系统的兴趣增强。由于该系统在注册阶段利用文本输入，在实际使用时利用音频输入，我们将这一任务称为基于音频文本的关键字识别。为了实现这一任务，通常会使用深度度量学习损失函数来训练声学编码器和文本编码器，例如三元组和基于代理的损失。本研究旨在通过利用声学嵌入和文本嵌入内部的结构关系，改进现有方法。与以往研究仅在点对点基础上比较声学和文本嵌入不同，我们的方法关注嵌入空间内部的关系结构，引入了关系代理损失（RPL）的概念。通过整合RPL，我们在华尔街日报（WSJ）语料库上展示了改进的性能。

更新时间: 2024-06-08 01:21:17

领域: eess.AS,cs.AI,eess.SP

下载: http://arxiv.org/abs/2406.05314v1

A Note on the Prediction-Powered Bootstrap

We introduce PPBoot: a bootstrap-based method for prediction-powered inference. PPBoot is applicable to arbitrary estimation problems and is very simple to implement, essentially only requiring one application of the bootstrap. Through a series of examples, we demonstrate that PPBoot often performs nearly identically to (and sometimes better than) the earlier PPI(++) method based on asymptotic normality$\unicode{x2013}$when the latter is applicable$\unicode{x2013}$without requiring any asymptotic characterizations. Given its versatility, PPBoot could simplify and expand the scope of application of prediction-powered inference to problems where central limit theorems are hard to prove.

Updated: 2024-06-08 01:17:13

标题: 关于预测动力引导的自助法的注解

摘要: 我们介绍了一种基于自举的预测驱动推断方法PPBoot。PPBoot适用于任意的估计问题，并且实现起来非常简单，基本上只需要应用一次自举。通过一系列示例，我们展示了在某些情况下，PPBoot通常表现几乎与基于渐近正态性质的早期PPI(++)方法相同甚至更好，而无需任何渐近特征的要求。鉴于其多功能性，PPBoot可以简化并扩展预测驱动推断的应用范围，特别是对于难以证明中心极限定理的问题。

更新时间: 2024-06-08 01:17:13

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.18379v2

Towards a RAG-based Summarization Agent for the Electron-Ion Collider

The complexity and sheer volume of information encompassing documents, papers, data, and other resources from large-scale experiments demand significant time and effort to navigate, making the task of accessing and utilizing these varied forms of information daunting, particularly for new collaborators and early-career scientists. To tackle this issue, a Retrieval Augmented Generation (RAG)--based Summarization AI for EIC (RAGS4EIC) is under development. This AI-Agent not only condenses information but also effectively references relevant responses, offering substantial advantages for collaborators. Our project involves a two-step approach: first, querying a comprehensive vector database containing all pertinent experiment information; second, utilizing a Large Language Model (LLM) to generate concise summaries enriched with citations based on user queries and retrieved data. We describe the evaluation methods that use RAG assessments (RAGAs) scoring mechanisms to assess the effectiveness of responses. Furthermore, we describe the concept of prompt template-based instruction-tuning which provides flexibility and accuracy in summarization. Importantly, the implementation relies on LangChain, which serves as the foundation of our entire workflow. This integration ensures efficiency and scalability, facilitating smooth deployment and accessibility for various user groups within the Electron Ion Collider (EIC) community. This innovative AI-driven framework not only simplifies the understanding of vast datasets but also encourages collaborative participation, thereby empowering researchers. As a demonstration, a web application has been developed to explain each stage of the RAG Agent development in detail.

Updated: 2024-06-08 01:15:05

标题: 朝向基于RAG的电子离子对撞机摘要代理

摘要: 摘要：大规模实验中包含的文件、论文、数据和其他资源的复杂性和庞大数量需要大量的时间和精力来导航，使得访问和利用这些不同形式的信息的任务对于新的合作者和早期科学家来说是令人望而生畏的。为了解决这个问题，正在开发一种基于检索增强生成（RAG）的EIC摘要人工智能(RAGS4EIC)。这种人工智能代理不仅可以将信息压缩，还可以有效地引用相关的回应，为合作者提供重大优势。我们的项目采取了两步方法：首先，查询包含所有相关实验信息的综合向量数据库；其次，利用大型语言模型（LLM）根据用户查询和检索到的数据生成简明摘要，并丰富引文。我们描述了使用RAG评估（RAGAs）评分机制评估响应有效性的评估方法。此外，我们描述了基于提示模板的指令调整概念，提供了摘要中的灵活性和准确性。重要的是，实施依赖于LangChain，它作为我们整个工作流程的基础。这种集成确保了效率和可扩展性，促进了在电子离子对撞机（EIC）社区内各种用户群体的平稳部署和可访问性。这种创新的人工智能驱动框架不仅简化了对庞大数据集的理解，还鼓励协作参与，从而赋予研究人员更强大的能力。作为演示，已经开发了一个Web应用程序，详细说明了RAG代理开发的每个阶段。

更新时间: 2024-06-08 01:15:05

领域: cs.CL,cs.AI,hep-ex,physics.ins-det

下载: http://arxiv.org/abs/2403.15729v3

RL-I2IT: Image-to-Image Translation with Deep Reinforcement Learning

Most existing Image-to-Image Translation (I2IT) methods generate images in a single run of a deep learning (DL) model. However, designing such a single-step model is always challenging, requiring a huge number of parameters and easily falling into bad global minimums and overfitting. In this work, we reformulate I2IT as a step-wise decision-making problem via deep reinforcement learning (DRL) and propose a novel framework that performs RL-based I2IT (RL-I2IT). The key feature in the RL-I2IT framework is to decompose a monolithic learning process into small steps with a lightweight model to progressively transform a source image successively to a target image. Considering that it is challenging to handle high dimensional continuous state and action spaces in the conventional RL framework, we introduce meta policy with a new concept Plan to the standard Actor-Critic model, which is of a lower dimension than the original image and can facilitate the actor to generate a tractable high dimensional action. In the RL-I2IT framework, we also employ a task-specific auxiliary learning strategy to stabilize the training process and improve the performance of the corresponding task. Experiments on several I2IT tasks demonstrate the effectiveness and robustness of the proposed method when facing high-dimensional continuous action space problems. Our implementation of the RL-I2IT framework is available at https://github.com/Algolzw/SPAC-Deformable-Registration.

Updated: 2024-06-08 01:09:02

标题: RL-I2IT：使用深度强化学习进行图像到图像的翻译

摘要: 大多数现有的图像到图像翻译（I2IT）方法在深度学习（DL）模型的单次运行中生成图像。然而，设计这样一个单步模型总是具有挑战性的，需要大量的参数，并容易陷入糟糕的全局最小值和过拟合。在这项工作中，我们通过深度强化学习（DRL）将I2IT重新构建为一个逐步决策问题，并提出了一个执行基于RL的I2IT（RL-I2IT）的新框架。RL-I2IT框架中的关键特征是将整体学习过程分解成小步骤，利用轻量级模型逐步将源图像成功地转换为目标图像。考虑到在传统RL框架中处理高维连续状态和动作空间是具有挑战性的，我们引入了元策略和新概念Plan到标准的Actor-Critic模型，该模型的维度低于原始图像，并可以促使Actor生成可处理的高维动作。在RL-I2IT框架中，我们还采用了一种任务特定的辅助学习策略来稳定训练过程并提高相应任务的性能。对几个I2IT任务的实验表明，当面对高维连续动作空间问题时，所提出的方法的有效性和鲁棒性。我们实现的RL-I2IT框架可在https://github.com/Algolzw/SPAC-Deformable-Registration找到。

更新时间: 2024-06-08 01:09:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.13672v5

Neural Methods for Amortised Parameter Inference

Simulation-based methods for making statistical inference have evolved dramatically over the past 50 years, keeping pace with technological advancements. The field is undergoing a new revolution as it embraces the representational capacity of neural networks, optimisation libraries and graphics processing units for learning complex mappings between data and inferential targets. The resulting tools are amortised, in the sense that they allow inference to be made quickly through fast feedforward operations. In this article we review recent progress made in the context of point estimation, approximate Bayesian inference, summary-statistic construction, and likelihood approximation. The review also covers available software, and includes a simple illustration to showcase the wide array of tools available for amortised inference and the benefits they offer over state-of-the-art Markov chain Monte Carlo methods. The article concludes with an overview of relevant topics and an outlook on future research directions.

Updated: 2024-06-08 01:06:40

标题: 神经方法用于摊销参数推断

摘要: 基于仿真的统计推断方法在过去50年中发生了巨大的演变，与技术进步同步发展。随着神经网络、优化库和图形处理单元的表征能力的应用，该领域正在经历一场新的革命，用于学习数据和推断目标之间的复杂映射。由此产生的工具是摊销的，意味着它们通过快速前向操作可以快速进行推断。在本文中，我们回顾了在点估计、近似贝叶斯推断、摘要统计构建和似然逼近方面取得的最新进展。该审查还涵盖了可用软件，并包括一个简单的示例，展示了用于摊销推断的各种工具以及它们相对于最先进的马尔科夫链蒙特卡洛方法的优势。文章以相关主题的概述和未来研究方向的展望结束。

更新时间: 2024-06-08 01:06:40

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2404.12484v2

COOKIEGUARD: Characterizing and Isolating the First-Party Cookie Jar

As third-party cookies are going away, first-party cookies are increasingly being used for tracking. Prior research has shown that third-party scripts write (or \textit{ghost-write}) first-party cookies in the browser's cookie jar because they are included in the website's main frame. What is more is that a third-party script is able to access all first-party cookies, both the actual first-party cookies as well as the ghost-written first-party cookies by different third-party scripts. Existing isolation mechanisms in the web browser such as SOP and CSP are not designed to address this lack of isolation between first-party cookies written by different third-parties. We conduct a comprehensive analysis of cross-domain first-party cookie retrieval, exfiltration, and modification on top-10K websites. Most notably, we find 18\% and 4\% of the first-party cookies are exfiltrated and overwritten, respectively, by cross-domain third-party scripts. We propose \name to introduce isolation between first-party cookies set by different third-party scripts in the main frame. To this end, \name intercepts cookie get and set operations between third-party scripts and the browser's cookie jar to enforce strict isolation between first-party cookies set by different third-party domains. Our evaluation of \name shows that it effectively blocks all cross-domain cookie read/write operations to provide a fully isolated cookie jar. While it generally does not impact appearance, navigation, or other website functionality, the strict isolation policy disrupts Single Sign-On (SSO) on just 11\% of websites that rely on first-party cookies for session management. Our work demonstrates the feasibility of isolating first-party cookies.

Updated: 2024-06-08 01:02:49

标题: COOKIEGUARD：对第一方Cookie存储罐进行特征化和隔离

摘要: 随着第三方Cookies的消失，越来越多地使用第一方Cookies进行跟踪。先前的研究表明，第三方脚本在浏览器的Cookie存储中写入（或“鬼写”）第一方Cookies，因为它们包含在网站的主框架中。更重要的是，第三方脚本能够访问所有第一方Cookies，包括实际的第一方Cookies以及不同第三方脚本鬼写的第一方Cookies。网络浏览器中现有的隔离机制如SOP和CSP并未设计用于解决不同第三方写入的第一方Cookies之间缺乏隔离的问题。我们对排名前10,000的网站进行了跨域第一方Cookie检索、渗透和修改的全面分析。值得注意的是，我们发现18%和4%的第一方Cookies分别被跨域第三方脚本渗透和覆盖。我们提出了一个名为\name的方案，用于在主框架中设置由不同第三方脚本写入的第一方Cookies之间引入隔离。为此，\name拦截第三方脚本和浏览器的Cookie存储之间的Cookie获取和设置操作，以强制执行不同第三方域设置的第一方Cookies之间的严格隔离。我们对\name的评估表明，它有效阻止了所有跨域Cookie读写操作，提供了完全隔离的Cookie存储。虽然通常不会影响外观、导航或其他网站功能，但严格的隔离政策仅影响了仅依赖第一方Cookies进行会话管理的11%网站上的单点登录（SSO）。我们的工作证明了隔离第一方Cookies的可行性。

更新时间: 2024-06-08 01:02:49

领域: cs.CR

下载: http://arxiv.org/abs/2406.05310v1

DeviceBERT: Applied Transfer Learning With Targeted Annotations and Vocabulary Enrichment to Identify Medical Device and Component Terminology in FDA Recall Summaries

FDA Medical Device recalls are critical and time-sensitive events, requiring swift identification of impacted devices to inform the public of a recall event and ensure patient safety. The OpenFDA device recall dataset contains valuable information about ongoing device recall actions, but manually extracting relevant device information from the recall action summaries is a time-consuming task. Named Entity Recognition (NER) is a task in Natural Language Processing (NLP) that involves identifying and categorizing named entities in unstructured text. Existing NER models, including domain-specific models like BioBERT, struggle to correctly identify medical device trade names, part numbers and component terms within these summaries. To address this, we propose DeviceBERT, a medical device annotation, pre-processing and enrichment pipeline, which builds on BioBERT to identify and label medical device terminology in the device recall summaries with improved accuracy. Furthermore, we demonstrate that our approach can be applied effectively for performing entity recognition tasks where training data is limited or sparse.

Updated: 2024-06-08 00:33:22

标题: DeviceBERT：应用定向注释和词汇丰富化的迁移学习，识别FDA召回摘要中的医疗设备和部件术语

摘要: FDA医疗器械召回是重要且时间敏感的事件，需要迅速识别受影响的设备，以通知公众召回事件并确保患者安全。OpenFDA设备召回数据集包含有关正在进行的设备召回行动的宝贵信息，但从召回行动摘要中手动提取相关设备信息是一项耗时的任务。命名实体识别（NER）是自然语言处理（NLP）中的一项任务，涉及识别和分类非结构化文本中的命名实体。现有的NER模型，包括领域特定模型如BioBERT，在这些摘要中正确识别医疗器械商标名称、零件号和组件术语方面存在困难。为了解决这个问题，我们提出了DeviceBERT，一个医疗器械注释、预处理和丰富化流水线，借助BioBERT来提高设备召回摘要中医疗器械术语的识别和标记准确性。此外，我们证明我们的方法可以有效地应用于执行实体识别任务，其中训练数据有限或稀缺。

更新时间: 2024-06-08 00:33:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05307v1

Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization

Adapting to a priori unknown noise level is a very important but challenging problem in sequential decision-making as efficient exploration typically requires knowledge of the noise level, which is often loosely specified. We report significant progress in addressing this issue for linear bandits in two respects. First, we propose a novel confidence set that is `semi-adaptive' to the unknown sub-Gaussian parameter $\sigma_*^2$ in the sense that the (normalized) confidence width scales with $\sqrt{d\sigma_*^2 + \sigma_0^2}$ where $d$ is the dimension and $\sigma_0^2$ is the specified sub-Gaussian parameter (known) that can be much larger than $\sigma_*^2$. This is a significant improvement over $\sqrt{d\sigma_0^2}$ of the standard confidence set of Abbasi-Yadkori et al. (2011), especially when $d$ is large or $\sigma_*^2=0$. We show that this leads to an improved regret bound in linear bandits. Second, for bounded rewards, we propose a novel variance-adaptive confidence set that has much improved numerical performance upon prior art. We then apply this confidence set to develop, as we claim, the first practical variance-adaptive linear bandit algorithm via an optimistic approach, which is enabled by our novel regret analysis technique. Both of our confidence sets rely critically on `regret equality' from online learning. Our empirical evaluation in diverse Bayesian optimization tasks shows that our proposed algorithms demonstrate better or comparable performance compared to existing methods.

Updated: 2024-06-08 00:23:36

标题: 噪声自适应置信区间在线性赌博机中的应用及其在贝叶斯优化中的应用

摘要: 适应先验未知噪声水平是顺序决策中一个非常重要但具有挑战性的问题，因为高效的探索通常需要对噪声水平有所了解，而这通常规定不明确。我们在线性赌博机领域在两个方面取得了重要进展来解决这个问题。首先，我们提出了一种新颖的置信区间，其在某种程度上对未知的次高斯参数 $\sigma_*^2$ 是“半自适应的”，即（标准化的）置信宽度与 $\sqrt{d\sigma_*^2 + \sigma_0^2}$ 成比例，其中 $d$ 是维度，$\sigma_0^2$ 是指定的次高斯参数（已知），可以远远大于 $\sigma_*^2$。这相对于 Abbasi-Yadkori等人（2011年）的标准置信集合 $\sqrt{d\sigma_0^2}$ 是一个显著的改进，尤其是当 $d$ 很大或 $\sigma_*^2=0$ 时。我们展示了这导致了线性赌博机中改进的遗憾界。其次，对于有界奖励，我们提出了一种新颖的方差自适应置信集，其在先前研究中具有更好的数值表现。然后，我们应用这一置信集通过乐观方法开发了，我们声称，第一个实用的方差自适应线性赌博机算法，这是由我们的新颖遗憾分析技术启用的。我们的两种置信集都严重依赖于在线学习中的“遗憾相等”。我们在各种贝叶斯优化任务中的实证评估表明，我们提出的算法相对于现有方法表现更好或具有可比性。

更新时间: 2024-06-08 00:23:36

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.07341v2

Beyond Efficiency: Scaling AI Sustainably

Barroso's seminal contributions in energy-proportional warehouse-scale computing launched an era where modern datacenters have become more energy efficient and cost effective than ever before. At the same time, modern AI applications have driven ever-increasing demands in computing, highlighting the importance of optimizing efficiency across the entire deep learning model development cycle. This paper characterizes the carbon impact of AI, including both operational carbon emissions from training and inference as well as embodied carbon emissions from datacenter construction and hardware manufacturing. We highlight key efficiency optimization opportunities for cutting-edge AI technologies, from deep learning recommendation models to multi-modal generative AI tasks. To scale AI sustainably, we must also go beyond efficiency and optimize across the life cycle of computing infrastructures, from hardware manufacturing to datacenter operations and end-of-life processing for the hardware.

Updated: 2024-06-08 00:07:16

标题: 超越效率：可持续地扩展人工智能

摘要: Barroso在能源比例仓库规模计算中的开创性贡献开启了一个时代，现代数据中心比以往任何时候都更加节能高效和成本效益。与此同时，现代人工智能应用程序推动了对计算的需求不断增加，突显了在整个深度学习模型开发周期中优化效率的重要性。本文对人工智能的碳影响进行了描述，包括训练和推理的操作性碳排放以及数据中心建设和硬件制造的体现性碳排放。我们突出了先进人工智能技术的关键效率优化机会，从深度学习推荐模型到多模态生成人工智能任务。为了可持续地扩展人工智能，我们还必须超越效率，在计算基础设施的整个生命周期中进行优化，从硬件制造到数据中心运营，再到硬件的末端处理。

更新时间: 2024-06-08 00:07:16

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.05303v1

Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models

标题: 隐私保护的低秩适应性在潜在扩散模型中的应用

Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

标题: 将人类知识与视觉概念对齐，实现可解释的医学图像分类

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

标题: 用高斯输入学习子空间稀疏多项式的均场分析

Rethinking the Capacity of Graph Neural Networks for Branching Strategy

标题: 重新思考图神经网络在分支策略中的能力

Learning High-Order Relationships of Brain Regions

标题: 学习大脑区域的高阶关系

NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security

标题: NYU CTF数据集：用于评估攻击性安全中LLMs的可扩展开源基准数据集

CERET: Cost-Effective Extrinsic Refinement for Text Generation

标题: CERET：用于文本生成的经济外部细化

Neural Operators with Localized Integral and Differential Kernels

标题: 神经算子与局部化积分和微分核

Creativity Has Left the Chat: The Price of Debiasing Language Models

标题: 创造力已经消失：去偏见语言模型的代价

Quantum Machine Learning on Near-Term Quantum Devices: Current State of Supervised and Unsupervised Techniques for Real-World Applications

标题: 近期量子设备上的量子机器学习：现实世界应用中监督和无监督技术的现状

CLASSP: a Biologically-Inspired Approach to Continual Learning through Adjustment Suppression and Sparsity Promotion

标题: CLASSP：一种受生物启发的通过调整抑制和稀疏促进的持续学习方法

Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction

标题: 相信PRoC3S：使用LLMs和约束满足解决长期视野的机器人问题

MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs

标题: MARS：在生成式LLMs中进行不确定性估计的含义感知响应评分

Randomized Geometric Algebra Methods for Convex Neural Networks

标题: 凸神经网络的随机几何代数方法

SAMM: Sharded Automated Market Makers

标题: SAMM：分片自动化做市商

FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning

标题: FedSelect：个性化联邦学习过程中用于微调的参数定制选择

Automata Extraction from Transformers

标题: 从变压器中提取自动机

ThatiAR: Subjectivity Detection in Arabic News Sentences

标题: ThatiAR：阿拉伯新闻句子中的主观性检测

M3H: Multimodal Multitask Machine Learning for Healthcare

标题: M3H：用于医疗保健的多模态多任务机器学习

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

标题: 自回归扩散变压器用于文本转语音合成

Physics-Enhanced Machine Learning: a position paper for dynamical systems investigations

标题: 物理增强机器学习：动力系统研究的立场论文

Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training

标题: 训练失败：数据一致性对并行机器学习训练的影响

Privacy-Preserving Optimal Parameter Selection for Collaborative Clustering

标题: 隐私保护的协同聚类优化参数选择

Reconfiguring Participatory Design to Resist AI Realism

标题: 重新配置参与式设计以抵制人工智能现实主义

VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification

标题: VP-LLM：通过Patchification进行文本驱动的3D体积补全

A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding

标题: 一个针对大型语言模型的微调数据集和蛋白质理解基准。

Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing

标题: 多智能体强化学习与分层协调在应急响应者驻守中的应用

Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

标题: 扰动易样本有助于提高目标对抗传递性

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

标题: 在线DPO：在线直接偏好优化与快慢追踪

PAPR in Motion: Seamless Point-level 3D Scene Interpolation

标题: PAPR在运动中：无缝的点级3D场景插值

Exploring Adversarial Robustness of Deep State Space Models

标题: 探索深度状态空间模型的对抗性鲁棒性

Enhancing Adversarial Transferability via Information Bottleneck Constraints

标题: 通过信息瓶颈约束增强对抗性迁移

Adaptively Perturbed Mirror Descent for Learning in Games

标题: 《自适应扰动镜像下降算法在博弈学习中的应用》

Safely Learning Dynamical Systems

标题: 安全学习动态系统

CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources

标题: CaPS：来自分布式源的协作和私密合成数据生成

Meta-learning in healthcare: A survey

标题: 在医疗保健领域的元学习：一项调查

TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring

标题: TrustSQL：基于惩罚评分的文本到SQL可靠性基准测试

Kuro Siwo: 33 billion $m^2$ under the water. A global multi-temporal satellite dataset for rapid flood mapping

标题: 黑潮: 水下价值330亿美元/m^2。 用于快速洪水制图的全球多时相卫星数据集

The Perception-Robustness Tradeoff in Deterministic Image Restoration

标题: 确定性图像恢复中的感知鲁棒性折衷

SMR: State Memory Replay for Long Sequence Modeling

标题: SMR：用于长序列建模的状态记忆重放

标题: 黑潮: 水下价值330亿美元/m^2。用于快速洪水制图的全球多时相卫星数据集