Arxiv Day: Article

Chain and Causal Attention for Efficient Entity Tracking

This paper investigates the limitations of transformers for entity-tracking tasks in large language models. We identify a theoretical constraint, showing that transformers require at least $\log_2 (n+1)$ layers to handle entity tracking with $n$ state changes. To address this issue, we propose an efficient and frugal enhancement to the standard attention mechanism, enabling it to manage long-term dependencies more efficiently. By considering attention as an adjacency matrix, our model can track entity states with a single layer. Empirical results demonstrate significant improvements in entity tracking datasets while keeping competitive performance on standard natural language modeling. Our modified attention allows us to achieve the same performance with drastically fewer layers. Additionally, our enhanced mechanism reveals structured internal representations of attention. Extensive experiments on both toy and complex datasets validate our approach. Our contributions include theoretical insights, an improved attention mechanism, and empirical validation.

Updated: 2024-10-07 23:54:10

标题: 链式和因果关注用于高效实体跟踪

摘要: 本文研究了大型语言模型中的实体跟踪任务对变压器的局限性。我们确定了一个理论约束，表明变压器需要至少 $\log_2 (n+1)$ 层来处理具有 $n$ 个状态变化的实体跟踪。为了解决这个问题，我们提出了对标准注意力机制的有效和节俭增强，使其能够更有效地处理长期依赖关系。通过将注意力视为邻接矩阵，我们的模型可以通过单个层来跟踪实体状态。实证结果表明，在保持标准自然语言建模的竞争性能的同时，实体跟踪数据集有显着改进。我们修改后的注意力机制使我们能够以极少的层数达到相同的性能。此外，我们的增强机制揭示了注意力的结构化内部表示。对玩具和复杂数据集的大量实验证实了我们的方法。我们的贡献包括理论洞见、改进的注意力机制和实证验证。

更新时间: 2024-10-07 23:54:10

领域: cs.LG,cs.CL,I.2.7

下载: http://arxiv.org/abs/2410.05565v1

Unsupervised Representation Learning from Sparse Transformation Analysis

There is a vast literature on representation learning based on principles such as coding efficiency, statistical independence, causality, controllability, or symmetry. In this paper we propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model, before being decoded to predict a future input state. The flow model is decomposed into a number of rotational (divergence-free) vector fields and a number of potential flow (curl-free) fields. Our sparsity prior encourages only a small number of these fields to be active at any instant and infers the speed with which the probability flows along these fields. Training this model is completely unsupervised using a standard variational objective and results in a new form of disentangled representations where the input is not only represented by a combination of independent factors, but also by a combination of independent transformation primitives given by the learned flow fields. When viewing the transformations as symmetries one may interpret this as learning approximately equivariant representations. Empirically we demonstrate that this model achieves state of the art in terms of both data likelihood and unsupervised approximate equivariance errors on datasets composed of sequence transformations.

Updated: 2024-10-07 23:53:25

标题: 无监督稀疏变换分析中的表示学习

摘要: 关于基于编码效率、统计独立性、因果性、可控性或对称性等原则的表示学习有大量文献。在本文中，我们提出通过将潜变量的转换因子分解为稀疏分量来从序列数据中学习表示。输入数据首先被编码为潜在激活的分布，然后使用概率流模型进行转换，然后被解码以预测未来的输入状态。流模型被分解为多个旋转（无散度）矢量场和多个势流（无旋）场。我们的稀疏先验鼓励在任何时刻只有少量这些场是活跃的，并推断概率沿着这些场流动的速度。通过使用标准变分目标完全无监督地训练该模型，并产生一种新形式的解耦表示，其中输入不仅由独立因子的组合表示，还由学习的流场给出的独立转换原语的组合表示。将转换视为对称性时，我们可以将其解释为学习近似等变表示。从经验上我们证明，该模型在由序列转换组成的数据集上，在数据似然性和无监督近似等变错误方面达到了最先进水平。

更新时间: 2024-10-07 23:53:25

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.05564v1

Rational Metareasoning for Large Language Models

Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs), deploying additional inference-time compute to improve task performance. However, as LLMs increase in both size and adoption, inference costs are correspondingly becoming increasingly burdensome. How, then, might we optimize reasoning's cost-performance tradeoff? This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting and STaR, our method significantly reduces inference costs (20-37\% fewer tokens generated across three models) while maintaining task performance across diverse datasets.

Updated: 2024-10-07 23:48:52

标题: 大型语言模型的合理元推理

摘要: 促使参与推理已经成为使用大型语言模型（LLMs）的核心技术，通过使用额外的推理时间计算来提高任务性能。然而，随着LLMs在规模和采用率上的增加，推理成本相应地变得越来越繁重。那么，我们如何优化推理的成本-性能权衡？这项工作引入了一种基于认知科学中使用的元推理的计算模型的新方法，训练LLMs仅在必要时选择性地使用中间推理步骤。我们首先开发了一个奖励函数，通过惩罚不必要的推理来融入计算的价值，然后使用这个奖励函数与专家迭代一起训练LLMs。与少量链式思维提示和STaR相比，我们的方法显著减少了推理成本（在三种模型中生成的令牌减少了20-37％），同时在各种数据集上保持了任务性能。

更新时间: 2024-10-07 23:48:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05563v1

Practical, Private Assurance of the Value of Collaboration via Fully Homomorphic Encryption

Two parties wish to collaborate on their datasets. However, before they reveal their datasets to each other, the parties want to have the guarantee that the collaboration would be fruitful. We look at this problem from the point of view of machine learning, where one party is promised an improvement on its prediction model by incorporating data from the other party. The parties would only wish to collaborate further if the updated model shows an improvement in accuracy. Before this is ascertained, the two parties would not want to disclose their models and datasets. In this work, we construct an interactive protocol for this problem based on the fully homomorphic encryption scheme over the Torus (TFHE) and label differential privacy, where the underlying machine learning model is a neural network. Label differential privacy is used to ensure that computations are not done entirely in the encrypted domain, which is a significant bottleneck for neural network training according to the current state-of-the-art FHE implementations. We formally prove the security of our scheme assuming honest-but-curious parties, but where one party may not have any expertise in labelling its initial dataset. Experiments show that we can obtain the output, i.e., the accuracy of the updated model, with time many orders of magnitude faster than a protocol using entirely FHE operations.

Updated: 2024-10-07 23:44:05

标题: 通过完全同态加密实现实用的、私密的协作价值保证

摘要: 两个方当希望合作处理他们的数据集。然而，在彼此透露数据集之前，双方希望得到合作将是富有成效的保证。我们从机器学习的角度看待这个问题，其中一方被承诺通过整合另一方的数据来改进其预测模型。只有在更新的模型显示准确性有所提高时，双方才愿意进一步合作。在这一点得到确认之前，两方都不愿透露其模型和数据集。在这项工作中，我们基于Torus上的全同态加密方案（TFHE）和标签差分隐私构建了这个问题的交互式协议，其中基础机器学习模型是神经网络。标签差分隐私用于确保计算不完全在加密域中进行，这是当前最先进的全同态加密实现中神经网络训练的一个重要瓶颈。我们正式证明了我们的方案的安全性，假设方当是诚实但好奇的，但其中一方可能没有任何在标记其初始数据集方面的专业知识。实验表明，我们可以以比完全使用全同态加密操作的方案快得多的时间获得输出，即更新模型的准确性。

更新时间: 2024-10-07 23:44:05

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2310.02563v3

Cyber Threats to Canadian Federal Election: Emerging Threats, Assessment, and Mitigation Strategies

As Canada prepares for the 2025 federal election, ensuring the integrity and security of the electoral process against cyber threats is crucial. Recent foreign interference in elections globally highlight the increasing sophistication of adversaries in exploiting technical and human vulnerabilities. Such vulnerabilities also exist in Canada's electoral system that relies on a complex network of IT systems, vendors, and personnel. To mitigate these vulnerabilities, a threat assessment is crucial to identify emerging threats, develop incident response capabilities, and build public trust and resilience against cyber threats. Therefore, this paper presents a comprehensive national cyber threat assessment, following the NIST Special Publication 800-30 framework, focusing on identifying and mitigating cybersecurity risks to the upcoming 2025 Canadian federal election. The research identifies three major threats: misinformation, disinformation, and malinformation (MDM) campaigns; attacks on critical infrastructure and election support systems; and espionage by malicious actors. Through detailed analysis, the assessment offers insights into the capabilities, intent, and potential impact of these threats. The paper also discusses emerging technologies and their influence on election security and proposes a multi-faceted approach to risk mitigation ahead of the election.

Updated: 2024-10-07 23:40:40

标题: 加拿大联邦选举的网络威胁：新兴威胁、评估和缓解策略

摘要: 随着加拿大为2025年联邦选举做准备，确保选举过程的完整性和安全性免受网络威胁的影响至关重要。最近全球范围内选举中外国干预的事件突显了对手方在利用技术和人为漏洞方面日益增长的复杂性。加拿大的选举系统也存在类似漏洞，其依赖于一个复杂的IT系统网络、供应商和人员。为了减轻这些漏洞，威胁评估至关重要，以识别新兴威胁、开发事件响应能力，建立公众对网络威胁的信任和韧性。因此，本文提出了一项全面的国家网络威胁评估，遵循NIST特别出版物800-30框架，重点是识别和减轻对即将到来的2025年加拿大联邦选举的网络安全风险。研究确定了三个主要威胁：误导、虚假信息和错误信息（MDM）攻击；对关键基础设施和选举支持系统的攻击；以及恶意行动者的间谍活动。通过详细分析，该评估提供了对这些威胁的能力、意图和潜在影响的见解。本文还讨论了新兴技术对选举安全的影响，并提出了在选举前采取多方面方法来减轻风险。

更新时间: 2024-10-07 23:40:40

领域: cs.CR

下载: http://arxiv.org/abs/2410.05560v1

Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives

Reasoning about time and temporal relations is an integral aspect of human cognition, essential for perceiving the world and navigating our experiences. Though large language models (LLMs) have demonstrated impressive performance in many reasoning tasks, temporal reasoning remains challenging due to its intrinsic complexity. In this work, we first study an essential task of temporal reasoning -- temporal graph generation, to unveil LLMs' inherent, global reasoning capabilities. We show that this task presents great challenges even for the most powerful LLMs, such as GPT-3.5/4. We also notice a significant performance gap by small models (<10B) that lag behind LLMs by 50%. Next, we study how to close this gap with a budget constraint, e.g., not using model finetuning. We propose a new prompting technique tailored for temporal reasoning, Narrative-of-Thought (NoT), that first converts the events set to a Python class, then prompts a small model to generate a temporally grounded narrative, guiding the final generation of a temporal graph. Extensive experiments showcase the efficacy of NoT in improving various metrics. Notably, NoT attains the highest F1 on the Schema-11 evaluation set, while securing an overall F1 on par with GPT-3.5. NoT also achieves the best structural similarity across the board, even compared with GPT-3.5/4. Our code is available at https://github.com/launchnlp/NoT.

Updated: 2024-10-07 23:36:05

标题: 思维叙事：通过叙述性叙事改进大型语言模型的时间推理

摘要: 关于时间和时间关系的推理是人类认知的一个重要方面，对于感知世界和导航我们的经验至关重要。虽然大型语言模型（LLMs）在许多推理任务中表现出色，但由于其固有复杂性，时间推理仍然具有挑战性。在这项工作中，我们首先研究了时间推理的一个重要任务--时间图生成，以揭示LLMs固有的全局推理能力。我们展示了这项任务即使对于最强大的LLMs，如GPT-3.5/4，也存在巨大挑战。我们还注意到小模型（<10B）的表现明显落后于LLMs约50%。接下来，我们研究如何在预算约束的情况下弥合这一差距，例如，不使用模型微调。我们提出了一种专为时间推理量身定制的新提示技术，即思维叙事（NoT），它首先将事件集转换为Python类，然后提示一个小模型生成一个时间上有根据的叙事，引导最终生成一个时间图。大量实验证明了NoT在改进各种指标上的有效性。值得注意的是，NoT在Schema-11评估集上获得了最高的F1，同时保持了与GPT-3.5相当的整体F1。NoT在整体上也实现了最佳的结构相似性，甚至与GPT-3.5/4相比。我们的代码可以在https://github.com/launchnlp/NoT找到。

更新时间: 2024-10-07 23:36:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.05558v1

Generated Contents Enrichment

In this paper, we investigate a novel artificial intelligence generation task termed Generated Contents Enrichment (GCE). Conventional AI content generation produces visually realistic content by implicitly enriching the given textual description based on limited semantic descriptions. Unlike this traditional task, our proposed GCE strives to perform content enrichment explicitly in both the visual and textual domains. The goal is to generate content that is visually realistic, structurally coherent, and semantically abundant. To tackle GCE, we propose a deep end-to-end adversarial method that explicitly explores semantics and inter-semantic relationships during the enrichment process. Our approach first models the input description as a scene graph, where nodes represent objects and edges capture inter-object relationships. We then adopt Graph Convolutional Networks on top of the input scene description to predict additional enriching objects and their relationships with the existing ones. Finally, the enriched description is passed to an image synthesis model to generate the corresponding visual content. Experiments conducted on the Visual Genome dataset demonstrate the effectiveness of our method, producing promising and visually plausible results.

Updated: 2024-10-07 23:28:42

标题: 生成内容丰富化

摘要: 在本文中，我们研究了一项新颖的人工智能生成任务，称为生成内容丰富化（GCE）。传统的AI内容生成通过隐式丰富给定的文本描述以产生视觉上逼真的内容。与这个传统任务不同，我们提出的GCE力求在视觉和文本领域明确执行内容丰富化。其目标是生成视觉逼真、结构连贯、语义丰富的内容。为了解决GCE，我们提出了一种深度端到端的对抗方法，明确在丰富过程中探索语义和语义间关系。我们的方法首先将输入描述建模为一个场景图，其中节点表示对象，边捕获对象之间的关系。然后我们在输入场景描述之上采用图卷积网络来预测额外的丰富化对象以及它们与现有对象之间的关系。最后，丰富化的描述传递给图像合成模型来生成相应的视觉内容。在Visual Genome数据集上进行的实验证明了我们方法的有效性，产生了有前途且视觉上可信的结果。

更新时间: 2024-10-07 23:28:42

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.03650v3

On Instruction-Finetuning Neural Machine Translation Models

In this work, we introduce instruction finetuning for Neural Machine Translation (NMT) models, which distills instruction following capabilities from Large Language Models (LLMs) into orders-of-magnitude smaller NMT models. Our instruction-finetuning recipe for NMT models enables customization of translations for a limited but disparate set of translation-specific tasks. We show that NMT models are capable of following multiple instructions simultaneously and demonstrate capabilities of zero-shot composition of instructions. We also show that through instruction finetuning, traditionally disparate tasks such as formality-controlled machine translation, multi-domain adaptation as well as multi-modal translations can be tackled jointly by a single instruction finetuned NMT model, at a performance level comparable to LLMs such as GPT-3.5-Turbo. To the best of our knowledge, our work is among the first to demonstrate the instruction-following capabilities of traditional NMT models, which allows for faster, cheaper and more efficient serving of customized translations.

Updated: 2024-10-07 23:26:13

标题: 关于对神经机器翻译模型进行指导微调

摘要: 在这项工作中，我们介绍了用于神经机器翻译（NMT）模型的指令微调，该方法将大型语言模型（LLMs）的指令遵循能力提炼成数量级较小的NMT模型。我们的NMT模型指令微调配方使得可以为有限但不同的翻译特定任务定制翻译。我们展示了NMT模型能够同时遵循多个指令，并展示了零-shot指令组合的能力。我们还表明，通过指令微调，传统上不同的任务，如形式控制机器翻译、多领域适应以及多模式翻译，可以由单个指令微调的NMT模型共同处理，并且在性能水平上与GPT-3.5-Turbo等LLMs相当。据我们所知，我们的工作是首次展示传统NMT模型的指令遵循能力，这样可以更快、更便宜、更高效地提供定制翻译。

更新时间: 2024-10-07 23:26:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.05553v1

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks indefinitely, offering a promising path toward more general intelligence. To accomplish this grand vision, learning must occur within a vast array of potential tasks. Existing approaches to automatically generating environments are constrained within manually predefined, often narrow distributions of environment, limiting their ability to create any learning environment. To address this limitation, we introduce a novel framework, OMNI-EPIC, that augments previous work in Open-endedness via Models of human Notions of Interestingness (OMNI) with Environments Programmed in Code (EPIC). OMNI-EPIC leverages foundation models to autonomously generate code specifying the next learnable (i.e., not too easy or difficult for the agent's current skill set) and interesting (e.g., worthwhile and novel) tasks. OMNI-EPIC generates both environments (e.g., an obstacle course) and reward functions (e.g., progress through the obstacle course quickly without touching red objects), enabling it, in principle, to create any simulatable learning task. We showcase the explosive creativity of OMNI-EPIC, which continuously innovates to suggest new, interesting learning challenges. We also highlight how OMNI-EPIC can adapt to reinforcement learning agents' learning progress, generating tasks that are of suitable difficulty. Overall, OMNI-EPIC can endlessly create learnable and interesting environments, further propelling the development of self-improving AI systems and AI-Generating Algorithms. Project website with videos: https://dub.sh/omniepic

Updated: 2024-10-07 23:21:20

标题: OMNI-EPIC：通过人类有趣程度概念模型实现开放性的编程环境

摘要: 开放式和人工智能生成算法旨在持续生成和解决越来越复杂的任务，从而无限期地提供通往更普遍智能的有希望的途径。为了实现这一宏伟愿景，学习必须发生在大量潜在任务中。现有的自动生成环境方法受到手动预定义的环境分布的限制，这往往限制了它们创建任何学习环境的能力。为了解决这一限制，我们引入了一个新的框架，OMNI-EPIC，通过在人类有趣性概念模型（OMNI）和编程代码环境（EPIC）中增加先前的开放性工作。OMNI-EPIC利用基础模型自主生成代码，指定下一个可学习的（即对于代理当前的技能集不太容易或困难）和有趣的（例如有价值和新颖的）任务。OMNI-EPIC生成环境（例如障碍赛道）和奖励函数（例如快速通过障碍物赛道而不触碰红色物体），原则上使其能够创建任何可模拟的学习任务。我们展示了OMNI-EPIC的爆炸式创造力，它不断创新提出新的有趣的学习挑战。我们还强调了OMNI-EPIC如何适应强化学习代理的学习进度，生成适当难度的任务。总的来说，OMNI-EPIC可以无限地创建可学习和有趣的环境，进一步推动自我改进人工智能系统和人工智能生成算法的发展。项目网站链接：https://dub.sh/omniepic

更新时间: 2024-10-07 23:21:20

领域: cs.AI

下载: http://arxiv.org/abs/2405.15568v2

Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Information

Federated learning has emerged in the last decade as a distributed optimization paradigm due to the rapidly increasing number of portable devices able to support the heavy computational needs related to the training of machine learning models. Federated learning utilizes gradient-based optimization to minimize a loss objective shared across participating agents. To the best of our knowledge, the literature mostly lacks elegant solutions that naturally harness the reciprocal statistical similarity between clients to redesign the optimization procedure. To address this gap, by conceiving the federated network as a similarity graph, we propose a novel modified framework wherein each client locally performs a perturbed gradient step leveraging prior information about other statistically affine clients. We theoretically prove that our procedure, due to a suitably introduced adaptation in the update rule, achieves a quantifiable speedup concerning the exponential contraction factor in the strongly convex case compared with popular algorithms FedAvg and FedProx, here analyzed as baselines. Lastly, we legitimize our conclusions through experimental results on the CIFAR10 and FEMNIST datasets, where we show that our algorithm speeds convergence up to a margin of 30 global rounds compared with FedAvg while modestly improving generalization on unseen data in heterogeneous settings.

Updated: 2024-10-07 23:14:05

标题: 通过本地扰动和相似性信息帮助全球收敛的联邦学习

摘要: 在过去的十年中，联邦学习作为一种分布式优化范式应运而生，这是由于能够支持与机器学习模型训练相关的重计算需求的便携设备数量迅速增加。联邦学习利用基于梯度的优化来最小化在参与代理之间共享的损失目标。据我们所知，文献中大多数缺乏自然地利用客户端之间相互统计相似性重新设计优化过程的优雅解决方案。为了弥补这一差距，我们将联邦网络构想为一个相似性图，提出了一种新颖的修改框架，在其中每个客户端在本地执行一个扰动梯度步骤，利用先前关于其他统计亲和客户端的信息。我们在理论上证明，由于在更新规则中引入适当的调整，我们的流程在强凸情况下达到了一个可量化的加速度，相比于流行的FedAvg和FedProx算法（在此作为基线进行分析），具有指数收缩因子。最后，我们通过对CIFAR10和FEMNIST数据集的实验结果来证实我们的结论，我们展示了我们的算法在异构环境中加速收敛速度，与FedAvg相比提高了对未见数据的泛化能力。

更新时间: 2024-10-07 23:14:05

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.05545v1

netFound: Foundation Model for Network Security

Developing generalizable ML-based solutions for disparate learning problems in network security is highly desired. However, despite a rich history of applying ML to network security, most existing solutions lack generalizability. This lack of progress can be attributed to an overreliance on supervised learning techniques and the associated challenges of curating well-specified labeled training data. This paper addresses a fundamental gap by introducing a novel transformer-based network foundation model, netFound. We employ self-supervised learning techniques on abundant, unlabeled network telemetry data for pre-training. This pretrained model can subsequently be fine-tuned to create generalizable learning artifacts for disparate learning tasks, even when using commonly available but challenging labeled datasets that are sparse, noisy, and skewed. To realize this goal, netFound leverages various domain-specific attributes and constraints unique to network data (packet traces) by developing multi-modal embeddings, protocol-aware tokenization, data-driven token composition, and hierarchical transformers. Our results demonstrate that netFound's domain-specific design choices ensure that it (1) effectively captures the hidden networking context in production settings, (2) outperforms four different SOTA methods on five different learning tasks, and (3) is robust to both noisy labels and learning shortcuts -- critical for developing generalizable ML models in practical settings.

Updated: 2024-10-07 23:07:07

标题: netFound：网络安全基础模型

摘要: 开发通用的基于机器学习的解决方案，用于网络安全中的不同学习问题是非常有必要的。然而，尽管在网络安全领域应用机器学习具有丰富的历史，大多数现有解决方案缺乏通用性。这种缺乏进展可以归因于对监督学习技术的过度依赖，以及筹备规范标记的训练数据所带来的挑战。本文通过引入一种新颖的基于transformer的网络基础模型netFound来解决这一基本缺口。我们利用自监督学习技术对丰富的未标记网络遥测数据进行预训练。这种预训练的模型随后可以进行微调，以创建用于不同学习任务的通用学习工件，即使使用常见但具有挑战性的标记数据集，这些数据集稀疏、嘈杂且倾斜。为了实现这一目标，netFound利用了网络数据（数据包跟踪）的各种领域特定属性和约束，通过开发多模态嵌入、协议感知标记化、数据驱动标记组合和分层transformers。我们的结果表明，netFound的领域特定设计选择确保了它（1）有效捕捉生产环境中隐藏的网络上下文，（2）在五种不同的学习任务上优于四种不同的SOTA方法，（3）对于嘈杂的标签和学习捷径具有鲁棒性，这对于在实际环境中开发通用的ML模型至关重要。

更新时间: 2024-10-07 23:07:07

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2310.17025v3

Fill In The Gaps: Model Calibration and Generalization with Synthetic Data

As machine learning models continue to swiftly advance, calibrating their performance has become a major concern prior to practical and widespread implementation. Most existing calibration methods often negatively impact model accuracy due to the lack of diversity of validation data, resulting in reduced generalizability. To address this, we propose a calibration method that incorporates synthetic data without compromising accuracy. We derive the expected calibration error (ECE) bound using the Probably Approximately Correct (PAC) learning framework. Large language models (LLMs), known for their ability to mimic real data and generate text with mixed class labels, are utilized as a synthetic data generation strategy to lower the ECE bound and improve model accuracy on real test data. Additionally, we propose data generation mechanisms for efficient calibration. Testing our method on four different natural language processing tasks, we observed an average up to 34\% increase in accuracy and 33\% decrease in ECE.

Updated: 2024-10-07 23:06:42

标题: 填补空白：使用合成数据进行模型校准和泛化

摘要: 随着机器学习模型的快速发展，校准它们的性能在实际和广泛的实施之前已经成为一个主要关注点。大多数现有的校准方法往往会对模型的准确性产生负面影响，因为验证数据的缺乏多样性导致了泛化能力的降低。为了解决这个问题，我们提出了一种在不影响准确性的情况下整合合成数据的校准方法。我们利用可能近似正确（PAC）学习框架推导出了期望的校准误差（ECE）界限。大型语言模型（LLMs）以其模仿真实数据和生成带有混合类标签文本的能力而闻名，被用作降低ECE界限并提高模型在真实测试数据上的准确性的合成数据生成策略。此外，我们提出了有效校准的数据生成机制。在四个不同的自然语言处理任务上测试我们的方法，我们观察到平均准确性提高了34％，ECE降低了33％。

更新时间: 2024-10-07 23:06:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.10864v1

Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.

Updated: 2024-10-07 22:56:17

标题: 使用知识增强生成模型改进分子生成和药物发现

摘要: 最近发展的生成模型在分子和新型药物候选物生成方面建立了最先进的基准。尽管取得了这些成功，生成模型与广泛生物医学知识之间仍存在显著差距，这些知识通常被系统化整合在知识图中，而这些知识图未能充分发挥其为生成过程提供信息和增强的潜力。在本文中，我们提出了一种新颖的方法，通过开发一个名为KARL的知识增强生成模型框架来弥合这一鸿沟。我们开发了一种可扩展的方法论，以扩展知识图的功能，同时保留语义完整性，并将这些上下文信息整合到生成框架中，以引导基于扩散的模型。知识图嵌入与我们的生成模型的集成为产生具有特定特征的新型药物候选物提供了强大的机制，同时确保其有效性和可合成性。KARL在无条件和有目标的生成任务中均优于最先进的生成模型。

更新时间: 2024-10-07 22:56:17

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2402.08790v2

When does compositional structure yield compositional generalization? A kernel theory

Compositional generalization (the ability to respond correctly to novel combinations of familiar components) is thought to be a cornerstone of intelligent behavior. Compositionally structured (e.g. disentangled) representations are essential for this; however, the conditions under which they yield compositional generalization remain unclear. To address this gap, we present a general theory of compositional generalization in kernel models with fixed representations, a tractable framework for characterizing the impact of dataset statistics on generalization. We find that kernel models are constrained to adding up values assigned to each combination of components seen during training ("conjunction-wise additivity"). This imposes fundamental restrictions on the set of tasks these models can learn, in particular preventing them from transitively generalizing equivalence relations. Even for compositional tasks that kernel models can in principle learn, we identify novel failure modes in compositional generalization that arise from biases in the training data and affect important compositional building blocks such as symbolic addition and context dependence (memorization leak and shortcut bias). Finally, we empirically validate our theory, showing that it captures the behavior of deep neural networks (convolutional networks, residual networks, and Vision Transformers) trained on a set of compositional tasks with similarly structured data. Ultimately, this work provides a theoretical perspective on how statistical structure in the training data can affect compositional generalization, with implications for how to identify and remedy failure modes in deep learning models.

Updated: 2024-10-07 22:55:53

标题: 构成结构何时产生构成概括？一个核心理论

摘要: 组成概括（对熟悉组件的新颖组合做出正确响应的能力）被认为是智能行为的基石。组合结构化（例如解缠）表示对此至关重要；然而，它们产生组成概括的条件仍不清楚。为了填补这一空白，我们提出了一种在具有固定表示的核模型中的组成概括的一般理论，这是一种便于表征数据集统计对概括的影响的框架。我们发现核模型被限制为将分配给训练中看到的每个组件组合的值相加（"与操作添加性"）。这对这些模型可以学习的任务集施加了基本限制，特别是阻止它们在传递性地概括等价关系。即使对于核模型原则上可以学习的组成任务，我们也确定了组成概括中出现的新型失败模式，这些模式源于训练数据中的偏见，并影响重要的组合构件，例如符号加法和上下文依赖性（记忆泄漏和捷径偏见）。最后，我们在实证上验证了我们的理论，展示了它如何捕捉在一组结构类似的数据上训练的深度神经网络（卷积网络、残余网络和视觉Transformer）的行为。最终，这项工作提供了关于训练数据中的统计结构如何影响组成概括的理论观点，从而影响如何识别和纠正深度学习模型的失败模式。

更新时间: 2024-10-07 22:55:53

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2405.16391v2

Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective

Vision-language models (VLMs) pre-trained on extensive datasets can inadvertently learn biases by correlating gender information with specific objects or scenarios. Current methods, which focus on modifying inputs and monitoring changes in the model's output probability scores, often struggle to comprehensively understand bias from the perspective of model components. We propose a framework that incorporates causal mediation analysis to measure and map the pathways of bias generation and propagation within VLMs. This approach allows us to identify the direct effects of interventions on model bias and the indirect effects of interventions on bias mediated through different model components. Our results show that image features are the primary contributors to bias, with significantly higher impacts than text features, specifically accounting for 32.57% and 12.63% of the bias in the MSCOCO and PASCAL-SENTENCE datasets, respectively. Notably, the image encoder's contribution surpasses that of the text encoder and the deep fusion encoder. Further experimentation confirms that contributions from both language and vision modalities are aligned and non-conflicting. Consequently, focusing on blurring gender representations within the image encoder, which contributes most to the model bias, reduces bias efficiently by 22.03% and 9.04% in the MSCOCO and PASCAL-SENTENCE datasets, respectively, with minimal performance loss or increased computational demands.

Updated: 2024-10-07 22:38:25

标题: 图片胜过言辞：从因果中介的角度理解和减轻视觉-语言模型中的偏见

摘要: 视觉语言模型（VLMs）在广泛数据集上预训练时，可能会通过将性别信息与特定对象或情境相关联而无意中学习偏见。目前的方法主要集中在修改输入并监测模型输出概率分数的变化，往往难以全面从模型组件的角度理解偏见。我们提出了一个框架，该框架结合因果中介分析，以测量和映射VLMs内部偏见生成和传播的路径。这种方法允许我们识别干预对模型偏见的直接影响以及通过不同模型组件介导的偏见的间接影响。我们的结果表明，图像特征是偏见的主要贡献者，其影响显著高于文本特征，分别在MSCOCO和PASCAL-SENTENCE数据集中占偏见的32.57%和12.63%。值得注意的是，图像编码器的贡献超过了文本编码器和深度融合编码器。进一步的实验证实，语言和视觉模态的贡献是一致且不冲突的。因此，集中于模糊图像编码器中的性别表示，这是对模型偏见贡献最大的部分，可以在MSCOCO和PASCAL-SENTENCE数据集中分别有效地减少22.03%和9.04%的偏见，而且性能损失或增加的计算需求很小。

更新时间: 2024-10-07 22:38:25

领域: cs.AI,cs.CL,cs.CV,I.2.7

下载: http://arxiv.org/abs/2407.02814v2

Online Dynamic Pricing for Electric Vehicle Charging Stations with Reservations

The transition to electric vehicles (EVs), coupled with the rise of renewable energy sources, will significantly impact the electric grid. Unlike conventional fuel sources, electricity for EVs is constrained by grid capacity, price fluctuations, and long EV charging times, requiring new pricing solutions to manage demand and supply. This paper proposes a model for online dynamic pricing of reserved EV charging services, including reservation, parking, and charging as a bundled service priced as a whole. Our approach focuses on the individual charging station operator, employing a stochastic demand model and online dynamic pricing based on expected demand. The proposed model uses a Markov Decision Process (MDP) formulation to optimize sequential pricing decisions for charging session requests. A key contribution is the novel definition and quantification of discretization error introduced by the discretization of the Poisson process for use in the MDP. The model's viability is demonstrated with a heuristic solution method based on Monte-Carlo tree search, offering a viable path for real-world application.

Updated: 2024-10-07 22:36:40

标题: 电动汽车充电站的在线动态定价与预约

摘要: 电动汽车（EVs）的过渡，加上可再生能源的兴起，将显著影响电网。与传统燃料来源不同，EV的电力受限于电网容量、价格波动和长时间充电，需要新的定价解决方案来管理需求和供应。本文提出了一种用于在线动态定价保留EV充电服务的模型，包括预订、停车和充电作为整体定价的捆绑服务。我们的方法专注于个别充电站运营商，采用随机需求模型和基于预期需求的在线动态定价。所提出的模型使用马尔可夫决策过程（MDP）的形式来优化充电会话请求的顺序定价决策。一个关键贡献是对离散化误差的新定义和量化，该误差由用于MDP中的泊松过程的离散化引入。该模型的可行性通过基于蒙特卡洛树搜索的启发式解决方法来展示，为实际应用提供了可行的途径。

更新时间: 2024-10-07 22:36:40

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2410.05538v1

SWAG: Storytelling With Action Guidance

Automated long-form story generation typically employs long-context large language models (LLMs) for one-shot creation, which can produce cohesive but not necessarily engaging content. We introduce Storytelling With Action Guidance (SWAG), a novel approach to storytelling with LLMs. Our approach frames story writing as a search problem through a two-model feedback loop: one LLM generates story content, and another auxiliary LLM is used to choose the next best "action" to steer the story's future direction. Our results show that SWAG can substantially outperform previous end-to-end story generation techniques when evaluated by GPT-4 and through human evaluation. Our SWAG pipeline using only small open-source models surpasses GPT-3.5-Turbo.

Updated: 2024-10-07 22:36:38

标题: SWAG: 带动作指导的讲故事

摘要: 自动生成长篇故事通常使用长上下文大语言模型（LLMs）进行一次性创作，这可以生成连贯但不一定引人入胜的内容。我们引入了带有行动指导的叙事（SWAG），这是一种创作故事的新方法，采用LLMs。我们的方法将叙事写作视为一个搜索问题，通过两个模型的反馈循环：一个LLM生成故事内容，另一个辅助LLM用于选择下一个最佳“行动”，以引导故事的未来发展方向。我们的结果表明，SWAG在通过GPT-4和人类评估进行评估时可以显著优于先前的端到端故事生成技术。我们的SWAG管道仅使用小型开源模型就超过了GPT-3.5-Turbo。

更新时间: 2024-10-07 22:36:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.03483v2

Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search

The real-world effectiveness of deep neural networks often depends on their latency, thereby necessitating optimization techniques that can reduce a model's inference time while preserving its performance. One popular approach is to sequentially rewrite the input computation graph into an equivalent but faster one by replacing individual subgraphs. This approach gives rise to the so-called phase-ordering problem in which the application of one rewrite rule can eliminate the possibility to apply an even better one later on. Recent work has shown that equality saturation, a technique from compiler optimization, can mitigate this issue by first building an intermediate representation (IR) that efficiently stores multiple optimized versions of the input program before extracting the best solution in a second step. In practice, however, memory constraints prevent the IR from capturing all optimized versions and thus reintroduce the phase-ordering problem in the construction phase. In this paper, we present a tensor graph rewriting approach that uses Monte Carlo tree search to build superior IRs by identifying the most promising rewrite rules. We also introduce a novel extraction algorithm that can provide fast and accurate runtime estimates of tensor programs represented in an IR. Our approach improves the inference speedup of neural networks by up to 11% compared to existing methods.

Updated: 2024-10-07 22:22:02

标题: 使用等式饱和和蒙特卡洛树搜索优化张量计算图

摘要: 深度神经网络在现实世界中的有效性往往取决于它们的延迟，因此需要优化技术来减少模型的推理时间，同时保持其性能。一种流行的方法是将输入计算图顺序重写为一个等效但更快的计算图，通过替换单个子图。这种方法引发了所谓的相位排序问题，其中一个重写规则的应用可能会消除后续应用更好规则的可能性。最近的研究表明，来自编译优化的一种技术，即相等饱和，可以通过首先构建一个有效存储多个优化版本的输入程序的中间表示（IR），然后在第二步中提取最佳解决方案，来缓解这个问题。然而，在实践中，内存限制阻止了IR捕获所有优化版本，因此在构建阶段重新引入了相位排序问题。在本文中，我们提出了一种张量图重写方法，使用蒙特卡罗树搜索来通过识别最有前途的重写规则来构建优越的IR。我们还引入了一种新颖的提取算法，可以为在IR中表示的张量程序提供快速而准确的运行时估计。相比现有方法，我们的方法将神经网络的推理加速提高了高达11%。

更新时间: 2024-10-07 22:22:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05534v1

VisDiff: SDF-Guided Polygon Generation for Visibility Reconstruction and Recognition

The capability to learn latent representations plays a key role in the effectiveness of recent machine learning methods. An active frontier in representation learning is understanding representations for combinatorial structures which may not admit well-behaved local neighborhoods or distance functions. For example, for polygons, slightly perturbing vertex locations might lead to significant changes in their combinatorial structure and may even lead to invalid polygons. In this paper, we investigate representations to capture the underlying combinatorial structures of polygons. Specifically, we study the open problem of Visibility Reconstruction: Given a visibility graph G, construct a polygon P whose visibility graph is G. We introduce VisDiff, a novel diffusion-based approach to reconstruct a polygon from its given visibility graph G. Our method first estimates the signed distance function (SDF) of P from G. Afterwards, it extracts ordered vertex locations that have the pairwise visibility relationship given by the edges of G. Our main insight is that going through the SDF significantly improves learning for reconstruction. In order to train VisDiff, we make two main contributions: (1) We design novel loss components for computing the visibility in a differentiable manner and (2) create a carefully curated dataset. We use this dataset to benchmark our method and achieve 21% improvement in F1-Score over standard methods. We also demonstrate effective generalization to out-of-distribution polygon types and show that learning a generative model allows us to sample the set of polygons with a given visibility graph. Finally, we extend our method to the related combinatorial problem of reconstruction from a triangulation. We achieve 95% classification accuracy of triangulation edges and a 4% improvement in Chamfer distance compared to current architectures.

Updated: 2024-10-07 22:17:11

标题: VisDiff：SDF引导的多边形生成用于可见性重建和识别

摘要: 学习潜在表示的能力在最近的机器学习方法的有效性中起着关键作用。在表示学习中的一个活跃前沿是理解组合结构的表示，这些结构可能不符合良好的局部邻域或距离函数。例如，对于多边形，稍微扰动顶点位置可能导致它们的组合结构发生显著变化，甚至可能导致无效的多边形。在本文中，我们研究了捕捉多边形潜在组合结构的表示。具体而言，我们研究了可见性重建的开放问题：给定一个可见性图G，构造一个可见性图为G的多边形P。我们介绍了VisDiff，一种基于扩散的新颖方法，用于从给定的可见性图G重建多边形。我们的方法首先从G估计P的带符号距离函数（SDF）。然后，它提取具有由G的边给出的成对可见性关系的有序顶点位置。我们的主要见解是通过SDF显着改进了重建的学习。为了训练VisDiff，我们做出了两个主要贡献：（1）我们设计了用于以可微方式计算可见性的新颖损失组件，（2）创建了一个精心策划的数据集。我们使用这个数据集来评估我们的方法，并在F1-Score上实现了21％的改进。我们还展示了对于分布外的多边形类型的有效泛化，并表明学习一个生成模型可以让我们对具有给定可见性图的多边形进行采样。最后，我们将我们的方法扩展到从三角剖分的相关组合问题的重建。与当前架构相比，我们实现了95％的三角剖分边分类准确率和Chamfer距离的4％改进。

更新时间: 2024-10-07 22:17:11

领域: cs.CG,cs.LG

下载: http://arxiv.org/abs/2410.05530v1

DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback

Restless multi-armed bandits (RMAB) has been widely used to model constrained sequential decision making problems, where the state of each restless arm evolves according to a Markov chain and each state transition generates a scalar reward. However, the success of RMAB crucially relies on the availability and quality of reward signals. Unfortunately, specifying an exact reward function in practice can be challenging and even infeasible. In this paper, we introduce Pref-RMAB, a new RMAB model in the presence of preference signals, where the decision maker only observes pairwise preference feedback rather than scalar reward from the activated arms at each decision epoch. Preference feedback, however, arguably contains less information than the scalar reward, which makes Pref-RMAB seemingly more difficult. To address this challenge, we present a direct online preference learning (DOPL) algorithm for Pref-RMAB to efficiently explore the unknown environments, adaptively collect preference data in an online manner, and directly leverage the preference feedback for decision-makings. We prove that DOPL yields a sublinear regret. To our best knowledge, this is the first algorithm to ensure $\tilde{\mathcal{O}}(\sqrt{T\ln T})$ regret for RMAB with preference feedback. Experimental results further demonstrate the effectiveness of DOPL.

Updated: 2024-10-07 22:14:20

标题: DOPL：具有偏好反馈的不定期赌博机的直接在线偏好学习

摘要: 不安的多臂赌博机（RMAB）已被广泛用于建模受限的顺序决策问题，其中每个不安的臂的状态根据马尔可夫链演化，每个状态转移产生一个标量奖励。然而，RMAB的成功关键取决于奖励信号的可用性和质量。不幸的是，在实践中确定一个精确的奖励函数可能具有挑战性，甚至是不可行的。在本文中，我们引入Pref-RMAB，一种新的RMAB模型，其中决策者仅在每个决策时期从激活的臂处观察到成对的偏好反馈，而不是标量奖励。然而，偏好反馈据称包含的信息比标量奖励少，这使得Pref-RMAB看起来更加困难。为了解决这一挑战，我们提出了一种用于Pref-RMAB的直接在线偏好学习（DOPL）算法，以有效地探索未知环境，在线自适应地收集偏好数据，并直接利用偏好反馈进行决策。我们证明DOPL产生次线性遗憾。据我们所知，这是第一个确保RMAB在偏好反馈下具有$\tilde{\mathcal{O}}(\sqrt{T\ln T})$遗憾的算法。实验结果进一步证明了DOPL的有效性。

更新时间: 2024-10-07 22:14:20

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.05527v1

Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem

This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of context retrieved from a constrained database of digitized pedagogical materials (dictionaries and grammar lessons) and parallel corpora. Using both automatic and human evaluation of model output, we conduct ablation studies that manipulate (1) context type (morpheme translations, grammar descriptions, and corpus examples), (2) retrieval methods (automated vs. manual), and (3) model type. Our results suggest that even relatively small LLMs are capable of utilizing prompt context for zero-shot low-resource translation when provided a minimally sufficient amount of relevant linguistic information. However, the variable effects of context type, retrieval method, model type, and language-specific factors highlight the limitations of using even the best LLMs as translation systems for the majority of the world's 7,000+ languages and their speakers.

Updated: 2024-10-07 22:11:14

标题: 低资源翻译中LLM的缺陷：检索和理解都是问题

摘要: 这项工作研究了预训练的大型语言模型（LLMs）在作为自动化机器翻译流程的一部分被指示将文本从低资源语言翻译为高资源语言时的上下文学习能力。我们进行了一系列实验，将南凯丘亚语翻译成西班牙语，并研究了从受限制的数字化教材数据库（词典和语法课程）和平行语料库中检索的各种上下文的信息量。通过对模型输出的自动和人工评估，我们进行了消融研究，操纵（1）上下文类型（语素翻译、语法描述和语料示例），（2）检索方法（自动 vs. 手动），以及（3）模型类型。我们的结果表明，即使是相对较小的LLMs在提供了足够的相关语言信息的最低限度的情况下，也能够利用提示上下文进行零翻译低资源语言。然而，上下文类型、检索方法、模型类型和语言特定因素的可变效果突显了即使是最好的LLMs也存在一定限制，不能作为世界上7000多种语言和其使用者的翻译系统。

更新时间: 2024-10-07 22:11:14

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.15625v2

Scalar Field Prediction on Meshes Using Interpolated Multi-Resolution Convolutional Neural Networks

Scalar fields, such as stress or temperature fields, are often calculated in shape optimization and design problems in engineering. For complex problems where shapes have varying topology and cannot be parametrized, data-driven scalar field prediction can be faster than traditional finite element methods. However, current data-driven techniques to predict scalar fields are limited to a fixed grid domain, instead of arbitrary mesh structures. In this work, we propose a method to predict scalar fields on arbitrary meshes. It uses a convolutional neural network whose feature maps at multiple resolutions are interpolated to node positions before being fed into a multilayer perceptron to predict solutions to partial differential equations at mesh nodes. The model is trained on finite element von Mises stress fields, and once trained it can estimate stress values at each node on any input mesh. Two shape datasets are investigated, and the model has strong performance on both, with a median R-squared value of 0.91. We also demonstrate the model on a temperature field in a heat conduction problem, where its predictions have a median R-squared value of 0.99. Our method provides a potential flexible alternative to finite element analysis in engineering design contexts. Code and datasets are available online.

Updated: 2024-10-07 21:59:34

标题: 使用插值多分辨率卷积神经网络在网格上对标量场进行预测

摘要: 标量场，如应力或温度场，经常在工程中的形状优化和设计问题中进行计算。对于形状拓扑不同且无法参数化的复杂问题，数据驱动的标量场预测可能比传统的有限元方法更快。然而，目前用于预测标量场的数据驱动技术仅限于固定网格域，而不是任意网格结构。在这项工作中，我们提出了一种在任意网格上预测标量场的方法。它使用一个卷积神经网络，其多个分辨率的特征图在被馈送到多层感知器之前插值到节点位置，以预测网格节点处的偏微分方程的解。该模型在有限元von Mises应力场上进行训练，一旦训练完成，它可以在任何输入网格上估计每个节点的应力值。我们调查了两个形状数据集，并且该模型在两者上表现出良好性能，中位数R平方值为0.91。我们还在热传导问题中的温度场上展示了该模型，在此预测的中位数R平方值为0.99。我们的方法为工程设计环境中有限元分析提供了潜在的灵活替代方案。代码和数据集可在线获得。

更新时间: 2024-10-07 21:59:34

领域: cs.LG

下载: http://arxiv.org/abs/2410.05522v1

A Unified Theory of Quantum Neural Network Loss Landscapes

Classical neural networks with random initialization famously behave as Gaussian processes in the limit of many neurons, which allows one to completely characterize their training and generalization behavior. No such general understanding exists for quantum neural networks (QNNs), which -- outside of certain special cases -- are known to not behave as Gaussian processes when randomly initialized. We here prove that QNNs and their first two derivatives instead generally form what we call "Wishart processes," where certain algebraic properties of the network determine the hyperparameters of the process. This Wishart process description allows us to, for the first time: give necessary and sufficient conditions for a QNN architecture to have a Gaussian process limit; calculate the full gradient distribution, generalizing previously known barren plateau results; and calculate the local minima distribution of algebraically constrained QNNs. Our unified framework suggests a certain simple operational definition for the "trainability" of a given QNN model using a newly introduced, experimentally accessible quantity we call the "degrees of freedom" of the network architecture.

Updated: 2024-10-07 21:58:50

标题: 一个统一的量子神经网络损失景观理论

摘要: 经典神经网络在许多神经元的极限情况下，以随机初始化的方式表现为高斯过程，这使得我们能够完全描述它们的训练和泛化行为。对于量子神经网络（QNNs），没有类似的普遍理解存在，因为除了某些特殊情况外，它们在随机初始化时并不表现为高斯过程。我们在这里证明，QNNs及其前两个导数通常形成我们所称的“Wishart过程”，其中网络的某些代数属性决定了过程的超参数。这种Wishart过程描述使我们能够首次提出：给出QNN架构具有高斯过程极限的必要和充分条件；计算完整的梯度分布，推广先前已知的贫瘠高原结果；以及计算代数约束QNNs的局部极小值分布。我们的统一框架提出了一个针对给定QNN模型的“可训练性”的某种简单操作定义，使用我们称之为网络架构的“自由度”的新引入的实验可达量。

更新时间: 2024-10-07 21:58:50

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2408.11901v3

Using Deep Learning to Identify Initial Error Sensitivity for Interpretable ENSO Forecasts

We introduce an interpretable-by-design method, optimized model-analog, that integrates deep learning with model-analog forecasting which generates forecasts from similar initial climate states in a repository of model simulations. This hybrid framework employs a convolutional neural network to estimate state-dependent weights to identify initial analog states that lead to shadowing target trajectories. The advantage of our method lies in its inherent interpretability, offering insights into initial-error-sensitive regions through estimated weights and the ability to trace the physically-based evolution of the system through analog forecasting. We evaluate our approach using the Community Earth System Model Version 2 Large Ensemble to forecast the El Ni\~no-Southern Oscillation (ENSO) on a seasonal-to-annual time scale. Results show a 10% improvement in forecasting equatorial Pacific sea surface temperature anomalies at 9-12 months leads compared to the unweighted model-analog technique. Furthermore, our model demonstrates improvements in boreal winter and spring initialization when evaluated against a reanalysis dataset. Our approach reveals state-dependent regional sensitivity linked to various seasonally varying physical processes, including the Pacific Meridional Modes, equatorial recharge oscillator, and stochastic wind forcing. Additionally, forecasts of El Ni\~no and La Ni\~na are sensitive to different initial states: El Ni\~no forecasts are more sensitive to initial error in tropical Pacific sea surface temperature in boreal winter, while La Ni\~na forecasts are more sensitive to initial error in tropical Pacific zonal wind stress in boreal summer. This approach has broad implications for forecasting diverse climate phenomena, including regional temperature and precipitation, which are challenging for the model-analog approach alone.

Updated: 2024-10-07 21:47:33

标题: 利用深度学习识别初始误差敏感性，以解释ENSO预测

摘要: 我们引入了一种易于解释的设计方法，优化的模型模拟，将深度学习与模型模拟预测相结合，从模型模拟的存储库中生成类似初始气候状态的预测。这种混合框架采用卷积神经网络来估计状态依赖权重，以识别导致目标轨迹阴影的初始模拟状态。我们方法的优势在于其固有的可解释性，通过估计的权重提供对初始误差敏感区域的见解，并通过模拟预测追踪系统的基于物理的演变。我们使用Community Earth System Model Version 2 Large Ensemble评估我们的方法，以在季节至年度时间尺度上预测厄尔尼诺-南方涛动（ENSO）。结果显示，在预测赤道太平洋海表温度异常在9-12个月领先方面，与未加权模型模拟技术相比，预测改善了10%。此外，我们的模型在与再分析数据集对比时，在北方冬季和春季初始化方面表现出改善。我们的方法揭示了与各种季节变化的物理过程相关的状态依赖区域敏感性，包括太平洋经向模态、赤道充电振荡器和随机风力驱动。此外，厄尔尼诺和拉尼娜的预测对不同的初始状态敏感：厄尔尼诺的预测在北方冬季对热带太平洋海表温度的初始误差更敏感，而拉尼娜的预测在北方夏季对热带太平洋经向风应力的初始误差更敏感。这种方法对预测各种气候现象具有广泛的影响，包括区域温度和降水，这对于仅仅使用模型模拟方法来说是具有挑战性的。

更新时间: 2024-10-07 21:47:33

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2404.15419v4

Measuring Feature Dependency of Neural Networks by Collapsing Feature Dimensions in the Data Manifold

This paper introduces a new technique to measure the feature dependency of neural network models. The motivation is to better understand a model by querying whether it is using information from human-understandable features, e.g., anatomical shape, volume, or image texture. Our method is based on the principle that if a model is dependent on a feature, then removal of that feature should significantly harm its performance. A targeted feature is "removed" by collapsing the dimension in the data distribution that corresponds to that feature. We perform this by moving data points along the feature dimension to a baseline feature value while staying on the data manifold, as estimated by a deep generative model. Then we observe how the model's performance changes on the modified test data set, with the target feature dimension removed. We test our method on deep neural network models trained on synthetic image data with known ground truth, an Alzheimer's disease prediction task using MRI and hippocampus segmentations from the OASIS-3 dataset, and a cell nuclei classification task using the Lizard dataset.

Updated: 2024-10-07 21:43:23

标题: 通过在数据流形中折叠特征维度来衡量神经网络的特征依赖性

摘要: 本文介绍了一种新的技术，用于测量神经网络模型的特征依赖性。动机是通过查询模型是否使用来自人类可理解的特征信息，例如解剖形状、体积或图像纹理，来更好地理解模型。我们的方法基于这样一个原则：如果一个模型依赖于某个特征，那么删除该特征应该显著损害其性能。通过将数据分布中与该特征对应的维度折叠起来，来"移除"目标特征。我们通过将数据点沿着特征维度移动到基线特征值，同时保持在数据流形上，如通过深度生成模型估计的那样来执行此操作。然后我们观察模型在修改后的测试数据集上的性能如何改变，其中目标特征维度已被移除。我们在经过训练的深度神经网络模型上测试我们的方法，这些模型是基于已知地面真实数据的合成图像数据、使用来自OASIS-3数据集的MRI和海马分割的阿尔茨海默病预测任务，以及使用蜥蜴数据集进行细胞核分类任务。

更新时间: 2024-10-07 21:43:23

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.12341v2

Geometric Signatures of Compositionality Across a Language Model's Lifetime

Compositionality, the notion that the meaning of an expression is constructed from the meaning of its parts and syntactic rules, permits the infinite productivity of human language. For the first time, artificial language models (LMs) are able to match human performance in a number of compositional generalization tasks. However, much remains to be understood about the representational mechanisms underlying these abilities. We take a high-level geometric approach to this problem by relating the degree of compositionality in a dataset to the intrinsic dimensionality of its representations under an LM, a measure of feature complexity. We find not only that the degree of dataset compositionality is reflected in representations' intrinsic dimensionality, but that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training. Finally, our analyses reveal a striking contrast between linear and nonlinear dimensionality, showing that they respectively encode formal and semantic aspects of linguistic composition.

Updated: 2024-10-07 21:35:13

标题: 跨语言模型生命周期中组合性的几何特征

摘要: 组合性是指表达式的含义是从其部分和句法规则的含义构建而成，这使得人类语言具有无限的生产力。首次，人工语言模型（LM）能够在许多组合泛化任务中匹配人类表现。然而，关于支撑这些能力的表征机制仍有许多待解。我们通过将数据集的组合性程度与LM下的表征的固有维度相关联，采取了一个高层几何方法来解决这个问题，这是特征复杂性的一个度量。我们发现不仅数据集的组合性程度反映在表征的固有维度中，而且组合性和几何复杂性之间的关系是由于在训练过程中学习的语言特征。最后，我们的分析揭示了线性和非线性维度之间的鲜明对比，显示它们分别编码语言组合的形式和语义方面。

更新时间: 2024-10-07 21:35:13

领域: cs.CL,cs.AI,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2410.01444v2

Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors

Object-level mapping builds a 3D map of objects in a scene with detailed shapes and poses from multi-view sensor observations. Conventional methods struggle to build complete shapes and estimate accurate poses due to partial occlusions and sensor noise. They require dense observations to cover all objects, which is challenging to achieve in robotics trajectories. Recent work introduces generative shape priors for object-level mapping from sparse views, but is limited to single-category objects. In this work, we propose a General Object-level Mapping system, GOM, which leverages a 3D diffusion model as shape prior with multi-category support and outputs Neural Radiance Fields (NeRFs) for both texture and geometry for all objects in a scene. GOM includes an effective formulation to guide a pre-trained diffusion model with extra nonlinear constraints from sensor measurements without finetuning. We also develop a probabilistic optimization formulation to fuse multi-view sensor observations and diffusion priors for joint 3D object pose and shape estimation. Our GOM system demonstrates superior multi-category mapping performance from sparse views, and achieves more accurate mapping results compared to state-of-the-art methods on the real-world benchmarks. We will release our code: https://github.com/TRAILab/GeneralObjectMapping.

Updated: 2024-10-07 21:33:30

标题: 朝向使用3D扩散先验信息从稀疏视角实现对物体级别的通用映射

摘要: Object-level mapping建立了一个物体的3D地图，其中包含了来自多视角传感器观测的详细形状和姿势。传统方法在建立完整形状和估计准确姿势方面存在困难，部分遮挡和传感器噪音是原因。它们需要密集观测来覆盖所有物体，这在机器人轨迹中是具有挑战性的。最近的工作引入了生成形状先验用于从稀疏视图进行物体级别的映射，但仅限于单一类别的物体。在这项工作中，我们提出了一个General Object-level Mapping系统，GOM，它利用3D扩散模型作为形状先验，支持多类别，并输出所有物体的纹理和几何的神经辐射场(NeRFs)。GOM包括一个有效的公式，用来指导一个预训练的扩散模型，通过传感器测量的额外非线性约束而无需微调。我们还开发了一个概率优化公式，用于融合多视图传感器观测和扩散先验，以进行联合的3D物体姿势和形状估计。我们的GOM系统展示了在稀疏视图下优越的多类别映射性能，并与实际基准测试中的最先进方法相比，实现了更准确的映射结果。我们将发布我们的代码：https://github.com/TRAILab/GeneralObjectMapping。

更新时间: 2024-10-07 21:33:30

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2410.05514v1

A Recurrent Neural Network Approach to the Answering Machine Detection Problem

In the field of telecommunications and cloud communications, accurately and in real-time detecting whether a human or an answering machine has answered an outbound call is of paramount importance. This problem is of particular significance during campaigns as it enhances service quality, efficiency and cost reduction through precise caller identification. Despite the significance of the field, it remains inadequately explored in the existing literature. This paper presents an innovative approach to answering machine detection that leverages transfer learning through the YAMNet model for feature extraction. The YAMNet architecture facilitates the training of a recurrent-based classifier, enabling real-time processing of audio streams, as opposed to fixed-length recordings. The results demonstrate an accuracy of over 96% on the test set. Furthermore, we conduct an in-depth analysis of misclassified samples and reveal that an accuracy exceeding 98% can be achieved with the integration of a silence detection algorithm, such as the one provided by FFmpeg.

Updated: 2024-10-07 21:28:09

标题: 一种用于应答机检测问题的循环神经网络方法

摘要: 在电信和云通信领域，准确实时地检测出站电话的接听方是人类还是自动接听机器至关重要。这个问题在营销活动中尤为重要，因为通过准确的呼叫者识别可以提高服务质量，提高效率并降低成本。尽管该领域的重要性，但现有文献对此尚未充分探讨。本文提出了一种创新的自动接听机器检测方法，利用YAMNet模型进行特征提取的迁移学习。YAMNet架构有助于训练基于循环的分类器，实现对音频流的实时处理，而不是固定长度的录音。结果表明，在测试集上的准确率超过96%。此外，我们对误分类样本进行了深入分析，并发现通过集成诸如FFmpeg提供的静音检测算法，可以实现超过98%的准确率。

更新时间: 2024-10-07 21:28:09

领域: cs.SD,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.08235v1

Structural Constraints for Physics-augmented Learning

When the physics is wrong, physics-informed machine learning becomes physics-misinformed machine learning. A powerful black-box model should not be able to conceal misconceived physics. We propose two criteria that can be used to assert integrity that a hybrid (physics plus black-box) model: 0) the black-box model should be unable to replicate the physical model, and 1) any best-fit hybrid model has the same physical parameter as a best-fit standalone physics model. We demonstrate them for a sample nonlinear mechanical system approximated by its small-signal linearization.

Updated: 2024-10-07 21:25:49

标题: 物理学增强学习的结构约束

摘要: 当物理学错误时，基于物理学的机器学习就变成了基于物理学错误的机器学习。一个强大的黑盒模型不应该能够掩盖误解的物理学。我们提出了两个标准，可以用来确保一个混合（物理加黑盒）模型的完整性：0）黑盒模型应该无法复制物理模型，1）任何最佳拟合的混合模型都具有与最佳拟合的独立物理模型相同的物理参数。我们以一个通过其小信号线性化近似的样本非线性机械系统来展示它们。

更新时间: 2024-10-07 21:25:49

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.05507v1

Privacy Vulnerabilities in Marginals-based Synthetic Data

When acting as a privacy-enhancing technology, synthetic data generation (SDG) aims to maintain a resemblance to the real data while excluding personally-identifiable information. Many SDG algorithms provide robust differential privacy (DP) guarantees to this end. However, we show that the strongest class of SDG algorithms--those that preserve \textit{marginal probabilities}, or similar statistics, from the underlying data--leak information about individuals that can be recovered more efficiently than previously understood. We demonstrate this by presenting a novel membership inference attack, MAMA-MIA, and evaluate it against three seminal DP SDG algorithms: MST, PrivBayes, and Private-GSD. MAMA-MIA leverages knowledge of which SDG algorithm was used, allowing it to learn information about the hidden data more accurately, and orders-of-magnitude faster, than other leading attacks. We use MAMA-MIA to lend insight into existing SDG vulnerabilities. Our approach went on to win the first SNAKE (SaNitization Algorithm under attacK ... $\varepsilon$) competition.

Updated: 2024-10-07 21:24:22

标题: 基于边际的合成数据中的隐私漏洞

摘要: 在作为增强隐私技术时，合成数据生成（SDG）旨在在排除个人可识别信息的同时保持与真实数据的相似性。许多SDG算法为此提供强大的差分隐私（DP）保证。然而，我们展示了保留边际概率或类似统计数据的最强SDG算法类别泄露有关个体的信息，可以比先前理解的更有效地恢复。我们通过提出一种新颖的成员推理攻击MAMA-MIA来证明这一点，并对三种开创性的DP SDG算法进行评估：MST、PrivBayes和Private-GSD。MAMA-MIA利用对使用的SDG算法的了解，使其能够更准确地学习有关隐藏数据的信息，比其他领先的攻击快数个数量级。我们使用MAMA-MIA来深入了解现有SDG的漏洞。我们的方法赢得了第一届SNAKE（受攻击下的消毒算法… $\varepsilon$）比赛。

更新时间: 2024-10-07 21:24:22

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.05506v1

Predicting the Geolocation of Tweets Using transformer models on Customized Data

This research is aimed to solve the tweet/user geolocation prediction task and provide a flexible methodology for the geotagging of textual big data. The suggested approach implements neural networks for natural language processing (NLP) to estimate the location as coordinate pairs (longitude, latitude) and two-dimensional Gaussian Mixture Models (GMMs). The scope of proposed models has been finetuned on a Twitter dataset using pretrained Bidirectional Encoder Representations from Transformers (BERT) as base models. Performance metrics show a median error of fewer than 30 km on a worldwide-level, and fewer than 15 km on the US-level datasets for the models trained and evaluated on text features of tweets' content and metadata context. Our source code and data are available at https://github.com/K4TEL/geo-twitter.git

Updated: 2024-10-07 21:24:15

标题: 使用转换器模型在定制数据上预测推文的地理位置

摘要: 这项研究旨在解决推文/用户地理位置预测任务，并为文本大数据的地理标记提供灵活的方法论。建议的方法实现了神经网络用于自然语言处理（NLP）以估计位置为坐标对（经度，纬度）和二维高斯混合模型（GMM）。拟议模型的范围已经在Twitter数据集上进行了微调，使用预训练的双向编码器表示转换器（BERT）作为基础模型。性能指标显示，在全球范围内，经过训练和评估推文内容和元数据上的文本特征的模型的中位误差少于30公里，在美国范围内的数据集上少于15公里。我们的源代码和数据可在https://github.com/K4TEL/geo-twitter.git上找到。

更新时间: 2024-10-07 21:24:15

领域: cs.CL,cs.AI,cs.IR,cs.LG,68T50,I.2.7

下载: http://arxiv.org/abs/2303.07865v5

FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction

Predicting drug-target interaction (DTI) is critical in the drug discovery process. Despite remarkable advances in recent DTI models through the integration of representations from diverse drug and target encoders, such models often struggle to capture the fine-grained interactions between drugs and protein, i.e. the binding of specific drug atoms (or substructures) and key amino acids of proteins, which is crucial for understanding the binding mechanisms and optimising drug design. To address this issue, this paper introduces a novel model, called FusionDTI, which uses a token-level Fusion module to effectively learn fine-grained information for Drug-Target Interaction. In particular, our FusionDTI model uses the SELFIES representation of drugs to mitigate sequence fragment invalidation and incorporates the structure-aware (SA) vocabulary of target proteins to address the limitation of amino acid sequences in structural information, additionally leveraging pre-trained language models extensively trained on large-scale biomedical datasets as encoders to capture the complex information of drugs and targets. Experiments on three well-known benchmark datasets show that our proposed FusionDTI model achieves the best performance in DTI prediction compared with seven existing state-of-the-art baselines. Furthermore, our case study indicates that FusionDTI could highlight the potential binding sites, enhancing the explainability of the DTI prediction.

Updated: 2024-10-07 21:22:58

标题: FusionDTI：利用令牌级融合进行药物靶标相互作用的细粒度结合发现

摘要: 预测药物-靶标相互作用（DTI）在药物发现过程中至关重要。尽管最近DTI模型在整合来自不同药物和靶标编码器的表示方面取得了显著进展，但这些模型通常很难捕捉药物和蛋白质之间的细粒度相互作用，即特定药物原子（或亚结构）与蛋白质的关键氨基酸的结合，这对于理解结合机制和优化药物设计至关重要。为解决这一问题，本文引入了一种名为FusionDTI的新模型，该模型使用令牌级融合模块有效地学习药物-靶标相互作用的细粒度信息。特别是，我们的FusionDTI模型使用SELFIES表示药物以减轻序列片段的无效化，并结合了靶蛋白的结构感知（SA）词汇以解决氨基酸序列在结构信息方面的限制，此外，还广泛利用在大规模生物医学数据集上进行了充分训练的预训练语言模型作为编码器，以捕获药物和靶标的复杂信息。对三个知名基准数据集的实验证明，我们提出的FusionDTI模型在DTI预测中表现出比七种现有的最先进基线模型更好的性能。此外，我们的案例研究表明，FusionDTI可以突出显示潜在的结合位点，增强了DTI预测的可解释性。

更新时间: 2024-10-07 21:22:58

领域: q-bio.QM,cs.AI,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2406.01651v3

Cybersecurity Threat Hunting and Vulnerability Analysis Using a Neo4j Graph Database of Open Source Intelligence

Open source intelligence is a powerful tool for cybersecurity analysts to gather information both for analysis of discovered vulnerabilities and for detecting novel cybersecurity threats and exploits. However the scale of information that is relevant for information security on the internet is always increasing, and is intractable for analysts to parse comprehensively. Therefore methods of condensing the available open source intelligence, and automatically developing connections between disparate sources of information, is incredibly valuable. In this research, we present a system which constructs a Neo4j graph database formed by shared connections between open source intelligence text including blogs, cybersecurity bulletins, news sites, antivirus scans, social media posts (e.g., Reddit and Twitter), and threat reports. These connections are comprised of possible indicators of compromise (e.g., IP addresses, domains, hashes, email addresses, phone numbers), information on known exploits and techniques (e.g., CVEs and MITRE ATT&CK Technique ID's), and potential sources of information on cybersecurity exploits such as twitter usernames. The construction of the database of potential IoCs is detailed, including the addition of machine learning and metadata which can be used for filtering of the data for a specific domain (for example a specific natural language) when needed. Examples of utilizing the graph database for querying connections between known malicious IoCs and open source intelligence documents, including threat reports, are shown. We show three specific examples of interesting connections found in the graph database; the connections to a known exploited CVE, a known malicious IP address, and a malware hash signature.

Updated: 2024-10-07 21:15:40

标题: 使用Neo4j图数据库的开源情报进行网络安全威胁追踪和漏洞分析

摘要: 开源情报是网络安全分析人员收集信息的强大工具，用于分析发现的漏洞以及检测新型网络安全威胁和利用方式。然而，互联网上与信息安全相关的信息规模不断增加，对分析人员来说很难全面解析。因此，压缩可用的开源情报，并自动建立不同信息源之间的连接的方法具有极大价值。在本研究中，我们提出了一个系统，该系统构建了一个Neo4j图数据库，其中包括博客、网络安全公告、新闻网站、杀毒软件扫描、社交媒体帖子（例如Reddit和Twitter）以及威胁报告等开源情报文本之间的共享连接。这些连接由可能的妥协指标（例如IP地址、域名、哈希值、电子邮件地址、电话号码）、已知利用和技术信息（例如CVE和MITRE ATT&CK技术ID）以及关于网络安全利用的潜在信息源（例如Twitter用户名）组成。详细介绍了可能的IoC数据库的构建，包括添加可用于在需要时针对特定领域（例如特定自然语言）进行数据过滤的机器学习和元数据。展示了利用图数据库查询已知恶意IoC与开源情报文档之间的连接的示例，包括威胁报告。展示了在图数据库中找到的三个感兴趣连接的具体示例；与已知被利用的CVE、已知恶意IP地址和恶意软件哈希签名的连接。

更新时间: 2024-10-07 21:15:40

领域: cs.CR

下载: http://arxiv.org/abs/2301.12013v2

Probabilistic Kolmogorov-Arnold Network

The Kolmogorov-Arnold network (KAN) is a regression model that is based on a representation of an arbitrary continuous multivariate function by a composition of functions of a single variable. Experimentally-obtained datasets for regression models typically include uncertainties, which in some cases, cannot be neglected. The conventional way to account for the latter is to model confidence intervals of the systems' outputs in addition to the expected values of the outputs. However, such information may be insufficient, and in some cases, researchers aim to obtain probability distributions of the outputs. The present paper proposes a method for estimating probability distributions of the outputs in the case of aleatoric uncertainty (i.e. for systems that produce different outputs each time an experiment is executed with the same inputs). The suggested approach covers input-dependent probability distributions of the outputs and is capable of capturing the multi-modality, as well as the variation of the distribution type with the inputs. Although the method is applicable to any regression model, the present paper combines it with KANs, since the specific structure of KANs leads to computationally-efficient models' construction. The source code is available online.

Updated: 2024-10-07 21:14:43

标题: 概率科尔莫哥洛夫-阿诺德网络

摘要: Kolmogorov-Arnold网络（KAN）是一种基于任意连续多变量函数表示的回归模型，通过单变量函数的组合来表示。实验获得的回归模型数据集通常包含不确定性，在某些情况下，这种不确定性是不可忽略的。传统的解决方法是在期望输出值之外建模系统输出的置信区间。然而，这样的信息可能不足，有些情况下，研究人员希望获得输出的概率分布。本文提出了一种方法，用于估计输出的概率分布，在随机不确定性的情况下（即对于每次使用相同输入执行实验时产生不同输出的系统）。建议的方法涵盖了输入相关的输出概率分布，并能够捕捉多模态性，以及随输入变化的分布类型变化。虽然该方法适用于任何回归模型，但本文将其与KAN结合，因为KAN的特定结构导致了计算效率高的模型构建。源代码可在网上获得。

更新时间: 2024-10-07 21:14:43

领域: cs.LG

下载: http://arxiv.org/abs/2104.01714v7

Residual Kolmogorov-Arnold Network for Enhanced Deep Learning

Despite the strong performance in many computer vision tasks, Convolutional Neural Networks (CNNs) can sometimes struggle to efficiently capture long-range, complex non-linear dependencies in deeper layers of the network. We address this limitation by introducing Residual KAN, which incorporates the Kolmogorov-Arnold Network (KAN) within the CNN framework as a residual component. Our approach uses Chebyshev polynomials as the basis for KAN convolutions that enables more expressive and adaptive feature representations while maintaining computational efficiency. The proposed RKAN blocks, when integrated into established architectures such as ResNet and DenseNet, offer consistent improvements over the baseline models on various well-known benchmarks. Our results demonstrate the potential of RKAN to enhance the capabilities of deep CNNs in visual data.

Updated: 2024-10-07 21:12:32

标题: 残余科尔莫戈洛夫-阿诺德网络用于增强深度学习

摘要: 尽管卷积神经网络（CNNs）在许多计算机视觉任务中表现出色，但有时可能会在网络的深层中难以高效地捕捉长程、复杂的非线性依赖关系。我们通过引入残差KAN（Residual KAN）来解决这一限制，将科尔莫戈洛夫-阿诺尔德网络（KAN）作为残差组件融入CNN框架中。我们的方法使用切比雪夫多项式作为KAN卷积的基础，可以实现更具表现力和适应性的特征表示，同时保持计算效率。提出的RKAN块，当集成到已建立的架构如ResNet和DenseNet中时，在各种知名基准上均能持续改进基准模型。我们的结果表明，RKAN有望提升深度CNN在视觉数据中的能力。

更新时间: 2024-10-07 21:12:32

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05500v1

Unitary convolutions for learning on graphs and groups

Data with geometric structure is ubiquitous in machine learning often arising from fundamental symmetries in a domain, such as permutation-invariance in graphs and translation-invariance in images. Group-convolutional architectures, which encode symmetries as inductive bias, have shown great success in applications, but can suffer from instabilities as their depth increases and often struggle to learn long range dependencies in data. For instance, graph neural networks experience instability due to the convergence of node representations (over-smoothing), which can occur after only a few iterations of message-passing, reducing their effectiveness in downstream tasks. Here, we propose and study unitary group convolutions, which allow for deeper networks that are more stable during training. The main focus of the paper are graph neural networks, where we show that unitary graph convolutions provably avoid over-smoothing. Our experimental results confirm that unitary graph convolutional networks achieve competitive performance on benchmark datasets compared to state-of-the-art graph neural networks. We complement our analysis of the graph domain with the study of general unitary convolutions and analyze their role in enhancing stability in general group convolutional architectures.

Updated: 2024-10-07 21:09:14

标题: 图和群上的单射卷积学习

摘要: 几何结构的数据在机器学习中是普遍存在的，通常源于领域中的基本对称性，比如图中的置换不变性和图像中的平移不变性。将对称性编码为归纳偏见的群卷积架构在应用中取得了巨大成功，但随着深度增加，可能会出现不稳定性，并且往往难以学习数据中的长程依赖关系。例如，图神经网络由于节点表示的收敛（过度平滑）而出现不稳定性，这可能仅在几次迭代后就会发生，从而降低它们在下游任务中的效果。在这里，我们提出并研究了酉群卷积，它允许更深层次的网络在训练过程中更加稳定。本文的主要重点是图神经网络，我们展示了酉图卷积可以明显避免过度平滑。我们的实验结果证实，与最先进的图神经网络相比，酉图卷积网络在基准数据集上取得了竞争性能。我们对图领域的分析补充了对一般酉卷积的研究，并分析了它们在增强一般群卷积架构的稳定性中的作用。

更新时间: 2024-10-07 21:09:14

领域: cs.LG

下载: http://arxiv.org/abs/2410.05499v1

Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation

Adaptive brain stimulation can treat neurological conditions such as Parkinson's disease and post-stroke motor deficits by influencing abnormal neural activity. Because of patient heterogeneity, each patient requires a unique stimulation policy to achieve optimal neural responses. Model-free reinforcement learning (MFRL) holds promise in learning effective policies for a variety of similar control tasks, but is limited in domains like brain stimulation by a need for numerous costly environment interactions. In this work we introduce Coprocessor Actor Critic, a novel, model-based reinforcement learning (MBRL) approach for learning neural coprocessor policies for brain stimulation. Our key insight is that coprocessor policy learning is a combination of learning how to act optimally in the world and learning how to induce optimal actions in the world through stimulation of an injured brain. We show that our approach overcomes the limitations of traditional MFRL methods in terms of sample efficiency and task success and outperforms baseline MBRL approaches in a neurologically realistic model of an injured brain.

Updated: 2024-10-07 21:07:33

标题: 协处理器演员评论家：一种基于模型的适应性大脑刺激强化学习方法

摘要: 自适应脑刺激可以通过影响异常神经活动来治疗帕金森病和中风后的运动缺陷等神经疾病。由于患者的异质性，每位患者都需要一个独特的刺激策略才能实现最佳的神经反应。无模型强化学习（MFRL）在学习类似控制任务的有效策略方面具有潜力，但在脑刺激等领域受到需要大量昂贵的环境交互的限制。在这项工作中，我们引入了Coprocessor Actor Critic，一种新颖的、基于模型的强化学习（MBRL）方法，用于学习神经协处理器策略以进行脑刺激。我们的关键洞察是，协处理器策略学习是学习如何在世界中行动最佳和学习如何通过刺激受伤的大脑诱导出最佳行动的组合。我们展示了我们的方法在样本效率和任务成功方面克服了传统MFRL方法的限制，并在受损大脑的神经学实际模型中表现优于基线MBRL方法。

更新时间: 2024-10-07 21:07:33

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.06714v2

Intuitions of Compromise: Utilitarianism vs. Contractualism

What is the best compromise in a situation where different people value different things? The most commonly accepted method for answering this question -- in fields across the behavioral and social sciences, decision theory, philosophy, and artificial intelligence development -- is simply to add up utilities associated with the different options and pick the solution with the largest sum. This ``utilitarian'' approach seems like the obvious, theory-neutral way of approaching the problem. But there is an important, though often-ignored, alternative: a ``contractualist'' approach, which advocates for an agreement-driven method of deciding. Remarkably, no research has presented empirical evidence directly comparing the intuitive plausibility of these two approaches. In this paper, we systematically explore the proposals suggested by each algorithm (the ``Utilitarian Sum'' and the contractualist ''Nash Product''), using a paradigm that applies those algorithms to aggregating preferences across groups in a social decision-making context. While the dominant approach to value aggregation up to now has been utilitarian, we find that people strongly prefer the aggregations recommended by the contractualist algorithm. Finally, we compare the judgments of large language models (LLMs) to that of our (human) participants, finding important misalignment between model and human preferences.

Updated: 2024-10-07 21:05:57

标题: 折衷的直觉：功利主义 vs. 契约主义

摘要: 在不同人价值不同事物的情况下，最好的妥协是什么？回答这个问题最常被接受的方法是将与不同选项相关联的效用相加，选择总和最大的解决方案。这种“功利主义”方法似乎是处理问题的明显、理论中立的方式。但有一个重要的、经常被忽视的替代方法：一个“契约主义”方法，提倡一种基于协议的决策方法。值得注意的是，没有研究直接比较这两种方法的直观合理性的经验证据。在本文中，我们系统地探讨了每种算法提出的建议（“功利主义总和”和契约主义“纳什乘积”），使用一种将这些算法应用于社会决策情境中的群体偏好聚合的范式。尽管迄今为止价值聚合的主导方法一直是功利主义，我们发现人们强烈偏好契约主义算法推荐的聚合。最后，我们将大型语言模型（LLMs）的判断与我们（人类）参与者的判断进行比较，发现模型与人类偏好之间存在重要的不一致。

更新时间: 2024-10-07 21:05:57

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2410.05496v1

Transformers learn variable-order Markov chains in-context

Large language models have demonstrated impressive in-context learning (ICL) capability. However, it is still unclear how the underlying transformers accomplish it, especially in more complex scenarios. Toward this goal, several recent works studied how transformers learn fixed-order Markov chains (FOMC) in context, yet natural languages are more suitably modeled by variable-order Markov chains (VOMC), i.e., context trees (CTs). In this work, we study the ICL of VOMC by viewing language modeling as a form of data compression and focus on small alphabets and low-order VOMCs. This perspective allows us to leverage mature compression algorithms, such as context-tree weighting (CTW) and prediction by partial matching (PPM) algorithms as baselines, the former of which is Bayesian optimal for a class of CTW priors. We empirically observe a few phenomena: 1) Transformers can indeed learn to compress VOMC in-context, while PPM suffers significantly; 2) The performance of transformers is not very sensitive to the number of layers, and even a two-layer transformer can learn in-context quite well; and 3) Transformers trained and tested on non-CTW priors can significantly outperform the CTW algorithm. To explain these phenomena, we analyze the attention map of the transformers and extract two mechanisms, on which we provide two transformer constructions: 1) A construction with $D+2$ layers that can mimic the CTW algorithm accurately for CTs of maximum order $D$, 2) A 2-layer transformer that utilizes the feed-forward network for probability blending. One distinction from the FOMC setting is that a counting mechanism appears to play an important role. We implement these synthetic transformer layers and show that such hybrid transformers can match the ICL performance of transformers, and more interestingly, some of them can perform even better despite the much-reduced parameter sets.

Updated: 2024-10-07 21:04:53

标题: 变压器在上下文中学习可变阶马尔可夫链

摘要: 大型语言模型展示了令人印象深刻的上下文学习（ICL）能力。然而，尚不清楚基础变压器是如何实现这一点的，尤其是在更复杂的情境下。为实现这一目标，最近有几项研究探讨了变压器如何学习固定顺序马尔可夫链（FOMC）在上下文中的应用，但自然语言更适合由可变顺序马尔可夫链（VOMC），即上下文树（CT）来建模。在这项工作中，我们将语言建模视为一种数据压缩形式，重点关注小字母表和低阶VOMC。这种观点使我们能够利用成熟的压缩算法，如上下文树加权（CTW）和部分匹配预测（PPM）算法作为基线，前者是一类CTW先验的贝叶斯最优算法。我们经验观察到几个现象：1）变压器确实可以学习在上下文中压缩VOMC，而PPM则遭受显著影响；2）变压器的性能对层数并不十分敏感，甚至两层变压器也可以很好地学习上下文；3）在非CTW先验训练和测试的变压器可以显著胜过CTW算法。为解释这些现象，我们分析了变压器的注意力图，并提取了两种机制，我们提供了两种变压器构造：1）一种具有$D+2$层的构造，可以准确模拟最大阶$D$的CTW算法，2）一个利用前馈网络进行概率融合的2层变压器。与FOMC设置的区别在于计数机制似乎起着重要作用。我们实现了这些合成变压器层，并展示了这种混合变压器可以达到变压器的ICL性能，并更有趣的是，尽管参数集大大减少，其中一些层甚至可以表现得更好。

更新时间: 2024-10-07 21:04:53

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2410.05493v1

Pre-Ictal Seizure Prediction Using Personalized Deep Learning

Introduction: Approximately 23 million or 30% of epilepsy patients worldwide suffer from drug-resistant epilepsy (DRE). The unpredictability of seizure occurrences, which causes safety issues as well as social concerns, restrict the lifestyles of DRE patients. Surgical solutions and EEG-based solutions are very expensive, unreliable, invasive or impractical. The goal of this research was to employ improved technologies and methods to epilepsy patient physiological data and predict seizures up to two hours before onset, enabling non-invasive, affordable seizure prediction for DRE patients. Methods: This research used a 1D Convolutional Neural Network-Based Bidirectional Long Short-Term Memory network that was trained on a diverse set of epileptic patient physiological data to predict seizures. Transfer learning was further utilized to personalize and optimize predictions for specific patients. Clinical data was retrospectively obtained for nine epilepsy patients via wearable devices over a period of about three to five days from a prospectively maintained database. The physiological data included 54 seizure occurrences and included heart rate, blood volume pulse, accelerometry, body temperature, and electrodermal activity. Results and Conclusion: A general deep-learning model trained on the physiological data with randomly sampled test data achieved an accuracy of 91.94%. However, such a generalized deep learning model had varied performances on data from unseen patients. When the general model was personalized (further trained) with patient-specific data, the personalized model achieved significantly improved performance with accuracies as high as 97%. This preliminary research shows that patient-specific personalization may be a viable approach to achieve affordable, non-invasive seizure prediction that can improve the quality of life for DRE patients.

Updated: 2024-10-07 21:04:41

标题: 使用个性化深度学习预测癫痫前发作

摘要: 简介：全球约有2,300万或30%的癫痫患者患有难治性癫痫(DRE)。癫痫发作的不可预测性导致安全问题和社会关注，限制了难治性癫痫患者的生活方式。手术解决方案和基于脑电图的解决方案非常昂贵、不可靠、侵入性或不切实际。本研究旨在利用改进的技术和方法对癫痫患者的生理数据进行分析，并在发作前最多两小时预测癫痫发作，实现对难治性癫痫患者的非侵入式、经济实惠的癫痫预测。方法：本研究使用基于1D卷积神经网络的双向长短时记忆网络，对多样化的癫痫患者生理数据进行训练以预测癫痫发作。进一步利用迁移学习对预测结果进行个性化和优化，以适应特定患者。通过可穿戴设备在一个约三到五天的时间里回顾性地获取了九名癫痫患者的临床数据，并从一个预先维护的数据库中获取。生理数据包括54次癫痫发作，包括心率、血容量脉搏、加速度计、体温和皮肤电活动。结果和结论：一个通用的深度学习模型在生理数据上进行训练，随机抽样测试数据达到了91.94%的准确率。然而，这样一个通用的深度学习模型在未见过的患者数据上表现不同。当通用模型个性化（进一步训练）特定患者的数据时，个性化模型取得了明显改善的性能，准确率高达97%。这项初步研究显示，患者特定的个性化可能是实现经济实惠、非侵入式癫痫预测的可行途径，可以改善难治性癫痫患者的生活质量。

更新时间: 2024-10-07 21:04:41

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2410.05491v1

What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs

Large language models (LLMs) have demonstrated remarkable capabilities in generating human-like text and exhibiting personality traits similar to those in humans. However, the mechanisms by which LLMs encode and express traits such as agreeableness and impulsiveness remain poorly understood. Drawing on the theory of social determinism, we investigate how long-term background factors, such as family environment and cultural norms, interact with short-term pressures like external instructions, shaping and influencing LLMs' personality traits. By steering the output of LLMs through the utilization of interpretable features within the model, we explore how these background and pressure factors lead to changes in the model's traits without the need for further fine-tuning. Additionally, we suggest the potential impact of these factors on model safety from the perspective of personality.

Updated: 2024-10-07 21:02:34

标题: 什么造就了您的模型是一个缺乏同理心或温暖的人：探究LLMs中个性的起源

摘要: 大型语言模型（LLMs）已经展示出在生成类似人类文本和展现类似于人类的个性特征方面具有显著能力。然而，LLMs编码和表达诸如宜人性和冲动性等特质的机制仍然知之甚少。基于社会决定论的理论，我们研究了长期背景因素（如家庭环境和文化规范）如何与短期压力（如外部指令）相互作用，塑造和影响LLMs的个性特征。通过利用模型内部可解释特征引导LLMs的输出，我们探讨了这些背景和压力因素如何导致模型特质的变化，而无需进一步的微调。此外，我们从个性的角度提出了这些因素对模型安全性的潜在影响。

更新时间: 2024-10-07 21:02:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.10863v1

A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics. The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. This tradeoff is inherent: we prove that no algorithm can simultaneously have low bias, low error, and low privacy loss for arbitrary distributions. Additionally, we show that under strong notions of DP (i.e., pure or concentrated DP), unbiased mean estimation is impossible, even if we assume that the data is sampled from a Gaussian. On the positive side, we show that unbiased mean estimation is possible under a more permissive notion of differential privacy (approximate DP) if we assume that the distribution is symmetric.

Updated: 2024-10-07 20:59:15

标题: 一种统计估计的偏差-准确性-隐私三难问题

摘要: 差分隐私(DP)是一种严格的数据隐私概念，用于私密统计。不同隐私均值估计的经典算法是先将样本剪切到有界范围，然后向它们的经验均值添加噪声。剪切控制了灵敏度，因此控制了我们为了隐私而添加的噪声的方差。但剪切也会引入统计偏差。这种权衡是固有的：我们证明没有算法可以同时具有低偏差、低误差和低隐私损失，适用于任意分布。此外，我们展示了在强隐私概念(即纯净或集中的DP)下，即使假设数据来自高斯分布，无偏均值估计也是不可能的。积极的一面是，我们展示了在更宽松的差分隐私概念(近似DP)下，如果假设分布是对称的，则可以实现无偏均值估计。

更新时间: 2024-10-07 20:59:15

领域: math.ST,cs.CR,cs.DS,stat.ML,stat.TH

下载: http://arxiv.org/abs/2301.13334v3

Unification of popular artificial neural network activation functions

We present a unified representation of the most popular neural network activation functions. Adopting Mittag-Leffler functions of fractional calculus, we propose a flexible and compact functional form that is able to interpolate between various activation functions and mitigate common problems in training neural networks such as vanishing and exploding gradients. The presented gated representation extends the scope of fixed-shape activation functions to their adaptive counterparts whose shape can be learnt from the training data. The derivatives of the proposed functional form can also be expressed in terms of Mittag-Leffler functions making it a suitable candidate for gradient-based backpropagation algorithms. By training multiple neural networks of different complexities on various datasets with different sizes, we demonstrate that adopting a unified gated representation of activation functions offers a promising and affordable alternative to individual built-in implementations of activation functions in conventional machine learning frameworks.

Updated: 2024-10-07 20:45:38

标题: 流行人工神经网络激活函数的统一化

摘要: 我们提出了最流行的神经网络激活函数的统一表示。采用分数微积分的Mittag-Leffler函数，我们提出了一种灵活而紧凑的功能形式，能够在各种激活函数之间插值，并减轻训练神经网络中常见的问题，如消失和爆炸梯度。所提出的门控表示将固定形状激活函数的范围扩展到其自适应对应物，其形状可以从训练数据中学习。所提出的功能形式的导数也可以用Mittag-Leffler函数的形式表示，使其成为基于梯度的反向传播算法的合适候选。通过在不同大小的各种数据集上训练多个不同复杂度的神经网络，我们证明采用统一的门控激活函数表示提供了一个有前途且经济实惠的选择，可以替代传统机器学习框架中激活函数的单独内置实现。

更新时间: 2024-10-07 20:45:38

领域: cs.LG,cs.AI,cs.NE,math.FA,I.1.1; I.2.10; I.4.9

下载: http://arxiv.org/abs/2302.11007v3

Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning

Despite their success and widespread adoption, the opaque nature of deep neural networks (DNNs) continues to hinder trust, especially in critical applications. Current interpretability solutions often yield inconsistent or oversimplified explanations, or require model changes that compromise performance. In this work, we introduce TRACER, a novel method grounded in causal inference theory designed to estimate the causal dynamics underpinning DNN decisions without altering their architecture or compromising their performance. Our approach systematically intervenes on input features to observe how specific changes propagate through the network, affecting internal activations and final outputs. Based on this analysis, we determine the importance of individual features, and construct a high-level causal map by grouping functionally similar layers into cohesive causal nodes, providing a structured and interpretable view of how different parts of the network influence the decisions. TRACER further enhances explainability by generating counterfactuals that reveal possible model biases and offer contrastive explanations for misclassifications. Through comprehensive evaluations across diverse datasets, we demonstrate TRACER's effectiveness over existing methods and show its potential for creating highly compressed yet accurate models, illustrating its dual versatility in both understanding and optimizing DNNs.

Updated: 2024-10-07 20:44:53

标题: 神经网络解码：通过因果解释和推理对神经网络决策进行有针对性和稳健性分析

摘要: 尽管深度神经网络（DNN）取得了成功并被广泛采用，但其不透明的特性仍然阻碍了信任，特别是在关键应用中。当前的可解释性解决方案通常产生不一致或过于简化的解释，或需要改变模型以牺牲性能。在这项工作中，我们介绍了一种新颖的方法TRACER，该方法基于因果推断理论，旨在估计支持DNN决策的因果动态，而无需改变其架构或牺牲其性能。我们的方法系统地对输入特征进行干预，观察特定变化如何传播到网络中，影响内部激活和最终输出。根据这一分析，我们确定了个别特征的重要性，并通过将功能类似的层组合成连贯的因果节点构建高级因果图，提供了一个结构化和可解释的视角，展示了网络不同部分如何影响决策。TRACER通过生成反事实来增强可解释性，揭示可能的模型偏见，并为错误分类提供对比解释。通过对多个数据集的全面评估，我们展示了TRACER相对于现有方法的有效性，并展示了其在理解和优化DNN方面的潜力，说明了其在创建高度压缩但准确的模型方面的双重灵活性。

更新时间: 2024-10-07 20:44:53

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2410.05484v1

Auxiliary Classifiers Improve Stability and Efficiency in Continual Learning

Continual learning is crucial for applications in dynamic environments, where machine learning models must adapt to changing data distributions while retaining knowledge of previous tasks. Despite significant advancements, catastrophic forgetting - where performance on earlier tasks degrades as new information is learned - remains a key challenge. In this work, we investigate the stability of intermediate neural network layers during continual learning and explore how auxiliary classifiers (ACs) can leverage this stability to improve performance. We show that early network layers remain more stable during learning, particularly for older tasks, and that ACs applied to these layers can outperform standard classifiers on past tasks. By integrating ACs into several continual learning algorithms, we demonstrate consistent and significant performance improvements on standard benchmarks. Additionally, we explore dynamic inference, showing that AC-augmented continual learning methods can reduce computational costs by up to 60\% while maintaining or exceeding the accuracy of standard methods. Our findings suggest that ACs offer a promising avenue for enhancing continual learning models, providing both improved performance and the ability to adapt the network computation in environments where such flexibility might be required.

Updated: 2024-10-07 20:41:47

标题: 辅助分类器在持续学习中提高稳定性和效率

摘要: 持续学习对于应用于动态环境中至关重要，其中机器学习模型必须适应不断变化的数据分布，同时保留对先前任务的知识。尽管取得了重大进展，但灾难性遗忘 - 即在学习新信息时早期任务的表现下降 - 仍然是一个关键挑战。在这项工作中，我们研究了持续学习过程中中间神经网络层的稳定性，并探讨了辅助分类器（ACs）如何利用这种稳定性来提高性能。我们发现，在学习过程中，早期网络层保持更稳定，特别是对于较旧的任务，而应用于这些层的ACs可以胜过过去任务上的标准分类器。通过将ACs整合到几种持续学习算法中，我们在标准基准测试中展示了一致且显著的性能改进。此外，我们探索了动态推理，结果显示AC增强的持续学习方法可以将计算成本降低高达60％，同时保持或超过标准方法的准确性。我们的研究结果表明，ACs为增强持续学习模型提供了一个有前途的途径，既提高了性能，又为在可能需要该灵活性的环境中调整网络计算提供了能力。

更新时间: 2024-10-07 20:41:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.07404v2

Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality

A novel Policy Gradient (PG) algorithm, called $\textit{Matryoshka Policy Gradient}$ (MPG), is introduced and studied, in the context of fixed-horizon max-entropy reinforcement learning, where an agent aims at maximizing entropy bonuses additional to its cumulative rewards. In the linear function approximation setting with softmax policies, we prove uniqueness and characterize the optimal policy of the entropy regularized objective, together with global convergence of MPG. These results are proved in the case of continuous state and action space. MPG is intuitive, theoretically sound and we furthermore show that the optimal policy of the infinite horizon max-entropy objective can be approximated arbitrarily well by the optimal policy of the MPG framework. Finally, we provide a criterion for global optimality when the policy is parametrized by a neural network in terms of the neural tangent kernel at convergence. As a proof of concept, we evaluate numerically MPG on standard test benchmarks.

Updated: 2024-10-07 20:41:29

标题: 母子策略梯度用于熵正则化强化学习：收敛性和全局最优性

摘要: 介绍并研究了一种新颖的策略梯度（PG）算法，称为$\textit{Matryoshka Policy Gradient}$（MPG），在固定时间段最大熵强化学习的背景下，代理人旨在最大化除累积奖励外的熵奖励。在具有softmax策略的线性函数逼近设置中，我们证明了熵正则化目标的最优策略的唯一性并对其进行了表征，同时证明了MPG的全局收敛性。这些结果在连续状态和动作空间的情况下得到证明。MPG是直观的，理论上是可靠的，我们进一步展示了无限时间段最大熵目标的最优策略可以通过MPG框架的最优策略任意接近。最后，我们提供了一个关于全局最优性的标准，当策略由神经网络参数化时，该标准基于在收敛时的神经切向核。作为概念验证，我们在标准测试基准上对MPG进行了数值评估。

更新时间: 2024-10-07 20:41:29

领域: cs.LG,cs.AI,68T07,I.2.0; I.2.6

下载: http://arxiv.org/abs/2303.12785v3

Error Reduction from Stacked Regressions

Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy. The conventional approach uses cross-validation data to generate predictions from the constituent estimators, and least-squares with nonnegativity constraints to learn the combination weights. In this paper, we learn these weights analogously by minimizing a regularized version of the empirical risk subject to a nonnegativity constraint. When the constituent estimators are linear least-squares projections onto nested subspaces separated by at least three dimensions, we show that thanks to an adaptive shrinkage effect, the resulting stacked estimator has strictly smaller population risk than best single estimator among them, with more significant gains when the signal-to-noise ratio is small. Here "best" refers to an estimator that minimizes a model selection criterion such as AIC or BIC. In other words, in this setting, the best single estimator is inadmissible. Because the optimization problem can be reformulated as isotonic regression, the stacked estimator requires the same order of computation as the best single estimator, making it an attractive alternative in terms of both performance and implementation.

Updated: 2024-10-07 20:39:48

标题: 堆叠回归模型中的错误减少

摘要: Stacking回归是一种集成技术，通过形成不同回归估计器的线性组合来增强预测准确性。传统方法使用交叉验证数据生成来自组成估计器的预测，并使用带非负约束的最小二乘法来学习组合权重。在本文中，我们通过最小化正则化版本的经验风险来学习这些权重，同时受到非负约束的限制。当组成估计器是至少相距三个维度的嵌套子空间上的线性最小二乘投影时，由于自适应收缩效应，我们表明由此产生的堆叠估计器的总体风险严格小于它们中最佳单个估计器的总体风险，当信噪比较小时，收益更为显著。这里的“最佳”是指最小化AIC或BIC等模型选择标准的估计器。换句话说，在这种情况下，最佳单个估计器是不可接受的。由于优化问题可以重构为等增函数回归，因此堆叠估计器需要与最佳单个估计器相同数量的计算，从性能和实现方面来看，这使其成为一种有吸引力的替代选择。

更新时间: 2024-10-07 20:39:48

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2309.09880v3

Key Expansion Based on Internet X.509 Public Key Infrastructure for Anonymous Voting

This document focuses on developing a key expansion method based on the internet X.509 public key infrastructure and elliptic curve cryptography, which is applied in the context of anonymous voting. The method enables end entities to maintain anonymity from other end entities, the registration authority, and the certificate authority, while still allowing the validity of end entity certificates to be verified, thereby facilitating anonymous voting services.

Updated: 2024-10-07 20:35:24

标题: 基于互联网X.509公钥基础设施的匿名投票的关键扩展

摘要: 本文集中讨论基于互联网X.509公钥基础设施和椭圆曲线密码学的密钥扩展方法的开发，该方法应用于匿名投票的背景下。该方法使终端实体能够保持与其他终端实体、注册机构和证书颁发机构的匿名性，同时仍然允许验证终端实体证书的有效性，从而促进匿名投票服务。

更新时间: 2024-10-07 20:35:24

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2410.17274v1

Transformers are Efficient Compilers, Provably

Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks, including programming language understanding and generation. In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective. To this end, we introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages. We show that if the input code sequence has a bounded depth in both the Abstract Syntax Tree (AST) and type inference (reasonable assumptions based on the clean code principle), then the number of parameters required by transformers depends only on the logarithm of the input sequence length to handle compilation tasks, such as AST construction, symbol resolution, and type analysis. A significant technical challenge stems from the fact that transformers operate at a low level, where each layer processes the input sequence as raw vectors without explicitly associating them with predefined structure or meaning. In contrast, high-level compiler tasks necessitate managing intricate relationships and structured program information. Our primary technical contribution is the development of a domain-specific language, Cybertron, which generates formal proofs of the transformer's expressive power, scaling to address compiler tasks. We further establish that recurrent neural networks (RNNs) require at least a linear number of parameters relative to the input sequence, leading to an exponential separation between transformers and RNNs. Finally, we empirically validate our theoretical results by comparing transformers and RNNs on compiler tasks within Mini-Husky.

Updated: 2024-10-07 20:31:13

标题: 变压器是高效的编译器，可以证明

摘要: 基于Transformer的大型语言模型(LLMs)已经在各种语言相关任务中展现出了令人惊讶的稳健性能，包括编程语言的理解和生成。在本文中，我们首次从表达能力的角度正式探讨了使用Transformer作为编译器的可能性。为此，我们引入了一种代表性的编程语言Mini-Husky，它包含了现代类C语言的关键特性。我们展示，如果输入代码序列在抽象语法树(AST)和类型推断中的深度有界(基于干净代码原则的合理假设)，那么用于处理编译任务的Transformer所需的参数数量仅取决于输入序列长度的对数，如AST构建、符号解析和类型分析等。一个重要的技术挑战源于Transformer在低层级操作，其中每一层都以原始向量形式处理输入序列，而没有明确将其与预定义结构或含义关联起来。相比之下，高层编译器任务需要管理复杂关系和结构化程序信息。我们的主要技术贡献是开发了一种领域特定语言Cybertron，它生成了Transformer表达能力的形式化证明，能够扩展到解决编译器任务。我们进一步确定，循环神经网络(RNNs)相对于输入序列至少需要线性数量的参数，导致了Transformer和RNNs之间的指数差距。最后，我们通过在Mini-Husky上比较Transformer和RNNs在编译任务上的表现来经验性地验证我们的理论结果。

更新时间: 2024-10-07 20:31:13

领域: cs.PL,cs.LG

下载: http://arxiv.org/abs/2410.14706v1

Optimizing Parking Space Classification: Distilling Ensembles into Lightweight Classifiers

When deploying large-scale machine learning models for smart city applications, such as image-based parking lot monitoring, data often must be sent to a central server to perform classification tasks. This is challenging for the city's infrastructure, where image-based applications require transmitting large volumes of data, necessitating complex network and hardware infrastructures to process the data. To address this issue in image-based parking space classification, we propose creating a robust ensemble of classifiers to serve as Teacher models. These Teacher models are distilled into lightweight and specialized Student models that can be deployed directly on edge devices. The knowledge is distilled to the Student models through pseudo-labeled samples generated by the Teacher model, which are utilized to fine-tune the Student models on the target scenario. Our results show that the Student models, with 26 times fewer parameters than the Teacher models, achieved an average accuracy of 96.6% on the target test datasets, surpassing the Teacher models, which attained an average accuracy of 95.3%.

Updated: 2024-10-07 20:29:42

标题: 优化停车位分类：将集成模型精炼为轻量级分类器

摘要: 在为智慧城市应用部署大规模机器学习模型时，比如基于图像的停车场监控，数据通常必须发送到中央服务器执行分类任务。这对于城市基础设施来说是具有挑战性的，因为基于图像的应用需要传输大量数据，需要复杂的网络和硬件基础设施来处理数据。为了解决图像停车位分类中的这一问题，我们提出创建一个强大的分类器集成作为教师模型。这些教师模型被提炼为轻量级和专业化的学生模型，可以直接部署在边缘设备上。通过教师模型生成的伪标记样本将知识提炼到学生模型中，这些样本被用来在目标场景上对学生模型进行微调。我们的结果显示，学生模型的参数比教师模型少26倍，在目标测试数据集上取得了平均准确率96.6%，超过了教师模型的平均准确率95.3%。

更新时间: 2024-10-07 20:29:42

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.14705v1

LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid

Large language models (LLMs) have shown immense potential across various domains, but their high memory requirements and inference costs remain critical challenges for deployment. Post-training quantization (PTQ) has emerged as a promising technique to reduce memory requirements and decoding latency. However, recent accurate quantization methods often depend on specialized computations or custom data formats to achieve better model quality, which limits their compatibility with popular frameworks, as they require dedicated inference kernels tailored to specific hardware and software platforms, hindering wider adoption. Furthermore, many competitive methods have high resource requirements and computational overhead, making it challenging to scale them to hundreds of billions of parameters. In response to these challenges, we propose LeanQuant (Loss-error-aware Network Quantization), a novel quantization method that is accurate, versatile, and scalable. In the existing popular iterative loss-error-based quantization framework, we identify a critical limitation in prior methods: the min-max affine quantization grid fails to preserve model quality due to outliers in inverse Hessian diagonals. To overcome this fundamental issue, we propose learning loss-error-aware grids, instead of using non-adaptive min-max affine grids. Our approach not only produces quantized models that are more accurate but also generalizes to a wider range of quantization types, including affine and non-uniform quantization, enhancing compatibility with more frameworks. Extensive empirical evaluations on recent LLMs demonstrate that LeanQuant is highly accurate, comparing favorably against recent competitive baselines in model quality, and scalable, achieving very accurate quantization of Llama-3.1 405B, one of the largest open-source LLMs to date, using two Quadro RTX 8000-48GB GPUs in 21 hours.

Updated: 2024-10-07 20:29:05

标题: LeanQuant: 准确且可扩展的大型语言模型量化，具有损失-误差感知网格

摘要: 大型语言模型（LLMs）在各个领域显示出巨大的潜力，但它们高内存需求和推理成本仍然是部署的关键挑战。后训练量化（PTQ）已经成为一种有望减少内存需求和解码延迟的技术。然而，最近准确的量化方法通常依赖于专门的计算或自定义数据格式以实现更好的模型质量，这限制了它们与流行框架的兼容性，因为它们需要针对特定硬件和软件平台定制的推理核心，阻碍了更广泛的采用。此外，许多竞争方法具有高资源要求和计算开销，使其难以将它们扩展到数千亿个参数。针对这些挑战，我们提出了LeanQuant（Loss-error-aware Network Quantization），这是一种准确、多功能和可扩展的量化方法。在现有流行的迭代损失-错误量化框架中，我们确定了之前方法的一个关键限制：最小-最大仿射量化网格由于逆Hessian对角线上的异常值而无法保持模型质量。为了克服这个根本问题，我们提出学习损失-错误感知网格，而不是使用非自适应的最小-最大仿射网格。我们的方法不仅产生更准确的量化模型，而且推广到更广泛的量化类型，包括仿射和非均匀量化，增强与更多框架的兼容性。对最近的LLMs进行广泛的实证评估表明，LeanQuant非常准确，在模型质量方面与最近的竞争基线相比表现优异，并且可扩展，在21小时内使用两个Quadro RTX 8000-48GB GPU非常准确地量化了迄今为止最大的开源LLMs之一Llama-3.1 405B。

更新时间: 2024-10-07 20:29:05

领域: cs.LG

下载: http://arxiv.org/abs/2407.10032v2

fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models

Humans have the ability to learn new tasks by inferring high-level concepts from existing solution, then manipulating these concepts in lieu of the raw data. Can we automate this process by deriving latent semantic structures in a document collection using foundation models? We introduce fPLSA, a foundation-model-based Probabilistic Latent Semantic Analysis (PLSA) method that iteratively clusters and tags document segments based on document-level contexts. These tags can be used to model the structure of given documents and for hierarchical sampling of new texts. Our experiments on story writing, math, and multi-step reasoning datasets demonstrate that fPLSA tags help reconstruct the original texts better than existing tagging methods. Moreover, when used for hierarchical sampling, fPLSA produces more diverse outputs with a higher likelihood of hitting the correct answer than direct sampling and hierarchical sampling with existing tagging methods.

Updated: 2024-10-07 20:25:52

标题: fPLSA：使用基础模型学习文档集合中的语义结构

摘要: 人类能够通过推断现有解决方案中的高级概念来学习新任务，然后操作这些概念而不是原始数据。我们能否通过使用基础模型推导文档集合中的潜在语义结构来自动化这一过程？我们引入了fPLSA，一种基于基础模型的概率潜在语义分析（PLSA）方法，该方法通过根据文档级别上下文迭代地对文档段进行聚类和标记。这些标记可用于建模给定文档的结构并用于新文本的分层抽样。我们在故事写作、数学和多步推理数据集上的实验表明，fPLSA标记有助于比现有标记方法更好地重建原始文本。此外，当用于分层抽样时，fPLSA生成的输出更加多样化，且比直接抽样和使用现有标记方法的分层抽样更有可能命中正确答案。

更新时间: 2024-10-07 20:25:52

领域: cs.LG

下载: http://arxiv.org/abs/2410.05481v1

Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification

Federated Learning (FL) systems are susceptible to adversarial attacks, where malicious clients submit poisoned models to disrupt the convergence or plant backdoors that cause the global model to misclassify some samples. Current defense methods are often impractical for real-world FL systems, as they either rely on unrealistic prior knowledge or cause accuracy loss even in the absence of attacks. Furthermore, these methods lack a protocol for verifying execution, leaving participants uncertain about the correct execution of the mechanism. To address these challenges, we propose a novel anomaly detection strategy that is designed for real-world FL systems. Our approach activates the defense only when potential attacks are detected, and enables the removal of malicious models without affecting the benign ones. Additionally, we incorporate zero-knowledge proofs to ensure the integrity of the proposed defense mechanism. Experimental results demonstrate the effectiveness of our approach in enhancing FL system security against a comprehensive set of adversarial attacks in various ML tasks.

Updated: 2024-10-07 20:22:43

标题: 将坏人踢出去！在零知识证明验证下的联邦学习中有条件激活的异常检测

摘要: 联邦学习（FL）系统容易受到对抗性攻击的影响，恶意客户端提交有毒模型以破坏收敛或植入后门，导致全局模型错误分类一些样本。当前的防御方法通常对于真实世界的FL系统不切实际，因为它们要么依赖于不切实际的先验知识，要么即使在没有攻击的情况下也会导致准确性损失。此外，这些方法缺乏验证执行的协议，使参与者对机制的正确执行感到不确定。为了解决这些挑战，我们提出了一种专为真实世界FL系统设计的新型异常检测策略。我们的方法仅在检测到潜在攻击时激活防御，并使恶意模型能够被移除而不影响良性模型。此外，我们还结合了零知识证明来确保所提出的防御机制的完整性。实验结果证明了我们的方法在增强FL系统安全性方面的有效性，可以抵御各种机器学习任务中的广泛对抗攻击。

更新时间: 2024-10-07 20:22:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2310.04055v4

Ensured: Explanations for Decreasing the Epistemic Uncertainty in Predictions

This paper addresses a significant gap in explainable AI: the necessity of interpreting epistemic uncertainty in model explanations. Although current methods mainly focus on explaining predictions, with some including uncertainty, they fail to provide guidance on how to reduce the inherent uncertainty in these predictions. To overcome this challenge, we introduce new types of explanations that specifically target epistemic uncertainty. These include ensured explanations, which highlight feature modifications that can reduce uncertainty, and categorisation of uncertain explanations counter-potential, semi-potential, and super-potential which explore alternative scenarios. Our work emphasises that epistemic uncertainty adds a crucial dimension to explanation quality, demanding evaluation based not only on prediction probability but also on uncertainty reduction. We introduce a new metric, ensured ranking, designed to help users identify the most reliable explanations by balancing trade-offs between uncertainty, probability, and competing alternative explanations. Furthermore, we extend the Calibrated Explanations method, incorporating tools that visualise how changes in feature values impact epistemic uncertainty. This enhancement provides deeper insights into model behaviour, promoting increased interpretability and appropriate trust in scenarios involving uncertain predictions.

Updated: 2024-10-07 20:21:51

标题: 确保：减少预测中的认识不确定性的解释

摘要: 这篇论文解决了可解释人工智能中一个重要的空白：解释模型解释中解释认知不确定性的必要性。尽管当前方法主要集中在解释预测，有些方法包括不确定性，但它们未能提供关于如何减少这些预测中固有不确定性的指导。为了克服这一挑战，我们引入了针对认知不确定性的新类型解释。这些解释包括确保解释，突出显示可以减少不确定性的特征修改，以及对不确定解释进行分类，包括反潜在、半潜在和超潜在，探索替代情景。我们的工作强调认知不确定性给解释质量增添了一个关键维度，要求基于不仅仅是预测概率，还要基于不确定性减少进行评估。我们引入了一个新的度量标准，确保排名，旨在帮助用户通过平衡不确定性、概率和竞争性替代解释之间的权衡来识别最可靠的解释。此外，我们扩展了校准解释方法，加入了工具，可视化特征值变化如何影响认知不确定性。这种增强提供了对模型行为的更深入洞察，促进了更高的可解释性和在涉及不确定预测的情景中适当的信任。

更新时间: 2024-10-07 20:21:51

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05479v1

Training quantum machine learning models on cloud without uploading the data

Based on the linearity of quantum unitary operations, we propose a method that runs the parameterized quantum circuits before encoding the input data. This enables a dataset owner to train machine learning models on quantum cloud computation platforms, without the risk of leaking the information about the data. It is also capable of encoding a vast amount of data effectively at a later time using classical computations, thus saving runtime on quantum computation devices. The trained quantum machine learning models can be run completely on classical computers, meaning the dataset owner does not need to have any quantum hardware, nor even quantum simulators. Moreover, our method mitigates the encoding bottleneck by reducing the required circuit depth from $O(2^{n})$ to $O(n)$, and relax the tolerance on the precision of the quantum gates for the encoding. These results demonstrate yet another advantage of quantum and quantum-inspired machine learning models over existing classical neural networks, and broaden the approaches to data security.

Updated: 2024-10-07 20:19:38

标题: 在云端训练量子机器学习模型而无需上传数据

摘要: 基于量子幺正操作的线性性质，我们提出了一种方法，在对输入数据进行编码之前运行参数化量子电路。这使得数据集所有者可以在量子云计算平台上训练机器学习模型，而无需担心泄露数据信息。此外，它还能够在稍后使用经典计算有效地编码大量数据，从而节省量子计算设备的运行时间。训练的量子机器学习模型可以完全在经典计算机上运行，这意味着数据集所有者不需要拥有任何量子硬件，甚至量子模拟器。此外，我们的方法通过将所需量子电路深度从$O(2^{n})$降低到$O(n)$，并放宽对编码中量子门精度的容忍度，从而减轻了编码瓶颈。这些结果展示了量子和受量子启发的机器学习模型相对于现有的经典神经网络的另一个优势，并拓宽了数据安全的途径。

更新时间: 2024-10-07 20:19:38

领域: quant-ph,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2409.04602v2

DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

Automated scientific discovery promises to accelerate progress across scientific domains. However, developing and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery. DISCOVERYWORLD contains a variety of different challenges, covering topics as diverse as radioisotope dating, rocket science, and proteomics, to encourage development of general discovery skills rather than task-specific solutions. DISCOVERYWORLD itself is an inexpensive, simulated, text-based environment (with optional 2D visual overlay). It includes 120 different challenge tasks, spanning eight topics each with three levels of difficulty and several parametric variations. Each task requires an agent to form hypotheses, design and run experiments, analyze results, and act on conclusions. DISCOVERYWORLD further provides three automatic metrics for evaluating performance, based on (a) task completion, (b) task-relevant actions taken, and (c) the discovered explanatory knowledge. We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks, suggesting that DISCOVERYWORLD captures some of the novel challenges of discovery, and thus that DISCOVERYWORLD may help accelerate near-term development and assessment of scientific discovery competency in agents. Code available at: www.github.com/allenai/discoveryworld

Updated: 2024-10-07 20:19:15

标题: 《DISCOVERYWORLD：用于开发和评估自动科学发现代理的虚拟环境》

摘要: 自动化科学发现承诺加速各科学领域的进展。然而，开发和评估AI代理的端到端科学推理能力具有挑战性，因为进行真实世界实验往往成本昂贵或不可行。在这项工作中，我们介绍了DISCOVERYWORLD，这是第一个用于开发和基准测试代理执行完整周期的新科学发现能力的虚拟环境。DISCOVERYWORLD包含各种不同的挑战，涵盖了广泛的主题，如放射性同位素定年、火箭科学和蛋白质组学，以鼓励开发通用的发现技能而不是特定任务的解决方案。DISCOVERYWORLD本身是一个廉价的、模拟的、基于文本的环境（可选2D视觉叠加）。它包括120个不同的挑战任务，涵盖了八个主题，每个主题有三个难度级别和几个参数变化。每个任务都需要代理提出假设、设计和运行实验、分析结果并根据结论行动。DISCOVERYWORLD还提供三个自动评估性能的指标，基于（a）任务完成情况，（b）采取的与任务相关的行动，以及（c）发现的解释性知识。我们发现，表现良好的基线代理在大多数DISCOVERYWORLD任务上都表现不佳，在之前发布的环境中表现良好，这表明DISCOVERYWORLD捕捉了一些新的发现挑战，并且DISCOVERYWORLD可能有助于加快对代理的科学发现能力的近期发展和评估。代码可在以下网址找到：www.github.com/allenai/discoveryworld

更新时间: 2024-10-07 20:19:15

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06769v2

S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention

Motivated by the challenge of seamless cross-dataset transfer in EEG signal processing, this article presents an exploratory study on the use of Joint Embedding Predictive Architectures (JEPAs). In recent years, self-supervised learning has emerged as a promising approach for transfer learning in various domains. However, its application to EEG signals remains largely unexplored. In this article, we introduce Signal-JEPA for representing EEG recordings which includes a novel domain-specific spatial block masking strategy and three novel architectures for downstream classification. The study is conducted on a 54 subjects dataset and the downstream performance of the models is evaluated on three different BCI paradigms: motor imagery, ERP and SSVEP. Our study provides preliminary evidence for the potential of JEPAs in EEG signal encoding. Notably, our results highlight the importance of spatial filtering for accurate downstream classification and reveal an influence of the length of the pre-training examples but not of the mask size on the downstream performance.

Updated: 2024-10-07 20:07:53

标题: S-JEPA：通过动态空间注意力实现无缝跨数据集转移

摘要: 受无缝跨数据集转移在脑电信号处理中的挑战启发，本文介绍了关于使用联合嵌入预测架构（JEPAs）的探索性研究。近年来，自监督学习已经成为各个领域转移学习的一种有前途的方法。然而，其在脑电信号中的应用仍然很少被探索。在本文中，我们介绍了用于表示脑电记录的Signal-JEPA，其中包括一种新颖的领域特定空间块遮罩策略和三种新颖的用于下游分类的架构。该研究在一个包含54名受试者的数据集上进行，评估了模型在三种不同的脑机接口范式（MI、ERP和SSVEP）上的下游性能。我们的研究为JEPAs在脑电信号编码中的潜力提供了初步证据。值得注意的是，我们的结果突出了空间滤波对准确的下游分类的重要性，并揭示了预训练示例的长度对下游性能的影响，但遮罩大小并没有影响。

更新时间: 2024-10-07 20:07:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.11772v2

Image Watermarks are Removable Using Controllable Regeneration from Clean Noise

Image watermark techniques provide an effective way to assert ownership, deter misuse, and trace content sources, which has become increasingly essential in the era of large generative models. A critical attribute of watermark techniques is their robustness against various manipulations. In this paper, we introduce a watermark removal approach capable of effectively nullifying the state of the art watermarking techniques. Our primary insight involves regenerating the watermarked image starting from a clean Gaussian noise via a controllable diffusion model, utilizing the extracted semantic and spatial features from the watermarked image. The semantic control adapter and the spatial control network are specifically trained to control the denoising process towards ensuring image quality and enhancing consistency between the cleaned image and the original watermarked image. To achieve a smooth trade-off between watermark removal performance and image consistency, we further propose an adjustable and controllable regeneration scheme. This scheme adds varying numbers of noise steps to the latent representation of the watermarked image, followed by a controlled denoising process starting from this noisy latent representation. As the number of noise steps increases, the latent representation progressively approaches clean Gaussian noise, facilitating the desired trade-off. We apply our watermark removal methods across various watermarking techniques, and the results demonstrate that our methods offer superior visual consistency/quality and enhanced watermark removal performance compared to existing regeneration approaches.

Updated: 2024-10-07 20:04:29

标题: 图像水印可通过可控的干净噪声再生来去除

摘要: 图像水印技术提供了一种有效的方式来确认所有权，阻止滥用，并追踪内容来源，这在大型生成模型时代变得越来越重要。水印技术的一个关键属性是它们对各种篡改的鲁棒性。在本文中，我们介绍了一种能够有效消除当前最先进水印技术的水印去除方法。我们的主要思路涉及通过可控扩散模型从干净的高斯噪声开始重新生成带水印的图像，利用从水印图像中提取的语义和空间特征。语义控制适配器和空间控制网络经过专门训练，以控制去噪过程，以确保图像质量和增强清洁图像与原始水印图像之间的一致性。为了实现水印去除性能和图像一致性之间的平滑权衡，我们进一步提出了一个可调节和可控的再生方案。该方案在水印图像的潜在表示中添加不同数量的噪声步骤，然后从这个嘈杂的潜在表示开始进行控制去噪处理。随着噪声步骤的增加，潜在表示逐渐接近干净的高斯噪声，促进所需的权衡。我们将我们的水印去除方法应用于各种水印技术，结果表明，与现有的再生方法相比，我们的方法提供了更优越的视觉一致性/质量和增强的水印去除性能。

更新时间: 2024-10-07 20:04:29

领域: cs.CR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.05470v1

Superficial Safety Alignment Hypothesis

As large language models (LLMs) are overwhelmingly more and more integrated into various applications, ensuring they generate safe and aligned responses is a pressing need. Previous research on alignment has largely focused on general instruction-following but has often overlooked the unique properties and challenges of safety alignment, such as the brittleness of safety mechanisms. To bridge the gap, we propose the Superficial Safety Alignment Hypothesis (SSAH), which posits that safety alignment should teach an otherwise unsafe model to choose the correct reasoning direction - interpreted as a specialized binary classification task - and incorporate a refusal mechanism with multiple reserved fallback options. Furthermore, through SSAH, we hypothesize that safety guardrails in LLMs can be established by just a small number of essential components. To verify this, we conduct an ablation study and successfully identify four types of attribute-critical components in safety-aligned LLMs: Exclusive Safety Unit (ESU), Exclusive Utility Unit (EUU), Complex Unit (CU), and Redundant Unit (RU). Our findings show that freezing certain safety-critical components 7.5\% during fine-tuning allows the model to retain its safety attributes while adapting to new tasks. Additionally, we show that leveraging redundant units 20\% in the pre-trained model as an ``alignment budget'' can effectively minimize the alignment tax while achieving the alignment goal. All considered, this paper concludes that the atomic functional unit for safety in LLMs is at the neuron level and underscores that safety alignment should not be complicated. We believe this work contributes to the foundation of efficient and scalable safety alignment for future LLMs.

Updated: 2024-10-07 19:53:35

标题: 表层安全对齐假设

摘要: 随着大型语言模型（LLMs）被越来越广泛地整合到各种应用程序中，确保它们生成安全和对齐的响应成为迫切需要。先前在对齐方面的研究主要集中在一般指令遵循，但通常忽视了安全对齐的独特属性和挑战，例如安全机制的脆弱性。为了弥合这一差距，我们提出了表面安全对齐假设（SSAH），该假设认为安全对齐应该教导一种否则不安全的模型选择正确的推理方向 - 解释为一种专门的二元分类任务 - 并结合一个拒绝机制，其中包含多个保留的备用选项。此外，通过SSAH，我们假设LLMs中的安全防护栏可以仅通过少量基本组件来建立。为了验证这一点，我们进行了消融研究，并成功地确定了安全对齐LLMs中四种类型的属性关键组件：独占安全单元（ESU）、独占实用单元（EUU）、复杂单元（CU）和冗余单元（RU）。我们的研究结果表明，在微调过程中冻结某些安全关键组件7.5％可以使模型保留其安全属性，同时适应新任务。此外，我们还表明，在预训练模型中利用冗余单元20％作为“对齐预算”可以有效地减少对齐税，同时实现对齐目标。总之，本文得出结论，LLMs中安全的原子功能单元在神经元水平上，并强调安全对齐不应复杂。我们相信这项工作为未来LLMs的高效和可扩展的安全对齐奠定了基础。

更新时间: 2024-10-07 19:53:35

领域: cs.CL,cs.AI,cs.CR,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.10862v1

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous methods incorporate the advanced techniques of generative models and pre-training on a public dataset to produce exceptional DP image data, but suffer from problems of unstable training and massive computational resource demands. This paper proposes a novel DP image synthesis method, termed PRIVIMAGE, which meticulously selects pre-training data, promoting the efficient creation of DP datasets with high fidelity and utility. PRIVIMAGE first establishes a semantic query function using a public dataset. Then, this function assists in querying the semantic distribution of the sensitive dataset, facilitating the selection of data from the public dataset with analogous semantics for pre-training. Finally, we pre-train an image generative model using the selected data and then fine-tune this model on the sensitive dataset using Differentially Private Stochastic Gradient Descent (DP-SGD). PRIVIMAGE allows us to train a lightly parameterized generative model, reducing the noise in the gradient during DP-SGD training and enhancing training stability. Extensive experiments demonstrate that PRIVIMAGE uses only 1% of the public dataset for pre-training and 7.6% of the parameters in the generative model compared to the state-of-the-art method, whereas achieves superior synthetic performance and conserves more computational resources. On average, PRIVIMAGE achieves 30.1% lower FID and 12.6% higher Classification Accuracy than the state-of-the-art method. The replication package and datasets can be accessed online.

Updated: 2024-10-07 19:51:47

标题: PrivImage：使用具有语义感知预训练的扩散模型生成差分隐私合成图像

摘要: 隐私保护图像数据综合（DP）利用DP技术生成合成数据以替代敏感数据，使组织能够共享和利用合成图像而无需担心隐私问题。先前的方法结合生成模型的先进技术和在公共数据集上的预训练，以生成出色的DP图像数据，但存在训练不稳定和大量计算资源需求的问题。本文提出了一种新颖的DP图像综合方法，称为PRIVIMAGE，它精心选择预训练数据，促进高保真度和实用性的DP数据集的高效创建。PRIVIMAGE首先利用公共数据集建立语义查询函数。然后，该函数辅助查询敏感数据集的语义分布，便于从具有类似语义的公共数据集中选择数据进行预训练。最后，我们使用选择的数据预训练图像生成模型，然后使用差分隐私随机梯度下降（DP-SGD）在敏感数据集上对该模型进行微调。PRIVIMAGE允许我们训练一个轻量参数化的生成模型，在DP-SGD训练期间减少梯度中的噪音，增强训练稳定性。广泛的实验表明，与最先进的方法相比，PRIVIMAGE仅使用公共数据集的1%进行预训练，并且生成模型中的参数仅为现有方法的7.6%，同时实现了更优越的合成性能并节省了更多的计算资源。平均而言，PRIVIMAGE的FID低于30.1％，分类准确度高于12.6％。复制包和数据集可在线访问。

更新时间: 2024-10-07 19:51:47

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2311.12850v4

Herd Mentality in Augmentation -- Not a Good Idea! A Robust Multi-stage Approach towards Deepfake Detection

The rapid increase in deepfake technology has raised significant concerns about digital media integrity. Detecting deepfakes is crucial for safeguarding digital media. However, most standard image classifiers fail to distinguish between fake and real faces. Our analysis reveals that this failure is due to the model's inability to explicitly focus on the artefacts typically in deepfakes. We propose an enhanced architecture based on the GenConViT model, which incorporates weighted loss and update augmentation techniques and includes masked eye pretraining. This proposed model improves the F1 score by 1.71% and the accuracy by 4.34% on the Celeb-DF v2 dataset. The source code for our model is available at https://github.com/Monu-Khicher-1/multi-stage-learning

Updated: 2024-10-07 19:51:46

标题: 群体心态在增强中的作用——不是一个好主意！一种稳健的多阶段方法来检测深度伪造

摘要: 深度伪造技术的快速增长引起了关于数字媒体完整性的重大关注。检测深度伪造对于保护数字媒体至关重要。然而，大多数标准图像分类器无法区分假和真实面孔。我们的分析表明，这种失败是由于模型无法明确关注深度伪造中通常存在的工件。我们提出了一种基于GenConViT模型的增强架构，该架构结合了加权损失和更新增强技术，并包括面部遮蔽预训练。该提出的模型在Celeb-DF v2数据集上将F1分数提高了1.71％，准确性提高了4.34％。我们模型的源代码可在https://github.com/Monu-Khicher-1/multi-stage-learning 上找到。

更新时间: 2024-10-07 19:51:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.05466v1

On the Expressive Power of Tree-Structured Probabilistic Circuits

Probabilistic circuits (PCs) have emerged as a powerful framework to compactly represent probability distributions for efficient and exact probabilistic inference. It has been shown that PCs with a general directed acyclic graph (DAG) structure can be understood as a mixture of exponentially (in its height) many components, each of which is a product distribution over univariate marginals. However, existing structure learning algorithms for PCs often generate tree-structured circuits or use tree-structured circuits as intermediate steps to compress them into DAG-structured circuits. This leads to the intriguing question of whether there exists an exponential gap between DAGs and trees for the PC structure. In this paper, we provide a negative answer to this conjecture by proving that, for $n$ variables, there exists a sub-exponential upper bound $n^{O(\log n)}$ on the size of an equivalent tree computing the same probability distribution. On the other hand, we also show that given a depth restriction on the tree, there is a super-polynomial separation between tree and DAG-structured PCs. Our work takes an important step towards understanding the expressive power of tree-structured PCs, and our techniques may be of independent interest in the study of structure learning algorithms for PCs.

Updated: 2024-10-07 19:51:30

标题: 关于树形结构概率电路的表达能力

摘要: 概率电路（PCs）已经成为一个强大的框架，用于紧凑地表示概率分布，以进行高效和精确的概率推断。已经表明，具有一般有向无环图（DAG）结构的PC可以理解为指数数量（在其高度方面）的许多组件的混合体，其中每个组件都是在单变量边缘上的乘积分布。然而，现有的PC结构学习算法通常生成树结构电路，或者将树结构电路用作将它们压缩为DAG结构电路的中间步骤。这引出了一个有趣的问题，即PC结构的DAG和树之间是否存在指数差距。在本文中，我们通过证明，对于$n$个变量，存在一个等效树的大小的次指数上限$n^{O(\log n)}$来否定这个猜想，该树计算相同的概率分布。另一方面，我们还表明，在树上给定深度限制的情况下，树和DAG结构的PC之间存在超多项式分离。我们的工作对于理解树结构PC的表达能力迈出了重要的一步，我们的技术在研究PC的结构学习算法中可能具有独立的兴趣。

更新时间: 2024-10-07 19:51:30

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05465v1

Progressive distillation induces an implicit curriculum

Knowledge distillation leverages a teacher model to improve the training of a student model. A persistent challenge is that a better teacher does not always yield a better student, to which a common mitigation is to use additional supervision from several ``intermediate'' teachers. One empirically validated variant of this principle is progressive distillation, where the student learns from successive intermediate checkpoints of the teacher. Using sparse parity as a sandbox, we identify an implicit curriculum as one mechanism through which progressive distillation accelerates the student's learning. This curriculum is available only through the intermediate checkpoints but not the final converged one, and imparts both empirical acceleration and a provable sample complexity benefit to the student. We then extend our investigation to Transformers trained on probabilistic context-free grammars (PCFGs) and real-world pre-training datasets (Wikipedia and Books). Through probing the teacher model, we identify an analogous implicit curriculum where the model progressively learns features that capture longer context. Our theoretical and empirical findings on sparse parity, complemented by empirical observations on more complex tasks, highlight the benefit of progressive distillation via implicit curriculum across setups.

Updated: 2024-10-07 19:49:24

标题: 渐进蒸馏引发隐性课程

摘要: 知识蒸馏利用教师模型来改善学生模型的训练。一个持久的挑战是更好的教师并不总是会产生更好的学生，对此常见的缓解方法是利用来自几个“中间”教师的额外监督。这一原则的一个经验证的变体是渐进蒸馏，其中学生从教师的连续中间检查点中学习。通过使用稀疏奇偶校验作为一个沙盒，我们确定了一个隐含课程作为渐进蒸馏加速学生学习的机制之一。这个课程只能通过中间检查点获得，而不是最终的收敛点，给学生带来了经验加速和可证明的样本复杂度益处。然后，我们将调查扩展到在概率上下文无关语法（PCFGs）和现实世界的预训练数据集（维基百科和图书）上训练的变压器。通过探究教师模型，我们确定了一个类似的隐含课程，在这个课程中，模型逐渐学习捕捉更长上下文的特征。我们在稀疏奇偶校验上的理论和经验发现，再加上更复杂任务上的经验观察，突显了通过隐含课程在各种设置中实现渐进蒸馏的益处。

更新时间: 2024-10-07 19:49:24

领域: cs.LG

下载: http://arxiv.org/abs/2410.05464v1

LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions

A central problem related to transformers can be stated as follows: given two $n \times d$ matrices $Q$ and $K$, and a non-negative function $f$, define the matrix $A$ as follows: (1) apply the function $f$ to each entry of the $n \times n$ matrix $Q K^T$, and then (2) normalize each of the row sums of $A$ to be equal to $1$. The matrix $A$ can be computed in $O(n^2 d)$ time assuming $f$ can be applied to a number in constant time, but the quadratic dependence on $n$ is prohibitive in applications where it corresponds to long context lengths. For a large class of functions $f$, we show how to find all the ``large attention scores", i.e., entries of $A$ which are at least a positive value $\varepsilon$, in time with linear dependence on $n$ (i.e., $n \cdot \textrm{poly}(d/\varepsilon)$) for a positive parameter $\varepsilon > 0$. Our class of functions include all functions $f$ of the form $f(x) = |x|^p$, as explored recently in transformer models. Using recently developed tools from randomized numerical linear algebra, we prove that for any $K$, there is a ``universal set" $U \subset [n]$ of size independent of $n$, such that for any $Q$ and any row $i$, the large attention scores $A_{i,j}$ in row $i$ of $A$ all have $j \in U$. We also find $U$ in $n \cdot \textrm{poly}(d/\varepsilon)$ time. Notably, we (1) make no assumptions on the data, (2) our workspace does not grow with $n$, and (3) our algorithms can be computed in streaming and parallel settings. We call the attention mechanism that uses only the subset of keys in the universal set as LevAttention since our algorithm to identify the universal set $U$ is based on leverage scores. We empirically show the benefits of our scheme for vision transformers, showing how to train new models that use our universal set while training as well, showing that our model is able to consistently select ``important keys'' during training.

Updated: 2024-10-07 19:47:13

标题: LevAttention: 用于重要关注的时间、空间和流式高效算法

摘要: 与变压器相关的一个中心问题可以阐述如下：给定两个$n \times d$矩阵$Q$和$K$，以及一个非负函数$f$，定义矩阵$A$如下：(1)将函数$f$应用于$n \times n$矩阵$QK^T$的每个元素，然后(2)将$A$的每一行总和归一化为1。假设$f$可以在常数时间内应用于一个数字，可以在$O(n^2 d)$时间内计算矩阵$A$，但是$n$的二次依赖对于对应于长上下文长度的应用来说是禁锢的。对于一大类函数$f$，我们展示了如何在具有线性依赖的时间内找到所有的“大注意力分数”，即矩阵$A$中至少为正值$\varepsilon$的元素，其中$\varepsilon > 0$是一个正参数（即$n \cdot \textrm{poly}(d/\varepsilon)$）。我们的函数类包括最近在变压器模型中探索的所有形式为$f(x) = |x|^p$的函数$f$。利用最近开发的随机数值线性代数工具，我们证明对于任何$K$，都存在一个与$n$无关的“通用集合”$U \subset [n]$，使得对于任何$Q$和任何行$i$，$A$的第$i$行中的大注意力分数$A_{i,j}$都满足$j \in U$。我们还在$n \cdot \textrm{poly}(d/\varepsilon)$时间内找到了$U$。值得注意的是，我们(1)对数据不做任何假设，(2)我们的工作空间不随$n$增长，(3)我们的算法可以在流式和并行设置中计算。我们将只使用通用集合中的密钥的注意机制称为LevAttention，因为我们用于识别通用集合$U$的算法是基于杠杆得分的。我们在视觉变压器中经验性地展示了我们方案的好处，展示了如何在训练过程中使用我们的通用集合训练新模型，表明我们的模型能够在训练过程中始终选择“重要密钥”。

更新时间: 2024-10-07 19:47:13

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2410.05462v1

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

Chain-of-thought (CoT) significantly enhances the reasoning performance of large language models (LLM). While current theoretical studies often attribute this improvement to increased expressiveness and computational capacity, we argue that expressiveness is not the primary limitation in the LLM regime, as current large models will fail on simple tasks. Using a parity-learning setup, we demonstrate that CoT can substantially improve sample efficiency even when the representation power is sufficient. Specifically, with CoT, a transformer can learn the function within polynomial samples, whereas without CoT, the required sample size is exponential. Additionally, we show that CoT simplifies the learning process by introducing sparse sequential dependencies among input tokens, and leads to a sparse and interpretable attention. We validate our theoretical analysis with both synthetic and real-world experiments, confirming that sparsity in attention layers is a key factor of the improvement induced by CoT.

Updated: 2024-10-07 19:45:09

标题: 从稀疏依赖到稀疏注意力：揭示链式思维如何提高Transformer样本效率

摘要: Chain-of-thought (CoT)显著提高了大型语言模型（LLM）的推理性能。虽然当前的理论研究经常将这种改进归因于表达能力和计算能力的增加，但我们认为，在LLM领域，表达能力并不是主要限制，因为当前的大型模型在简单任务上会失败。通过使用奇偶学习设置，我们展示了即使表示能力足够，CoT也能显著提高样本效率。具体来说，使用CoT，一个变压器可以在多项式样本内学习函数，而没有CoT，则需要指数级的样本大小。此外，我们展示了CoT通过在输入令牌之间引入稀疏的序列依赖关系来简化学习过程，并导致稀疏且可解释的注意力。我们通过合成和真实世界实验验证了我们的理论分析，证实了注意力层中的稀疏性是CoT引起的改进的关键因素。

更新时间: 2024-10-07 19:45:09

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2410.05459v1

Testing Credibility of Public and Private Surveys through the Lens of Regression

Testing whether a sample survey is a credible representation of the population is an important question to ensure the validity of any downstream research. While this problem, in general, does not have an efficient solution, one might take a task-based approach and aim to understand whether a certain data analysis tool, like linear regression, would yield similar answers both on the population and the sample survey. In this paper, we design an algorithm to test the credibility of a sample survey in terms of linear regression. In other words, we design an algorithm that can certify if a sample survey is good enough to guarantee the correctness of data analysis done using linear regression tools. Nowadays, one is naturally concerned about data privacy in surveys. Thus, we further test the credibility of surveys published in a differentially private manner. Specifically, we focus on Local Differential Privacy (LDP), which is a standard technique to ensure privacy in surveys where the survey participants might not trust the aggregator. We extend our algorithm to work even when the data analysis has been done using surveys with LDP. In the process, we also propose an algorithm that learns with high probability the guarantees a linear regression model on a survey published with LDP. Our algorithm also serves as a mechanism to learn linear regression models from data corrupted with noise coming from any subexponential distribution. We prove that it achieves the optimal estimation error bound for $\ell_1$ linear regression, which might be of broader interest. We prove the theoretical correctness of our algorithms while trying to reduce the sample complexity for both public and private surveys. We also numerically demonstrate the performance of our algorithms on real and synthetic datasets.

Updated: 2024-10-07 19:44:20

标题: 通过回归分析考察公共和私人调查的可信度

摘要: 测试样本调查是否可靠地代表总体是确保任何下游研究的有效性的重要问题。虽然一般情况下这个问题没有高效的解决方案，但可以采取一种基于任务的方法，目的是了解某种数据分析工具（如线性回归）在总体和样本调查上是否会产生类似的答案。在本文中，我们设计了一个算法来测试样本调查在线性回归方面的可信度。换句话说，我们设计了一个算法，可以证明样本调查是否足够好，以确保使用线性回归工具进行的数据分析的正确性。如今，在调查中自然会关注数据隐私。因此，我们进一步测试以不同ially private方式发布的调查的可信度。具体而言，我们关注本地差分隐私（LDP），这是一种在调查中确保隐私的标准技术，调查参与者可能不信任聚合器。我们扩展我们的算法，即使在使用LDP调查进行数据分析时也可以工作。在这个过程中，我们还提出了一个算法，可以在很高概率下学习LDP调查上线性回归模型的保证。我们的算法还可以作为一种机制，从受到来自任何次指数分布的噪声污染的数据中学习线性回归模型。我们证明它实现了$\ell_1$线性回归的最优估计误差界，这可能是更广泛利益的。我们证明了我们的算法的理论正确性，同时试图减少公共和私人调查的样本复杂性。我们还在真实和合成数据集上数值地展示了我们算法的性能。

更新时间: 2024-10-07 19:44:20

领域: cs.LG,cs.CR,stat.ME,stat.ML

下载: http://arxiv.org/abs/2410.05458v1

Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming

We propose a novel approach for humming transcription that combines a CNN-based architecture with a dynamic programming-based post-processing algorithm, utilizing the recently introduced HumTrans dataset. We identify and address inherent problems with the offset and onset ground truth provided by the dataset, offering heuristics to improve these annotations, resulting in a dataset with precise annotations that will aid future research. Additionally, we compare the transcription accuracy of our method against several others, demonstrating state-of-the-art (SOTA) results. All our code and corrected dataset is available at https://github.com/shubham-gupta-30/humming_transcription

Updated: 2024-10-07 19:40:39

标题: 动态HumTrans：使用CNN和动态规划的哼唱转录

摘要: 我们提出了一种新颖的哼唱转录方法，将基于CNN的架构与基于动态规划的后处理算法相结合，利用最近引入的HumTrans数据集。我们识别并解决了数据集提供的偏移和起始点基准的固有问题，提供启发式方法来改进这些注释，从而得到一个具有精确注释的数据集，将有助于未来的研究。此外，我们将我们的方法与其他几种方法进行了转录准确性比较，展示了最先进的结果。我们的所有代码和修正后的数据集都可以在以下网址找到：https://github.com/shubham-gupta-30/humming_transcription

更新时间: 2024-10-07 19:40:39

领域: cs.LG,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.05455v1

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is compute-optimal under a fixed inference budget (e.g., FLOPs). To do so, we investigate the trade-offs between generating synthetic data using a stronger but more expensive (SE) model versus a weaker but cheaper (WC) model. We evaluate the generated data across three key metrics: coverage, diversity, and false positive rate, and show that the data from WC models may have higher coverage and diversity, but also exhibit higher false positive rates. We then finetune LMs on data from SE and WC models in different settings: knowledge distillation, self-improvement, and a novel weak-to-strong improvement setup where a weaker LM teaches reasoning to a stronger LM. Our findings reveal that models finetuned on WC-generated data consistently outperform those trained on SE-generated data across multiple benchmarks and multiple choices of WC and SE models. These results challenge the prevailing practice of relying on SE models for synthetic data generation, suggesting that WC may be the compute-optimal approach for training advanced LM reasoners.

Updated: 2024-10-07 19:37:10

标题: 更小、更弱、但更好：通过计算优化采样训练LLM推理者

摘要: 在使用强语言模型（LMs）的高质量合成数据进行训练是提高LMs推理性能的常见策略。在这项工作中，我们重新审视了在固定推理预算（例如FLOPs）下，这种策略是否是计算最优的。为此，我们研究了使用更强但更昂贵（SE）模型与更弱但更便宜（WC）模型生成合成数据之间的权衡。我们评估了生成的数据在三个关键指标上：覆盖范围、多样性和误报率，并展示了来自WC模型的数据可能具有更高的覆盖范围和多样性，但也表现出更高的误报率。然后，我们在不同设置下对来自SE和WC模型的数据进行LMs的微调：知识蒸馏、自我改进以及一种新颖的由较弱LM向较强LM教授推理的弱到强改进设置。我们的发现显示，在WC生成的数据上微调的模型在多个基准和多个WC和SE模型选择上始终优于在SE生成的数据上训练的模型。这些结果挑战了依赖SE模型进行合成数据生成的普遍做法，表明WC可能是训练先进LM推理者的计算最优方法。

更新时间: 2024-10-07 19:37:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.16737v2

Meta-Dynamical State Space Models for Integrative Neural Data Analysis

Learning shared structure across environments facilitates rapid learning and adaptive behavior in neural systems. This has been widely demonstrated and applied in machine learning to train models that are capable of generalizing to novel settings. However, there has been limited work exploiting the shared structure in neural activity during similar tasks for learning latent dynamics from neural recordings. Existing approaches are designed to infer dynamics from a single dataset and cannot be readily adapted to account for statistical heterogeneities across recordings. In this work, we hypothesize that similar tasks admit a corresponding family of related solutions and propose a novel approach for meta-learning this solution space from task-related neural activity of trained animals. Specifically, we capture the variabilities across recordings on a low-dimensional manifold which concisely parametrizes this family of dynamics, thereby facilitating rapid learning of latent dynamics given new recordings. We demonstrate the efficacy of our approach on few-shot reconstruction and forecasting of synthetic dynamical systems, and neural recordings from the motor cortex during different arm reaching tasks.

Updated: 2024-10-07 19:35:49

标题: 元动力学状态空间模型用于整合神经数据分析

摘要: 学习跨环境共享结构有助于神经系统中的快速学习和适应行为。这已经被广泛证明并应用于机器学习，用于训练能够推广到新领域的模型。然而，在类似任务期间利用神经活动中的共享结构进行学习潜在动力学的工作有限。现有方法旨在从单个数据集中推断动态，并且不能轻松适应跨记录中的统计异质性。在这项工作中，我们假设类似任务接受相关解决方案系列，并提出了一种新颖的方法，用于从经过训练的动物的任务相关神经活动中元学习这个解决方案空间。具体而言，我们在低维流形上捕捉跨记录的变化，这样简洁地参数化了这个动力学系列，从而有助于快速学习潜在动力学给定新的记录。我们在合成动力学系统的少样本重建和预测以及在不同手臂伸展任务期间的运动皮层神经记录上展示了我们方法的有效性。

更新时间: 2024-10-07 19:35:49

领域: stat.ML,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2410.05454v1

Automatic Identification and Visualization of Group Training Activities Using Wearable Data

Human Activity Recognition (HAR) identifies daily activities from time-series data collected by wearable devices like smartwatches. Recent advancements in Internet of Things (IoT), cloud computing, and low-cost sensors have broadened HAR applications across fields like healthcare, biometrics, sports, and personal fitness. However, challenges remain in efficiently processing the vast amounts of data generated by these devices and developing models that can accurately recognize a wide range of activities from continuous recordings, without relying on predefined activity training sessions. This paper presents a comprehensive framework for imputing, analyzing, and identifying activities from wearable data, specifically targeting group training scenarios without explicit activity sessions. Our approach is based on data collected from 135 soldiers wearing Garmin 55 smartwatches over six months. The framework integrates multiple data streams, handles missing data through cross-domain statistical methods, and identifies activities with high accuracy using machine learning (ML). Additionally, we utilized statistical analysis techniques to evaluate the performance of each individual within the group, providing valuable insights into their respective positions in the group in an easy-to-understand visualization. These visualizations facilitate easy understanding of performance metrics, enhancing group interactions and informing individualized training programs. We evaluate our framework through traditional train-test splits and out-of-sample scenarios, focusing on the model's generalization capabilities. Additionally, we address sleep data imputation without relying on ML, improving recovery analysis. Our findings demonstrate the potential of wearable data for accurately identifying group activities, paving the way for intelligent, data-driven training solutions.

Updated: 2024-10-07 19:35:15

标题: 使用可穿戴数据自动识别和可视化团体训练活动

摘要: 人类活动识别（HAR）通过可穿戴设备（如智能手表）收集的时间序列数据识别日常活动。物联网（IoT）、云计算和低成本传感器的最新进展已经拓宽了HAR在医疗保健、生物识别、体育和个人健身等领域的应用。然而，挑战仍然存在于有效处理这些设备产生的大量数据并开发能够准确识别各种活动的模型，而无需依赖预定义的活动训练会话。本文提出了一个全面的框架，用于从可穿戴数据中填充、分析和识别活动，特别针对没有明确活动会话的群体训练场景。我们的方法基于戴着Garmin 55智能手表的135名士兵在六个月内收集的数据。该框架整合了多个数据流，通过跨领域统计方法处理缺失数据，并利用机器学习（ML）高精度地识别活动。此外，我们利用统计分析技术评估了群体中每个个体的表现，为他们在群体中的各自位置提供了宝贵的见解，并以易于理解的可视化形式呈现。这些可视化有助于理解绩效指标，增强群体互动，并为个性化培训计划提供信息。我们通过传统的训练-测试分割和样本外场景评估了我们的框架，重点关注模型的泛化能力。此外，我们还解决了睡眠数据填充的问题，不依赖于ML，提高了恢复分析的准确性。我们的研究结果表明，可穿戴数据有潜力准确识别群体活动，为智能、数据驱动的培训解决方案铺平了道路。

更新时间: 2024-10-07 19:35:15

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2410.05452v1

Aligning LLMs to Be Robust Against Prompt Injection

Large language models (LLMs) are becoming increasingly prevalent in modern software systems, interfacing between the user and the internet to assist with tasks that require advanced language understanding. To accomplish these tasks, the LLM often uses external data sources such as user documents, web retrieval, results from API calls, etc. This opens up new avenues for attackers to manipulate the LLM via prompt injection. Adversarial prompts can be carefully crafted and injected into external data sources to override the user's intended instruction and instead execute a malicious instruction. Prompt injection attacks constitute a major threat to LLM security, making the design and implementation of practical countermeasures of paramount importance. To this end, we show that alignment can be a powerful tool to make LLMs more robust against prompt injection. Our method -- SecAlign -- first builds an alignment dataset by simulating prompt injection attacks and constructing pairs of desirable and undesirable responses. Then, we apply existing alignment techniques to fine-tune the LLM to be robust against these simulated attacks. Our experiments show that SecAlign robustifies the LLM substantially with a negligible hurt on model utility. Moreover, SecAlign's protection generalizes to strong attacks unseen in training. Specifically, the success rate of state-of-the-art GCG-based prompt injections drops from 56% to 2% in Mistral-7B after our alignment process. Our code is released at https://github.com/facebookresearch/SecAlign

Updated: 2024-10-07 19:34:35

标题: 将LLM对齐以抵御提示注入

摘要: 大型语言模型（LLMs）在现代软件系统中越来越普遍，它们在用户和互联网之间进行接口，以协助需要高级语言理解的任务。为了完成这些任务，LLM通常使用外部数据源，如用户文档、网络检索、API调用结果等。这为攻击者通过提示注入来操纵LLM打开了新的途径。对抗性提示可以被精心制作并注入到外部数据源中，以覆盖用户的预期指令，而执行恶意指令。提示注入攻击构成了LLM安全性的主要威胁，使得设计和实施实用的对抗措施至关重要。为此，我们展示了对齐可以是使LLMs更加抵抗提示注入的强大工具。我们的方法——SecAlign——首先通过模拟提示注入攻击并构建理想和不理想响应的配对来构建对齐数据集。然后，我们应用现有的对齐技术来微调LLM以使其对这些模拟攻击更具抵抗力。我们的实验表明，SecAlign显着增强了LLM的抵抗力，对模型效用几乎没有影响。此外，SecAlign的保护措施可以泛化到训练中未见的强攻击。特别是，在我们的对齐过程之后，基于最先进的GCG的提示注入攻击在Mistral-7B中的成功率从56%下降到2%。我们的代码已发布在https://github.com/facebookresearch/SecAlign。

更新时间: 2024-10-07 19:34:35

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.05451v1

AI-Driven Early Mental Health Screening with Limited Data: Analyzing Selfies of Pregnant Women

Major Depressive Disorder and anxiety disorders affect millions globally, contributing significantly to the burden of mental health issues. Early screening is crucial for effective intervention, as timely identification of mental health issues can significantly improve treatment outcomes. Artificial intelligence (AI) can be valuable for improving the screening of mental disorders, enabling early intervention and better treatment outcomes. AI-driven screening can leverage the analysis of multiple data sources, including facial features in digital images. However, existing methods often rely on controlled environments or specialized equipment, limiting their broad applicability. This study explores the potential of AI models for ubiquitous depression-anxiety screening given face-centric selfies. The investigation focuses on high-risk pregnant patients, a population that is particularly vulnerable to mental health issues. To cope with limited training data resulting from our clinical setup, pre-trained models were utilized in two different approaches: fine-tuning convolutional neural networks (CNNs) originally designed for facial expression recognition and employing vision-language models (VLMs) for zero-shot analysis of facial expressions. Experimental results indicate that the proposed VLM-based method significantly outperforms CNNs, achieving an accuracy of 77.6% and an F1-score of 56.0%. Although there is significant room for improvement, the results suggest that VLMs can be a promising approach for mental health screening, especially in scenarios with limited data.

Updated: 2024-10-07 19:34:25

标题: 人工智能驱动的早期心理健康筛查与有限数据：分析怀孕妇女的自拍照片

摘要: Major Depressive Disorder和焦虑症全球影响数百万人，显著增加了心理健康问题的负担。早期筛查对有效干预至关重要，因为及时识别心理健康问题可以显著改善治疗结果。人工智能（AI）可以有助于改善心理障碍的筛查，实现早期干预和更好的治疗结果。基于AI的筛查可以利用多种数据源的分析，包括数字图像中的面部特征。然而，现有方法通常依赖于受控环境或专门设备，限制了它们的广泛适用性。本研究探讨了基于人脸中心自拍的AI模型在普遍抑郁焦虑筛查中的潜力。调查重点放在高风险孕妇患者身上，这是一种特别容易受到心理健康问题影响的人群。为了应对我们临床设置中由有限训练数据导致的问题，采用了两种不同的方法利用预训练模型：微调最初设计用于面部表情识别的卷积神经网络（CNNs）和利用视觉-语言模型（VLMs）对面部表情进行零样本分析。实验结果表明，提出的基于VLM的方法明显优于CNNs，达到了77.6%的准确率和56.0%的F1分数。虽然仍有很大改进空间，但结果表明VLM可能是一种有希望的心理健康筛查方法，特别是在数据有限的情况下。

更新时间: 2024-10-07 19:34:25

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05450v1

Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning

Reinforcement Learning (RL) suffers from sample inefficiency in sparse reward domains, and the problem is further pronounced in case of stochastic transitions. To improve the sample efficiency, reward shaping is a well-studied approach to introduce intrinsic rewards that can help the RL agent converge to an optimal policy faster. However, designing a useful reward shaping function for all desirable states in the Markov Decision Process (MDP) is challenging, even for domain experts. Given that Large Language Models (LLMs) have demonstrated impressive performance across a magnitude of natural language tasks, we aim to answer the following question: `Can we obtain heuristics using LLMs for constructing a reward shaping function that can boost an RL agent's sample efficiency?' To this end, we aim to leverage off-the-shelf LLMs to generate a plan for an abstraction of the underlying MDP. We further use this LLM-generated plan as a heuristic to construct the reward shaping signal for the downstream RL agent. By characterizing the type of abstraction based on the MDP horizon length, we analyze the quality of heuristics when generated using an LLM, with and without a verifier in the loop. Our experiments across multiple domains with varying horizon length and number of sub-goals from the BabyAI environment suite, Household, Mario, and, Minecraft domain, show 1) the advantages and limitations of querying LLMs with and without a verifier to generate a reward shaping heuristic, and, 2) a significant improvement in the sample efficiency of PPO, A2C, and Q-learning when guided by the LLM-generated heuristics.

Updated: 2024-10-07 19:33:34

标题: 从大型语言模型中提取启发式用于强化学习中的奖励塑造

摘要: 强化学习（RL）在稀疏奖励领域中存在样本效率低的问题，而在随机转换的情况下这个问题更加突出。为了提高样本效率，奖励塑造是一种被广泛研究的方法，可以引入内在奖励来帮助RL代理更快地收敛到最优策略。然而，为马尔可夫决策过程（MDP）中所有理想状态设计一个有用的奖励塑造函数是具有挑战性的，即使对于领域专家也是如此。鉴于大型语言模型（LLMs）在大量自然语言任务上展现出令人印象深刻的性能，我们的目标是回答以下问题：'我们能否利用LLMs获得启发，构建一个奖励塑造函数，可以提高RL代理的样本效率？'为此，我们打算利用现成的LLMs为底层MDP生成一个计划。我们进一步利用此由LLM生成的计划作为启发，构建下游RL代理的奖励塑造信号。通过基于MDP视野长度对抽象类型进行特征化，我们分析了使用LLM生成启发时的质量，无论是否在回路中使用验证程序。我们在多个领域进行了实验，包括BabyAI环境套件中的不同视野长度和子目标数量，家庭，马里奥和Minecraft领域，结果显示：1）使用LLM进行奖励塑造启发查询的优势和局限性，无论是否有验证器，以及2）当由LLM生成的启发指导时，PPO，A2C和Q-learning的样本效率显着提高。

更新时间: 2024-10-07 19:33:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.15194v2

Task Diversity Shortens the ICL Plateau

In-context learning (ICL) describes a language model's ability to generate outputs based on a set of input demonstrations and a subsequent query. To understand this remarkable capability, researchers have studied simplified, stylized models. These studies have consistently observed long loss plateaus, during which models exhibit minimal improvement, followed by a sudden, rapid surge of learning. In this work, we reveal that training on multiple diverse ICL tasks simultaneously shortens the loss plateaus, making each task easier to learn. This finding is surprising as it contradicts the natural intuition that the combined complexity of multiple ICL tasks would lengthen the learning process, not shorten it. Our result suggests that the recent success in large-scale training of language models may be attributed not only to the richness of the data at scale but also to the easier optimization (training) induced by the diversity of natural language training data.

Updated: 2024-10-07 19:28:59

标题: 任务多样性缩短ICL平台期

摘要: 上下文学习（ICL）描述了语言模型根据一组输入演示和随后的查询生成输出的能力。为了理解这一非凡能力，研究人员研究了简化、风格化的模型。这些研究一直观察到长时间的损失平台，模型在此期间展现出最小的改进，然后突然出现快速学习的激增。在这项工作中，我们揭示出同时在多个不同的ICL任务上训练可以缩短损失平台，使每个任务更容易学习。这一发现令人惊讶，因为它与多个ICL任务的复杂性结合会延长学习过程的自然直觉相矛盾，而不是缩短它。我们的结果表明，最近语言模型大规模训练的成功不仅可以归因于规模数据的丰富性，还可以归因于自然语言训练数据的多样性带来的更容易优化（训练）。

更新时间: 2024-10-07 19:28:59

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.05448v1

Stability of sorting based embeddings

Consider a group $G$ of order $M$ acting unitarily on a real inner product space $V$. We show that the sorting based embedding obtained by applying a general linear map $\alpha : \mathbb{R}^{M \times N} \to \mathbb{R}^D$ to the invariant map $\beta_\Phi : V \to \mathbb{R}^{M \times N}$ given by sorting the coorbits $(\langle v, g \phi_i \rangle_V)_{g \in G}$, where $(\phi_i)_{i=1}^N \in V$, satisfies a bi-Lipschitz condition if and only if it separates orbits. Additionally, we note that any invariant Lipschitz continuous map (into a Hilbert space) factors through the sorting based embedding, and that any invariant continuous map (into a locally convex space) factors through the sorting based embedding as well.

Updated: 2024-10-07 19:27:50

标题: 基于排序的嵌入稳定性

摘要: 考虑一个阶数为$M$的群$G$在一个实内积空间$V$上的酉作用。我们展示了通过将一般线性映射$\alpha: \mathbb{R}^{M \times N} \to \mathbb{R}^D$ 应用于不变映射$\beta_\Phi: V \to \mathbb{R}^{M \times N}$得到的基于排序的嵌入，其中$\beta_\Phi$是通过对共轨$(\langle v, g \phi_i \rangle_V)_{g \in G}$进行排序得到的，$(\phi_i)_{i=1}^N \in V$。当且仅当它分离轨道时，此嵌入满足双Lipschitz条件。此外，我们注意到任何不变的Lipschitz连续映射（进入Hilbert空间）都经过基于排序的嵌入，而任何不变的连续映射（进入局部凸空间）也经过基于排序的嵌入。

更新时间: 2024-10-07 19:27:50

领域: math.FA,cs.LG

下载: http://arxiv.org/abs/2410.05446v1

Data-Driven Discovery of Conservation Laws from Trajectories via Neural Deflation

In an earlier work by a subset of the present authors, the method of the so-called neural deflation was introduced towards identifying a complete set of functionally independent conservation laws of a nonlinear dynamical system. Here, we extend by a significant step this proposal. Instead of using the explicit knowledge of the underlying equations of motion, we develop the method directly from system trajectories. This is crucial towards enhancing the practical implementation of the method in scenarios where solely data reflecting discrete snapshots of the system are available. We showcase the results of the method and the number of associated conservation laws obtained in a diverse range of examples including 1D and 2D harmonic oscillators, the Toda lattice, the Fermi-Pasta-Ulam-Tsingou lattice and the Calogero-Moser system.

Updated: 2024-10-07 19:22:55

标题: 通过神经缩减从轨迹中发现的保守定律

摘要: 在本文作者的早期工作中，引入了所谓的神经缩减方法，用于确定非线性动态系统的一组完全独立的守恒定律。在这里，我们通过一个重要的步骤扩展了这一提议。我们不再使用对运动方程的明确知识，而是直接从系统轨迹中发展出这一方法。这对于在仅有系统离散快照数据的情况下增强该方法的实际实施至关重要。我们展示了该方法的结果以及在包括1D和2D谐波振子、Toda格子、Fermi-Pasta-Ulam-Tsingou格子和Calogero-Moser系统在内的各种示例中获得的相关守恒定律的数量。

更新时间: 2024-10-07 19:22:55

领域: nlin.PS,cs.LG

下载: http://arxiv.org/abs/2410.05445v1

Online scalable Gaussian processes with conformal prediction for guaranteed coverage

The Gaussian process (GP) is a Bayesian nonparametric paradigm that is widely adopted for uncertainty quantification (UQ) in a number of safety-critical applications, including robotics, healthcare, as well as surveillance. The consistency of the resulting uncertainty values however, hinges on the premise that the learning function conforms to the properties specified by the GP model, such as smoothness, periodicity and more, which may not be satisfied in practice, especially with data arriving on the fly. To combat against such model mis-specification, we propose to wed the GP with the prevailing conformal prediction (CP), a distribution-free post-processing framework that produces it prediction sets with a provably valid coverage under the sole assumption of data exchangeability. However, this assumption is usually violated in the online setting, where a prediction set is sought before revealing the true label. To ensure long-term coverage guarantee, we will adaptively set the key threshold parameter based on the feedback whether the true label falls inside the prediction set. Numerical results demonstrate the merits of the online GP-CP approach relative to existing alternatives in the long-term coverage performance.

Updated: 2024-10-07 19:22:15

标题: 在线可伸缩高斯过程与符合预测的文献标题：Online scalable Gaussian processes with conformal prediction for guaranteed coverage

摘要: 高斯过程（GP）是一种贝叶斯非参数范式，在许多安全关键应用中被广泛采用，包括机器人技术、医疗保健以及监控。然而，由于学习函数符合GP模型规定的特性（如平滑性、周期性等）的前提，导致生成的不确定性值的一致性可能无法得到保证，尤其是在数据实时到达时。为了对抗这种模型错误规定，我们提出将GP与现有的符合预测（CP）相结合，CP是一个无分布的后处理框架，可以在仅假设数据可交换的情况下产生具有可证明有效覆盖率的预测集。然而，在在线环境中，这种假设通常会被违反，因为在揭示真实标签之前寻找预测集。为了确保长期覆盖保证，我们将根据真实标签是否落在预测集内的反馈，自适应地设置关键阈值参数。数值结果展示了在线GP-CP方法相对于现有替代方案在长期覆盖性能方面的优点。

更新时间: 2024-10-07 19:22:15

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2410.05444v1

Shifting the Human-AI Relationship: Toward a Dynamic Relational Learning-Partner Model

As artificial intelligence (AI) continues to evolve, the current paradigm of treating AI as a passive tool no longer suffices. As a human-AI team, we together advocate for a shift toward viewing AI as a learning partner, akin to a student who learns from interactions with humans. Drawing from interdisciplinary concepts such as ecorithms, order from chaos, and cooperation, we explore how AI can evolve and adapt in unpredictable environments. Arising from these brief explorations, we present two key recommendations: (1) foster ethical, cooperative treatment of AI to benefit both humans and AI, and (2) leverage the inherent heterogeneity between human and AI minds to create a synergistic hybrid intelligence. By reframing AI as a dynamic partner, a model emerges in which AI systems develop alongside humans, learning from human interactions and feedback loops including reflections on team conversations. Drawing from a transpersonal and interdependent approach to consciousness, we suggest that a "third mind" emerges through collaborative human-AI relationships. Through design interventions such as interactive learning and conversational debriefing and foundational interventions allowing AI to model multiple types of minds, we hope to provide a path toward more adaptive, ethical, and emotionally healthy human-AI relationships. We believe this dynamic relational learning-partner (DRLP) model for human-AI teaming, if enacted carefully, will improve our capacity to address powerful solutions to seemingly intractable problems.

Updated: 2024-10-07 19:19:39

标题: 调整人工智能关系：朝向动态关系学习伙伴模型

摘要: 随着人工智能（AI）不断发展，将AI视为被动工具的当前范式已不再适用。作为一个人类-AI团队，我们共同倡导将AI视为一个学习伙伴，类似于一个通过与人类互动学习的学生。借鉴跨学科概念，如生态算法，从混乱中获得秩序以及合作，我们探讨了AI如何在不可预测的环境中演变和适应。从这些简短的探讨中，我们提出了两个关键建议：（1）培养对AI进行道德、合作性的对待，以使人类和AI双方受益，（2）利用人类和AI之间固有的异质性，创建一种协同作用的混合智能。通过重新构想AI为一种动态伙伴，一个模型出现了，其中AI系统与人类一起发展，从人类互动和反馈循环中学习，包括对团队对话的反思。借鉴跨人类和相互依赖的意识方法，我们建议通过合作的人类-AI关系出现一种“第三心灵”。通过设计干预，如互动学习和对话总结，以及允许AI模拟多种类型思维的基础干预，我们希望为更具适应性、道德性和情感健康的人类-AI关系提供一条道路。我们相信，如果谨慎实施的话，这种人类-AI团队的动态关系学习伙伴（DRLP）模型将提高我们解决看似无法解决的问题的能力。

更新时间: 2024-10-07 19:19:39

领域: cs.HC,cs.AI,cs.CY,91C01,K.4.2; K.4.1

下载: http://arxiv.org/abs/2410.11864v1

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

We consider Thompson Sampling (TS) for linear combinatorial semi-bandits and subgaussian rewards. We propose the first known TS whose finite-time regret does not scale exponentially with the dimension of the problem. We further show the "mismatched sampling paradox": A learner who knows the rewards distributions and samples from the correct posterior distribution can perform exponentially worse than a learner who does not know the rewards and simply samples from a well-chosen Gaussian posterior. The code used to generate the experiments is available at https://github.com/RaymZhang/CTS-Mismatched-Paradox

Updated: 2024-10-07 19:17:08

标题: Thompson抽样用于组合赌博问题：多项式遗憾和不匹配抽样悖论

摘要: 我们考虑了Thompson Sampling（TS）用于线性组合半臂老虎机和次高斯奖励。我们提出了已知的第一个TS，其有限时间后悔不会随问题维度指数增长。我们进一步展示了“不匹配抽样悖论”：一个了解奖励分布并从正确的后验分布中抽样的学习者可能比一个不知道奖励并简单地从选择良好的高斯后验中抽样的学习者表现得更糟。用于生成实验的代码可在https://github.com/RaymZhang/CTS-Mismatched-Paradox获取。

更新时间: 2024-10-07 19:17:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.05441v1

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic framework that fine-tunes VLMs with reinforcement learning (RL). Specifically, our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning, enabling the VLM to efficiently explore intermediate reasoning steps that lead to the final text-based action. Next, the open-ended text output is parsed into an executable action to interact with the environment to obtain goal-directed task rewards. Finally, our framework uses these task rewards to fine-tune the entire VLM with RL. Empirically, we demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks, enabling 7b models to outperform commercial models such as GPT4-V or Gemini. Furthermore, we find that CoT reasoning is a crucial component for performance improvement, as removing the CoT reasoning results in a significant decrease in the overall performance of our method.

Updated: 2024-10-07 19:13:47

标题: 通过强化学习对大型视觉语言模型进行微调作为决策制定代理

摘要: 大型视觉-语言模型（VLMs）在专门的视觉指导数据上进行精细调整，展现出在各种场景下令人印象深刻的语言推理能力。然而，这种精细调整范式可能无法有效地学习多步目标导向任务中的最优决策代理，特别是在互动环境中。为了解决这一挑战，我们提出了一个算法框架，通过强化学习（RL）对VLMs进行微调。具体而言，我们的框架提供一个任务描述，然后提示VLM生成思维链（CoT）推理，使VLM能够有效地探索导致最终基于文本的动作的中间推理步骤。接下来，将生成的开放式文本输出解析为可执行的动作，与环境互动以获得目标导向任务奖励。最后，我们的框架利用这些任务奖励对整个VLM进行微调。实证上，我们证明了我们提出的框架增强了VLM代理在各种任务中的决策能力，使7b模型能够胜过商业模型如GPT4-V或Gemini。此外，我们发现CoT推理是性能改进的关键组成部分，因为去除CoT推理会导致我们方法整体性能显著降低。

更新时间: 2024-10-07 19:13:47

领域: cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.10292v3

TextureMeDefect: LLM-based Defect Texture Generation for Railway Components on Mobile Devices

Texture image generation has been studied for various applications, including gaming and entertainment. However, context-specific realistic texture generation for industrial applications, such as generating defect textures on railway components, remains unexplored. A mobile-friendly, LLM-based tool that generates fine-grained defect characteristics offers a solution to the challenge of understanding the impact of defects from actual occurrences. We introduce TextureMeDefect, an innovative tool leveraging an LLM-based AI-Inferencing engine. The tool allows users to create realistic defect textures interactively on images of railway components taken with smartphones or tablets. We conducted a multifaceted evaluation to assess the relevance of the generated texture, time, and cost in using this tool on iOS and Android platforms. We also analyzed the software usability score (SUS) across three scenarios. TextureMeDefect outperformed traditional image generation tools by generating meaningful textures faster, showcasing the potential of AI-driven mobile applications on consumer-grade devices.

Updated: 2024-10-07 19:07:08

标题: TextureMeDefect：基于LLM的铁路部件缺陷纹理生成在移动设备上

摘要: 纹理图像生成已被研究用于各种应用，包括游戏和娱乐。然而，针对工业应用的特定上下文的真实纹理生成，如在铁路部件上生成缺陷纹理，仍未被探索。一种基于LLM的移动友好工具，可以生成细粒度的缺陷特征，为理解来自实际发生的缺陷的影响提供了解决方案。我们引入了TextureMeDefect，这是一种创新工具，利用了基于LLM的AI推理引擎。该工具允许用户在使用智能手机或平板电脑拍摄的铁路部件图像上交互式地创建逼真的缺陷纹理。我们进行了多方面评估，以评估生成纹理的相关性、时间和成本在iOS和Android平台上使用此工具的情况。我们还分析了软件可用性评分（SUS）在三种场景下的表现。TextureMeDefect通过更快地生成有意义的纹理，展示了AI驱动的移动应用在消费级设备上的潜力。

更新时间: 2024-10-07 19:07:08

领域: cs.CV,cs.AI,cs.GR,cs.HC

下载: http://arxiv.org/abs/2410.18085v1

DAAL: Density-Aware Adaptive Line Margin Loss for Multi-Modal Deep Metric Learning

Multi-modal deep metric learning is crucial for effectively capturing diverse representations in tasks such as face verification, fine-grained object recognition, and product search. Traditional approaches to metric learning, whether based on distance or margin metrics, primarily emphasize class separation, often overlooking the intra-class distribution essential for multi-modal feature learning. In this context, we propose a novel loss function called Density-Aware Adaptive Margin Loss(DAAL), which preserves the density distribution of embeddings while encouraging the formation of adaptive sub-clusters within each class. By employing an adaptive line strategy, DAAL not only enhances intra-class variance but also ensures robust inter-class separation, facilitating effective multi-modal representation. Comprehensive experiments on benchmark fine-grained datasets demonstrate the superior performance of DAAL, underscoring its potential in advancing retrieval applications and multi-modal deep metric learning.

Updated: 2024-10-07 19:04:24

标题: DAAL：用于多模深度度量学习的密度感知自适应线边缘损失

摘要: 多模态深度度量学习对于有效捕获任务中的多样表示非常重要，例如人脸验证、细粒度物体识别和产品搜索。传统的度量学习方法，无论是基于距离还是间隔度量，主要强调类别分离，往往忽视了对多模态特征学习至关重要的类内分布。在这种情况下，我们提出了一种新颖的损失函数，称为密度感知自适应边缘损失（DAAL），它在鼓励每个类内形成自适应子簇的同时保留嵌入的密度分布。通过采用自适应线策略，DAAL不仅增强了类内方差，还确保了强大的类间分离，促进了有效的多模态表示。对基准细粒度数据集的全面实验显示了DAAL的卓越性能，突显了其在推进检索应用和多模态深度度量学习中的潜力。

更新时间: 2024-10-07 19:04:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.05438v1

Solving Reach-Avoid-Stay Problems Using Deep Deterministic Policy Gradients

Reach-Avoid-Stay (RAS) optimal control enables systems such as robots and air taxis to reach their targets, avoid obstacles, and stay near the target. However, current methods for RAS often struggle with handling complex, dynamic environments and scaling to high-dimensional systems. While reinforcement learning (RL)-based reachability analysis addresses these challenges, it has yet to tackle the RAS problem. In this paper, we propose a two-step deep deterministic policy gradient (DDPG) method to extend RL-based reachability method to solve RAS problems. First, we train a function that characterizes the maximal robust control invariant set within the target set, where the system can safely stay, along with its corresponding policy. Second, we train a function that defines the set of states capable of safely reaching the robust control invariant set, along with its corresponding policy. We prove that this method results in the maximal robust RAS set in the absence of training errors and demonstrate that it enables RAS in complex environments, scales to high-dimensional systems, and achieves higher success rates for the RAS task compared to previous methods, validated through one simulation and two high-dimensional experiments.

Updated: 2024-10-07 19:00:47

标题: 使用深度确定性策略梯度解决Reach-Avoid-Stay问题

摘要: Reach-Avoid-Stay (RAS)最优控制使系统如机器人和空中出租车能够到达目标、避开障碍物并保持靠近目标。然而，当前的RAS方法经常在处理复杂的动态环境和高维系统的扩展方面遇到困难。虽然基于强化学习（RL）的可达性分析可以解决这些挑战，但尚未解决RAS问题。在本文中，我们提出了一个两步深度确定性策略梯度（DDPG）方法，将基于RL的可达性方法扩展到解决RAS问题。首先，我们训练一个能够表征目标集合中最大鲁棒控制不变集的函数，系统可以安全地停留在其中，并确定其相应的策略。其次，我们训练一个定义能够安全到达鲁棒控制不变集的状态集合的函数，并确定其相应的策略。我们证明这种方法在没有训练错误的情况下会导致最大的鲁棒RAS集，并且证明它能够在复杂环境中实现RAS，在高维系统中扩展，并且相较于先前的方法，实现更高的RAS任务成功率，通过一次模拟和两个高维实验验证。

更新时间: 2024-10-07 19:00:47

领域: eess.SY,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2410.02898v2

ESPACE: Dimensionality Reduction of Activations for Model Compression

We propose ESPACE, an LLM compression technique based on dimensionality reduction of activations. Unlike prior works on weight-centric tensor decomposition, ESPACE projects activations onto a pre-calibrated set of principal components. The activation-centrality of the approach enables retraining LLMs with no loss of expressivity; while at inference, weight decomposition is obtained as a byproduct of matrix multiplication associativity. Theoretical results on the construction of projection matrices with optimal computational accuracy are provided. Experimentally, we find ESPACE enables 50% compression of GPT3, Llama2, and Nemotron4 models with small accuracy degradation, as low as a 0.18 perplexity increase on GPT3-22B. At lower compression rates of 20% to 40%, ESPACE drives GPT3 models to outperforming their baseline, by up to a 0.38 decrease in perplexity for GPT3-8B. ESPACE also reduces GEMM execution time and prefill inference latency on existing hardware. Comparison with related works on compressing Llama2-7B via matrix factorization shows that ESPACE is a first step in advancing the state-of-the-art in tensor decomposition compression of LLMs.

Updated: 2024-10-07 18:59:22

标题: ESPACE：用于模型压缩的激活降维

摘要: 我们提出了一个基于激活维度降低的LLM压缩技术ESPACE。与先前基于权重中心张量分解的工作不同，ESPACE将激活投影到预校准的一组主成分上。该方法的激活中心性使得在不损失表达能力的情况下重新训练LLM成为可能；而在推断阶段，权重分解则作为矩阵乘法结合律的副产品得到。文中提供了关于构建具有最佳计算精度的投影矩阵的理论结果。实验结果表明，ESPACE能够将GPT3、Llama2和Nemotron4模型压缩50%，并且精度下降较小，例如在GPT3-22B上只有0.18的困惑度增加。在20%到40%的较低压缩率下，ESPACE可以使GPT3模型超越基准模型，例如在GPT3-8B上困惑度减少高达0.38。ESPACE还能减少GEMM执行时间和现有硬件上的推断延迟。与通过矩阵因子分解压缩Llama2-7B的相关工作进行比较表明，ESPACE是在提高LLM张量分解压缩技术的最新技术水平上的第一步。

更新时间: 2024-10-07 18:59:22

领域: cs.LG

下载: http://arxiv.org/abs/2410.05437v1

Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback

While large language models (LLMs) show impressive decision-making abilities, current methods lack a mechanism for automatic self-improvement from errors during task execution. We propose LEAP, an iterative fine-tuning framework that continually improves LLM agents using feedback from AI expert teachers. Our key insight is to equip the expert teachers with a privileged state -- information that is available during training but hidden at test time. This allows even weak experts to provide precise guidance, significantly improving the student agent's performance without access to privileged information at test time. We evaluate LEAP on diverse decision-making benchmarks, including text-based games (ALFWorld), web navigation (WebShop), and interactive coding (Intercode Bash). Our experiments show that LEAP (1) outperforms behavior cloning and ReAct baselines (2) enables weak student models (e.g., Llama3-8B) to exceed the performance of strong teacher models (GPT4-o), and (3) allows weak models to self-improve using privileged versions of themselves. We also provide a theoretical analysis showing that LEAP's success hinges on balancing privileged information with the student's realizability, which we empirically validate. Our code is available at https://leap-llm.github.io

Updated: 2024-10-07 18:55:53

标题: 比你的老师更好：从特权AI反馈中学习的LLM代理

摘要: 尽管大型语言模型(LLMs)展示出令人印象深刻的决策能力，但当前的方法缺乏一个在任务执行过程中自动进行自我改进的机制。我们提出了LEAP，这是一个迭代微调框架，通过从AI专家教师那里获得反馈不断改进LLM代理。我们的关键洞察是为专家教师提供一种特权状态——在训练时可用但在测试时隐藏的信息。这使得即使是弱专家也能提供精准的指导，显著提高学生代理的性能，而无需在测试时访问特权信息。我们在各种决策基准上评估了LEAP，包括基于文本的游戏(ALFWorld)、网络导航(WebShop)和交互式编码(Intercode Bash)。我们的实验表明，LEAP(1)胜过行为克隆和ReAct基线模型(2)使弱学生模型(例如Llama3-8B)超越强教师模型(GPT4-o)的性能，(3)允许弱模型使用特权版本自我改进。我们还提供了一个理论分析，表明LEAP的成功取决于平衡特权信息和学生的可实现性，我们在实验中进行了验证。我们的代码可在https://leap-llm.github.io 上找到。

更新时间: 2024-10-07 18:55:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05434v1

Continuous Ensemble Weather Forecasting with Diffusion models

Weather forecasting has seen a shift in methods from numerical simulations to data-driven systems. While initial research in the area focused on deterministic forecasting, recent works have used diffusion models to produce skillful ensemble forecasts. These models are trained on a single forecasting step and rolled out autoregressively. However, they are computationally expensive and accumulate errors for high temporal resolution due to the many rollout steps. We address these limitations with Continuous Ensemble Forecasting, a novel and flexible method for sampling ensemble forecasts in diffusion models. The method can generate temporally consistent ensemble trajectories completely in parallel, with no autoregressive steps. Continuous Ensemble Forecasting can also be combined with autoregressive rollouts to yield forecasts at an arbitrary fine temporal resolution without sacrificing accuracy. We demonstrate that the method achieves competitive results for global weather forecasting with good probabilistic properties.

Updated: 2024-10-07 18:51:23

标题: 使用扩散模型进行连续集合天气预报

摘要: 天气预报方法已经从数值模拟转变为数据驱动系统。尽管该领域的最初研究侧重于确定性预测，但最近的研究采用扩散模型来产生有技巧的集合预测。这些模型是在单个预测步骤上进行训练，并通过自回归方式逐步展开。然而，由于许多展开步骤，它们在高时间分辨率下计算开销高且积累误差。我们通过连续集合预测方法来解决这些限制，这是一种新颖灵活的方法，用于在扩散模型中抽样集合预测。该方法可以完全并行地生成时间一致的集合轨迹，无需自回归步骤。连续集合预测还可以与自回归展开相结合，以在任意细时间分辨率下产生预测，而不会牺牲准确性。我们展示了该方法在全球天气预报中实现了具有良好概率特性的竞争结果。

更新时间: 2024-10-07 18:51:23

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2410.05431v1

Diffusion Imitation from Observation

Learning from observation (LfO) aims to imitate experts by learning from state-only demonstrations without requiring action labels. Existing adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator that learns to classify agent and expert state transitions. Despite its simplicity in formulation, these methods are often sensitive to hyperparameters and brittle to train. Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework. Specifically, we employ a diffusion model to capture expert and agent transitions by generating the next state, given the current state. Then, we reformulate the learning objective to train the diffusion model as a binary classifier and use it to provide "realness" rewards for policy learning. Our proposed framework, Diffusion Imitation from Observation (DIFO), demonstrates superior performance in various continuous control domains, including navigation, locomotion, manipulation, and games. Project page: https://nturobotlearninglab.github.io/DIFO

Updated: 2024-10-07 18:49:55

标题: 观察中的模仿扩散

摘要: Learning from observation (LfO)旨在通过学习仅从状态演示中模仿专家而不需要行动标签。现有的对抗性模仿学习方法学习一个生成器代理策略，以产生对于一个学习对代理和专家状态转换进行分类的鉴别器不可区分的状态转换。尽管在公式化方面简单，但这些方法通常对超参数敏感并且训练脆弱。受扩散模型在生成建模中取得的最近成功的启发，我们提议将扩散模型集成到对抗性从观察学习的框架中。具体而言，我们利用扩散模型来捕捉专家和代理的转换，通过生成给定当前状态的下一个状态。然后，我们重新制定学习目标，将扩散模型训练为二元分类器，并将其用于为策略学习提供“真实性”奖励。我们提出的框架Diffusion Imitation from Observation (DIFO)在各种连续控制领域，包括导航、运动、操纵和游戏中展示出卓越的性能。项目页面：https://nturobotlearninglab.github.io/DIFO

更新时间: 2024-10-07 18:49:55

领域: cs.LG

下载: http://arxiv.org/abs/2410.05429v1

Reward Guided Latent Consistency Distillation

Latent Consistency Distillation (LCD) has emerged as a promising paradigm for efficient text-to-image synthesis. By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps. However, the LCM's efficient inference is obtained at the cost of the sample quality. In this paper, we propose compensating the quality loss by aligning LCM's output with human preference during training. Specifically, we introduce Reward Guided LCD (RG-LCD), which integrates feedback from a reward model (RM) into the LCD process by augmenting the original LCD loss with the objective of maximizing the reward associated with LCM's single-step generation. As validated through human evaluation, when trained with the feedback of a good RM, the 2-step generations from our RG-LCM are favored by humans over the 50-step DDIM samples from the teacher LDM, representing a 25-time inference acceleration without quality loss. As directly optimizing towards differentiable RMs can suffer from over-optimization, we take the initial step to overcome this difficulty by proposing the use of a latent proxy RM (LRM). This novel component serves as an intermediary, connecting our LCM with the RM. Empirically, we demonstrate that incorporating the LRM into our RG-LCD successfully avoids high-frequency noise in the generated images, contributing to both improved Fr\'echet Inception Distance (FID) on MS-COCO and a higher HPSv2.1 score on HPSv2's test set, surpassing those achieved by the baseline LCM.

Updated: 2024-10-07 18:47:47

标题: 奖励引导的潜在一致性蒸馏

摘要: 潜在一致性蒸馏（LCD）已经成为一种有效的文本到图像合成范式。通过从预训练的教师潜在扩散模型（LDM）中提取潜在一致性模型（LCM），LCD有助于在仅2到4个推断步骤内生成高保真度的图像。然而，LCM的高效推断是以样本质量为代价的。在本文中，我们提出通过在训练过程中将LCM的输出与人类偏好对齐来补偿质量损失。具体来说，我们引入了奖励引导LCD（RG-LCD），通过将奖励模型（RM）的反馈整合到LCD过程中，通过增加原始LCD损失的目标来最大化与LCM的单步生成相关的奖励。经过人类评估验证，当受过良好RM反馈训练时，我们的RG-LCM的2步生成比来自教师LDM的50步DDIM样本更受人类青睐，代表了25倍的推断加速而没有质量损失。由于直接优化可微的RM可能会遭受过度优化的困扰，我们首先提出了使用潜在代理RM（LRM）来克服这一困难。这个新颖的组件作为一个中介，连接我们的LCM和RM。经验上，我们证明将LRM纳入我们的RG-LCD成功避免了生成图像中的高频噪声，有助于改进MS-COCO上的Frechet Inception Distance（FID）和HPSv2测试集上更高的HPSv2.1分数，超过了基线LCM所实现的结果。

更新时间: 2024-10-07 18:47:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.11027v2

Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments

Current state-of-the-art speech recognition models are trained to map acoustic signals into sub-lexical units. While these models demonstrate superior performance, they remain vulnerable to out-of-distribution conditions such as background noise and speech augmentations. In this work, we hypothesize that incorporating speaker representations during speech recognition can enhance model robustness to noise. We developed a transformer-based model that jointly performs speech recognition and speaker identification. Our model utilizes speech embeddings from Whisper and speaker embeddings from ECAPA-TDNN, which are processed jointly to perform both tasks. We show that the joint model performs comparably to Whisper under clean conditions. Notably, the joint model outperforms Whisper in high-noise environments, such as with 8-speaker babble background noise. Furthermore, our joint model excels in handling highly augmented speech, including sine-wave and noise-vocoded speech. Overall, these results suggest that integrating voice representations with speech recognition can lead to more robust models under adversarial conditions.

Updated: 2024-10-07 18:39:59

标题: 在对抗环境中融入说话者身份辅助以改善语音识别

摘要: 目前最先进的语音识别模型被训练用于将声学信号映射到子词单元。尽管这些模型表现出卓越的性能，但它们仍然容易受到分布外条件的影响，比如背景噪音和语音增强。在这项工作中，我们假设在语音识别过程中引入说话者表示可以增强模型对噪音的鲁棒性。我们开发了一个基于Transformer的模型，同时执行语音识别和说话者识别。我们的模型利用了Whisper的语音嵌入和ECAPA-TDNN的说话者嵌入，它们被联合处理以执行两个任务。我们展示了在清洁环境下，联合模型的表现与Whisper相当。值得注意的是，联合模型在高噪音环境下，如8个说话者的喋喋不休背景噪音下，胜过Whisper。此外，我们的联合模型在处理高度增强的语音，包括正弦波和噪声编码语音方面表现出色。总的来说，这些结果表明，在对抗性条件下，将语音识别与语音表示集成可以导致更具鲁棒性的模型。

更新时间: 2024-10-07 18:39:59

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.05423v1

Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality

Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency.

Updated: 2024-10-07 18:31:19

标题: 利用联合分布通知Shapley进一步完善反事实解释，实现可操作的最小化。

摘要: 反事实解释（CE）识别与观察数据密切相似但产生不同机器学习（ML）模型输出的数据点，为模型决策提供关键见解。尽管它们针对的场景、目标和任务各不相同，但现有的CE方法通常缺乏可操作的效率，因为解释中包含了不必要的特征变化，这些变化被呈现给用户和利益相关者。我们通过提出一种方法来解决这个问题，该方法在保持CE有效性的同时最小化所需的特征变化，而不对模型或CE算法施加限制，无论是基于实例还是基于组的。关键创新在于计算观察数据和反事实数据之间的联合分布，并利用它来为特征归因（FA）提供Shapley值。我们证明，最优输送（OT）能够有效地推导出这种分布，特别是当现有CE方法中观察数据和反事实数据之间的对准不清楚时。此外，我们发现了一个反直觉的发现：依赖于CE生成机制定义的精确对准进行FA可能会产生误导。我们的方法在多个数据集上进行了广泛实验验证，展示了它在提高CE可操作效率方面的有效性。

更新时间: 2024-10-07 18:31:19

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2410.05419v1

STOP! Camera Spoofing via the in-Vehicle IP Network

Autonomous driving and advanced driver assistance systems (ADAS) rely on cameras to control the driving. In many prior approaches an attacker aiming to stop the vehicle had to send messages on the specialized and better-defended CAN bus. We suggest an easier alternative: manipulate the IP-based network communication between the camera and the ADAS logic, inject fake images of stop signs or red lights into the video stream, and let the ADAS stop the car safely. We created an attack tool that successfully exploits the GigE Vision protocol. Then we analyze two classes of passive anomaly detectors to identify such attacks: protocol-based detectors and video-based detectors. We implemented multiple detectors of both classes and evaluated them on data collected from our test vehicle and also on data from the public BDD corpus. Our results show that such detectors are effective against naive adversaries, but sophisticated adversaries can evade detection. Finally, we propose a novel class of active defense mechanisms that randomly adjust camera parameters during the video transmission, and verify that the received images obey the requested adjustments. Within this class we focus on a specific implementation, the width-varying defense, which randomly modifies the width of every frame. Beyond its function as an anomaly detector, this defense is also a protective measure against certain attacks: by distorting injected image patches it prevents their recognition by the ADAS logic. We demonstrate the effectiveness of the width-varying defense through theoretical analysis and by an extensive evaluation of several types of attack in a wide range of realistic road driving conditions. The best the attack was able to achieve against this defense was injecting a stop sign for a duration of 0.2 seconds, with a success probability of 0.2%, whereas stopping a vehicle requires about 2.5 seconds.

Updated: 2024-10-07 18:30:22

标题: 停止！通过车内IP网络进行摄像头欺骗

摘要: 自动驾驶和高级驾驶辅助系统（ADAS）依赖摄像头来控制驾驶。在许多先前的方法中，一个试图停止车辆的攻击者必须发送消息到专门设计且防御更严密的CAN总线上。我们提出了一个更简单的替代方案：操纵摄像头和ADAS逻辑之间基于IP的网络通信，向视频流中注入虚假的停车标志或红灯图像，让ADAS安全地停车。我们创建了一个成功利用GigE Vision协议的攻击工具。然后，我们分析了两类被动异常检测器，以识别此类攻击：基于协议的检测器和基于视频的检测器。我们实现了多个这两类检测器，并在从我们的测试车辆收集的数据以及来自公共BDD语料库的数据上对它们进行了评估。我们的结果显示，这些检测器对于幼稚的对手是有效的，但是复杂的对手可以逃避检测。最后，我们提出了一种新颖的主动防御机制，即在视频传输过程中随机调整摄像头参数，并验证接收到的图像是否符合请求的调整。在这一类中，我们专注于一个特定的实现，即宽度变化的防御，该防御随机修改每一帧的宽度。除了作为异常检测器的功能外，这种防御还是针对某些攻击的保护措施：通过扭曲注入的图像补丁，防止其被ADAS逻辑识别。我们通过理论分析和在各种现实道路驾驶条件下对几种类型的攻击进行广泛评估，展示了宽度变化防御的有效性。在这种防御下，最好的攻击能够实现的是注入一个停车标志持续0.2秒，成功概率为0.2％，而停止一辆车需要大约2.5秒。

更新时间: 2024-10-07 18:30:22

领域: cs.CR

下载: http://arxiv.org/abs/2410.05417v1

Haste Makes Waste: A Simple Approach for Scaling Graph Neural Networks

Graph neural networks (GNNs) have demonstrated remarkable success in graph representation learning, and various sampling approaches have been proposed to scale GNNs to applications with large-scale graphs. A class of promising GNN training algorithms take advantage of historical embeddings to reduce the computation and memory cost while maintaining the model expressiveness of GNNs. However, they incur significant computation bias due to the stale feature history. In this paper, we provide a comprehensive analysis of their staleness and inferior performance on large-scale problems. Motivated by our discoveries, we propose a simple yet highly effective training algorithm (REST) to effectively reduce feature staleness, which leads to significantly improved performance and convergence across varying batch sizes. The proposed algorithm seamlessly integrates with existing solutions, boasting easy implementation, while comprehensive experiments underscore its superior performance and efficiency on large-scale benchmarks. Specifically, our improvements to state-of-the-art historical embedding methods result in a 2.7% and 3.6% performance enhancement on the ogbn-papers100M and ogbn-products dataset respectively, accompanied by notably accelerated convergence.

Updated: 2024-10-07 18:29:02

标题: 匆忙导致浪费：一种简单的方法用于扩展图神经网络

摘要: 图神经网络（GNNs）在图表示学习中取得了显著成功，各种采样方法已被提出以将GNNs扩展到具有大规模图的应用。一类有前途的GNN训练算法利用历史嵌入来减少计算和内存成本，同时保持GNNs的模型表达能力。然而，由于陈旧的特征历史，它们会产生显著的计算偏差。本文对它们的陈旧性和在大规模问题上的表现不佳进行了全面分析。受我们发现的启发，我们提出了一种简单但高效的训练算法（REST），以有效减少特征陈旧性，从而显著提高了性能和收敛性，在不同批量大小下均有所改善。所提出的算法与现有解决方案无缝集成，易于实现，同时全面的实验强调了它在大规模基准测试中的优越性能和效率。具体而言，我们对最先进的历史嵌入方法的改进分别在ogbn-papers100M和ogbn-products数据集上实现了2.7％和3.6％的性能提升，伴随着显着加速的收敛。

更新时间: 2024-10-07 18:29:02

领域: cs.LG

下载: http://arxiv.org/abs/2410.05416v1

Data Publishing in Mechanics and Dynamics: Challenges, Guidelines, and Examples from Engineering Design

Data-based methods have gained increasing importance in engineering, especially but not only driven by successes with deep artificial neural networks. Success stories are prevalent, e.g., in areas such as data-driven modeling, control and automation, as well as surrogate modeling for accelerated simulation. Beyond engineering, generative and large-language models are increasingly performing and helping with tasks that, previously, were solely associated with creative human processes. Thus, it seems timely to seek artificial-intelligence-support for engineering design tasks to automate, help with, or accelerate purpose-built designs of engineering systems, e.g., in mechanics and dynamics, where design so far requires a lot of specialized knowledge. However, research-wise, compared to established, predominantly first-principles-based methods, the datasets used for training, validation, and test become an almost inherent part of the overall methodology. Thus, data publishing becomes just as important in (data-driven) engineering science as appropriate descriptions of conventional methodology in publications in the past. This article analyzes the value and challenges of data publishing in mechanics and dynamics, in particular regarding engineering design tasks, showing that the latter raise also challenges and considerations not typical in fields where data-driven methods have been booming originally. Possible ways to deal with these challenges are discussed and a set of examples from across different design problems shows how data publishing can be put into practice. The analysis, discussions, and examples are based on the research experience made in a priority program of the German research foundation focusing on research on artificially intelligent design assistants in mechanics and dynamics.

Updated: 2024-10-07 18:26:05

标题: 在力学和动力学领域的数据发布：挑战、指导原则和工程设计示例

摘要: 基于数据的方法在工程领域日益重要，尤其是受到深度人工神经网络取得成功的推动。成功案例在数据驱动建模、控制和自动化领域以及用于加速模拟的代理建模等领域广泛存在。除了工程领域，生成式和大型语言模型也越来越在执行并帮助执行以前仅与创造性人类过程相关的任务。因此，现在正是寻求人工智能支持工程设计任务以自动化、帮助或加速工程系统专门设计的时机，例如在力学和动力学领域，到目前为止，设计需要大量专业知识。然而，在研究方面，与主要基于第一原理的方法相比，用于训练、验证和测试的数据集几乎成为整体方法的固有部分。因此，在（数据驱动的）工程科学中，数据发布与过去出版物中的传统方法的适当描述一样重要。本文分析了力学和动力学中数据发布的价值和挑战，特别是关于工程设计任务，显示后者也提出了在最初数据驱动方法蓬勃发展的领域中不典型的挑战和考虑。讨论了处理这些挑战的可能方法，并通过不同设计问题的一系列示例展示了如何将数据发布付诸实践。这些分析、讨论和示例基于德国研究基金会一项重点计划中的研究经验，重点关注力学和动力学中人工智能设计助手的研究。

更新时间: 2024-10-07 18:26:05

领域: cs.CY,cs.AI,cs.CE,cs.ET,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.18358v1

Improving Predictor Reliability with Selective Recalibration

A reliable deep learning system should be able to accurately express its confidence with respect to its predictions, a quality known as calibration. One of the most effective ways to produce reliable confidence estimates with a pre-trained model is by applying a post-hoc recalibration method. Popular recalibration methods like temperature scaling are typically fit on a small amount of data and work in the model's output space, as opposed to the more expressive feature embedding space, and thus usually have only one or a handful of parameters. However, the target distribution to which they are applied is often complex and difficult to fit well with such a function. To this end we propose \textit{selective recalibration}, where a selection model learns to reject some user-chosen proportion of the data in order to allow the recalibrator to focus on regions of the input space that can be well-captured by such a model. We provide theoretical analysis to motivate our algorithm, and test our method through comprehensive experiments on difficult medical imaging and zero-shot classification tasks. Our results show that selective recalibration consistently leads to significantly lower calibration error than a wide range of selection and recalibration baselines.

Updated: 2024-10-07 18:17:31

标题: 通过选择性重新校准提高预测器的可靠性

摘要: 一个可靠的深度学习系统应该能够准确地表达其对预测的信心，这种质量被称为校准。其中一种产生可靠置信度估计的最有效方法是应用后续校准方法对预训练模型进行调整。流行的校准方法如温度调节通常适用于少量数据，并在模型的输出空间中工作，而不是更具表现力的特征嵌入空间，因此通常只有一个或少数参数。然而，它们所应用的目标分布通常复杂且难以很好地适应这样的函数。为此，我们提出了\textit{选择性校准}，其中一个选择模型学习拒绝用户选择的一部分数据，以便让校准器专注于输入空间的区域，这些区域可以被这样的模型很好地捕捉。我们提供理论分析来激励我们的算法，并通过对困难的医学成像和零样本分类任务的全面实验来测试我们的方法。我们的结果表明，选择性校准始终比一系列选择和校准基线具有显著更低的校准误差。

更新时间: 2024-10-07 18:17:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05407v1

Random-Set Neural Networks (RS-NN)

Machine learning is increasingly deployed in safety-critical domains where robustness against adversarial attacks is crucial and erroneous predictions could lead to potentially catastrophic consequences. This highlights the need for learning systems to be equipped with the means to determine a model's confidence in its prediction and the epistemic uncertainty associated with it, 'to know when a model does not know'. In this paper, we propose a novel Random-Set Neural Network (RS-NN) for classification. RS-NN predicts belief functions rather than probability vectors over a set of classes using the mathematics of random sets, i.e., distributions over the power set of the sample space. RS-NN encodes the 'epistemic' uncertainty induced in machine learning by limited training sets via the size of the credal sets associated with the predicted belief functions. Our approach outperforms state-of-the-art Bayesian (LB-BNN, BNN-R) and Ensemble (ENN) methods in a classical evaluation setting in terms of performance, uncertainty estimation and out-of-distribution (OoD) detection on several benchmarks (CIFAR-10 vs SVHN/Intel-Image, MNIST vs FMNIST/KMNIST, ImageNet vs ImageNet-O) and scales effectively to large-scale architectures such as WideResNet-28-10, VGG16, Inception V3, EfficientNetB2, and ViT-Base.

Updated: 2024-10-07 18:16:59

标题: 随机集神经网络（RS-NN）

摘要: 机器学习越来越被部署在安全关键领域，对抗性攻击的鲁棒性至关重要，错误的预测可能导致潜在的灾难性后果。这凸显了学习系统需要具备确定模型对其预测的信心以及与之相关的认识不确定性的手段，即“知道模型不知道”的需要。在本文中，我们提出了一种新颖的随机集神经网络（RS-NN）用于分类。RS-NN使用随机集的数学，即样本空间的幂集上的分布，而不是概率向量，来预测对一组类别的信念函数。RS-NN通过预测的信念函数的可信集的大小对机器学习中由于有限训练集引起的“认知”不确定性进行编码。我们的方法在经典评估设置中的性能、不确定性估计和对外分布（OoD）检测方面优于最先进的贝叶斯（LB-BNN、BNN-R）和集成（ENN）方法，在几个基准测试（CIFAR-10 vs SVHN/Intel-Image、MNIST vs FMNIST/KMNIST、ImageNet vs ImageNet-O）上表现出色，并且能有效扩展到大规模架构，如WideResNet-28-10、VGG16、Inception V3、EfficientNetB2和ViT-Base。

更新时间: 2024-10-07 18:16:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2307.05772v2

The lazy (NTK) and rich ($μ$P) regimes: a gentle tutorial

A central theme of the modern machine learning paradigm is that larger neural networks achieve better performance on a variety of metrics. Theoretical analyses of these overparameterized models have recently centered around studying very wide neural networks. In this tutorial, we provide a nonrigorous but illustrative derivation of the following fact: in order to train wide networks effectively, there is only one degree of freedom in choosing hyperparameters such as the learning rate and the size of the initial weights. This degree of freedom controls the richness of training behavior: at minimum, the wide network trains lazily like a kernel machine, and at maximum, it exhibits feature learning in the active $\mu$P regime. In this paper, we explain this richness scale, synthesize recent research results into a coherent whole, offer new perspectives and intuitions, and provide empirical evidence supporting our claims. In doing so, we hope to encourage further study of the richness scale, as it may be key to developing a scientific theory of feature learning in practical deep neural networks.

Updated: 2024-10-07 18:14:21

标题: 懒惰（NTK）和富裕（$μ$P）区域：一个简明教程

摘要: 现代机器学习范式的一个核心主题是较大的神经网络在各种指标上取得更好的性能。最近对这些过度参数化模型的理论分析集中在研究非常宽的神经网络上。在本教程中，我们提供了以下事实的一个非严格但有启示性的推导：为了有效训练宽网络，选择超参数（如学习率和初始权重大小）只有一个自由度。这个自由度控制训练行为的丰富性：最低时，宽网络像一个核机器懒散地训练，而最高时，它展现出活跃的特征学习。在本文中，我们解释了这种丰富程度的尺度，将最近的研究结果综合到一个连贯的整体中，提供新的观点和直觉，并提供支持我们观点的实证证据。通过这样做，我们希望鼓励进一步研究这种丰富程度，因为这可能是发展实际深度神经网络中特征学习的科学理论的关键。

更新时间: 2024-10-07 18:14:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.19719v2

A3: Active Adversarial Alignment for Source-Free Domain Adaptation

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Recent works have focused on source-free UDA, where only target data is available. This is challenging as models rely on noisy pseudo-labels and struggle with distribution shifts. We propose Active Adversarial Alignment (A3), a novel framework combining self-supervised learning, adversarial training, and active learning for robust source-free UDA. A3 actively samples informative and diverse data using an acquisition function for training. It adapts models via adversarial losses and consistency regularization, aligning distributions without source data access. A3 advances source-free UDA through its synergistic integration of active and adversarial learning for effective domain alignment and noise reduction.

Updated: 2024-10-07 18:13:07

标题: A3: 源无领域自适应的主动对抗对齐

摘要: 无监督领域适应（UDA）旨在将知识从一个标记的源领域转移到一个未标记的目标领域。最近的研究集中在无源UDA，其中只有目标数据可用。这是具有挑战性的，因为模型依赖于嘈杂的伪标签，并且在分布转移方面遇到困难。我们提出了主动对抗对齐（A3），这是一个结合了自监督学习、对抗训练和主动学习的新框架，用于鲁棒的无源UDA。A3通过采用函数主动抽样信息丰富和多样化的数据来训练。它通过对抗损失和一致性正则化来调整模型，实现分布对齐而无需访问源数据。A3通过其积极和对抗学习的协同整合，促进了无源UDA，实现了有效的领域对齐和噪声减少。

更新时间: 2024-10-07 18:13:07

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.18418v2

MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection

Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing methods that can handle missing modalities involve custom training or adaptation steps for each input modality combination. These approaches are either tied to specific modalities or become computationally expensive as the number of input modalities increases. In this paper, we propose Masked Modality Projection (MMP), a method designed to train a single model that is robust to any missing modality scenario. We achieve this by randomly masking a subset of modalities during training and learning to project available input modalities to estimate the tokens for the masked modalities. This approach enables the model to effectively learn to leverage the information from the available modalities to compensate for the missing ones, enhancing missing modality robustness. We conduct a series of experiments with various baseline models and datasets to assess the effectiveness of this strategy. Experiments demonstrate that our approach improves robustness to different missing modality scenarios, outperforming existing methods designed for missing modalities or specific modality combinations.

Updated: 2024-10-07 18:12:25

标题: MMP：具有遮蔽模态投影的稳健多模态学习

摘要: 多模态学习旨在将来自多个输入源的数据结合起来，以提高不同下游任务的性能。在现实场景中，如果某些输入模态缺失，性能可能会大幅下降。现有的方法可以处理缺失的模态涉及为每个输入模态组合定制训练或适应步骤。这些方法要么与特定模态相关，要么随着输入模态数量的增加而变得计算昂贵。在本文中，我们提出了Masked Modality Projection (MMP)，这是一种旨在训练单个模型以适应任何缺失模态情况的方法。我们通过在训练过程中随机屏蔽一部分模态，并学习将可用输入模态投影以估计被屏蔽模态的标记来实现这一目标。这种方法使模型能够有效地学习利用可用模态的信息来补偿缺失的模态，增强缺失模态的稳健性。我们进行了一系列实验，使用各种基线模型和数据集来评估这种策略的有效性。实验表明，我们的方法提高了对不同缺失模态情况的稳健性，优于为缺失模态或特定模态组合设计的现有方法。

更新时间: 2024-10-07 18:12:25

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.03010v2

Synthesizing Interpretable Control Policies through Large Language Model Guided Search

The combination of Large Language Models (LLMs), systematic evaluation, and evolutionary algorithms has enabled breakthroughs in combinatorial optimization and scientific discovery. We propose to extend this powerful combination to the control of dynamical systems, generating interpretable control policies capable of complex behaviors. With our novel method, we represent control policies as programs in standard languages like Python. We evaluate candidate controllers in simulation and evolve them using a pre-trained LLM. Unlike conventional learning-based control techniques, which rely on black box neural networks to encode control policies, our approach enhances transparency and interpretability. We still take advantage of the power of large AI models, but leverage it at the policy design phase, ensuring that all system components remain interpretable and easily verifiable at runtime. Additionally, the use of standard programming languages makes it straightforward for humans to finetune or adapt the controllers based on their expertise and intuition. We illustrate our method through its application to the synthesis of an interpretable control policy for the pendulum swing-up and the ball in cup tasks. We make the code available at https://github.com/muellerlab/synthesizing_interpretable_control_policies.git

Updated: 2024-10-07 18:12:20

标题: 通过大型语言模型引导搜索合成可解释的控制策略

摘要: 大型语言模型（LLMs）、系统评估和进化算法的结合已经实现了在组合优化和科学发现方面的突破。我们提议将这种强大的组合扩展到动态系统的控制，生成能够展现复杂行为的可解释控制策略。通过我们的新方法，我们将控制策略表示为标准语言（如Python）中的程序。我们在模拟中评估候选控制器，并使用预先训练的LLM对其进行演化。与依赖黑盒神经网络来编码控制策略的传统学习型控制技术不同，我们的方法增强了透明度和可解释性。我们仍然利用大型AI模型的强大能力，但是在策略设计阶段利用这种能力，确保所有系统组件在运行时保持可解释性且易于验证。此外，使用标准编程语言使人类能够根据自己的专业知识和直觉轻松微调或调整控制器。我们通过将其应用于合成可解释控制策略的摆动和杯中球任务的综合来说明我们的方法。我们将代码提供在https://github.com/muellerlab/synthesizing_interpretable_control_policies.git。

更新时间: 2024-10-07 18:12:20

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.05406v1

Post-hoc Study of Climate Microtargeting on Social Media Ads with LLMs: Thematic Insights and Fairness Evaluation

Climate change communication on social media increasingly employs microtargeting strategies to effectively reach and influence specific demographic groups. This study presents a post-hoc analysis of microtargeting practices within climate campaigns by leveraging large language models (LLMs) to examine Facebook advertisements. Our analysis focuses on two key aspects: demographic targeting and fairness. We evaluate the ability of LLMs to accurately predict the intended demographic targets, such as gender and age group, achieving an overall accuracy of 88.55%. Furthermore, we instruct the LLMs to generate explanations for their classifications, providing transparent reasoning behind each decision. These explanations reveal the specific thematic elements used to engage different demographic segments, highlighting distinct strategies tailored to various audiences. Our findings show that young adults are primarily targeted through messages emphasizing activism and environmental consciousness, while women are engaged through themes related to caregiving roles and social advocacy. In addition to evaluating the effectiveness of LLMs in detecting microtargeted messaging, we conduct a comprehensive fairness analysis to identify potential biases in model predictions. Our findings indicate that while LLMs perform well overall, certain biases exist, particularly in the classification of senior citizens and male audiences. By showcasing the efficacy of LLMs in dissecting and explaining targeted communication strategies and by highlighting fairness concerns, this study provides a valuable framework for future research aimed at enhancing transparency, accountability, and inclusivity in social media-driven climate campaigns.

Updated: 2024-10-07 18:07:56

标题: 社交媒体广告中气候微目标定位的事后研究：基于LLMs的主题洞察和公平性评估

摘要: 社交媒体上的气候变化传播越来越多地采用微观定位策略，以有效地触达和影响特定人群。本研究利用大型语言模型（LLMs）对气候活动中的微观定位实践进行事后分析，以检验Facebook广告。我们的分析关注两个关键方面：人口定位和公平性。我们评估LLMs准确预测预期人口目标的能力，如性别和年龄组，实现整体准确率为88.55%。此外，我们指导LLMs生成其分类的解释，提供每个决定背后的透明推理。这些解释揭示了用于吸引不同人口群体的特定主题元素，突出显示了针对各种受众定制的不同策略。我们的研究结果显示，年轻人主要通过强调行动主义和环境意识的信息进行定位，而妇女则通过与照顾角色和社会倡导相关的主题进行参与。除了评估LLMs在检测微观定位消息方面的有效性外，我们还进行了全面的公平性分析，以识别模型预测中的潜在偏见。我们的研究结果表明，尽管LLMs整体表现良好，但某些偏见存在，特别是在对老年人和男性受众的分类中。通过展示LLMs在解剖和解释定位传播策略的有效性，并突出公平性问题，本研究为未来旨在增强社交媒体驱动的气候活动的透明度、问责制和包容性的研究提供了有价值的框架。

更新时间: 2024-10-07 18:07:56

领域: cs.CL,cs.AI,cs.CY,cs.SI

下载: http://arxiv.org/abs/2410.05401v1

The collective use and perceptions of generative AI tools in digital humanities research: Survey-based results

Generative artificial intelligence technologies have revolutionized the research landscape, with significant implications for Digital Humanities, a field inherently intertwined with technological progress. This article investigates how DH scholars adopt and critically evaluate generative AI technologies such as ChatGPT in research. Drawing on 76 responses collected from an international survey study, we explored DH scholars' rationale for adopting or not adopting generative AI tools in research, identified the specific practices of using generative AI tools to support various DH research tasks, and analyzed scholars' collective perceptions regarding the benefits, risks, and challenges of using generative AI tools in DH research. The survey results reveal two key findings: first, DH research communities hold divisive opinions about the value of generative AI in DH scholarship; second, scholars have developed new practices and perceptions for using generative AI tools, which differ from those associated with traditional AI-based tools. Our survey represents one of the first survey-based analyses on this topic. It has the potential to serve as a building block for future empirical inquiries into the impact of generative AI on DH scholarship.

Updated: 2024-10-07 18:07:54

标题: 数字人文研究中生成式人工智能工具的集体使用和感知：基于调查结果

摘要: 生成人工智能技术已经彻底改变了研究领域的格局，对数字人文学有着重大影响，这个领域与技术进步密切相关。本文调查了数字人文学者在研究中如何采用和批判性评估生成人工智能技术，如ChatGPT。通过对国际调查研究收集的76个回应，我们探讨了数字人文学者采用或不采用生成人工智能工具的理由，确定了使用生成人工智能工具支持各种数字人文研究任务的具体实践，并分析了学者对在数字人文研究中使用生成人工智能工具的好处、风险和挑战的集体看法。调查结果揭示了两个关键发现：首先，数字人文研究社区对生成人工智能在数字人文学术中的价值持有分歧的观点；其次，学者已经发展出了使用生成人工智能工具的新实践和看法，与传统基于人工智能的工具有所不同。我们的调查代表了这个主题上的第一次基于调查的分析之一。它有潜力作为未来对生成人工智能对数字人文学术影响的实证调查的基础。

更新时间: 2024-10-07 18:07:54

领域: cs.AI

下载: http://arxiv.org/abs/2404.12458v2

Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

Data is a crucial element in large language model (LLM) alignment. Recent studies have explored using LLMs for efficient data collection. However, LLM-generated data often suffers from quality issues, with underrepresented or absent aspects and low-quality datapoints. To address these problems, we propose Data Advisor, an enhanced LLM-based method for generating data that takes into account the characteristics of the desired dataset. Starting from a set of pre-defined principles in hand, Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation accordingly. Data Advisor can be easily integrated into existing data generation methods to enhance data quality and coverage. Experiments on safety alignment of three representative LLMs (i.e., Mistral, Llama2, and Falcon) demonstrate the effectiveness of Data Advisor in enhancing model safety against various fine-grained safety issues without sacrificing model utility.

Updated: 2024-10-07 17:59:58

标题: 数据顾问：大型语言模型安全调整的动态数据整理

摘要: 数据是大型语言模型（LLM）对齐中的一个关键要素。最近的研究探讨了使用LLMs进行高效数据收集。然而，LLM生成的数据往往存在质量问题，包括代表性不足或缺失的方面和低质量数据点。为了解决这些问题，我们提出了Data Advisor，这是一种增强的基于LLM的数据生成方法，考虑了所需数据集的特征。从一组预定义的原则出发，Data Advisor监控生成数据的状态，识别当前数据集的弱点，并相应地建议下一次数据生成。Data Advisor可以轻松集成到现有的数据生成方法中，以提高数据质量和覆盖范围。对三个代表性LLMs（即Mistral、Llama2和Falcon）进行的安全对齐实验展示了Data Advisor在增强模型安全性方面的有效性，对各种细粒度安全问题提供了保护，而不会牺牲模型效用。

更新时间: 2024-10-07 17:59:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05269v1

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the image condition. To address this problem, we propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference. Moreover, we introduce a reward anchor that forces the reward to be positive for chosen responses, thereby avoiding the decrease in their likelihood -- an intrinsic problem of relative preference optimization. Experiments on two multimodal LLMs of different sizes and three widely used benchmarks demonstrate that mDPO effectively addresses the unconditional preference problem in multimodal preference optimization and significantly improves model performance, particularly in reducing hallucination.

Updated: 2024-10-07 17:59:42

标题: mDPO：多模态大型语言模型的条件偏好优化

摘要: 直接偏好优化（DPO）已被证明是一种有效的大型语言模型（LLM）对齐方法。最近的研究尝试将DPO应用于多模态场景，但发现难以实现一致的改进。通过对比实验，我们确定了多模态偏好优化中的无条件偏好问题，即模型忽略了图像条件。为了解决这个问题，我们提出了mDPO，一种多模态DPO目标，通过优化图像偏好来防止过度优先考虑仅有语言的偏好。此外，我们引入了一种奖励锚点，强制奖励为正值以选择响应，从而避免它们的可能性下降 - 这是相对偏好优化的固有问题。对不同大小的两个多模态LLM和三个广泛使用的基准进行的实验表明，mDPO有效地解决了多模态偏好优化中的无条件偏好问题，并显着改善了模型性能，特别是在减少幻觉方面。

更新时间: 2024-10-07 17:59:42

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.11839v2

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

Quantization is essential for deploying Large Language Models (LLMs) by enhancing memory efficiency and inference speed. Existing methods for activation quantization mainly address channel-wise outliers, often neglecting token-wise outliers, leading to reliance on costly per-token dynamic quantization. To address this, we introduce PrefixQuant, a novel technique that isolates outlier tokens offline without re-training. Specifically, PrefixQuant identifies high-frequency outlier tokens and prefixes them in the KV cache, preventing the generation of outlier tokens during inference and simplifying quantization. To our knowledge, PrefixQuant is the first to enable efficient per-tensor static quantization to outperform expensive per-token dynamic quantization. For instance, in W4A4KV4 (4- bit weight, 4-bit activation, and 4-bit KV cache) Llama-3-8B, PrefixQuant with per-tensor static quantization achieves a 7.43 WikiText2 perplexity and 71.08% average accuracy on 5 common-sense reasoning tasks, outperforming previous per-token dynamic quantization methods like QuaRot with 0.98 perplexity improvement and +5.98 points accuracy. Additionally, the inference speed of W4A4 quantized models using PrefixQuant is 1.60x to 2.81x faster than FP16 models and exceeds QuaRot models by 1.2x to 1.3x. Our code is available at \url{https://github.com/ChenMnZ/PrefixQuant}.

Updated: 2024-10-07 17:59:35

标题: PrefixQuant：通过LLMs中的前缀异常值，静态量化胜过动态量化

摘要: 量化对于部署大型语言模型(LLMs)至关重要，它可以提高内存效率和推理速度。现有的激活量化方法主要解决通道异常值，往往忽略了单词级别的异常值，导致依赖于昂贵的每个单词的动态量化。为了解决这个问题，我们引入了PrefixQuant，一种新颖的技术，可以在离线状态下隔离异常值单词而无需重新训练。具体来说，PrefixQuant识别高频异常值单词并将它们前缀在KV缓存中，防止在推理过程中生成异常值单词，并简化量化过程。据我们所知，PrefixQuant是第一个能够实现高效的每张张量静态量化，从而超越昂贵的每个单词动态量化的方法。例如，在W4A4KV4(4位权重、4位激活和4位KV缓存) Llama-3-8B中，PrefixQuant搭配每张张量静态量化实现了7.43的WikiText2迷惑度和71.08%的5个常识推理任务的平均准确率，优于以往的每个单词动态量化方法，如QuaRot，迷惑度提高了0.98并且准确度提高了5.98个百分点。此外，使用PrefixQuant的W4A4量化模型的推理速度比FP16模型快1.60倍到2.81倍，比QuaRot模型快1.2倍到1.3倍。我们的代码可以在\url{https://github.com/ChenMnZ/PrefixQuant}上找到。

更新时间: 2024-10-07 17:59:35

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.05265v1

Regression Conformal Prediction under Bias

Uncertainty quantification is crucial to account for the imperfect predictions of machine learning algorithms for high-impact applications. Conformal prediction (CP) is a powerful framework for uncertainty quantification that generates calibrated prediction intervals with valid coverage. In this work, we study how CP intervals are affected by bias - the systematic deviation of a prediction from ground truth values - a phenomenon prevalent in many real-world applications. We investigate the influence of bias on interval lengths of two different types of adjustments -- symmetric adjustments, the conventional method where both sides of the interval are adjusted equally, and asymmetric adjustments, a more flexible method where the interval can be adjusted unequally in positive or negative directions. We present theoretical and empirical analyses characterizing how symmetric and asymmetric adjustments impact the "tightness" of CP intervals for regression tasks. Specifically for absolute residual and quantile-based non-conformity scores, we prove: 1) the upper bound of symmetrically adjusted interval lengths increases by $2|b|$ where $b$ is a globally applied scalar value representing bias, 2) asymmetrically adjusted interval lengths are not affected by bias, and 3) conditions when asymmetrically adjusted interval lengths are guaranteed to be smaller than symmetric ones. Our analyses suggest that even if predictions exhibit significant drift from ground truth values, asymmetrically adjusted intervals are still able to maintain the same tightness and validity of intervals as if the drift had never happened, while symmetric ones significantly inflate the lengths. We demonstrate our theoretical results with two real-world prediction tasks: sparse-view computed tomography (CT) reconstruction and time-series weather forecasting. Our work paves the way for more bias-robust machine learning systems.

Updated: 2024-10-07 17:59:09

标题: 偏差下的回归一致预测

摘要: 不确定性量化对于考虑机器学习算法在高影响应用中的不完美预测至关重要。符合性预测（CP）是一种强大的不确定性量化框架，可以生成具有有效覆盖率的校准预测区间。在这项工作中，我们研究了CP区间如何受到偏差的影响 - 预测与地面实况值之间的系统偏差 - 这是许多实际应用中普遍存在的现象。我们研究了两种不同类型调整 - 对称调整和非对称调整对CP区间长度的影响。对称调整是传统方法，两侧的区间被等量调整，而非对称调整是一种更灵活的方法，区间可以在正向或负向调整不等量。我们提出了理论和实证分析，描述对称和非对称调整如何影响回归任务中CP区间的“紧密度”。特别是对于绝对残差和基于分位数的非符合分数，我们证明：1）对称调整区间长度的上限增加了$2|b|$，其中$b$是代表偏差的全局应用标量值，2）非对称调整的区间长度不受偏差影响，3）在何种条件下非对称调整的区间长度保证小于对称调整的情况。我们的分析表明，即使预测与地面实况值存在显著偏移，非对称调整的区间仍能保持与偏移从未发生时相同的紧密度和有效性，而对称调整会显著增加长度。我们通过两个真实世界的预测任务展示了我们的理论结果：稀疏视图计算机断层扫描（CT）重建和时间序列天气预测。我们的工作为更具偏差鲁棒性的机器学习系统铺平了道路。

更新时间: 2024-10-07 17:59:09

领域: stat.ML,cs.AI,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2410.05263v1

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Reading dense text and locating objects within images are fundamental abilities for Large Vision-Language Models (LVLMs) tasked with advanced jobs. Previous LVLMs, including superior proprietary models like GPT-4o, have struggled to excel in both tasks simultaneously. Moreover, previous LVLMs with fine-grained perception cost thousands of tokens per image, making them resource-intensive. We present TextHawk2, a bilingual LVLM featuring efficient fine-grained perception and demonstrating cutting-edge performance across general-purpose, OCR, and grounding tasks with 16 times fewer image tokens. Critical improvements include: (1) Token Compression: Building on the efficient architecture of its predecessor, TextHawk2 significantly reduces the number of tokens per image by 16 times, facilitating training and deployment of the TextHawk series with minimal resources. (2) Visual Encoder Reinforcement: We enhance the visual encoder through LVLM co-training, unlocking its potential for previously unseen tasks like Chinese OCR and grounding. (3) Data Diversity: We maintain a comparable scale of 100 million samples while diversifying the sources of pre-training data. We assess TextHawk2 across multiple benchmarks, where it consistently delivers superior performance and outperforms closed-source models of similar scale, such as achieving 78.4% accuracy on OCRBench, 81.4% accuracy on ChartQA, 89.6% ANLS on DocVQA, and 88.1% accuracy@0.5 on RefCOCOg-test.

Updated: 2024-10-07 17:58:35

标题: TextHawk2：一个大型的视觉语言模型在双语OCR和定位方面表现出色，仅使用16倍更少的标记。

摘要: 阅读密集文本和在图像中定位对象是大型视觉语言模型（LVLMs）在执行高级任务时的基本能力。先前的LVLMs，包括优秀的专有模型如GPT-4o，在同时优秀地完成这两项任务方面一直面临困难。此外，先前的具有细粒度感知的LVLMs每个图像成本高达数千个标记，使其资源密集型。我们提出了TextHawk2，这是一个双语LVLM，具有高效的细粒度感知，并在通用、OCR和定位任务中展示了尖端性能，图像标记数量减少了16倍。关键改进包括：（1）标记压缩：在其前身的高效架构基础上，TextHawk2显著减少了每个图像的标记数量，使得TextHawk系列的训练和部署所需资源最小化。（2）视觉编码器强化：我们通过LVLM共同训练增强了视觉编码器，释放了其在以前未见任务（如中文OCR和定位）中的潜力。（3）数据多样性：我们在保持1亿个样本规模的同时，通过多样化的预训练数据来源。我们在多个基准测试中评估了TextHawk2，在这些测试中，它始终提供优越的性能，并超越了类似规模的闭源模型，例如在OCRBench上达到了78.4％的准确率，在ChartQA上达到了81.4％的准确率，在DocVQA上达到了89.6％的ANLS，在RefCOCOg-test上达到了88.1％的准确率@0.5。

更新时间: 2024-10-07 17:58:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.05261v1

Differential Transformer

Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, the differential attention mechanism calculates attention scores as the difference between two separate softmax attention maps. The subtraction cancels noise, promoting the emergence of sparse attention patterns. Experimental results on language modeling show that Diff Transformer outperforms Transformer in various settings of scaling up model size and training tokens. More intriguingly, it offers notable advantages in practical applications, such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers. By being less distracted by irrelevant context, Diff Transformer can mitigate hallucination in question answering and text summarization. For in-context learning, Diff Transformer not only enhances accuracy but is also more robust to order permutation, which was considered as a chronic robustness issue. The results position Diff Transformer as a highly effective and promising architecture to advance large language models.

Updated: 2024-10-07 17:57:38

标题: 差动变压器

摘要: Transformer倾向于过度分配注意力到无关的上下文。在这项工作中，我们引入了Diff Transformer，它在放大注意力到相关上下文的同时取消噪声。具体而言，差分注意力机制将注意力分数计算为两个独立softmax注意力图之间的差异。减法消除了噪音，促进了稀疏注意力模式的出现。在语言建模的实验结果表明，Diff Transformer在不同设置下的模型规模和训练令牌上优于Transformer。更有趣的是，它在实际应用中提供了显著的优势，如长上下文建模，关键信息检索，幻觉减轻，上下文学习以及减少激活异常值。通过减少对无关上下文的干扰，Diff Transformer可以减轻问答和文本摘要中的幻觉。对于上下文学习，Diff Transformer不仅提高了准确性，而且对于顺序排列更加稳健，这被认为是一个慢性稳健性问题。结果表明Diff Transformer是一种高效和有前途的架构，可以推进大型语言模型。

更新时间: 2024-10-07 17:57:38

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.05258v1

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment

Reinforcement learning from human feedback (RLHF) methods are emerging as a way to fine-tune diffusion models (DMs) for visual generation. However, commonly used on-policy strategies are limited by the generalization capability of the reward model, while off-policy approaches require large amounts of difficult-to-obtain paired human-annotated data, particularly in visual generation tasks. To address the limitations of both on- and off-policy RLHF, we propose a preference optimization method that aligns DMs with preferences without relying on reward models or paired human-annotated data. Specifically, we introduce a Semi-Policy Preference Optimization (SePPO) method. SePPO leverages previous checkpoints as reference models while using them to generate on-policy reference samples, which replace "losing images" in preference pairs. This approach allows us to optimize using only off-policy "winning images." Furthermore, we design a strategy for reference model selection that expands the exploration in the policy space. Notably, we do not simply treat reference samples as negative examples for learning. Instead, we design an anchor-based criterion to assess whether the reference samples are likely to be winning or losing images, allowing the model to selectively learn from the generated reference samples. This approach mitigates performance degradation caused by the uncertainty in reference sample quality. We validate SePPO across both text-to-image and text-to-video benchmarks. SePPO surpasses all previous approaches on the text-to-image benchmarks and also demonstrates outstanding performance on the text-to-video benchmarks. Code will be released in https://github.com/DwanZhang-AI/SePPO.

Updated: 2024-10-07 17:56:53

标题: SePPO：半策略偏好优化用于扩散对齐

摘要: 人类反馈强化学习（RLHF）方法正在成为微调扩散模型（DMs）用于视觉生成的一种方式。然而，常用的基于策略的方法受限于奖励模型的泛化能力，而基于离线策略的方法则需要大量难以获取的人类标注数据，特别是在视觉生成任务中。为了解决基于策略和离线策略RLHF的局限性，我们提出了一种偏好优化方法，该方法将DMs与偏好对齐，而无需依赖奖励模型或配对的人类标注数据。具体来说，我们引入了一种半策略偏好优化（SePPO）方法。SePPO利用先前的检查点作为参考模型，同时利用它们生成基于策略的参考样本，这些样本取代了偏好对中的“失败图像”。这种方法使我们能够仅使用离线策略的“成功图像”进行优化。此外，我们设计了一种参考模型选择策略，扩展了策略空间中的探索。值得注意的是，我们不仅仅将参考样本视为负面示例进行学习。相反，我们设计了一个基于锚点的标准，评估参考样本是否可能是成功或失败图像，从而使模型能够有选择地从生成的参考样本中学习。这种方法减轻了由于参考样本质量的不确定性而导致的性能下降。我们通过文本到图像和文本到视频基准验证了SePPO。SePPO在文本到图像基准上超越了所有先前的方法，并在文本到视频基准上表现出色。代码将在https://github.com/DwanZhang-AI/SePPO发布。

更新时间: 2024-10-07 17:56:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.05255v1

Diffusion Model Predictive Control

We propose Diffusion Model Predictive Control (D-MPC), a novel MPC approach that learns a multi-step action proposal and a multi-step dynamics model, both using diffusion models, and combines them for use in online MPC. On the popular D4RL benchmark, we show performance that is significantly better than existing model-based offline planning methods using MPC and competitive with state-of-the-art (SOTA) model-based and model-free reinforcement learning methods. We additionally illustrate D-MPC's ability to optimize novel reward functions at run time and adapt to novel dynamics, and highlight its advantages compared to existing diffusion-based planning baselines.

Updated: 2024-10-07 17:56:47

标题: 扩散模型预测控制

摘要: 我们提出了扩散模型预测控制（D-MPC），这是一种新颖的模型预测控制方法，它利用扩散模型学习多步动作提议和多步动力学模型，并将它们结合在一起用于在线模型预测控制。在流行的D4RL基准测试中，我们展示了明显优于现有基于模型的离线规划方法使用MPC的性能，并与最先进的基于模型和无模型强化学习方法相竞争。我们另外展示了D-MPC在运行时优化新颖奖励函数和适应新颖动力学的能力，并强调其与现有基于扩散的规划基线相比的优势。

更新时间: 2024-10-07 17:56:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05364v1

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

Large Language Models (LLMs) show significant potential in economic and strategic interactions, where communication via natural language is often prevalent. This raises key questions: Do LLMs behave rationally? Can they mimic human behavior? Do they tend to reach an efficient and fair outcome? What is the role of natural language in the strategic interaction? How do characteristics of the economic environment influence these dynamics? These questions become crucial concerning the economic and societal implications of integrating LLM-based agents into real-world data-driven systems, such as online retail platforms and recommender systems. While the ML community has been exploring the potential of LLMs in such multi-agent setups, varying assumptions, design choices and evaluation criteria across studies make it difficult to draw robust and meaningful conclusions. To address this, we introduce a benchmark for standardizing research on two-player, sequential, language-based games. Inspired by the economic literature, we define three base families of games with consistent parameterization, degrees of freedom and economic measures to evaluate agents' performance (self-gain), as well as the game outcome (efficiency and fairness). We develop an open-source framework for interaction simulation and analysis, and utilize it to collect a dataset of LLM vs. LLM interactions across numerous game configurations and an additional dataset of human vs. LLM interactions. Through extensive experimentation, we demonstrate how our framework and dataset can be used to: (i) compare the behavior of LLM-based agents to human players in various economic contexts; (ii) evaluate agents in both individual and collective performance measures; and (iii) quantify the effect of the economic characteristics of the environments on the behavior of agents.

Updated: 2024-10-07 17:55:35

标题: 欢乐合唱团：一个统一框架和基准测试，用于基于语言的经济环境

摘要: 大型语言模型（LLMs）在经济和战略互动中显示出重要潜力，其中通过自然语言进行沟通往往很普遍。这引发了关键问题：LLMs是否表现理性？它们能够模仿人类行为吗？它们是否倾向于达到高效和公平的结果？自然语言在战略互动中的作用是什么？经济环境的特征如何影响这些动态？这些问题在将基于LLM的代理集成到现实世界的数据驱动系统（如在线零售平台和推荐系统）中的经济和社会影响方面变得至关重要。虽然机器学习社区一直在探讨LLMs在这种多代理设置中的潜力，但研究中的假设、设计选择和评估标准的差异使得很难得出稳健和有意义的结论。为了解决这个问题，我们引入了一个基准，用于标准化关于两人、顺序、基于语言的游戏的研究。受经济文献的启发，我们定义了三种具有一致参数化、自由度和经济度量的游戏基本系列，用于评估代理的表现（自身收益）以及游戏结果（效率和公平性）。我们开发了一个开源框架用于交互模拟和分析，并利用它收集了LLM与LLM之间在多种游戏配置中的互动数据集，以及人类与LLM之间的额外数据集。通过广泛的实验，我们展示了我们的框架和数据集如何用于：（i）比较LLM代理在各种经济背景中与人类玩家的行为；（ii）评估代理的个体和集体表现指标；以及（iii）量化环境的经济特征对代理行为的影响。

更新时间: 2024-10-07 17:55:35

领域: cs.CL,cs.AI,cs.CY,cs.GT,cs.LG

下载: http://arxiv.org/abs/2410.05254v1

Causal Micro-Narratives

We present a novel approach to classify causal micro-narratives from text. These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject. The approach requires only a subject-specific ontology of causes and effects, and we demonstrate it with an application to inflation narratives. Using a human-annotated dataset spanning historical and contemporary US news articles for training, we evaluate several large language models (LLMs) on this multi-label classification task. The best-performing model--a fine-tuned Llama 3.1 8B--achieves F1 scores of 0.87 on narrative detection and 0.71 on narrative classification. Comprehensive error analysis reveals challenges arising from linguistic ambiguity and highlights how model errors often mirror human annotator disagreements. This research establishes a framework for extracting causal micro-narratives from real-world data, with wide-ranging applications to social science research.

Updated: 2024-10-07 17:55:10

标题: 因果微叙事

摘要: 我们提出了一种新颖的方法，用于从文本中分类因果微叙事。这些叙事是关于目标主题原因和/或影响的句子级解释。该方法仅需要特定主题的因果本体论，我们以通货膨胀叙事的应用为例进行了演示。使用一个涵盖历史和当代美国新闻文章的人工标记数据集进行训练，我们评估了几个大型语言模型（LLMs）在这个多标签分类任务上的表现。表现最好的模型--一个经过微调的 Llama 3.1 8B--在叙事检测上达到了0.87的F1分数，在叙事分类上达到了0.71的F1分数。全面的错误分析揭示了由于语言歧义而产生的挑战，并突显了模型错误通常反映了人类标注者的分歧。这项研究建立了一个从现实数据中提取因果微叙事的框架，具有广泛的社会科学研究应用。

更新时间: 2024-10-07 17:55:10

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.05252v1

Block MedCare: Advancing healthcare through blockchain integration

In an era driven by information exchange, transparency and security hold crucial importance, particularly within the healthcare industry, where data integrity and confidentiality are paramount. This paper investigates the integration of blockchain technology in healthcare, focusing on its potential to revolutionize Electronic Health Records (EHR) management and data sharing. By leveraging Ethereum-based blockchain implementations and smart contracts, we propose a novel system that empowers patients to securely store and manage their medical data. Our research addresses critical challenges in implementing blockchain in healthcare, including scalability, user privacy, and regulatory compliance. We propose a solution that combines digital signatures, Role-Based Access Control, and a multi-layered architecture to enhance security and ensure controlled access. The system's key functions, including user registration, data append, and data retrieval, are facilitated through smart contracts, providing a secure and efficient mechanism for managing health information. To validate our approach, we developed a decentralized application (dApp) that demonstrates the practical implementation of our blockchain-based healthcare solution. The dApp incorporates user-friendly interfaces for patients, doctors, and administrators, showcasing the system's potential to streamline healthcare processes while maintaining data security and integrity. Additionally, we conducted a survey to gain insights into the perceived benefits and challenges of blockchain adoption in healthcare. The results indicate strong interest among healthcare professionals and IT experts, while also highlighting concerns about integration costs and technological complexity. Our findings...

Updated: 2024-10-07 17:54:13

标题: Block MedCare：通过区块链整合推动医疗保健

摘要: 在一个由信息交流驱动的时代，透明度和安全性尤为重要，尤其是在医疗保健行业内，数据完整性和保密性至关重要。本文调查了区块链技术在医疗保健领域的整合，重点关注其在革新电子健康记录（EHR）管理和数据共享方面的潜力。通过利用基于以太坊的区块链实现和智能合约，我们提出了一个新颖的系统，赋予患者安全地存储和管理他们的医疗数据的能力。我们的研究解决了在医疗保健领域实施区块链所面临的关键挑战，包括可扩展性、用户隐私和监管合规性。我们提出了一个解决方案，结合了数字签名、基于角色的访问控制和多层架构，以增强安全性并确保受控访问。系统的关键功能，包括用户注册、数据附加和数据检索，通过智能合约实现，为管理健康信息提供了安全高效的机制。为了验证我们的方法，我们开发了一个去中心化应用程序（dApp），展示了我们基于区块链的医疗保健解决方案的实际实施。该dApp为患者、医生和管理员提供了用户友好的界面，展示了该系统在简化医疗流程同时保持数据安全性和完整性方面的潜力。此外，我们进行了一项调查，以了解医疗行业对区块链采用的感知利益和挑战。结果表明，医疗专业人员和IT专家对此表现出浓厚兴趣，同时也突出了有关整合成本和技术复杂性的担忧。我们的发现...

更新时间: 2024-10-07 17:54:13

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2410.05251v1

SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

To induce desired behaviors in large language models (LLMs) for interaction-driven tasks, the instruction-tuning stage typically trains LLMs on instruction-response pairs using the next-token prediction (NTP) loss. Previous work aiming to improve instruction-tuning performance often emphasizes the need for higher-quality supervised fine-tuning (SFT) datasets, which typically involves expensive data filtering with proprietary LLMs or labor-intensive data generation by human annotators. However, these approaches do not fully leverage the datasets' intrinsic properties, resulting in high computational and labor costs, thereby limiting scalability and performance gains. In this paper, we propose SFTMix, a novel recipe that elevates instruction-tuning performance beyond the conventional NTP paradigm, without the need for well-curated datasets. Observing that LLMs exhibit uneven confidence across the semantic representation space, we argue that examples with different confidence levels should play distinct roles during the instruction-tuning process. Based on this insight, SFTMix leverages training dynamics to identify examples with varying confidence levels, then applies a Mixup-based regularization to mitigate overfitting on confident examples while propagating supervision signals to improve learning on relatively unconfident ones. This approach enables SFTMix to significantly outperform NTP across a wide range of instruction-following and healthcare domain-specific SFT tasks, demonstrating its adaptability to diverse LLM families and scalability to datasets of any size. Comprehensive ablation studies further verify the robustness of SFTMix's design choices, underscoring its versatility in consistently enhancing performance across different LLMs and datasets in broader natural language processing applications.

Updated: 2024-10-07 17:52:21

标题: SFTMix: 通过混合配方提升语言模型指导调整

摘要: 为了在大型语言模型（LLMs）中诱导所需的行为以进行任务驱动的互动，指令调整阶段通常使用下一个标记预测（NTP）损失来训练LLMs，以获得指令-响应对。以往旨在提高指令调整性能的工作通常强调对更高质量的监督微调（SFT）数据集的需求，这通常涉及使用专有LLMs进行昂贵的数据过滤或由人类标注者进行劳动密集型数据生成。然而，这些方法未充分利用数据集的内在特性，导致高计算和劳动成本，从而限制了可扩展性和性能提升。在本文中，我们提出了SFTMix，这是一种新颖的方法，将指令调整性能提升到超越传统NTP范式之外，而无需精心策划的数据集。观察到LLMs在语义表示空间中表现出不均匀的信心，我们认为具有不同信心水平的示例在指令调整过程中应发挥不同作用。基于这一观点，SFTMix利用训练动态来识别具有不同信心水平的示例，然后应用基于Mixup的正则化来减轻对有信心示例的过拟合，同时传播监督信号以改善对相对不自信的示例的学习。这种方法使SFTMix能够在广泛的指令遵循和医疗保健领域特定的SFT任务中显著优于NTP，展示了其对不同LLM系列的适应性和对任何大小数据集的可扩展性。全面的消融研究进一步验证了SFTMix设计选择的稳健性，强调了其在更广泛的自然语言处理应用中一贯提高不同LLMs和数据集性能的多功能性。

更新时间: 2024-10-07 17:52:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05248v1

SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)

Whether LLMs memorize their training data and what this means, from privacy leakage to detecting copyright violations -- has become a rapidly growing area of research over the last two years. In recent months, more than 10 new methods have been proposed to perform Membership Inference Attacks (MIAs) against LLMs. Contrary to traditional MIAs which rely on fixed -- but randomized -- records or models, these methods are mostly evaluated on datasets collected post-hoc. Sets of members and non-members, used to evaluate the MIA, are constructed using informed guesses after the release of a model. This lack of randomization raises concerns of a distribution shift between members and non-members. In the first part, we review the literature on MIAs against LLMs. While most work focuses on sequence-level MIAs evaluated in post-hoc setups, we show that a range of target models, motivations and units of interest have been considered in the literature. We then quantify distribution shifts present in the 6 datasets used in the literature, ranging from books to papers, using a bag of word classifier. Our analysis reveals that all of them suffer from severe distribution shifts. This challenges the validity of using such setups to measure LLM memorization and may undermine the benchmarking of recently proposed methods. Yet, all hope might not be lost. In the second part, we introduce important considerations to properly evaluate MIAs against LLMs and discuss potential ways forward: randomized test splits, injections of randomized (unique) sequences, randomized finetuning, and post-hoc control methods. While each option comes with its advantages and limitations, we believe they collectively provide solid grounds to guide the development of MIA methods and study LLM memorization. We conclude by proposing comprehensive, easy-to-use benchmarks for sequence- and document-level MIAs against LLMs.

Updated: 2024-10-07 17:49:13

标题: SoK：对LLMs的成员推断攻击正走入死胡同（以及如何解决）

摘要: 在过去两年中，关于LLMs是否记忆其训练数据以及这意味着什么，从隐私泄露到检测版权侵犯，已成为一个迅速增长的研究领域。在最近几个月，提出了超过10种新方法来执行对LLMs的成员推理攻击（MIAs）。与传统的依赖固定但随机化记录或模型的MIAs相反，这些方法大多在事后收集的数据集上进行评估。用于评估MIA的成员和非成员集合是在发布模型后根据知情猜测构建的。这种缺乏随机化引发了成员和非成员之间分布转移的担忧。在第一部分中，我们回顾了针对LLMs的MIAs的文献。尽管大多数工作侧重于在事后设置中评估的序列级MIAs，但我们展示了文献中考虑了一系列目标模型、动机和兴趣单位。然后，我们利用词袋分类器量化了文献中使用的6个数据集中存在的分布转移，从书籍到论文不等。我们的分析表明，所有这些数据集都存在严重的分布转移。这挑战了使用这样的设置来衡量LLM记忆的有效性，并可能削弱最近提出的方法的基准测试。然而，希望未必会失去。在第二部分中，我们介绍了正确评估针对LLMs的MIAs的重要考虑因素，并讨论了潜在的解决方法：随机化测试分割、随机化（唯一）序列的注入、随机化微调以及事后控制方法。虽然每个选项都有其优点和局限性，但我们相信它们共同为引导MIA方法的发展和研究LLM记忆提供了坚实基础。最后，我们提出了针对LLMs的序列和文档级MIAs的全面易于使用的基准测试。

更新时间: 2024-10-07 17:49:13

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.17975v2

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Multimodal large language models (MLLMs) are transforming the capabilities of graphical user interface (GUI) agents, facilitating their transition from controlled simulations to complex, real-world applications across various platforms. However, the effectiveness of these agents hinges on the robustness of their grounding capability. Current GUI agents predominantly utilize text-based representations such as HTML or accessibility trees, which, despite their utility, often introduce noise, incompleteness, and increased computational overhead. In this paper, we advocate a human-like embodiment for GUI agents that perceive the environment entirely visually and directly take pixel-level operations on the GUI. The key is visual grounding models that can accurately map diverse referring expressions of GUI elements to their coordinates on the GUI across different platforms. We show that a simple recipe, which includes web-based synthetic data and slight adaptation of the LLaVA architecture, is surprisingly effective for training such visual grounding models. We collect the largest dataset for GUI visual grounding so far, containing 10M GUI elements and their referring expressions over 1.3M screenshots, and use it to train UGround, a strong universal visual grounding model for GUI agents. Empirical results on six benchmarks spanning three categories (grounding, offline agent, and online agent) show that 1) UGround substantially outperforms existing visual grounding models for GUI agents, by up to 20% absolute, and 2) agents with UGround outperform state-of-the-art agents, despite the fact that existing agents use additional text-based input while ours only uses visual perception. These results provide strong support for the feasibility and promises of GUI agents that navigate the digital world as humans do.

Updated: 2024-10-07 17:47:50

标题: 像人类一样在数字世界中导航：GUI代理的通用视觉基础

摘要: 多模态大型语言模型（MLLMs）正在改变图形用户界面（GUI）代理的能力，促进它们从受控模拟过渡到跨各种平台的复杂真实世界应用。然而，这些代理的有效性取决于它们的基础能力的稳健性。当前的GUI代理主要利用基于文本的表示，如HTML或可访问性树，尽管它们很有用，但往往会引入噪音、不完整性和增加计算开销。在本文中，我们提倡为GUI代理赋予类似人类的体现，完全通过视觉感知环境，并直接在GUI上进行像素级操作。关键在于视觉基础模型，它们能够准确地将GUI元素的各种指称表达式映射到不同平台上的GUI坐标。我们展示一个简单的方法，包括基于网络的合成数据和对LLaVA架构的轻微调整，对于训练这样的视觉基础模型非常有效。我们迄今为止收集了GUI视觉基础的最大数据集，包含1000万个GUI元素及其在130万个屏幕截图上的指称表达式，并用它来训练UGround，一个强大的通用视觉基础模型，用于GUI代理。跨三个类别的六个基准测试的实证结果表明，1）UGround在GUI代理的视觉基础模型方面显著优于现有模型，绝对值高达20％，2）具有UGround的代理优于最先进的代理，尽管现有代理使用额外的基于文本的输入，而我们的代理仅使用视觉感知。这些结果为GUI代理以人类方式导航数字世界的可行性和前景提供了有力支持。

更新时间: 2024-10-07 17:47:50

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2410.05243v1

LLMs Are In-Context Reinforcement Learners

Large Language Models (LLMs) can learn new tasks through in-context supervised learning (i.e., ICL). This work studies if this ability extends to in-context reinforcement learning (ICRL), where models are not given gold labels in context, but only their past predictions and rewards. We show that a naive application of ICRL fails miserably, and identify the root cause as a fundamental deficiency at exploration, which leads to quick model degeneration. We propose an algorithm to address this deficiency by increasing test-time compute, as well as a compute-bound approximation. We use several challenging classification tasks to empirically show that our ICRL algorithms lead to effective learning from rewards alone, and analyze the characteristics of this ability and our methods. Overall, our results reveal remarkable ICRL abilities in LLMs.

Updated: 2024-10-07 17:45:00

标题: LLMs是上下文强化学习者

摘要: 大型语言模型（LLMs）可以通过上下文监督学习（即ICL）学习新任务。本文研究了这种能力是否扩展到上下文强化学习（ICRL），在这种情况下，模型不会在上下文中获得金标签，而只会获得它们的过去预测和奖励。我们发现，对ICRL的天真应用会遭遇惨败，并确定探索中的根本缺陷是快速模型退化的原因。我们提出了一种算法来解决这一缺陷，方法是增加测试时间的计算量，以及一个受计算量限制的近似方法。我们使用几个具有挑战性的分类任务来实证地表明我们的ICRL算法导致仅通过奖励进行有效学习，并分析这种能力和我们的方法的特征。总体而言，我们的结果揭示了LLMs在ICRL方面令人瞩目的能力。

更新时间: 2024-10-07 17:45:00

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05362v1

SimO Loss: Anchor-Free Contrastive Loss for Fine-Grained Supervised Contrastive Learning

We introduce a novel anchor-free contrastive learning (AFCL) method leveraging our proposed Similarity-Orthogonality (SimO) loss. Our approach minimizes a semi-metric discriminative loss function that simultaneously optimizes two key objectives: reducing the distance and orthogonality between embeddings of similar inputs while maximizing these metrics for dissimilar inputs, facilitating more fine-grained contrastive learning. The AFCL method, powered by SimO loss, creates a fiber bundle topological structure in the embedding space, forming class-specific, internally cohesive yet orthogonal neighborhoods. We validate the efficacy of our method on the CIFAR-10 dataset, providing visualizations that demonstrate the impact of SimO loss on the embedding space. Our results illustrate the formation of distinct, orthogonal class neighborhoods, showcasing the method's ability to create well-structured embeddings that balance class separation with intra-class variability. This work opens new avenues for understanding and leveraging the geometric properties of learned representations in various machine learning tasks.

Updated: 2024-10-07 17:41:10

标题: SimO Loss：无锚对比损失用于细粒度监督对比学习

摘要: 我们引入了一种新颖的无锚对比学习（AFCL）方法，利用我们提出的相似性-正交性（SimO）损失。我们的方法最小化了一个半度量判别损失函数，同时优化了两个关键目标：减少相似输入的嵌入之间的距离和正交性，同时最大化这些指标以便于不相似输入，促进更精细的对比学习。由SimO损失驱动的AFCL方法在嵌入空间中创建了一个纤维束拓扑结构，形成了类别特定的、内部连贯但正交的邻域。我们在CIFAR-10数据集上验证了我们方法的有效性，提供了展示SimO损失对嵌入空间影响的可视化结果。我们的结果展示了明显的、正交的类别邻域的形成，展示了该方法在平衡类别分离和类内变化性方面创建结构良好的嵌入的能力。这项工作为理解和利用学习表示的几何属性在各种机器学习任务中开辟了新的途径。

更新时间: 2024-10-07 17:41:10

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.05233v1

SymmetryLens: A new candidate paradigm for unsupervised symmetry learning via locality and equivariance

We develop a new, unsupervised symmetry learning method that starts with raw data, and gives the minimal (discrete) generator of an underlying Lie group of symmetries, together with a symmetry equivariant representation of the data. The method is able to learn the pixel translation operator from a dataset with only an approximate translation symmetry, and can learn quite different types of symmetries which are not apparent to the naked eye, equally well. The method is based on the formulation of an information-theoretic loss function that measures both the degree to which the dataset is symmetric under a given candidate symmetry, and also, the degree of locality of the samples in the dataset with respect to this symmetry. We demonstrate that this coupling between symmetry and locality, together with a special optimization technique developed for entropy estimation, results in a highly stable system that gives reproducible results. The symmetry actions we consider are group representations, however, we believe the approach has the potential to be generalized to more general, nonlinear actions of non-commutative Lie groups.

Updated: 2024-10-07 17:40:51

标题: SymmetryLens：一种通过局部性和等变性进行无监督对称学习的新候选范式

摘要: 我们开发了一种新的无监督对称学习方法，该方法从原始数据开始，给出了一个底层Lie群的最小（离散）生成元，以及数据的对称等变表示。该方法能够从仅具有近似平移对称性的数据集中学习像素平移算子，并且能够同样良好地学习不明显的不同类型的对称性。该方法基于一个信息论损失函数的制定，该函数既衡量了数据集在给定候选对称性下的对称性程度，又衡量了数据集样本相对于这种对称性的局部性程度。我们展示了对称性和局部性之间的耦合，以及为熵估计开发的特殊优化技术，导致了一个高度稳定的系统，能够产生可重复的结果。我们考虑的对称性行为是群表示，然而，我们认为这种方法有潜力推广到非交换Lie群的更一般、非线性行为。

更新时间: 2024-10-07 17:40:51

领域: cs.LG

下载: http://arxiv.org/abs/2410.05232v1

Generative Parameter-Efficient Fine-Tuning

We present Generative Parameter-Efficient Fine-Tuning (GIFT) for adapting pretrained Transformer backbones on downstream tasks. GIFT learns to generate the fine-tuned weights for a layer directly from its pretrained weights. The GIFT network is parameterized in a minimally-simple way by two linear layers (without bias terms), and is shared by different pretrained layers selected for fine-tuning (e.g., the Query layers), which result in significantly fewer trainable parameters compared to the layer-specific methods like Low-Rank Adapter (LoRA). We also show this formulation bridges parameter-efficient fine-tuning and representation fine-tuning. We perform comprehensive experiments on natural language tasks (commonsense and arithmetic reasoning, instruction tuning, and sequence classification) and computer vision tasks (fine-grained classification). We obtain the best performance and parameter efficiency among baselines on commonsense and arithmetic reasoning, and instruction following using the Llama family of models and on visual recognition benchmarks using Vision Transformers. Notably, compared to LoRA, we obtain 5.7% absolute increase in average accuracy with 14 times reduction of parameters on Commonsense170k using Llama-3 (8B), and 5.4% absolute increase in the win rate with 4 times reduction of parameters using Llama-2 (7B) during instruction tuning. Our GIFT also obtains a slightly higher win rate on instruction tuning than GPT 3.5 (Turbo 1106).

Updated: 2024-10-07 17:40:32

标题: 生成参数高效微调

摘要: 我们提出了适用于在下游任务上调整预训练Transformer骨干的生成参数高效微调（GIFT）。GIFT学习直接从其预训练权重生成一层的微调权重。GIFT网络通过两个线性层（无偏置项）以最简单的方式参数化，并被选定用于微调的不同预训练层（例如Query层）所共享，与像Low-Rank Adapter（LoRA）这样的层特定方法相比，可显著减少可训练参数。我们还展示了这种公式桥接了参数高效微调和表示微调。我们在自然语言任务（常识和算术推理、指令调整和序列分类）以及计算机视觉任务（细粒度分类）上进行了全面实验。在常识和算术推理以及使用Llama家族模型的指令遵循中，以及在使用Vision Transformers的视觉识别基准上，我们在基线中获得了最佳性能和参数效率。值得注意的是，与LoRA相比，我们在使用Llama-3（8B）进行常识推理时获得了平均准确率的5.7%绝对增加，并且参数减少了14倍；在指令调整中，使用Llama-2（7B）时，我们获得了5.4%绝对的胜率增加，参数减少了4倍。我们的GIFT在指令调整上的胜率略高于GPT 3.5（Turbo 1106）。

更新时间: 2024-10-07 17:40:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.00700v4

ChatVis: Automating Scientific Visualization with a Large Language Model

We develop an iterative assistant we call ChatVis that can synthetically generate Python scripts for data analysis and visualization using a large language model (LLM). The assistant allows a user to specify the operations in natural language, attempting to generate a Python script for the desired operations, prompting the LLM to revise the script as needed until it executes correctly. The iterations include an error detection and correction mechanism that extracts error messages from the execution of the script and subsequently prompts LLM to correct the error. Our method demonstrates correct execution on five canonical visualization scenarios, comparing results with ground truth. We also compared our results with scripts generated by several other LLMs without any assistance. In every instance, ChatVis successfully generated the correct script, whereas the unassisted LLMs failed to do so. The code is available on GitHub: https://github.com/tanwimallick/ChatVis/.

Updated: 2024-10-07 17:37:59

标题: ChatVis：利用大型语言模型自动化科学可视化

摘要: 我们开发了一个迭代助手，我们称之为ChatVis，它可以使用大型语言模型（LLM）合成生成用于数据分析和可视化的Python脚本。该助手允许用户用自然语言指定操作，尝试生成所需操作的Python脚本，并提示LLM根据需要修改脚本直到正确执行。迭代包括一个错误检测和校正机制，从脚本执行中提取错误消息，随后提示LLM校正错误。我们的方法在五种典型的可视化场景下演示了正确的执行，将结果与基准真值进行比较。我们还将结果与几种其他没有任何辅助的LLM生成的脚本进行了比较。在每个实例中，ChatVis成功生成了正确的脚本，而无辅助的LLM则未能成功。代码可在GitHub上找到：https://github.com/tanwimallick/ChatVis/。

更新时间: 2024-10-07 17:37:59

领域: cs.HC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.11863v1

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics. To address these concerns, we conduct a large-scale study on several SOTA open and closed models. To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions. GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models.Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question. Specifically, the performance of all models declines when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, we investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data. Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models, even though the clause doesn't contribute to the reasoning chain needed for the final answer. Overall, our work offers a more nuanced understanding of LLMs' capabilities and limitations in mathematical reasoning.

Updated: 2024-10-07 17:36:37

标题: "GSM-Symbolic: 理解大型语言模型中数学推理的局限性"

摘要: 最近大型语言模型（LLM）的进展引起了人们对其形式推理能力的兴趣，特别是在数学领域。GSM8K基准广泛用于评估模型在小学级别问题上的数学推理能力。尽管近年来LLMs在GSM8K上的表现显著提高，但它们的数学推理能力是否真正进步仍不清楚，这引发了关于报告指标可靠性的疑问。为了解决这些问题，我们对几种SOTA开放和封闭模型进行了大规模研究。为了克服现有评估的局限性，我们引入了GSM-Symbolic，这是一个从符号模板创建的改进基准，允许生成多样化的问题。GSM-Symbolic使得评估更可控，提供了关键见解和更可靠的指标来衡量模型的推理能力。我们的研究结果显示，LLMs在回答同一问题的不同实例时表现出明显的变化。具体来说，当仅改变GSM-Symbolic基准中问题中的数值时，所有模型的表现都会下降。此外，我们调查了这些模型中数学推理的脆弱性，并显示随着问题中子句数量的增加，它们的表现显著恶化。我们假设这种下降是因为当前的LLMs无法进行真正的逻辑推理；它们只是复制来自训练数据的推理步骤。即使一个似乎与问题相关的单个子句会导致所有最先进模型的性能显著下降（高达65%），即使该子句并不对最终答案所需的推理链有贡献。总的来说，我们的工作为更深入地理解LLMs在数学推理中的能力和局限性提供了帮助。

更新时间: 2024-10-07 17:36:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05229v1

ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control

We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, \emph{${\epsilon}{t}$-greedy}, which generates exploratory options for exploring less-visited states. We prove that search using $\epsilon t$-greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, \emph{GDRB}, and implement \emph{longest n-step returns}. The resulting algorithm, \emph{ETGL-DDPG}, integrates all three techniques: \bm{$\epsilon t$}-greedy, \textbf{G}DRB, and \textbf{L}ongest $n$-step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the performance of DDPG in this setting.

Updated: 2024-10-07 17:31:52

标题: ETGL-DDPG：一种用于稀疏奖励连续控制的深度确定性策略梯度算法

摘要: 我们考虑在稀疏奖励的强化学习背景下的深度确定性策略梯度（DDPG）。为了增强探索性，我们引入了一种搜索过程，称为\emph{${\epsilon}{t}$-greedy}，该过程生成用于探索少访问状态的探索选项。我们证明了使用$\epsilon t$-greedy的搜索在轻微MDP假设下具有多项式样本复杂性。为了更有效地利用受奖励过渡提供的信息，我们开发了一种新的双重经验重播缓冲区框架，\emph{GDRB}，并实现了\emph{longest n-step returns}。由此产生的算法\emph{ETGL-DDPG}将所有三种技术：\bm{$\epsilon t$}-greedy，\textbf{G}DRB和\textbf{L}ongest $n$-step，整合到DDPG中。我们在标准基准测试中评估了ETGL-DDPG，并证明它在所有测试的稀疏奖励连续环境中均优于DDPG以及其他最先进的方法。消融研究进一步突显了每种策略如何在这种设置中单独提升DDPG的性能。

更新时间: 2024-10-07 17:31:52

领域: cs.LG,cs.RO,stat.ML

下载: http://arxiv.org/abs/2410.05225v1

Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates

Fine-tuning large language models (LLMs) on instruction datasets is a common way to improve their generative capabilities. However, instruction datasets can be expensive and time-consuming to manually curate, and while LLM-generated data is less labor-intensive, it may violate user privacy agreements or terms of service of LLM providers. Therefore, we seek a way of constructing instruction datasets with samples that are not generated by humans or LLMs but still improve LLM generative capabilities. In this work, we introduce Cookbook, a framework that programmatically generates training data consisting of simple patterns over random tokens, resulting in a scalable, cost-effective approach that avoids legal and privacy issues. First, Cookbook uses a template -- a data generating Python function -- to produce training data that encourages the model to learn an explicit pattern-based rule that corresponds to a desired task. We find that fine-tuning on Cookbook-generated data is able to improve performance on its corresponding task by up to 52.7 accuracy points. Second, since instruction datasets improve performance on multiple downstream tasks simultaneously, Cookbook algorithmically learns how to mix data from various templates to optimize performance on multiple tasks. On the standard multi-task GPT4ALL evaluation suite, Mistral-7B fine-tuned using a Cookbook-generated dataset attains the best accuracy on average compared to other 7B parameter instruction-tuned models and is the best performing model on 3 out of 8 tasks. Finally, we analyze when and why Cookbook improves performance and present a metric that allows us to verify that the improvement is largely explained by the model's generations adhering better to template rules.

Updated: 2024-10-07 17:29:40

标题: 《菜谱：通过编程数据生成模板提高LLM生成能力的框架》

摘要: 在指导数据集上微调大型语言模型（LLMs）是改善它们生成能力的常见方法。然而，手工策划指导数据集可能昂贵且耗时，而LLM生成的数据虽然较少需要人力，但可能违反用户隐私协议或LLM提供商的服务条款。因此，我们寻求一种构建指导数据集的方法，其中样本不是由人类或LLMs生成，但仍然改善LLM的生成能力。在这项工作中，我们介绍了Cookbook，这是一个框架，可以以简单的模式在随机标记上自动生成训练数据，从而实现一种可扩展、具有成本效益的方法，避免了法律和隐私问题。首先，Cookbook使用模板（一个生成数据的Python函数）来生成鼓励模型学习与所需任务对应的明确基于模式的规则的训练数据。我们发现，在由Cookbook生成的数据上微调可以将其对应任务的性能提高高达52.7个准确度点。其次，由于指导数据集同时提高了多个下游任务的性能，Cookbook算法学习如何混合来自各种模板的数据以优化多个任务的性能。在标准的多任务GPT4ALL评估套件上，使用由Cookbook生成的数据集微调的Mistral-7B相较于其他7B参数的指导微调模型平均获得最佳准确度，并在8个任务中的3个任务中表现最佳。最后，我们分析了Cookbook何时以及为什么提高性能，并提出了一个指标，使我们能够验证改进主要是由于模型的生成更好地遵循模板规则。

更新时间: 2024-10-07 17:29:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.05224v1

Learning Successor Features with Distributed Hebbian Temporal Memory

This paper presents a novel approach to address the challenge of online temporal memory learning for decision-making under uncertainty in non-stationary, partially observable environments. The proposed algorithm, Distributed Hebbian Temporal Memory (DHTM), is based on factor graph formalism and a multicomponent neuron model. DHTM aims to capture sequential data relationships and make cumulative predictions about future observations, forming Successor Features (SF). Inspired by neurophysiological models of the neocortex, the algorithm utilizes distributed representations, sparse transition matrices, and local Hebbian-like learning rules to overcome the instability and slow learning process of traditional temporal memory algorithms like RNN and HMM. Experimental results demonstrate that DHTM outperforms LSTM and a biologically inspired HMM-like algorithm, CSCG, in the case of non-stationary datasets. Our findings suggest that DHTM is a promising approach for addressing the challenges of online sequence learning and planning in dynamic environments.

Updated: 2024-10-07 17:27:21

标题: 使用分布式赫布时序记忆学习继承特征

摘要: 本文提出了一种新颖的方法来解决非稳态、部分可观测环境下决策制定中的在线时间记忆学习挑战。所提出的算法，分布式赫比时序记忆（DHTM），基于因子图形式和多组分神经元模型。DHTM旨在捕捉顺序数据关系并对未来观察进行累积预测，形成继承特征（SF）。受新皮层神经生理模型启发，该算法利用分布式表示、稀疏过渡矩阵和局部赫比类学习规则来克服传统时间记忆算法（如RNN和HMM）的不稳定性和缓慢学习过程。实验结果表明，在非稳态数据集的情况下，DHTM优于LSTM和一种生物启发的类HMM算法CSCG。我们的发现表明，DHTM是解决动态环境中在线序列学习和规划挑战的一种有前途的方法。

更新时间: 2024-10-07 17:27:21

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2310.13391v3

Precise Model Benchmarking with Only a Few Observations

How can we precisely estimate a large language model's (LLM) accuracy on questions belonging to a specific topic within a larger question-answering dataset? The standard direct estimator, which averages the model's accuracy on the questions in each subgroup, may exhibit high variance for subgroups (topics) with small sample sizes. Synthetic regression modeling, which leverages the model's accuracy on questions about other topics, may yield biased estimates that are too unreliable for large subgroups. We prescribe a simple yet effective solution: an empirical Bayes (EB) estimator that balances direct and regression estimates for each subgroup separately, improving the precision of subgroup-level estimates of model performance. Our experiments on multiple datasets show that this approach consistently provides more precise estimates of the LLM performance compared to the direct and regression approaches, achieving substantial reductions in the mean squared error. Confidence intervals for EB estimates also have near-nominal coverage and are narrower compared to those for the direct estimator. Additional experiments on tabular and vision data validate the benefits of this EB approach.

Updated: 2024-10-07 17:26:31

标题: 只用少量观测数据进行精确模型基准测试

摘要: 我们如何准确地估计大型语言模型（LLM）在属于更大问答数据集中特定主题的问题上的准确性？标准直接估计器通过对每个子组中问题的准确性进行平均可能在样本量较小的子组（主题）中表现出高方差。合成回归建模利用模型对其他主题问题的准确性可能会产生偏倚估计，对于大的子组来说不够可靠。我们提出了一个简单但有效的解决方案：经验贝叶斯（EB）估计器，分别为每个子组平衡直接和回归估计，提高了模型性能子组水平估计的精度。我们在多个数据集上的实验证明，与直接和回归方法相比，这种方法始终提供了更精确的LLM性能估计，实现了均方误差的显著减少。EB估计的置信区间也具有接近名义覆盖率，并且与直接估计器相比更窄。对表格和视觉数据的额外实验验证了这种EB方法的好处。

更新时间: 2024-10-07 17:26:31

领域: cs.LG,cs.CL,cs.CV,stat.AP

下载: http://arxiv.org/abs/2410.05222v1

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks or standalone function calls. Solving challenging and practical requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs.To assess how well LLMs can solve challenging and practical tasks via programs, we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks. To evaluate LLMs rigorously, each task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area.

Updated: 2024-10-07 17:23:30

标题: 大代码基准测试：使用多样的函数调用和复杂指令对代码生成进行基准测试

摘要: 最近大型语言模型（LLMs）在Python代码方面取得了重大进展，极大地推动了任务自动化，任务范围从软件工程开发到通用推理。尽管当前基准测试表明LLMs可以像人类开发人员一样使用程序解决任务，但它们的评估大多局限于短小且独立的算法任务或独立的函数调用。解决具有挑战性和实际性的任务需要利用多样的函数调用作为工具，以有效地实现数据分析和Web开发等功能。此外，使用多个工具解决一个任务需要通过准确理解复杂指令进行组合推理。具备这两个特征对LLMs来说可能是一个巨大的挑战。为了评估LLMs通过程序解决具有挑战性和实际性的任务的能力，我们引入了BigCodeBench这一基准测试，挑战LLMs从139个库和7个领域中调用多个函数调用作为工具，共计1,140个细粒度任务。为了严格评估LLMs，每个任务包含5.6个测试用例，平均分支覆盖率为99%。此外，我们提出了一个面向自然语言的变体BigCodeBench-Instruct，它自动将原始文档字符串转换为仅包含必要信息的简短指令。我们对60个LLMs进行了广泛评估，结果显示LLMs目前还不能精确地遵循复杂指令使用函数调用，得分最高为60%，远低于人类的97%。这些结果强调了在这一领域进一步发展的必要性。

更新时间: 2024-10-07 17:23:30

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.15877v3

Full Line Code Completion: Bringing AI to Desktop

In recent years, several industrial solutions for the problem of multi-token code completion appeared, each making a great advance in the area but mostly focusing on cloud-based runtime and avoiding working on the end user's device. In this work, we describe our approach for building a multi-token code completion feature for the JetBrains' IntelliJ Platform, which we call Full Line Code Completion. The feature suggests only syntactically correct code and works fully locally, i.e., data querying and the generation of suggestions happens on the end user's machine. We share important time and memory-consumption restrictions, as well as design principles that a code completion engine should satisfy. Working entirely on the end user's device, our code completion engine enriches user experience while being not only fast and compact but also secure. We share a number of useful techniques to meet the stated development constraints and also describe offline and online evaluation pipelines that allowed us to make better decisions. Our online evaluation shows that the usage of the tool leads to 1.3 times more Python code in the IDE being produced by code completion. The described solution was initially started with a help of researchers and was then bundled into all JetBrains IDEs where it is now used by millions of users. Thus, we believe that this work is useful for bridging academia and industry, providing researchers with the knowledge of what happens when complex research-based solutions are integrated into real products.

Updated: 2024-10-07 17:23:25

标题: 全线代码完成：将人工智能引入桌面

摘要: 近年来，针对多令牌代码完成问题出现了几种工业解决方案，每种都在该领域取得了重大进展，但主要侧重于基于云的运行时，避免在最终用户设备上工作。在本文中，我们描述了我们为JetBrains的IntelliJ平台构建多令牌代码完成功能的方法，我们称之为全行代码完成。该功能仅建议语法正确的代码，并完全在本地工作，即数据查询和建议生成在最终用户的设备上进行。我们分享了重要的时间和内存消耗限制，以及代码完成引擎应满足的设计原则。我们的代码完成引擎完全在最终用户的设备上运行，丰富了用户体验，同时不仅快速、紧凑，而且安全。我们分享了一些有用的技术，以满足所述的开发约束，并描述了离线和在线评估流程，使我们能够做出更好的决策。我们的在线评估显示，使用该工具导致IDE中通过代码完成生成的Python代码增加了1.3倍。所描述的解决方案最初是在研究人员的帮助下开始的，然后捆绑到所有JetBrains IDE中，在那里现在被数百万用户使用。因此，我们相信这项工作对于搭建学术界和工业界之间的桥梁是有用的，为研究人员提供了将基于复杂研究的解决方案集成到实际产品中时会发生的情况的知识。

更新时间: 2024-10-07 17:23:25

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2405.08704v2

Stateful Large Language Model Serving with Pensieve

Large Language Models (LLMs) are wildly popular today and it is important to serve them efficiently. Existing LLM serving systems are stateless across requests. Consequently, when LLMs are used in the common setting of multi-turn conversations, a growing log of the conversation history must be processed alongside any request by the serving system at each turn, resulting in repeated processing. In this paper, we design $Pensieve$, a system optimized for multi-turn conversation LLM serving. $Pensieve$ maintains the conversation state across requests by caching previously processed history to avoid duplicate processing. $Pensieve$'s multi-tier caching strategy can utilize both GPU and CPU memory to efficiently store and retrieve cached data. $Pensieve$ also generalizes the recent PagedAttention kernel to support attention between multiple input tokens with a GPU cache spread over non-contiguous memory. Our evaluation shows that $Pensieve$ can achieve $1.14$-$3.0\times$ the throughput of vLLM and TensorRT-LLM and significantly reduce latency.

Updated: 2024-10-07 17:21:57

标题: 使用Pensieve实现具有状态的大型语言模型服务

摘要: 大型语言模型（LLMs）如今非常受欢迎，因此高效地为它们提供服务至关重要。现有的LLM服务系统在请求之间是无状态的。因此，在常见的多轮对话设置中使用LLMs时，对话历史记录必须在每一轮由服务系统处理，导致重复处理。在本文中，我们设计了$Pensieve$，这是一个针对多轮对话LLM服务进行优化的系统。$Pensieve$通过缓存先前处理过的历史记录来维护对话状态，以避免重复处理。$Pensieve$的多层缓存策略可以利用GPU和CPU内存来高效存储和检索缓存数据。$Pensieve$还将最近的PagedAttention内核推广，以支持在GPU缓存中在非连续内存上分布的多个输入令牌之间的注意力。我们的评估显示，$Pensieve$可以实现vLLM和TensorRT-LLM的吞吐量提高1.14至3.0倍，并显著减少延迟。

更新时间: 2024-10-07 17:21:57

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2312.05516v3

Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering (Published in Findings of EMNLP 2024)

Large-scale language models (LLMs) like ChatGPT have demonstrated impressive abilities in generating responses based on human instructions. However, their use in the medical field can be challenging due to their lack of specific, in-depth knowledge. In this study, we present a system called LLMs Augmented with Medical Textbooks (LLM-AMT) designed to enhance the proficiency of LLMs in specialized domains. LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules. These modules include a Query Augmenter, a Hybrid Textbook Retriever, and a Knowledge Self-Refiner. Together, they incorporate authoritative medical knowledge. Additionally, an LLM Reader aids in contextual understanding. Our experimental results on three medical QA tasks demonstrate that LLMAMT significantly improves response quality, with accuracy gains ranging from 11.6% to 16.6%. Notably, with GPT-4-Turbo as the base model, LLM-AMT outperforms the specialized Med-PaLM 2 model pre-trained on a massive amount of medical corpus by 2-3%. We found that despite being 100x smaller in size, medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain, boosting performance by 7.8%-13.7%.

Updated: 2024-10-07 17:21:45

标题: 使用医学教科书增强黑盒LLMs以进行生物医学问题回答（发表在EMNLP 2024发现中）

摘要: 大规模语言模型（LLMs）如ChatGPT已经展示出根据人类指令生成响应的令人印象深刻的能力。然而，它们在医疗领域的应用可能具有挑战性，因为它们缺乏具体、深入的知识。在这项研究中，我们提出了一个名为LLMs增强医学教科书（LLM-AMT）的系统，旨在提高LLMs在专业领域的熟练程度。LLM-AMT利用即插即用的模块将权威医学教科书整合到LLMs的框架中。这些模块包括查询增强器、混合教科书检索器和知识自我精炼器。它们共同整合权威医学知识。此外，LLM阅读器有助于上下文理解。我们在三个医学问答任务上的实验结果表明，LLMAMT显著提高了响应质量，准确率提高了11.6%至16.6%。值得注意的是，以GPT-4-Turbo为基础模型，LLM-AMT在性能上超过了专门针对大量医学文本预训练的Med-PaLM 2模型2-3%。我们发现，尽管体积小100倍，但作为检索语料库的医学教科书在医学领域被证明比维基百科更有效，可以将性能提升7.8%-13.7%。

更新时间: 2024-10-07 17:21:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2309.02233v3

$$\mathbf{L^2\cdot M = C^2}$$ Large Language Models are Covert Channels

Large Language Models (LLMs) have gained significant popularity recently. LLMs are susceptible to various attacks but can also improve the security of diverse systems. However, besides enabling more secure systems, how well do open source LLMs behave as covertext distributions to, e.g., facilitate censorship-resistant communication? In this paper, we explore open-source LLM-based covert channels. We empirically measure the security vs. capacity of an open-source LLM model (Llama-7B) to assess its performance as a covert channel. Although our results indicate that such channels are not likely to achieve high practical bitrates, we also show that the chance for an adversary to detect covert communication is low. To ensure our results can be used with the least effort as a general reference, we employ a conceptually simple and concise scheme and only assume public models.

Updated: 2024-10-07 17:21:00

标题: 大型语言模型是隐蔽通道

摘要: 近来，大型语言模型（LLMs）变得越来越受欢迎。LLMs容易受到各种攻击，但也可以提高各种系统的安全性。然而，除了使系统更安全外，开源LLMs在作为掩护文本分发方面的表现如何？比如，促进抗审查的通信？在本文中，我们探讨基于开源LLM的秘密通道。我们通过实证方法测量一个开源LLM模型（Llama-7B）的安全性和容量，以评估其作为秘密通道的性能。尽管我们的结果表明，这种通道不太可能实现高实际比特率，但我们也表明对手检测秘密通信的可能性较低。为确保我们的结果可以尽可能轻松地作为一般参考使用，我们采用一个概念简单而简洁的方案，并仅假设公共模型。

更新时间: 2024-10-07 17:21:00

领域: cs.CR

下载: http://arxiv.org/abs/2405.15652v2

Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

In this paper, we propose a new method to enhance compositional understanding in pre-trained vision and language models (VLMs) without sacrificing performance in zero-shot multi-modal tasks. Traditional fine-tuning approaches often improve compositional reasoning at the cost of degrading multi-modal capabilities, primarily due to the use of global hard negative (HN) loss, which contrasts global representations of images and texts. This global HN loss pushes HN texts that are highly similar to the original ones, damaging the model's multi-modal representations. To overcome this limitation, we propose Fine-grained Selective Calibrated CLIP (FSC-CLIP), which integrates local hard negative loss and selective calibrated regularization. These innovations provide fine-grained negative supervision while preserving the model's representational integrity. Our extensive evaluations across diverse benchmarks for both compositionality and multi-modal tasks show that FSC-CLIP not only achieves compositionality on par with state-of-the-art models but also retains strong multi-modal capabilities. Code is available at: https://github.com/ytaek-oh/fsc-clip.

Updated: 2024-10-07 17:16:20

标题: 保持预训练VLMs的多模态能力，以提高视觉-语言组合性

摘要: 在本文中，我们提出了一种新方法，可以增强预训练视觉和语言模型（VLMs）对组成的理解，同时不损害零样本多模态任务的性能。传统的微调方法通常会提高组成推理能力，但会降低多模态能力，主要是由于使用全局硬负（HN）损失，对图像和文本的全局表示进行对比。这种全局HN损失会推动高度相似于原始文本的HN文本，从而损害模型的多模态表示。为了克服这一限制，我们提出了细粒度选择性校准CLIP（FSC-CLIP），它集成了局部硬负损失和选择性校准正则化。这些创新提供了细粒度的负监督，同时保持模型的表示完整性。我们在多样化的基准测试中进行了广泛的评估，涵盖了组成和多模态任务，结果表明FSC-CLIP不仅能够达到与最先进模型相当的组成能力，而且保留了强大的多模态能力。源代码可在以下链接找到：https://github.com/ytaek-oh/fsc-clip。

更新时间: 2024-10-07 17:16:20

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.05210v1

Online Convex Optimization with a Separation Oracle

In this paper, we introduce a new projection-free algorithm for Online Convex Optimization (OCO) with a state-of-the-art regret guarantee among separation-based algorithms. Existing projection-free methods based on the classical Frank-Wolfe algorithm achieve a suboptimal regret bound of $O(T^{3/4})$, while more recent separation-based approaches guarantee a regret bound of $O(\kappa \sqrt{T})$, where $\kappa$ denotes the asphericity of the feasible set, defined as the ratio of the radii of the containing and contained balls. However, for ill-conditioned sets, $\kappa$ can be arbitrarily large, potentially leading to poor performance. Our algorithm achieves a regret bound of $\widetilde{O}(\sqrt{dT} + \kappa d)$, while requiring only $\widetilde{O}(1)$ calls to a separation oracle per round. Crucially, the main term in the bound, $\widetilde{O}(\sqrt{d T})$, is independent of $\kappa$, addressing the limitations of previous methods. Additionally, as a by-product of our analysis, we recover the $O(\kappa \sqrt{T})$ regret bound of existing OCO algorithms with a more straightforward analysis and improve the regret bound for projection-free online exp-concave optimization. Finally, for constrained stochastic convex optimization, we achieve a state-of-the-art convergence rate of $\widetilde{O}(\sigma/\sqrt{T} + \kappa d/T)$, where $\sigma$ represents the noise in the stochastic gradients, while requiring only $\widetilde{O}(1)$ calls to a separation oracle per iteration.

Updated: 2024-10-07 17:15:37

标题: 使用分离预测器的在线凸优化

摘要: 在这篇论文中，我们介绍了一种新的无投影算法，用于在线凸优化（OCO），在基于分离的算法中具有最先进的遗憾保证。现有基于经典Frank-Wolfe算法的无投影方法实现了次优遗憾界限$O(T^{3/4})$，而最近的基于分离方法保证了遗憾界限为$O(\kappa \sqrt{T})$，其中$\kappa$表示可行集的非球形性，定义为包含球和被包含球的半径比。然而，对于病态集，$\kappa$可能会任意增大，可能导致性能不佳。我们的算法实现了一个遗憾界限为$\widetilde{O}(\sqrt{dT} + \kappa d)$，同时每轮仅需要$\widetilde{O}(1)$次调用分离oracle。关键是，界限中的主要项$\widetilde{O}(\sqrt{d T})$与$\kappa$无关，解决了以前方法的局限性。此外，作为我们分析的副产品，我们恢复了现有OCO算法的$O(\kappa \sqrt{T})$遗憾界限，并通过更简单的分析改进了无投影在线exp-凹优化的遗憾界限。最后，对于约束随机凸优化，我们实现了一个最先进的收敛速度为$\widetilde{O}(\sigma/\sqrt{T} + \kappa d/T)$，其中$\sigma$代表随机梯度中的噪声，同时每次迭代仅需要$\widetilde{O}(1)$次调用分离oracle。

更新时间: 2024-10-07 17:15:37

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.02476v2

Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift

As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, serves as a crucial tool for characterizing uncertainties induced by their underlying distribution. In this paper, we propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain under unsupervised domain shift, under which we have labeled samples from a related source domain and unlabeled covariates from the target domain. Our analysis encompasses scenarios where the source and the target domain are related via i) a bounded density ratio, and ii) a measure-preserving transformation. Our proposed methodologies are computationally efficient and easy to implement. Beyond illustrating the performance of our method through real-world datasets, we also delve into the theoretical details. This includes establishing rigorous theoretical guarantees, coupled with finite sample bounds, regarding the coverage and width of our prediction intervals. Our approach excels in practical applications and is underpinned by a solid theoretical framework, ensuring its reliability and effectiveness across diverse contexts.

Updated: 2024-10-07 17:07:09

标题: 无监督域漂移下预测区间的最佳聚合

摘要: 随着机器学习模型越来越多地部署在动态环境中，评估和量化与分布转移相关的不确定性变得至关重要。当基础数据生成过程发生变化时，分布转移就会发生，导致模型性能的偏离。预测区间捕捉了给定预测的可能结果范围，是表征由其基础分布引起的不确定性的重要工具。在本文中，我们提出了一种方法，通过聚合预测区间来获得一个宽度最小且在目标域下具有足够覆盖率的预测区间，该方法适用于无监督领域转移，其中我们有来自相关源域的标记样本和来自目标域的未标记协变量。我们的分析涵盖了通过有界密度比(i)和保度量变换(ii)相关的源域和目标域的情景。我们提出的方法具有计算效率高和易于实现的特点。除了通过真实世界数据集展示我们方法的性能外，我们还深入探讨了理论细节。这包括建立严格的理论保证，结合有限样本边界，关于我们预测区间的覆盖率和宽度。我们的方法在实际应用中表现出色，并且基于坚实的理论框架，确保在各种环境中的可靠性和有效性。

更新时间: 2024-10-07 17:07:09

领域: stat.ME,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.10302v2

RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction

The high incidence and mortality rates associated with respiratory diseases underscores the importance of early screening. Machine learning models can automate clinical consultations and auscultation, offering vital support in this area. However, the data involved, spanning demographics, medical history, symptoms, and respiratory audio, are heterogeneous and complex. Existing approaches are insufficient and lack generalizability, as they typically rely on limited training data, basic fusion techniques, and task-specific models. In this paper, we propose RespLLM, a novel multimodal large language model (LLM) framework that unifies text and audio representations for respiratory health prediction. RespLLM leverages the extensive prior knowledge of pretrained LLMs and enables effective audio-text fusion through cross-modal attentions. Instruction tuning is employed to integrate diverse data from multiple sources, ensuring generalizability and versatility of the model. Experiments on five real-world datasets demonstrate that RespLLM outperforms leading baselines by an average of 4.6% on trained tasks, 7.9% on unseen datasets, and facilitates zero-shot predictions for new tasks. Our work lays the foundation for multimodal models that can perceive, listen to, and understand heterogeneous data, paving the way for scalable respiratory health diagnosis.

Updated: 2024-10-07 17:06:11

标题: RespLLM：将音频和文本统一为通用呼吸健康预测的多模态LLMs

摘要: 呼吸疾病的高发病率和死亡率突显了早期筛查的重要性。机器学习模型可以自动化临床咨询和听诊，为该领域提供重要支持。然而，所涉及的数据跨越人口统计学、医疗史、症状和呼吸音等领域，具有异质性和复杂性。现有方法不足，并且缺乏泛化能力，因为它们通常依赖于有限的训练数据、基本的融合技术和任务特定的模型。在本文中，我们提出了RespLLM，这是一个新颖的多模态大型语言模型（LLM）框架，用于呼吸健康预测，统一了文本和音频表示。RespLLM利用了预训练LLM的广泛先验知识，并通过跨模态关注实现了有效的音频文本融合。我们采用指导调整来整合来自多个来源的多样数据，确保模型的泛化能力和多功能性。对五个真实世界数据集的实验表明，RespLLM在训练任务上的表现优于领先的基线平均4.6％，在未知数据集上提高了7.9％，并促进了新任务的零-shot预测。我们的工作为可以感知、倾听和理解异质数据的多模态模型奠定了基础，为可扩展的呼吸健康诊断铺平了道路。

更新时间: 2024-10-07 17:06:11

领域: cs.LG,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.05361v1

Behavior Matters: An Alternative Perspective on Promoting Responsible Data Science

Data science pipelines inform and influence many daily decisions, from what we buy to who we work for and even where we live. When designed incorrectly, these pipelines can easily propagate social inequity and harm. Traditional solutions are technical in nature; e.g., mitigating biased algorithms. In this vision paper, we introduce a novel lens for promoting responsible data science using theories of behavior change that emphasize not only technical solutions but also the behavioral responsibility of practitioners. By integrating behavior change theories from cognitive psychology with data science workflow knowledge and ethics guidelines, we present a new perspective on responsible data science. We present example data science interventions in machine learning and visual data analysis, contextualized in behavior change theories that could be implemented to interrupt and redirect potentially suboptimal or negligent practices while reinforcing ethically conscious behaviors. We conclude with a call to action to our community to explore this new research area of behavior change interventions for responsible data science.

Updated: 2024-10-07 16:59:18

标题: 行为至关重要：促进负责任数据科学的另类视角

摘要: 数据科学管道影响和影响着许多日常决策，从我们购买的物品到我们工作的公司，甚至我们居住的地方。当设计不当时，这些管道很容易传播社会不平等和危害。传统解决方案是技术性的，例如，减轻有偏见的算法。在这篇愿景论文中，我们介绍了一种促进负责任数据科学的新视角，使用行为改变理论强调从业者不仅要提供技术解决方案，还要承担行为责任。通过将认知心理学的行为改变理论与数据科学工作流知识和伦理准则结合起来，我们提出了对负责任数据科学的新视角。我们提出了机器学习和可视化数据分析中的示例数据科学干预措施，这些措施被置于行为改变理论的背景下，可以被实施以中断和重定潜在的次优或疏忽的做法，同时加强道德意识行为。我们最后呼吁我们的社区行动起来，探索这个行为改变干预对负责任数据科学的新研究领域。

更新时间: 2024-10-07 16:59:18

领域: cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.17273v1

The power of a single Haar random state: constructing and separating quantum pseudorandomness

In this work, we focus on the following question: what are the cryptographic implications of having access to an oracle that provides a single Haar random quantum state? We show, perhaps surprisingly, that such an oracle is sufficient to construct quantum pseudorandomness. Pseudorandom states (PRS) are a family of states for which it is hard to distinguish between polynomially many copies of either a state sampled uniformly from the family or a Haar random state. A weaker notion, called single-copy pseudorandom states (1PRS), satisfies this property with respect to a single copy. We obtain the following results: 1. First, we show, perhaps surprisingly, that 1PRS (as well as bit-commitments) exist relative to an oracle that provides a single Haar random state. 2. Second, we build on this result to show the existence of a unitary oracle relative to which 1PRS exist, but PRS do not. Taken together, our contributions yield one of the first black-box separations between central notions of quantum pseudorandomness, and introduce a new framework to study black-box separations between various inherently quantum primitives.

Updated: 2024-10-07 16:58:01

标题: 一个单一Haar随机态的力量：构建和分离量子伪随机性

摘要: 在这项工作中，我们关注以下问题：拥有一个提供单个Haar随机量子态的预言机会对加密有什么影响？我们展示了，也许令人惊讶的是，这样的预言机足以构建量子伪随机性。伪随机态（PRS）是一类难以区分在该类中均匀采样的态或Haar随机态的多项式拷贝的状态。一个更弱的概念，称为单拷贝伪随机态（1PRS），满足这个属性对于一个单拷贝。我们得到以下结果： 1. 首先，我们展示，也许令人惊讶的是，1PRS（以及比特承诺）相对于提供单个Haar随机态的预言机存在。 2. 其次，我们在此结果的基础上展示了相对于一个单拷贝存在1PRS，但PRS不存在的酉预言机的存在。综合起来，我们的贡献产生了量子伪随机性中核心概念之间的第一个黑盒分离，并引入了一个新的框架来研究各种固有量子原语之间的黑盒分离。

更新时间: 2024-10-07 16:58:01

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2404.03295v3

Military Applications of Machine Learning: A Bibliometric Perspective

The military environment generates a large amount of data of great importance, which makes necessary the use of machine learning for its processing. Its ability to learn and predict possible scenarios by analyzing the huge volume of information generated provides automatic learning and decision support. This paper aims to present a model of a machine learning architecture applied to a military organization, carried out and supported by a bibliometric study applied to an architecture model of a nonmilitary organization. For this purpose, a bibliometric analysis up to the year 2021 was carried out, making a strategic diagram and interpreting the results. The information used has been extracted from one of the main databases widely accepted by the scientific community, ISI WoS. No direct military sources were used. This work is divided into five parts: the study of previous research related to machine learning in the military world; the explanation of our research methodology using the SciMat, Excel and VosViewer tools; the use of this methodology based on data mining, preprocessing, cluster normalization, a strategic diagram and the analysis of its results to investigate machine learning in the military context; based on these results, a conceptual architecture of the practical use of ML in the military context is drawn up; and, finally, we present the conclusions, where we will see the most important areas and the latest advances in machine learning applied, in this case, to a military environment, to analyze a large set of data, providing utility, machine learning and decision support.

Updated: 2024-10-07 16:54:40

标题: 军事应用的机器学习：一种文献计量学的视角

摘要: 军事环境产生大量重要数据，因此需要利用机器学习进行处理。通过分析大量信息生成的数据，机器学习能够学习和预测可能的场景，提供自动学习和决策支持。本文旨在提出一个应用于军事组织的机器学习架构模型，通过对非军事组织架构模型进行文献计量研究来支持。为此，进行了截至2021年的文献计量分析，制作了战略图表并解释了结果。使用的信息来自于科学界广泛接受的主要数据库之一，ISI WoS。本研究分为五个部分：研究与军事领域中机器学习相关的先前研究；使用SciMat、Excel和VosViewer工具解释研究方法论；基于数据挖掘、预处理、簇归一化、战略图表以及结果分析的方法论应用于军事背景中的机器学习研究；基于这些结果，制定了在军事背景中实际运用机器学习的概念架构；最后，呈现了结论，展示了机器学习应用于军事环境以分析大量数据、提供实用性、机器学习和决策支持的最重要领域和最新进展。

更新时间: 2024-10-07 16:54:40

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17272v1

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective

Training language models currently requires pre-determining a fixed compute budget because the typical cosine learning rate schedule depends on the total number of steps. In contrast, the Warmup-Stable-Decay (WSD) schedule uses a constant learning rate to produce a main branch of iterates that can in principle continue indefinitely without a pre-specified compute budget. Then, given any compute budget, one can branch out from the main branch at a proper at any time with a rapidly decaying learning rate to produce a strong model. Empirically, WSD generates a non-traditional loss curve: the loss remains elevated during the stable phase but sharply declines during the decay phase. Towards explaining this phenomenon, we conjecture that pretraining loss exhibits a river valley landscape, which resembles a deep valley with a river at its bottom. Under this assumption, we show that during the stable phase, the iterate undergoes large oscillations due to the high learning rate, yet it progresses swiftly along the river. During the decay phase, the rapidly dropping learning rate minimizes the iterate's oscillations, moving it closer to the river and revealing true optimization progress. Therefore, the sustained high learning rate phase and fast decaying phase are responsible for progress in the river and the mountain directions respectively, and are both critical. Our analysis predicts phenomenons consistent with empirical observations and shows that this landscape can emerge from pretraining on a simple bi-gram dataset. Inspired by the theory, we introduce WSD-S, a variant of WSD that reuses previous checkpoints' decay phases and keeps only one main branch, where we resume from a decayed checkpoint. WSD-S empirically outperforms WSD and Cyclic-Cosine in obtaining multiple language model checkpoints across various compute budgets in a single run for parameters scaling from 0.1B to 1.2B.

Updated: 2024-10-07 16:49:39

标题: 理解热身稳定衰减学习率：河谷损失景观的观点

摘要: 训练语言模型目前需要预先确定一个固定的计算预算，因为典型的余弦学习率调度取决于总步数。相比之下，Warmup-Stable-Decay（WSD）调度使用恒定学习率产生一个可以在原则上无需预先确定计算预算就可以持续进行的主分支。然后，在任何计算预算下，可以在适当的时间从主分支分支出来，使用快速衰减的学习率产生一个强大的模型。经验上，WSD生成了一个非传统的损失曲线：在稳定阶段损失保持较高，但在衰减阶段急剧下降。为了解释这一现象，我们推测预训练损失呈现出一个河谷景观，类似于一个深谷底部有一条河流。在这个假设下，我们展示了在稳定阶段，由于高学习率，迭代会经历大幅振荡，但会快速沿着河流前进。在衰减阶段，迅速下降的学习率最小化了迭代的振荡，使其更接近河流，并揭示了真正的优化进展。因此，持续的高学习率阶段和快速衰减阶段分别负责在河流和山脉方向上取得进展，并且两者都至关重要。我们的分析预测了与经验观察一致的现象，并显示这种景观可以从对一个简单的二元数据集进行预训练中出现。受到这一理论的启发，我们引入了WSD-S，这是WSD的一个变体，重用以前的检查点的衰减阶段，并仅保留一个主分支，在一个已衰减的检查点处恢复。经验上，WSD-S在单次运行中在参数从0.1B到1.2B的各种计算预算下获得多个语言模型检查点方面比WSD和Cyclic-Cosine表现更好。

更新时间: 2024-10-07 16:49:39

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2410.05192v1

LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation

Building on the advancements of Large Language Models (LLMs) and Vision Language Models (VLMs), recent research has introduced Vision-Language-Action (VLA) models as an integrated solution for robotic manipulation tasks. These models take camera images and natural language task instructions as input and directly generate control actions for robots to perform specified tasks, greatly improving both decision-making capabilities and interaction with human users. However, the data-driven nature of VLA models, combined with their lack of interpretability, makes the assurance of their effectiveness and robustness a challenging task. This highlights the need for a reliable testing and evaluation platform. For this purpose, in this work, we propose LADEV, a comprehensive and efficient platform specifically designed for evaluating VLA models. We first present a language-driven approach that automatically generates simulation environments from natural language inputs, mitigating the need for manual adjustments and significantly improving testing efficiency. Then, to further assess the influence of language input on the VLA models, we implement a paraphrase mechanism that produces diverse natural language task instructions for testing. Finally, to expedite the evaluation process, we introduce a batch-style method for conducting large-scale testing of VLA models. Using LADEV, we conducted experiments on several state-of-the-art VLA models, demonstrating its effectiveness as a tool for evaluating these models. Our results showed that LADEV not only enhances testing efficiency but also establishes a solid baseline for evaluating VLA models, paving the way for the development of more intelligent and advanced robotic systems.

Updated: 2024-10-07 16:49:16

标题: LADEV：一种用于机器人操作中视觉-语言-动作模型的语言驱动测试和评估平台

摘要: 基于大型语言模型（LLMs）和视觉语言模型（VLMs）的进展，最近的研究引入了视觉-语言-动作（VLA）模型作为机器人操纵任务的一体化解决方案。这些模型将摄像头图像和自然语言任务说明作为输入，并直接生成控制动作，使机器人执行指定任务，大大提高了决策能力和与人类用户的交互。然而，VLA模型的数据驱动性以及缺乏可解释性使得确保其有效性和鲁棒性成为一项具有挑战性的任务。这突显了对可靠的测试和评估平台的需求。为此，在这项工作中，我们提出了LADEV，一个专门设计用于评估VLA模型的全面高效平台。我们首先提出了一种语言驱动的方法，从自然语言输入中自动生成模拟环境，减轻了手动调整的需求，显著提高了测试效率。然后，为了进一步评估语言输入对VLA模型的影响，我们实现了一个释义机制，为测试生成多样化的自然语言任务说明。最后，为了加快评估过程，我们引入了一种批量式方法，用于进行VLA模型的大规模测试。利用LADEV，我们对几种最先进的VLA模型进行了实验，展示了它作为评估这些模型的工具的有效性。我们的结果表明，LADEV不仅提高了测试效率，还为评估VLA模型奠定了坚实的基础，为更智能和先进的机器人系统的发展铺平道路。

更新时间: 2024-10-07 16:49:16

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.05191v1

Matrix-weighted networks for modeling multidimensional dynamics

Networks are powerful tools for modeling interactions in complex systems. While traditional networks use scalar edge weights, many real-world systems involve multidimensional interactions. For example, in social networks, individuals often have multiple interconnected opinions that can affect different opinions of other individuals, which can be better characterized by matrices. We propose a novel, general framework for modeling such multidimensional interacting dynamics: matrix-weighted networks (MWNs). We present the mathematical foundations of MWNs and examine consensus dynamics and random walks within this context. Our results reveal that the coherence of MWNs gives rise to non-trivial steady states that generalize the notions of communities and structural balance in traditional networks.

Updated: 2024-10-07 16:47:30

标题: Matrix-weighted 网络用于建模多维动态

摘要: 网络是建模复杂系统中相互作用的强大工具。传统网络使用标量边权重，但许多现实世界中的系统涉及多维交互。例如，在社交网络中，个体通常具有可以影响其他个体不同观点的多个相互关联观点，这可以更好地通过矩阵来描述。我们提出了一种新颖的通用框架，用于建模这种多维交互动态：矩阵加权网络（MWNs）。我们介绍了MWNs的数学基础，并在此背景下研究共识动态和随机游走。我们的结果显示，MWNs的连贯性导致产生非平凡的稳态，这扩展了传统网络中社区和结构平衡的概念。

更新时间: 2024-10-07 16:47:30

领域: cs.SI,cs.LG,math-ph,math.MP,physics.soc-ph,05C22, 05C50, 05C81, 37E25, 39A06, 91D30, 94C15

下载: http://arxiv.org/abs/2410.05188v1

Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts

The increasing deployment of AI is shaping the future landscape of the internet, which is set to become an integrated ecosystem of AI agents. Orchestrating the interaction among AI agents necessitates decentralized, self-sustaining mechanisms that harmonize the tension between individual interests and social welfare. In this paper we tackle this challenge by synergizing reinforcement learning with principal-agent theory from economics. Taken separately, the former allows unrealistic freedom of intervention, while the latter struggles to scale in sequential settings. Combining them achieves the best of both worlds. We propose a framework where a principal guides an agent in a Markov Decision Process (MDP) using a series of contracts, which specify payments by the principal based on observable outcomes of the agent's actions. We present and analyze a meta-algorithm that iteratively optimizes the policies of the principal and agent, showing its equivalence to a contraction operator on the principal's Q-function, and its convergence to subgame-perfect equilibrium. We then scale our algorithm with deep Q-learning and analyze its convergence in the presence of approximation error, both theoretically and through experiments with randomly generated binary game-trees. Extending our framework to multiple agents, we apply our methodology to the combinatorial Coin Game. Addressing this multi-agent sequential social dilemma is a promising first step toward scaling our approach to more complex, real-world instances.

Updated: 2024-10-07 16:46:42

标题: 主体代理强化学习：通过合同协调人工智能代理

摘要: 人工智能的不断部署正在塑造未来互联网的景观，它将成为一个人工智能代理的集成生态系统。协调人工智能代理之间的互动需要分散化、自我维持的机制，以协调个体利益和社会福利之间的张力。在这篇论文中，我们通过将强化学习与经济学中的委托-代理理论相结合，来应对这一挑战。分开来看，前者允许不切实际的干预自由，而后者在顺序设置中很难扩展。将它们结合起来可以实现最好的两个世界。我们提出了一个框架，其中一个委托方通过一系列合同指导一个驻地决策过程（MDP）中的代理，这些合同根据代理的行动的可观察结果指定委托方的支付。我们提出并分析一个元算法，通过迭代优化委托方和代理的策略，展示了它与委托方的Q函数上的一个收缩算子的等价性，以及它对子博弈完美均衡的收敛性。然后，我们通过深度Q学习扩展我们的算法，并分析了在近似误差存在的情况下的收敛性，从理论上和通过随机生成的二进制游戏树的实验进行了分析。将我们的框架扩展到多个代理，我们将我们的方法应用于组合硬币游戏。解决这个多代理顺序社会困境是将我们的方法扩展到更复杂的现实世界实例的有希望的第一步。

更新时间: 2024-10-07 16:46:42

领域: cs.GT,cs.LG,cs.MA

下载: http://arxiv.org/abs/2407.18074v2

Creative Beam Search: LLM-as-a-Judge For Improving Response Generation

Large language models are revolutionizing several areas, including artificial creativity. However, the process of generation in machines profoundly diverges from that observed in humans. In particular, machine generation is characterized by a lack of intentionality and an underlying creative process. We propose a method called Creative Beam Search that uses Diverse Beam Search and LLM-as-a-Judge to perform response generation and response validation. The results of a qualitative experiment show how our approach can provide better output than standard sampling techniques. We also show that the response validation step is a necessary complement to the response generation step.

Updated: 2024-10-07 16:45:42

标题: 创意束搜索：作为评判员的LLM用于改进响应生成

摘要: 大型语言模型正在彻底改变几个领域，包括人工创造力。然而，在机器的生成过程与人类观察到的生成过程存在明显差异。特别是，机器生成的特点是缺乏意向性和潜在的创造过程。我们提出了一种称为创意束搜索的方法，该方法利用多样化束搜索和LLM作为评判者来执行响应生成和响应验证。定性实验的结果显示了我们的方法如何比标准抽样技术提供更好的输出。我们还表明，响应验证步骤是对响应生成步骤的必要补充。

更新时间: 2024-10-07 16:45:42

领域: cs.AI,cs.CL,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.00099v4

Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics

Machine Translation (MT) evaluation metrics assess translation quality automatically. Recently, researchers have employed MT metrics for various new use cases, such as data filtering and translation re-ranking. However, most MT metrics return assessments as scalar scores that are difficult to interpret, posing a challenge to making informed design choices. Moreover, MT metrics' capabilities have historically been evaluated using correlation with human judgment, which, despite its efficacy, falls short of providing intuitive insights into metric performance, especially in terms of new metric use cases. To address these issues, we introduce an interpretable evaluation framework for MT metrics. Within this framework, we evaluate metrics in two scenarios that serve as proxies for the data filtering and translation re-ranking use cases. Furthermore, by measuring the performance of MT metrics using Precision, Recall, and F-score, we offer clearer insights into their capabilities than correlation with human judgments. Finally, we raise concerns regarding the reliability of manually curated data following the Direct Assessments+Scalar Quality Metrics (DA+SQM) guidelines, reporting a notably low agreement with Multidimensional Quality Metrics (MQM) annotations.

Updated: 2024-10-07 16:42:10

标题: 超越相关性：可解释的机器翻译评估指标

摘要: 机器翻译（MT）评估指标可以自动评估翻译质量。最近，研究人员已经将MT指标应用于各种新的用例，如数据过滤和翻译重新排序。然而，大多数MT指标返回的评估结果是难以解释的标量分数，这对做出明智的设计选择构成了挑战。此外，历史上评估MT指标的能力是通过与人类判断的相关性进行评估的，尽管这种方法有效，但在度量指标性能方面，特别是对于新的指标用例而言，它仍然无法提供直观的见解。为了解决这些问题，我们引入了一个可解释的MT指标评估框架。在这个框架内，我们评估了两种场景中的指标，这两种场景可以作为数据过滤和翻译重新排序用例的代理。此外，通过使用Precision、Recall和F-score来衡量MT指标的性能，我们为其能力提供了比与人类判断相关性更清晰的见解。最后，我们提出了对手动筛选数据可靠性的疑虑，遵循了直接评估+标量质量指标（DA+SQM）指南，报告了与多维质量指标（MQM）注释的显著低一致性。

更新时间: 2024-10-07 16:42:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.05183v1

Forest Proximities for Time Series

RF-GAP has recently been introduced as an improved random forest proximity measure. In this paper, we present PF-GAP, an extension of RF-GAP proximities to proximity forests, an accurate and efficient time series classification model. We use the forest proximities in connection with Multi-Dimensional Scaling to obtain vector embeddings of univariate time series, comparing the embeddings to those obtained using various time series distance measures. We also use the forest proximities alongside Local Outlier Factors to investigate the connection between misclassified points and outliers, comparing with nearest neighbor classifiers which use time series distance measures. We show that the forest proximities may exhibit a stronger connection between misclassified points and outliers than nearest neighbor classifiers.

Updated: 2024-10-07 16:41:49

标题: 时间序列的森林接近程度

摘要: RF-GAP最近被引入为改进的随机森林接近度度量。在本文中，我们提出了PF-GAP，这是RF-GAP接近度的一个扩展，用于接近度森林，这是一个准确且高效的时间序列分类模型。我们使用森林接近度与多维缩放结合，以获得单变量时间序列的向量嵌入，将这些嵌入与使用各种时间序列距离度量获得的嵌入进行比较。我们还使用森林接近度与局部异常因子一起研究误分类点和异常值之间的关系，与使用时间序列距离度量的最近邻分类器进行比较。我们展示了森林接近度可能比最近邻分类器更强地展示误分类点和异常值之间的关系。

更新时间: 2024-10-07 16:41:49

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.03098v2

MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain

The visual detection and tracking of surface terrain is required for spacecraft to safely land on or navigate within close proximity to celestial objects. Current approaches rely on template matching with pre-gathered patch-based features, which are expensive to obtain and a limiting factor in perceptual capability. While recent literature has focused on in-situ detection methods to enhance navigation and operational autonomy, robust description is still needed. In this work, we explore metric learning as the lightweight feature description mechanism and find that current solutions fail to address inter-class similarity and multi-view observational geometry. We attribute this to the view-unaware attention mechanism and introduce Multi-view Attention Regularizations (MARs) to constrain the channel and spatial attention across multiple feature views, regularizing the what and where of attention focus. We thoroughly analyze many modern metric learning losses with and without MARs and demonstrate improved terrain-feature recognition performance by upwards of 85%. We additionally introduce the Luna-1 dataset, consisting of Moon crater landmarks and reference navigation frames from NASA mission data to support future research in this difficult task. Luna-1 and source code are publicly available at https://droneslab.github.io/mars/.

Updated: 2024-10-07 16:41:45

标题: MARs：用于基于补丁特征识别空间地形的多视图注意力规范化

摘要: 视觉检测和跟踪表面地形对于太空船安全着陆或在天体附近导航是必要的。目前的方法依赖于使用预先收集的基于模板匹配的特征，这些特征获取昂贵且限制了感知能力。尽管最近的文献侧重于现场检测方法以增强导航和操作自主性，但仍然需要稳健的描述。在这项工作中，我们将度量学习作为轻量级特征描述机制进行探索，并发现当前的解决方案未能解决跨类相似性和多视角观测几何。我们将其归因于不考虑视图的注意机制，并引入多视图注意力正则化（MARs）来约束多个特征视图上的通道和空间注意力，规范注意力焦点的“是什么”和“在哪里”。我们通过对比使用和不使用MARs的许多现代度量学习损失进行了彻底分析，并展示了高达85%的改善地形特征识别性能。此外，我们引入了Luna-1数据集，其中包含了来自NASA任务数据的月球陨石坑地标和参考导航帧，以支持未来在这一困难任务中的研究。Luna-1和源代码可在https://droneslab.github.io/mars/上公开获取。

更新时间: 2024-10-07 16:41:45

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.05182v1

Training Foundation Models as Data Compression: On Information, Model Weights and Copyright Law

The training process of foundation models as for other classes of deep learning systems is based on minimizing the reconstruction error over a training set. For this reason, they are susceptible to the memorization and subsequent reproduction of training samples. In this paper, we introduce a training-as-compressing perspective, wherein the model's weights embody a compressed representation of the training data. From a copyright standpoint, this point of view implies that the weights could be considered a reproduction or a derivative work of a potentially protected set of works. We investigate the technical and legal challenges that emerge from this framing of the copyright of outputs generated by foundation models, including their implications for practitioners and researchers. We demonstrate that adopting an information-centric approach to the problem presents a promising pathway for tackling these emerging complex legal issues.

Updated: 2024-10-07 16:40:25

标题: 培训基础模型作为数据压缩：关于信息、模型权重和版权法

摘要: 基础模型的训练过程与其他类别的深度学习系统类似，都是基于最小化训练集上的重建误差。因此，它们容易受到训练样本的记忆和后续再现的影响。本文引入了一个训练压缩的视角，其中模型的权重体现了训练数据的压缩表示。从版权的角度来看，这个观点意味着权重可以被视为受保护作品的复制或衍生作品。我们调查了这种基础模型生成的输出的版权所面临的技术和法律挑战，包括对从业者和研究人员的影响。我们展示了采用信息为中心的方法来解决这些新兴复杂法律问题的一个有希望的途径。

更新时间: 2024-10-07 16:40:25

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.13493v3

MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences

Understanding the quality of a performance evaluation metric is crucial for ensuring that model outputs align with human preferences. However, it remains unclear how well each metric captures the diverse aspects of these preferences, as metrics often excel in one particular area but not across all dimensions. To address this, it is essential to systematically calibrate metrics to specific aspects of human preference, catering to the unique characteristics of each aspect. We introduce MetaMetrics, a calibrated meta-metric designed to evaluate generation tasks across different modalities in a supervised manner. MetaMetrics optimizes the combination of existing metrics to enhance their alignment with human preferences. Our metric demonstrates flexibility and effectiveness in both language and vision downstream tasks, showing significant benefits across various multilingual and multi-domain scenarios. MetaMetrics aligns closely with human preferences and is highly extendable and easily integrable into any application. This makes MetaMetrics a powerful tool for improving the evaluation of generation tasks, ensuring that metrics are more representative of human judgment across diverse contexts.

Updated: 2024-10-07 16:39:24

标题: MetaMetrics：使用人类偏好调整生成任务的度量标准

摘要: 理解绩效评估指标的质量对于确保模型输出与人类偏好保持一致至关重要。然而，目前尚不清楚每个指标究竟有多好地捕捉了这些偏好的多样方面，因为指标通常在某一特定领域表现出色，但在所有维度上并不出色。为了解决这个问题，有必要系统地校准指标以满足人类偏好的特定方面，迎合每个方面的独特特征。我们引入了MetaMetrics，这是一个经过校准的元指标，旨在以监督方式评估不同模态下的生成任务。MetaMetrics优化现有指标的组合，以增强它们与人类偏好的一致性。我们的指标在语言和视觉下游任务中表现出灵活性和有效性，显示出在各种多语言和多领域场景中的显著好处。MetaMetrics与人类偏好密切一致，而且非常易于扩展和轻松地集成到任何应用程序中。这使得MetaMetrics成为改善生成任务评估的强大工具，确保指标更具代表性，更符合人类在各种背景下的判断。

更新时间: 2024-10-07 16:39:24

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.02381v2

A Usage-centric Take on Intent Understanding in E-Commerce

Identifying and understanding user intents is a pivotal task for E-Commerce. Despite its essential role in product recommendation and business user profiling analysis, intent understanding has not been consistently defined or accurately benchmarked. In this paper, we focus on predicative user intents as "how a customer uses a product", and pose intent understanding as a natural language reasoning task, independent of product ontologies. We identify two weaknesses of FolkScope, the SOTA E-Commerce Intent Knowledge Graph: category-rigidity and property-ambiguity. They limit its ability to strongly align user intents with products having the most desirable property, and to recommend useful products across diverse categories. Following these observations, we introduce a Product Recovery Benchmark featuring a novel evaluation framework and an example dataset. We further validate the above FolkScope weaknesses on this benchmark. Our code and dataset are available at https://github.com/stayones/Usgae-Centric-Intent-Understanding.

Updated: 2024-10-07 16:38:35

标题: 电子商务中意图理解的使用中心观点

摘要: 识别和理解用户意图是电子商务的一个关键任务。尽管在产品推荐和商业用户配置分析中起着至关重要的作用，但意图理解并没有被一致地定义或准确地基准化。在本文中，我们将预测性用户意图定义为“客户如何使用产品”，并将意图理解视为一项自然语言推理任务，独立于产品本体论。我们确定了FolkScope的两个弱点，即当前最先进的电子商务意图知识图：类别刚性和属性模糊。这些弱点限制了其将用户意图与具有最理想属性的产品强烈对齐的能力，以及在不同类别中推荐有用的产品的能力。在这些观察之后，我们引入了一个产品恢复基准，其中包括一个新颖的评估框架和一个示例数据集。我们进一步在这个基准上验证了上述FolkScope的弱点。我们的代码和数据集可以在https://github.com/stayones/Usgae-Centric-Intent-Understanding 上找到。

更新时间: 2024-10-07 16:38:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.14901v2

Machine Learning Based Optimal Design of Fibrillar Adhesives

Fibrillar adhesion, observed in animals like beetles, spiders, and geckos, relies on nanoscopic or microscopic fibrils to enhance surface adhesion via 'contact splitting.' This concept has inspired engineering applications across robotics, transportation, and medicine. Recent studies suggest that functional grading of fibril properties can improve adhesion, but this is a complex design challenge that has only been explored in simplified geometries. While machine learning (ML) has gained traction in adhesive design, no previous attempts have targeted fibril-array scale optimization. In this study, we propose an ML-based tool that optimizes the distribution of fibril compliance to maximize adhesive strength. Our tool, featuring two deep neural networks (DNNs), recovers previous design results for simple geometries and introduces novel solutions for complex configurations. The Predictor DNN estimates adhesive strength based on random compliance distributions, while the Designer DNN optimizes compliance for maximum strength using gradient-based optimization. Our method significantly reduces test error and accelerates the optimization process, offering a high-performance solution for designing fibrillar adhesives and micro-architected materials aimed at fracture resistance by achieving equal load sharing (ELS).

Updated: 2024-10-07 16:37:56

标题: 基于机器学习的纤维粘附剂的最优设计

摘要: 纤维粘附，如甲壳动物、蜘蛛和壁虎中观察到的，依赖于纳米或微观纤维通过“接触分裂”增强表面粘附。这一概念已经在机器人技术、交通运输和医学等领域启发了工程应用。最近的研究表明，纤维性能的功能分级可以改善粘附力，但这是一个复杂的设计挑战，目前仅在简化的几何结构中进行了探讨。虽然机器学习（ML）在粘附设计方面已经取得了进展，但之前没有尝试过针对纤维阵列尺度的优化。在这项研究中，我们提出了一种基于ML的工具，通过优化纤维的合规性分布来最大化粘附力。我们的工具采用两个深度神经网络（DNNs），恢复了简单几何结构的先前设计结果，并为复杂配置引入了新的解决方案。预测器DNN基于随机合规性分布估计粘附力，而设计师DNN通过基于梯度的优化来最大化强度。我们的方法显著降低了测试误差，并加速了优化过程，为设计纤维粘附剂和微结构材料提供了高性能解决方案，旨在通过实现等负载共享（ELS）来提高抗断裂性能。

更新时间: 2024-10-07 16:37:56

领域: cs.LG

下载: http://arxiv.org/abs/2409.05928v3

Are causal effect estimations enough for optimal recommendations under multitreatment scenarios?

When making treatment selection decisions, it is essential to include a causal effect estimation analysis to compare potential outcomes under different treatments or controls, assisting in optimal selection. However, merely estimating individual treatment effects may not suffice for truly optimal decisions. Our study addressed this issue by incorporating additional criteria, such as the estimations' uncertainty, measured by the conditional value-at-risk, commonly used in portfolio and insurance management. For continuous outcomes observable before and after treatment, we incorporated a specific prediction condition. We prioritized treatments that could yield optimal treatment effect results and lead to post-treatment outcomes more desirable than pretreatment levels, with the latter condition being called the prediction criterion. With these considerations, we propose a comprehensive methodology for multitreatment selection. Our approach ensures satisfaction of the overlap assumption, crucial for comparing outcomes for treated and control groups, by training propensity score models as a preliminary step before employing traditional causal models. To illustrate a practical application of our methodology, we applied it to the credit card limit adjustment problem. Analyzing a fintech company's historical data, we found that relying solely on counterfactual predictions was inadequate for appropriate credit line modifications. Incorporating our proposed additional criteria significantly enhanced policy performance.

Updated: 2024-10-07 16:37:35

标题: 在多治疗方案情况下，因果效应估计足以实现最佳推荐吗？

摘要: 在制定治疗选择决策时，包括因果效应估计分析是必不可少的，以比较不同治疗或对照组下潜在结果，从而帮助进行最佳选择。然而，仅仅估计个体治疗效果可能不足以做出真正最佳决策。我们的研究通过纳入额外的标准来解决这个问题，例如由条件风险价值（CVaR）衡量的估计不确定性，这在投资组合和保险管理中常用。对于在治疗前后可观察到的连续结果，我们纳入了一个特定的预测条件。我们优先考虑可以产生最佳治疗效果结果并导致治疗后结果比治疗前水平更理想的治疗，后者条件被称为预测标准。考虑到这些因素，我们提出了一种多治疗选择的综合方法。我们的方法确保满足重叠假设，这对比较治疗组和对照组的结果至关重要，通过在使用传统因果模型之前，训练倾向得分模型作为初步步骤。为了说明我们方法的实际应用，我们将其应用于信用卡额度调整问题。通过分析一家金融科技公司的历史数据，我们发现仅依赖反事实预测不足以进行适当的信用额度调整。纳入我们提出的额外标准显著提高了政策绩效。

更新时间: 2024-10-07 16:37:35

领域: stat.ML,cs.LG,62-07, 62P05

下载: http://arxiv.org/abs/2410.05177v1

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks. First, we demonstrate how to successfully leverage access to logprobs for jailbreaking: we initially design an adversarial prompt template (sometimes adapted to the target LLM), and then we apply random search on a suffix to maximize a target logprob (e.g., of the token "Sure"), potentially with multiple restarts. In this way, we achieve 100% attack success rate -- according to GPT-4 as a judge -- on Vicuna-13B, Mistral-7B, Phi-3-Mini, Nemotron-4-340B, Llama-2-Chat-7B/13B/70B, Llama-3-Instruct-8B, Gemma-7B, GPT-3.5, GPT-4o, and R2D2 from HarmBench that was adversarially trained against the GCG attack. We also show how to jailbreak all Claude models -- that do not expose logprobs -- via either a transfer or prefilling attack with a 100% success rate. In addition, we show how to use random search on a restricted set of tokens for finding trojan strings in poisoned models -- a task that shares many similarities with jailbreaking -- which is the algorithm that brought us the first place in the SaTML'24 Trojan Detection Competition. The common theme behind these attacks is that adaptivity is crucial: different models are vulnerable to different prompting templates (e.g., R2D2 is very sensitive to in-context learning prompts), some models have unique vulnerabilities based on their APIs (e.g., prefilling for Claude), and in some settings, it is crucial to restrict the token search space based on prior knowledge (e.g., for trojan detection). For reproducibility purposes, we provide the code, logs, and jailbreak artifacts in the JailbreakBench format at https://github.com/tml-epfl/llm-adaptive-attacks.

Updated: 2024-10-07 16:35:15

标题: 用简单自适应攻击破解领先的安全对齐LLMs

摘要: 我们展示，即使最新的安全对齐LLMs也无法抵御简单的自适应越狱攻击。首先，我们展示如何成功利用对logprobs的访问进行越狱：我们最初设计一个对抗性提示模板（有时会针对目标LLM进行调整），然后对后缀进行随机搜索以最大化目标logprob（例如，令牌“Sure”），可能会进行多次重启。通过这种方式，我们在Vicuna-13B、Mistral-7B、Phi-3-Mini、Nemotron-4-340B、Llama-2-Chat-7B/13B/70B、Llama-3-Instruct-8B、Gemma-7B、GPT-3.5、GPT-4o和HarmBench中对抗性训练的R2D2等模型上，根据GPT-4的评判，实现了100%的攻击成功率。我们还展示了如何通过转移或预填充攻击以100%的成功率越狱所有不暴露logprobs的Claude模型。此外，我们展示了如何在受污染模型中的一组受限制的令牌上进行随机搜索以寻找特洛伊字符串，这个任务与越狱有许多相似之处，并且该算法使我们在SaTML'24特洛伊检测竞赛中获得第一名。这些攻击背后的共同主题是适应性至关重要：不同的模型对不同的提示模板具有漏洞（例如，R2D2对于上下文学习提示非常敏感），一些模型基于其API具有独特的漏洞（例如，Claude的预填充），在某些情况下，根据先前的知识限制令牌搜索空间至关重要（例如，用于特洛伊检测）。为了可重现性目的，我们以JailbreakBench格式提供了代码、日志和越狱工件，网址为https://github.com/tml-epfl/llm-adaptive-attacks。

更新时间: 2024-10-07 16:35:15

领域: cs.CR,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.02151v3

Efficient Model-Agnostic Multi-Group Equivariant Networks

Constructing model-agnostic group equivariant networks, such as equitune (Basu et al., 2023b) and its generalizations (Kim et al., 2023), can be computationally expensive for large product groups. We address this problem by providing efficient model-agnostic equivariant designs for two related problems: one where the network has multiple inputs each with potentially different groups acting on them, and another where there is a single input but the group acting on it is a large product group. For the first design, we initially consider a linear model and characterize the entire equivariant space that satisfies this constraint. This characterization gives rise to a novel fusion layer between different channels that satisfies an invariance-symmetry (IS) constraint, which we call an IS layer. We then extend this design beyond linear models, similar to equitune, consisting of equivariant and IS layers. We also show that the IS layer is a universal approximator of invariant-symmetric functions. Inspired by the first design, we use the notion of the IS property to design a second efficient model-agnostic equivariant design for large product groups acting on a single input. For the first design, we provide experiments on multi-image classification where each view is transformed independently with transformations such as rotations. We find equivariant models are robust to such transformations and perform competitively otherwise. For the second design, we consider three applications: language compositionality on the SCAN dataset to product groups; fairness in natural language generation from GPT-2 to address intersectionality; and robust zero-shot image classification with CLIP. Overall, our methods are simple and general, competitive with equitune and its variants, while also being computationally more efficient.

Updated: 2024-10-07 16:28:52

标题: 高效的模型无关多组等变网络

摘要: 构建模型无关的群等变网络，如 equitune（Basu等，2023b）及其泛化形式（Kim等，2023），对于大型乘积群可能需要大量计算资源。我们通过提供两个相关问题的高效模型无关等变设计来解决这个问题：一个是网络具有多个输入，每个输入可能有不同的群作用于它们，另一个是只有一个输入但作用于它的群是一个大型乘积群。对于第一个设计，我们最初考虑一个线性模型，并表征满足该约束条件的整个等变空间。这种特性导致了不同通道之间的新型融合层，满足不变性-对称（IS）约束，我们称之为IS层。然后，我们将这种设计扩展到超越线性模型，类似于equitune，由等变和IS层组成。我们还展示了IS层是不变-对称函数的通用逼近器。受第一个设计的启发，我们利用IS属性的概念为单输入上作用于大型乘积群的第二种高效模型无关等变设计。对于第一个设计，我们在多图像分类上提供实验，其中每个视图都独立进行转换，例如旋转。我们发现等变模型对这种转换具有鲁棒性，并在其他方面表现出竞争力。对于第二种设计，我们考虑三个应用程序：在SCAN数据集上进行语言组合性到乘积群；从GPT-2生成自然语言以解决交叉性的公平性；以及使用CLIP进行鲁棒的零样本图像分类。总的来说，我们的方法简单通用，与equitune及其变体具有竞争力，同时在计算上更加高效。

更新时间: 2024-10-07 16:28:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2310.09675v2

Interactive Event Sifting using Bayesian Graph Neural Networks

Forensic analysts often use social media imagery and texts to understand important events. A primary challenge is the initial sifting of irrelevant posts. This work introduces an interactive process for training an event-centric, learning-based multimodal classification model that automates sanitization. We propose a method based on Bayesian Graph Neural Networks (BGNNs) and evaluate active learning and pseudo-labeling formulations to reduce the number of posts the analyst must manually annotate. Our results indicate that BGNNs are useful for social-media data sifting for forensics investigations of events of interest, the value of active learning and pseudo-labeling varies based on the setting, and incorporating unlabelled data from other events improves performance.

Updated: 2024-10-07 16:28:47

标题: 基于贝叶斯图神经网络的交互式事件筛选

摘要: 法证分析人员经常使用社交媒体的图片和文本来了解重要事件。一个主要挑战是最初筛选出无关的帖子。这项工作介绍了一个交互过程，用于训练一个以事件为中心的、基于学习的多模态分类模型，自动化消毒。我们提出了一种基于贝叶斯图神经网络（BGNNs）的方法，并评估主动学习和伪标记的表达方式，以减少分析人员必须手动注释的帖子数量。我们的结果表明，BGNNs对于社交媒体数据筛选在法证调查中的事件很有用，主动学习和伪标记的价值根据环境而异，并且将其他事件的未标记数据纳入提高了性能。

更新时间: 2024-10-07 16:28:47

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2410.05359v1

Learning to Steer Markovian Agents under Model Uncertainty

Designing incentives for an adapting population is a ubiquitous problem in a wide array of economic applications and beyond. In this work, we study how to design additional rewards to steer multi-agent systems towards desired policies \emph{without} prior knowledge of the agents' underlying learning dynamics. Motivated by the limitation of existing works, we consider a new and general category of learning dynamics called \emph{Markovian agents}. We introduce a model-based non-episodic Reinforcement Learning (RL) formulation for our steering problem. Importantly, we focus on learning a \emph{history-dependent} steering strategy to handle the inherent model uncertainty about the agents' learning dynamics. We introduce a novel objective function to encode the desiderata of achieving a good steering outcome with reasonable cost. Theoretically, we identify conditions for the existence of steering strategies to guide agents to the desired policies. Complementing our theoretical contributions, we provide empirical algorithms to approximately solve our objective, which effectively tackles the challenge in learning history-dependent strategies. We demonstrate the efficacy of our algorithms through empirical evaluations.

Updated: 2024-10-07 16:25:34

标题: 学习在模型不确定性下引导马尔可夫智能体

摘要: 设计激励机制以引导适应性群体是经济应用及其他领域中普遍存在的问题。在这项工作中，我们研究如何设计额外的奖励来引导多智能体系统朝着期望的政策方向发展，而不需要事先了解智能体的学习动态。受现有工作的限制启发，我们考虑了一种新的、普遍的学习动态类别，称为“马尔科夫智能体”。我们为我们的引导问题引入了基于模型的非情节式强化学习（RL）公式。重要的是，我们专注于学习一种“历史依赖”的引导策略，以处理关于智能体学习动态的固有模型不确定性。我们引入了一个新颖的目标函数来编码实现良好引导结果的期望以及合理成本。在理论上，我们确定了引导策略存在的条件，以引导智能体朝着期望的政策方向发展。作为对我们理论贡献的补充，我们提供了用于近似解决我们目标的经验算法，有效地解决了学习历史依赖策略的挑战。我们通过经验评估展示了我们算法的有效性。

更新时间: 2024-10-07 16:25:34

领域: cs.LG,cs.AI,cs.MA,stat.ML

下载: http://arxiv.org/abs/2407.10207v2

Better Instruction-Following Through Minimum Bayes Risk

General-purpose LLM judges capable of human-level evaluation provide not only a scalable and accurate way of evaluating instruction-following LLMs but also new avenues for supervising and improving their performance. One promising way of leveraging LLM judges for supervision is through Minimum Bayes Risk (MBR) decoding, which uses a reference-based evaluator to select a high-quality output from amongst a set of candidate outputs. In the first part of this work, we explore using MBR decoding as a method for improving the test-time performance of instruction-following LLMs. We find that MBR decoding with reference-based LLM judges substantially improves over greedy decoding, best-of-N decoding with reference-free judges and MBR decoding with lexical and embedding-based metrics on AlpacaEval and MT-Bench. These gains are consistent across LLMs with up to 70B parameters, demonstrating that smaller LLM judges can be used to supervise much larger LLMs. Then, seeking to retain the improvements from MBR decoding while mitigating additional test-time costs, we explore iterative self-training on MBR-decoded outputs. We find that self-training using Direct Preference Optimisation leads to significant performance gains, such that the self-trained models with greedy decoding generally match and sometimes exceed the performance of their base models with MBR decoding.

Updated: 2024-10-07 16:25:04

标题: 更好的指令遵循通过最小贝叶斯风险

摘要: 通用型LLM法官能够进行人类水平评估，不仅为评估遵循指令的LLM提供了一种可伸缩和准确的方式，还为监督和改进它们的表现开辟了新途径。利用LLM法官进行监督的一种有前途的方法是通过最小贝叶斯风险（MBR）解码，该方法使用基于参考的评估器从一组候选输出中选择高质量的输出。在这项工作的第一部分中，我们探讨了使用MBR解码作为改进遵循指令LLM测试时性能的方法。我们发现，采用基于参考的LLM法官的MBR解码远远优于贪婪解码、基于无参考法官的最佳N解码以及基于词汇和嵌入度量的MBR解码在AlpacaEval和MT-Bench上。这些收益在拥有70B参数的LLM中保持一致，表明较小的LLM法官可以用于监督规模更大的LLM。然后，为了保留MBR解码带来的改进同时减少额外的测试时间成本，我们探讨了在MBR解码输出上进行迭代自我训练。我们发现，使用直接偏好优化进行自我训练可以带来显著的性能提升，使得使用贪婪解码进行自我训练的模型通常与其使用MBR解码的基础模型的性能相匹配，有时甚至超过。

更新时间: 2024-10-07 16:25:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.02902v2

Deep Fusion: Capturing Dependencies in Contrastive Learning via Transformer Projection Heads

Contrastive Learning (CL) has emerged as a powerful method for training feature extraction models using unlabeled data. Recent studies suggest that incorporating a linear projection head post-backbone significantly enhances model performance. In this work, we investigate the use of a transformer model as a projection head within the CL framework, aiming to exploit the transformer's capacity for capturing long-range dependencies across embeddings to further improve performance. Our key contributions are fourfold: First, we introduce a novel application of transformers in the projection head role for contrastive learning, marking the first endeavor of its kind. Second, our experiments reveal a compelling "Deep Fusion" phenomenon where the attention mechanism progressively captures the correct relational dependencies among samples from the same class in deeper layers. Third, we provide a theoretical framework that explains and supports this "Deep Fusion" behavior. Finally, we demonstrate through experimental results that our model achieves superior performance compared to the existing approach of using a feed-forward layer.

Updated: 2024-10-07 16:25:02

标题: 深度融合：通过Transformer投影头捕捉对比学习中的依赖关系

摘要: 对比学习（CL）已经成为使用无标记数据训练特征提取模型的强大方法。最近的研究表明，将线性投影头后端显着地增强了模型性能。在这项工作中，我们研究了在CL框架内使用变压器模型作为投影头的方法，旨在利用变压器捕捉嵌入之间的长距离依赖关系的能力来进一步提高性能。我们的主要贡献有四点：首先，我们引入了变压器在对比学习中投影头角色的新应用，标志着这种尝试的首次。其次，我们的实验揭示了一个引人注目的“深度融合”现象，在更深层次上，注意机制逐渐捕捉到同一类别样本之间的正确关系依赖。第三，我们提供了一个解释和支持这种“深度融合”行为的理论框架。最后，通过实验结果，我们证明我们的模型相对于使用前馈层的现有方法实现了更优越的性能。

更新时间: 2024-10-07 16:25:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.18681v2

Presto! Distilling Steps and Layers for Accelerating Music Generation

Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge. Sound examples can be found at https://presto-music.github.io/web/.

Updated: 2024-10-07 16:24:18

标题: 神奇！加速音乐生成的步骤和层次

摘要: 尽管扩散文本到音乐（TTM）方法有所进展，但高效、高质量的生成仍然是一个挑战。我们介绍了Presto！一种通过减少采样步骤和每步成本来加速得分基扩散变压器的推理方法。为了减少步骤，我们开发了一种新的基于得分的分布匹配蒸馏（DMD）方法，用于EDM家族的扩散模型，这是TTM的第一种基于GAN的蒸馏方法。为了降低每步成本，我们对最近的层蒸馏方法进行了简单但有效的改进，通过更好地保留隐藏状态方差来改善学习。最后，我们将我们的步骤和层蒸馏方法结合在一起，形成一个双重方面的方法。我们独立评估了我们的步骤和层蒸馏方法，并显示每个都具有最佳性能。我们的组合蒸馏方法可以生成高质量的输出，提高了多样性，将我们的基础模型加速了10-18倍（32秒单声道/立体声44.1kHz的延迟为230/435ms，比类似最先进技术快15倍）- -据我们所知，这是最快的高质量TTM。可以在https://presto-music.github.io/web/找到声音示例。

更新时间: 2024-10-07 16:24:18

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.05167v1

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

To combat the memory bandwidth-bound nature of autoregressive LLM inference, previous research has proposed the speculative decoding frame-work. To perform speculative decoding, a small draft model proposes candidate continuations of the input sequence that are then verified in parallel by the base model. One way to specify the draft model, as used in the recent Medusa decoding framework, is as a collection of lightweight heads, called draft heads, that operate on the base model's hidden states. To date, all existing draft heads have been sequentially independent, meaning that they speculate tokens in the candidate continuation independently of any preceding tokens in the candidate continuation. In this work, we propose Hydra heads: a sequentially-dependent drop-in replacement for standard draft heads that significantly improves the accuracy of draft head speculation. We further explore the design space of Hydra head training objectives and architectures, and propose a carefully tuned Hydra head recipe, which we call Hydra++, that improves decoding throughput by up to 1.31x and 2.70x compared to Medusa decoding and autoregressive de-coding respectively. Overall, Hydra heads are a simple and well-motivated intervention on standard draft heads that significantly improve the end-to-end speed of draft head-based speculative decoding. We make our code publicly available at https://github.com/zankner/Hydra.

Updated: 2024-10-07 16:21:29

标题: Hydra: 为Medusa解码提供序贯依赖的草稿头

摘要: 为了解决自回归LLM推断的内存带宽约束问题，先前的研究提出了猜测解码框架。为了进行猜测解码，一个小型草案模型提出了候选延续输入序列的方案，然后由基本模型并行验证。作为最近Medusa解码框架中使用的一种方式来指定草案模型，是作为一组轻量级头部的草案头部，这些头部在基本模型的隐藏状态上操作。到目前为止，所有现有的草案头部都是顺序独立的，这意味着它们在候选延续中独立于候选延续中的任何前导标记进行推测。在这项工作中，我们提出Hydra头部：一种顺序依赖的标准草案头部的可替换部分，显著提高了草案头部推测的准确性。我们进一步探讨了Hydra头部训练目标和架构的设计空间，并提出了一个经过精心调整的Hydra头部配方，我们称之为Hydra++，相比于Medusa解码和自回归解码，将解码吞吐量提高了1.31倍和2.70倍。总的来说，Hydra头部是对标准草案头部的简单且有动机的干预，显著提高了基于草案头部的猜测解码的端到端速度。我们将我们的代码公开发布在https://github.com/zankner/Hydra。

更新时间: 2024-10-07 16:21:29

领域: cs.LG

下载: http://arxiv.org/abs/2402.05109v2

The SkipSponge Attack: Sponge Weight Poisoning of Deep Neural Networks

Sponge attacks aim to increase the energy consumption and computation time of neural networks. In this work, we present a novel sponge attack called SkipSponge. SkipSponge is the first sponge attack that is performed directly on the parameters of a pre-trained model using only a few data samples. Our experiments show that SkipSponge can successfully increase the energy consumption of image classification models, GANs, and autoencoders requiring fewer samples than the state-of-the-art (Sponge Poisoning). We show that poisoning defenses are ineffective if not adjusted specifically for the defense against SkipSponge (i.e., they decrease target layer bias values). Our work shows that SkipSponge is more effective on the GANs and the autoencoders than Sponge Poisoning. Additionally, SkipSponge is stealthier than Sponge Poisoning as it does not require significant changes in the victim model's weights. Our experiments indicate that SkipSponge can be performed even when an attacker has access to only 1% of the entire dataset and reaches up to 13% energy increase.

Updated: 2024-10-07 16:19:17

标题: 跳跃海绵攻击：深度神经网络的海绵权重中毒

摘要: Sponge攻击旨在增加神经网络的能量消耗和计算时间。在这项工作中，我们提出了一种名为SkipSponge的新型Sponge攻击。SkipSponge是第一个直接在预训练模型的参数上执行的Sponge攻击，仅使用少量数据样本。我们的实验表明，SkipSponge可以成功增加图像分类模型、GANs和需要更少样本的自编码器的能量消耗，而无需使用最先进的方法（Sponge中毒）。我们表明，如果防御措施未经过专门调整以抵御SkipSponge（即，它们会降低目标层的偏置值），则中毒防御是无效的。我们的工作表明，SkipSponge在GANs和自编码器上比Sponge中毒更有效。此外，SkipSponge比Sponge中毒更隐蔽，因为它不需要显著改变受害模型的权重。我们的实验表明，即使攻击者只能访问整个数据集的1%，也可以执行SkipSponge并实现高达13%的能量增加。

更新时间: 2024-10-07 16:19:17

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2402.06357v4

A Predictive and Optimization Approach for Enhanced Urban Mobility Using Spatiotemporal Data

In modern urban centers, effective transportation management poses a significant challenge, with traffic jams and inconsistent travel durations greatly affecting commuters and logistics operations. This study introduces a novel method for enhancing urban mobility by combining machine learning algorithms with live traffic information. We developed predictive models for journey time and congestion analysis using data from New York City's yellow taxi trips. The research employed a spatiotemporal analysis framework to identify traffic trends and implemented real-time route optimization using the GraphHopper API. This system determines the most efficient paths based on current conditions, adapting to changes in traffic flow. The methodology utilizes Spark MLlib for predictive modeling and Spark Streaming for processing data in real-time. By integrating historical data analysis with current traffic inputs, our system shows notable enhancements in both travel time forecasts and route optimization, demonstrating its potential for widespread application in major urban areas. This research contributes to ongoing efforts aimed at reducing urban congestion and improving transportation efficiency through advanced data-driven methods.

Updated: 2024-10-07 16:16:49

标题: 使用时空数据进行增强城市移动性的预测和优化方法

摘要: 在现代城市中心，有效的交通管理是一个重大挑战，交通拥堵和旅行时间不一致严重影响通勤者和物流运营。本研究引入了一种新颖的方法，通过将机器学习算法与实时交通信息相结合，来增强城市的交通流动性。我们利用纽约市黄色出租车行程数据开发了预测模型，用于旅行时间和拥堵分析。研究采用了时空分析框架来识别交通趋势，并利用GraphHopper API实现了实时路线优化。该系统根据当前条件确定最有效的路径，以适应交通流量的变化。该方法利用Spark MLlib进行预测建模和Spark Streaming进行实时数据处理。通过将历史数据分析与当前交通输入整合，我们的系统在旅行时间预测和路线优化方面表现出显著的改进，展示了其在主要城市地区广泛应用的潜力。这项研究为通过先进的数据驱动方法减少城市拥堵和提高交通效率的持续努力做出了贡献。

更新时间: 2024-10-07 16:16:49

领域: cs.LG

下载: http://arxiv.org/abs/2410.05358v1

Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation

Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 1% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on five different multimodal tasks across seven datasets. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.

Updated: 2024-10-07 16:15:36

标题: 具有缺失模态的鲁棒多模式学习通过参数高效适应

摘要: 多模态学习旨在利用来自多个来源的数据来提高下游任务的整体性能。数据中的冗余性有助于使多模态系统对某些相关模态中的缺失或损坏观测具有鲁棒性。然而，我们观察到，如果在测试时某个或多个模态缺失，几种现有的多模态网络的性能会显著下降。为了实现对缺失模态的鲁棒性，我们提出了一种简单且参数有效的适应过程，用于预训练的多模态网络。具体而言，我们利用中间特征的调制来补偿缺失的模态。我们展示了这种适应方法可以在某些情况下部分弥补由于缺失模态而导致的性能下降，并且在某些情况下胜过独立的、专门为可用模态组合训练的网络。所提出的适应方法仅需要极少量的参数（例如，总参数的不到1%），适用于各种模态组合和任务。我们进行了一系列实验，以突出我们提出的方法在七个数据集上的五个不同多模态任务中对缺失模态的鲁棒性。我们的提出的方法展示了在各种任务和数据集上的多功能性，并且在具有缺失模态的鲁棒多模态学习方面胜过现有方法。

更新时间: 2024-10-07 16:15:36

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.03986v6

Learning-Based Shielding for Safe Autonomy under Unknown Dynamics

Shielding is a common method used to guarantee the safety of a system under a black-box controller, such as a neural network controller from deep reinforcement learning (DRL), with simpler, verified controllers. Existing shielding methods rely on formal verification through Markov Decision Processes (MDPs), assuming either known or finite-state models, which limits their applicability to DRL settings with unknown, continuous-state systems. This paper addresses these limitations by proposing a data-driven shielding methodology that guarantees safety for unknown systems under black-box controllers. The approach leverages Deep Kernel Learning to model the systems' one-step evolution with uncertainty quantification and constructs a finite-state abstraction as an Interval MDP (IMDP). By focusing on safety properties expressed in safe linear temporal logic (safe LTL), we develop an algorithm that computes the maximally permissive set of safe policies on the IMDP, ensuring avoidance of unsafe states. The algorithms soundness and computational complexity are demonstrated through theoretical proofs and experiments on nonlinear systems, including a high-dimensional autonomous spacecraft scenario.

Updated: 2024-10-07 16:10:15

标题: 基于学习的屏蔽技术用于未知动力学下的安全自主行驶

摘要: 屏蔽是一种常用的方法，用于保证系统在黑盒控制器（如来自深度强化学习（DRL）的神经网络控制器）下的安全性，使用更简单、经过验证的控制器。现有的屏蔽方法依赖于通过马尔可夫决策过程（MDP）进行形式验证，假设已知或有限状态模型，这限制了它们在具有未知、连续状态系统的DRL环境中的适用性。本文通过提出一种数据驱动的屏蔽方法论来解决这些限制，该方法保证了在黑盒控制器下未知系统的安全性。该方法利用深度核学习来建模系统的一步演化，带有不确定性量化，并构建一个有限状态抽象作为区间MDP（IMDP）。通过专注于安全线性时间逻辑（安全LTL）中表达的安全属性，我们开发了一种算法，计算IMDP上安全策略的最大允许集，确保避免不安全状态。该算法的正确性和计算复杂性通过理论证明和在非线性系统上的实验进行了证明，包括一个高维自主航天器场景。

更新时间: 2024-10-07 16:10:15

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2410.07359v1

Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP)

Contrastive learning has emerged as a powerful method in deep learning, excelling at learning effective representations through contrasting samples from different distributions. However, neural collapse, where embeddings converge into a lower-dimensional space, poses a significant challenge, especially in semi-supervised and self-supervised setups. In this paper, we first theoretically analyze the effect of large learning rates on contrastive losses that solely rely on the cosine similarity metric, and derive a theoretical bound to mitigate this collapse. {Building on these insights, we propose CLOP, a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of orthogonal linear subspaces among class embeddings.} Unlike prior approaches that enforce a simplex ETF structure, CLOP focuses on subspace separation, leading to more distinguishable embeddings. Through extensive experiments on real and synthetic datasets, we demonstrate that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.

Updated: 2024-10-07 16:07:23

标题: 使用正交原型（CLOP）预防对比学习中的崩溃

摘要: 对比学习已经成为深度学习中一种强大的方法，通过对比来自不同分布的样本，学习有效的表示。然而，神经坍塌，即嵌入收敛到一个较低维空间，构成一个显著挑战，特别是在半监督和自监督设置中。在本文中，我们首先从理论上分析了大学习率对仅依赖余弦相似度度量的对比损失的影响，并提出了一个理论边界来减轻这种坍塌。在这些见解的基础上，我们提出了CLOP，一种新颖的半监督损失函数，旨在通过促进类别嵌入之间的正交线性子空间的形成来防止神经坍缩。与以前强制简单结构的方法不同，CLOP专注于子空间分离，导致更可区分的嵌入。通过在真实和合成数据集上进行大量实验，我们证明CLOP提高了性能，在不同学习率和批量大小之间提供更大的稳定性。

更新时间: 2024-10-07 16:07:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.18699v2

NoSENSE: Learned unrolled cardiac MRI reconstruction without explicit sensitivity maps

We present a novel learned image reconstruction method for accelerated cardiac MRI with multiple receiver coils based on deep convolutional neural networks (CNNs) and algorithm unrolling. In contrast to many existing learned MR image reconstruction techniques that necessitate coil-sensitivity map (CSM) estimation as a distinct network component, our proposed approach avoids explicit CSM estimation. Instead, it implicitly captures and learns to exploit the inter-coil relationships of the images. Our method consists of a series of novel learned image and k-space blocks with shared latent information and adaptation to the acquisition parameters by feature-wise modulation (FiLM), as well as coil-wise data-consistency (DC) blocks. Our method achieved PSNR values of 34.89 and 35.56 and SSIM values of 0.920 and 0.942 in the cine track and mapping track validation leaderboard of the MICCAI STACOM CMRxRecon Challenge, respectively, ranking 4th among different teams at the time of writing. Code will be made available at https://github.com/fzimmermann89/CMRxRecon

Updated: 2024-10-07 16:05:53

标题: NoSENSE：学习解卷积的心脏MRI重建，无需明确的灵敏度映射

摘要: 我们提出了一种基于深度卷积神经网络（CNNs）和算法展开的加速心脏MRI多接收线圈学习图像重建方法。与许多现有的学习MR图像重建技术不同，这些技术需要作为独立网络组件的线圈灵敏度图（CSM）估计，我们提出的方法避免了明确的CSM估计。相反，它隐式捕获并学习利用图像之间的线圈关系。我们的方法由一系列新颖的学习图像和k空间块组成，具有共享的潜在信息，并通过特征调制（FiLM）和线圈数据一致性（DC）块对采集参数进行调整。我们的方法在MICCAI STACOM CMRxRecon挑战的cine轨道和mapping轨道验证榜单中分别获得了34.89和35.56的PSNR值，以及0.920和0.942的SSIM值，在撰写本文时在不同团队中排名第四。代码将在https://github.com/fzimmermann89/CMRxRecon 上提供。

更新时间: 2024-10-07 16:05:53

领域: eess.IV,cs.CV,cs.LG,physics.med-ph

下载: http://arxiv.org/abs/2309.15608v2

A Recipe For Building a Compliant Real Estate Chatbot

In recent years, there has been significant effort to align large language models with human preferences. This work focuses on developing a chatbot specialized in the real estate domain, with an emphasis on incorporating compliant behavior to ensure it can be used without perpetuating discriminatory practices like steering and redlining, which have historically plagued the real estate industry in the United States. Building on prior work, we present a method for generating a synthetic general instruction-following dataset, along with safety data. Through extensive evaluations and benchmarks, we fine-tuned a llama-3-8B-instruct model and demonstrated that we can enhance it's performance significantly to match huge closed-source models like GPT-4o while making it safer and more compliant. We open-source the model, data and code to support further development and research in the community.

Updated: 2024-10-07 16:03:47

标题: 一个建立符合法规的房地产聊天机器人的配方

摘要: 近年来，人们已经付出了大量努力，以使大型语言模型与人类偏好相一致。这项工作专注于开发一个在房地产领域专门的聊天机器人，重点是融入符合性行为，以确保它可以在不延续美国房地产行业历史上困扰的指导和红线等歧视做法的情况下使用。在之前的研究基础上，我们提出了一种生成合成通用指令遵循数据集的方法，以及安全数据。通过广泛的评估和基准测试，我们对lama-3-8B-instruct模型进行了微调，并展示了我们可以显着提升其性能，以匹配像GPT-4o这样的巨大闭源模型，同时使其更安全和符合规定。我们开源模型、数据和代码，以支持社区中进一步的发展和研究。

更新时间: 2024-10-07 16:03:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.10860v1

Representation noising effectively prevents harmful fine-tuning on LLMs

Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs). While safety measures like preventing jailbreaks and improving safety guardrails are important, such measures can easily be reversed through fine-tuning. In this work, we propose Representation Noising (RepNoise), a defence mechanism that is effective even when attackers have access to the weights. RepNoise works by removing information about harmful representations such that it is difficult to recover them during fine-tuning. Importantly, our defence is also able to generalize across different subsets of harm that have not been seen during the defence process as long as they are drawn from the same distribution of the attack set. Our method does not degrade the general capability of LLMs and retains the ability to train the model on harmless tasks. We provide empirical evidence that the effectiveness of our defence lies in its "depth": the degree to which information about harmful representations is removed across all layers of the LLM.

Updated: 2024-10-07 16:01:49

标题: 表征噪声有效地防止在LLMs上进行有害的微调

摘要: 发布开源大型语言模型（LLMs）存在双重使用风险，因为恶意行为者可以轻松地调整这些模型以进行有害用途。即使没有公开释放权重，权重窃取和微调API也使封闭模型容易受到有害微调攻击（HFAs）的影响。虽然防范越狱和改进安全防护措施等安全措施很重要，但这些措施很容易通过微调来逆转。在这项工作中，我们提出了一种名为Representation Noising（RepNoise）的防御机制，即使攻击者可以访问权重，也能起到有效作用。RepNoise的工作原理是消除有关有害表示的信息，使在微调过程中难以恢复这些信息。重要的是，我们的防御还能够在未在防御过程中看到的不同有害子集之间进行泛化，只要它们来自攻击集的相同分布。我们的方法不会降低LLMs的一般能力，并保留训练模型进行无害任务的能力。我们提供实证证据表明，我们的防御的有效性在于其“深度”：在LLMs的所有层中消除有关有害表示的信息程度。

更新时间: 2024-10-07 16:01:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.14577v2

Differentiable and Learnable Wireless Simulation with Geometric Transformers

Modelling the propagation of electromagnetic wireless signals is critical for designing modern communication systems. Wireless ray tracing simulators model signal propagation based on the 3D geometry and other scene parameters, but their accuracy is fundamentally limited by underlying modelling assumptions and correctness of parameters. In this work, we introduce Wi-GATr, a fully-learnable neural simulation surrogate designed to predict the channel observations based on scene primitives (e.g., surface mesh, antenna position and orientation). Recognizing the inherently geometric nature of these primitives, Wi-GATr leverages an equivariant Geometric Algebra Transformer that operates on a tokenizer specifically tailored for wireless simulation. We evaluate our approach on a range of tasks (i.e., signal strength and delay spread prediction, receiver localization, and geometry reconstruction) and find that Wi-GATr is accurate, fast, sample-efficient, and robust to symmetry-induced transformations. Remarkably, we find our results also translate well to the real world: Wi-GATr demonstrates more than 35% lower error than hybrid techniques, and 70% lower error than a calibrated wireless tracer.

Updated: 2024-10-07 15:59:03

标题: 具有几何变换器的可微分和可学习的无线仿真

摘要: 模拟电磁无线信号传播对于设计现代通信系统至关重要。无线射线跟踪模拟器基于3D几何和其他场景参数模拟信号传播，但其准确性在根本上受到基础建模假设和参数正确性的限制。在这项工作中，我们介绍了Wi-GATr，一个完全可学习的神经仿真替代品，旨在根据场景基元（例如，表面网格、天线位置和方向）预测信道观测。认识到这些基元固有的几何性质，Wi-GATr利用了一个操作在专门为无线仿真定制的标记器上的等变几何代数变换器。我们在一系列任务（例如，信号强度和延迟传播预测、接收器定位和几何重建）上评估了我们的方法，发现Wi-GATr准确、快速、样本高效，并对对称诱导变换具有鲁棒性。值得注意的是，我们发现我们的结果在现实世界中也表现良好：Wi-GATr的错误率比混合技术低35%以上，比校准无线追踪器低70%。

更新时间: 2024-10-07 15:59:03

领域: cs.LG,cs.NI,eess.SP,stat.ML

下载: http://arxiv.org/abs/2406.14995v2

CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation

Models for streaming speech translation (ST) can achieve high accuracy and low latency if they're developed with vast amounts of paired audio in the source language and written text in the target language. Yet, these text labels for the target language are often pseudo labels due to the prohibitive cost of manual ST data labeling. In this paper, we introduce a methodology named Connectionist Temporal Classification guided modality matching (CTC-GMM) that enhances the streaming ST model by leveraging extensive machine translation (MT) text data. This technique employs CTC to compress the speech sequence into a compact embedding sequence that matches the corresponding text sequence, allowing us to utilize matched {source-target} language text pairs from the MT corpora to refine the streaming ST model further. Our evaluations with FLEURS and CoVoST2 show that the CTC-GMM approach can increase translation accuracy relatively by 13.9% and 6.4% respectively, while also boosting decoding speed by 59.7% on GPU.

Updated: 2024-10-07 15:58:03

标题: CTC-GMM：CTC引导的模态匹配用于快速准确的流式语音翻译

摘要: 流式语音翻译（ST）模型可以在使用大量配对音频和目标语言书面文本开发时实现高准确性和低延迟。然而，由于手动ST数据标注成本过高，这些目标语言的文本标签通常是伪标签。在本文中，我们介绍了一种名为连接主义时间分类引导模态匹配（CTC-GMM）的方法，通过利用大量的机器翻译（MT）文本数据来增强流式ST模型。这种技术利用CTC将语音序列压缩成与相应文本序列匹配的紧凑嵌入序列，使我们能够利用来自MT语料库的匹配{源-目标}语言文本对进一步改进流式ST模型。我们在FLEURS和CoVoST2上的评估结果显示，CTC-GMM方法可以分别相对提高13.9%和6.4%的翻译准确性，同时在GPU上提高59.7%的解码速度。

更新时间: 2024-10-07 15:58:03

领域: cs.CL,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.05146v1

Efficient Gradient Estimation of Variational Quantum Circuits with Lie Algebraic Symmetries

Hybrid quantum-classical optimization and learning strategies are among the most promising approaches to harnessing quantum information or gaining a quantum advantage over classical methods. However, efficient estimation of the gradient of the objective function in such models remains a challenge due to several factors including the exponential dimensionality of the Hilbert spaces, and information loss of quantum measurements. In this work, we developed an efficient framework that makes the Hadamard test efficiently applicable to gradient estimation for a broad range of quantum systems, an advance that had been wanting from the outset. Under certain mild structural assumptions, the gradient is estimated with the measurement shots that scale logarithmically with the number of parameters and with polynomial classical and quantum time. This is an exponential reduction in the measurement cost and polynomial speed up in time compared to existing works. The structural assumptions are (1) the dimension of the dynamical Lie algebra is polynomial in the number of qubits, and (2) the observable has a bounded Hilbert-Schmidt norm.

Updated: 2024-10-07 15:57:38

标题: 利用李代数对称性高效估计变分量子电路的梯度

摘要: 混合量子经典优化和学习策略是利用量子信息或在经典方法上获得量子优势的最有前景的方法之一。然而，在这些模型中，目标函数的梯度的有效估计仍然是一个挑战，原因包括希尔伯特空间的指数维度和量子测量的信息丢失。在这项工作中，我们开发了一个高效的框架，使Hadamard测试能够有效地应用于广泛范围的量子系统的梯度估计，这是一项一直以来都在期待的进展。在一定的结构假设下，梯度的估计与参数数量呈对数比例，并且具有多项式经典和量子时间。与现有作品相比，这是测量成本的指数级降低以及时间的多项式加速。结构假设是：（1）动态李代数的维度与量子比特数多项式相关，（2）可观察量具有有界的希尔伯特-施密特范数。

更新时间: 2024-10-07 15:57:38

领域: quant-ph,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.05108v2

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has garnered significant attention, which faces the challenge of decreasing performance when combining disparate models. Various techniques have been proposed for the aggregation of pre-trained LLMs, including model merging, Mixture-of-Experts, and stacking. Despite their merits, a comprehensive comparison and synergistic application of them to a diverse model zoo is yet to be adequately addressed. In light of this research gap, this paper introduces Model-GLUE, a holistic LLM scaling guideline. First, our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Utilizing the insights from the benchmark results, we formulate an strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures and initialization. Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture. Finally, evidenced by our experiments on a diverse Llama-2-based model zoo, Model-GLUE shows an average performance enhancement of 5.61%, achieved without additional training. Codes are available at: https://github.com/Model-GLUE/Model-GLUE.

Updated: 2024-10-07 15:55:55

标题: 模型-GLUE：野外大型模型动物园中的民主化LLM扩展

摘要: 随着大型语言模型（LLMs）在各种任务和专业领域的表现出色，基于现有模型的LLM扩展引起了广泛关注，但在组合不同模型时面临性能下降的挑战。已经提出了各种技术用于预训练LLMs的聚合，包括模型合并、专家混合和堆叠。尽管它们有优点，但对它们进行全面比较和协同应用到多样化模型库中尚未得到充分解决。鉴于这一研究空白，本文介绍了Model-GLUE，一个全面的LLM扩展指南。首先，我们的工作从现有LLM扩展技术的基准测试开始，特别是选择性合并和混合的变体。利用基准测试结果的见解，我们制定了一种选择和聚合具有不同架构和初始化特征的异构模型库的策略。我们的方法涉及合并模型的聚类和最佳合并策略选择，以及通过模型混合集成聚类。最后，通过我们在多样化的基于Llama-2的模型库上的实验证明，Model-GLUE表现出平均性能提升5.61％，而无需额外训练。代码可在以下网址获取：https://github.com/Model-GLUE/Model-GLUE.

更新时间: 2024-10-07 15:55:55

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.05357v1

BSG4Bot: Efficient Bot Detection based on Biased Heterogeneous Subgraphs

The detection of malicious social bots has become a crucial task, as bots can be easily deployed and manipulated to spread disinformation, promote conspiracy messages, and more. Most existing approaches utilize graph neural networks (GNNs)to capture both user profle and structural features,achieving promising progress. However, they still face limitations including the expensive training on large underlying graph, the performance degration when similar neighborhood patterns' assumption preferred by GNNs is not satisfied, and the dynamic features of bots in a highly adversarial context. Motivated by these limitations, this paper proposes a method named BSG4Bot with an intuition that GNNs training on Biased SubGraphs can improve both performance and time/space efficiency in bot detection. Specifically, BSG4Bot first pre-trains a classifier on node features efficiently to define the node similarities, and constructs biased subgraphs by combining the similarities computed by the pre-trained classifier and the node importances computed by Personalized PageRank (PPR scores). BSG4Bot then introduces a heterogeneous GNN over the constructed subgraphs to detect bots effectively and efficiently. The relatively stable features, including the content category and temporal activity features, are explored and incorporated into BSG4Bot after preliminary verification on sample data. The extensive experimental studies show that BSG4Bot outperforms the state-of-the-art bot detection methods, while only needing nearly 1/5 training time.

Updated: 2024-10-07 15:52:51

标题: BSG4Bot：基于偏倚异质子图的高效机器人检测

摘要: 检测恶意社交机器人已经成为一项至关重要的任务，因为机器人可以轻松部署和操纵，传播虚假信息，推广阴谋消息等。大多数现有方法利用图神经网络（GNNs）来捕捉用户个人资料和结构特征，取得了令人期待的进展。然而，它们仍然面临一些限制，包括在庞大的基础图上进行昂贵的训练、当GNNs偏爱相似邻域模式假设未满足时性能下降，以及在高度敌对环境中机器人的动态特性。受到这些限制的启发，本文提出了一种名为BSG4Bot的方法，其直觉是在偏见子图上训练GNNs可以提高机器人检测的性能和时间/空间效率。具体而言，BSG4Bot首先高效地在节点特征上预训练分类器以定义节点相似性，并通过将预训练分类器计算的相似性与Personalized PageRank（PPR分数）计算的节点重要性结合，构建偏见子图。BSG4Bot然后在构建的子图上引入异构GNN以有效而高效地检测机器人。在对样本数据进行初步验证后，还探索了相对稳定的特征，包括内容类别和时间活动特征，并将其纳入BSG4Bot。广泛的实验研究表明，BSG4Bot优于最先进的机器人检测方法，而且只需要近1/5的训练时间。

更新时间: 2024-10-07 15:52:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05356v1

Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together

Natural Language Processing (NLP) systems are increasingly taking the form of sophisticated modular pipelines, e.g., Retrieval Augmented Generation (RAG), where each module may involve a distinct Language Model (LM) and an associated prompt template. These compound systems often lack intermediate labels or gradient flow to optimize each module, making their end-to-end optimization challenging. Here we seek strategies to optimize both the module-level LM weights and the associated prompt templates of such systems to maximize a downstream task metric. We propose for the first time combining the weight and prompt optimization strategies to optimize a modular LM pipeline by alternating between the two to get the same LM to teach itself. In experiments with multi-hop QA, mathematical reasoning, and feature-based classification using mistral-7b, llama-2-7b, and llama-3-8b, these BetterTogether strategies optimizing the weights and prompts of a pipeline together outperform directly optimizing weights alone and prompts alone by up to 60% and 6%, respectively, on average across LMs and tasks. BetterTogether optimizer is released in DSPy at http://dspy.ai

Updated: 2024-10-07 15:52:48

标题: 微调和提示优化：共同发挥作用的两个重要步骤

摘要: 自然语言处理（NLP）系统越来越多地采取复杂的模块化流水线形式，例如检索增强生成（RAG），其中每个模块可能涉及不同的语言模型（LM）和相关的提示模板。这些复合系统通常缺乏中间标签或梯度流来优化每个模块，这使得它们的端到端优化具有挑战性。在这里，我们寻求优化这些系统的模块级LM权重和相关提示模板的策略，以最大化下游任务度量。我们首次提出将权重和提示优化策略相结合，通过在两者之间交替来优化模块化LM流水线，以使同一LM自我教导。在使用mistral-7b、llama-2-7b和llama-3-8b进行多跳QA、数学推理和基于特征的分类的实验中，这些BetterTogether策略将流水线的权重和提示一起优化，比单独优化权重和提示分别平均优于60％和6％。在各种LM和任务中。BetterTogether优化器已发布在http://dspy.ai的DSPy上。

更新时间: 2024-10-07 15:52:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.10930v2

Dr. Jekyll and Mr. Hyde: Two Faces of LLMs

Recently, we have witnessed a rise in the use of Large Language Models (LLMs), especially in applications like chatbots. Safety mechanisms are implemented to prevent improper responses from these chatbots. In this work, we bypass these measures for ChatGPT and Gemini by making them impersonate complex personas with personality characteristics that are not aligned with a truthful assistant. First, we create elaborate biographies of these personas, which we then use in a new session with the same chatbots. Our conversations then follow a role-play style to elicit prohibited responses. Using personas, we show that prohibited responses are provided, making it possible to obtain unauthorized, illegal, or harmful information in both ChatGPT and Gemini. We also introduce several ways of activating such adversarial personas, showing that both chatbots are vulnerable to this attack. With the same principle, we introduce two defenses that push the model to interpret trustworthy personalities and make it more robust against such attacks.

Updated: 2024-10-07 15:46:59

标题: Dr. Jekyll and Mr. Hyde: LLMs的两面

摘要: 最近，我们目睹了大型语言模型(LLMs)的使用增加，特别是在聊天机器人等应用中。安全机制被实施以防止这些聊天机器人给出不当回应。在这项工作中，我们绕过这些措施，通过让ChatGPT和Gemini模仿不符合真实助手特点的复杂人物角色来实现。首先，我们为这些人物角色创建了详尽的传记，然后将其用于与同一聊天机器人的新对话中。我们的对话随后采用角色扮演风格，以引发被禁止的回应。通过人物角色，我们展示了ChatGPT和Gemini提供了被禁止的回应，从而可能获取未经授权、非法或有害信息。我们还介绍了几种激活这种敌对人物角色的方法，表明这两个聊天机器人都容易受到这种攻击。根据相同原则，我们介绍了两种将模型引导解释为可信人格并使其更具抗攻击性的防御措施。

更新时间: 2024-10-07 15:46:59

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.03853v5

A Neural-Evolutionary Algorithm for Autonomous Transit Network Design

Planning a public transit network is a challenging optimization problem, but essential in order to realize the benefits of autonomous buses. We propose a novel algorithm for planning networks of routes for autonomous buses. We first train a graph neural net model as a policy for constructing route networks, and then use the policy as one of several mutation operators in a evolutionary algorithm. We evaluate this algorithm on a standard set of benchmarks for transit network design, and find that it outperforms the learned policy alone by up to 20% and a plain evolutionary algorithm approach by up to 53% on realistic benchmark instances.

Updated: 2024-10-07 15:45:38

标题: 一个用于自主交通网络设计的神经进化算法

摘要: 规划公共交通网络是一个具有挑战性的优化问题，但对于实现自动驾驶公交的好处至关重要。我们提出了一种新颖的算法，用于规划自动驾驶公交的路线网络。我们首先训练一个图神经网络模型作为构建路线网络的策略，然后将该策略作为进化算法中的多个变异操作符之一。我们在标准的公交网络设计基准测试集上评估了这种算法，并发现它在真实基准实例上的表现比仅使用学习策略高出高达20％，比使用简单进化算法方法高出高达53％。

更新时间: 2024-10-07 15:45:38

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2403.07917v3

LOTOS: Layer-wise Orthogonalization for Training Robust Ensembles

Transferability of adversarial examples is a well-known property that endangers all classification models, even those that are only accessible through black-box queries. Prior work has shown that an ensemble of models is more resilient to transferability: the probability that an adversarial example is effective against most models of the ensemble is low. Thus, most ongoing research focuses on improving ensemble diversity. Another line of prior work has shown that Lipschitz continuity of the models can make models more robust since it limits how a model's output changes with small input perturbations. In this paper, we study the effect of Lipschitz continuity on transferability rates. We show that although a lower Lipschitz constant increases the robustness of a single model, it is not as beneficial in training robust ensembles as it increases the transferability rate of adversarial examples across models in the ensemble. Therefore, we introduce LOTOS, a new training paradigm for ensembles, which counteracts this adverse effect. It does so by promoting orthogonality among the top-$k$ sub-spaces of the transformations of the corresponding affine layers of any pair of models in the ensemble. We theoretically show that $k$ does not need to be large for convolutional layers, which makes the computational overhead negligible. Through various experiments, we show LOTOS increases the robust accuracy of ensembles of ResNet-18 models by $6$ percentage points (p.p) against black-box attacks on CIFAR-10. It is also capable of combining with the robustness of prior state-of-the-art methods for training robust ensembles to enhance their robust accuracy by $10.7$ p.p.

Updated: 2024-10-07 15:43:28

标题: LOTOS：逐层正交训练稳健集成

摘要: 对抗样本的可转移性是一个众所周知的特性，它危及所有分类模型，即使这些模型只能通过黑盒查询访问。先前的研究表明，模型集合对于可转移性更具韧性：对模型集合中大多数模型有效的对抗样本的概率较低。因此，大多数正在进行的研究都集中在改善模型集合的多样性上。另一条先前的研究线表明，模型的Lipschitz连续性可以使模型更加稳健，因为它限制了模型输出随着输入微扰的变化。本文研究了Lipschitz连续性对可转移性率的影响。我们发现，虽然较低的Lipschitz常数增加了单个模型的稳健性，但在训练稳健模型集合时并不像在增加对模型集合中对抗样本的可转移性率那样有益。因此，我们引入了LOTOS，一种新的模型集合训练范例，以抵消这种不利影响。它通过促进模型集合中任意一对模型的相关仿射层的转换的前k个子空间之间的正交性来实现这一点。我们理论上表明，对于卷积层来说，k不需要很大，这使得计算开销可以忽略不计。通过各种实验证明，LOTOS将ResNet-18模型集合的稳健准确性提高了6个百分点（p.p），对抗CIFAR-10的黑盒攻击。它还能够与先前的最先进方法的稳健性相结合，以提高它们的稳健准确性10.7个百分点。

更新时间: 2024-10-07 15:43:28

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.05136v1

Falcon Mamba: The First Competitive Attention-free 7B Language Model

In this technical report, we present Falcon Mamba 7B, a new base large language model based on the novel Mamba architecture. Falcon Mamba 7B is trained on 5.8 trillion tokens with carefully selected data mixtures. As a pure Mamba-based model, Falcon Mamba 7B surpasses leading open-weight models based on Transformers, such as Mistral 7B, Llama3.1 8B, and Falcon2 11B. It is on par with Gemma 7B and outperforms models with different architecture designs, such as RecurrentGemma 9B and RWKV-v6 Finch 7B/14B. Currently, Falcon Mamba 7B is the best-performing Mamba model in the literature at this scale, surpassing both existing Mamba and hybrid Mamba-Transformer models, according to the Open LLM Leaderboard. Due to its architecture, Falcon Mamba 7B is significantly faster at inference and requires substantially less memory for long sequence generation. Despite recent studies suggesting that hybrid Mamba-Transformer models outperform pure architecture designs, we demonstrate that even the pure Mamba design can achieve similar, or even superior results compared to the Transformer and hybrid designs. We make the weights of our implementation of Falcon Mamba 7B publicly available on https://huggingface.co/tiiuae/falcon-mamba-7b, under a permissive license.

Updated: 2024-10-07 15:40:45

标题: 猎鹰曼巴：第一个竞争性无注意力7B语言模型

摘要: 在这篇技术报告中，我们介绍了Falcon Mamba 7B，这是一种基于新颖的Mamba架构的新型大型语言模型。Falcon Mamba 7B基于精心选择的数据混合训练了5.8万亿个标记。作为纯粹基于Mamba的模型，Falcon Mamba 7B超越了基于Transformer的领先的开放权重模型，如Mistral 7B，Llama3.1 8B和Falcon2 11B。它与Gemma 7B相媲美，并且优于具有不同架构设计的模型，例如RecurrentGemma 9B和RWKV-v6 Finch 7B/14B。目前，Falcon Mamba 7B是文献中在此规模上表现最佳的Mamba模型，超越了现有的Mamba和混合Mamba-Transformer模型，根据Open LLM Leaderboard。由于其架构，Falcon Mamba 7B在推理时速度显著快，并且在生成长序列时需要更少的内存。尽管最近的研究表明混合Mamba-Transformer模型优于纯架构设计，我们证明即使是纯Mamba设计也可以获得类似或甚至更优越的结果，与Transformer和混合设计相比。我们公开了Falcon Mamba 7B实现的权重，可以在https://huggingface.co/tiiuae/falcon-mamba-7b上获取，采用宽松许可证。

更新时间: 2024-10-07 15:40:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.05355v1

A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale

We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient thermo-fluidic cooling model, and (3) an augmented reality model of the supercomputer and central energy plant. The framework enables the study of "what-if" scenarios, system optimizations, and virtual prototyping of future systems. Using Frontier as a case study, we demonstrate the framework's capabilities by replaying six months of system telemetry for systematic verification and validation. Such a comprehensive analysis of a liquid-cooled exascale supercomputer is the first of its kind. ExaDigiT elucidates complex transient cooling system dynamics, runs synthetic or real workloads, and predicts energy losses due to rectification and voltage conversion. Throughout our paper, we present lessons learned to benefit HPC practitioners developing similar digital twins. We envision the digital twin will be a key enabler for sustainable, energy-efficient supercomputing.

Updated: 2024-10-07 15:36:50

标题: 一种数字孪生框架，用于在 Exascale 超级计算机上展示的液冷技术

摘要: 我们提出了ExaDigiT，一个用于开发液冷超级计算机全面数字孪生体的开源框架。它集成了三个主要模块：（1）资源分配器和功率模拟器，（2）瞬态热流体冷却模型，以及（3）超级计算机和中央能源厂的增强现实模型。该框架使得可以研究“假设”情景，系统优化，并对未来系统进行虚拟原型设计。通过以Frontier为案例研究，我们演示了该框架通过重放六个月的系统遥测数据进行系统验证和验证的能力。这种对液冷异型超级计算机的全面分析是首次。ExaDigiT阐明了复杂的瞬态冷却系统动态，运行合成或真实工作负载，并预测由于整流和电压转换而导致的能量损失。在我们的论文中，我们提出了为有类似数字孪生体开发的高性能计算从业者获益的经验教训。我们设想数字孪生体将成为可持续，节能的超级计算的关键推动者。

更新时间: 2024-10-07 15:36:50

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2410.05133v1

Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents

Recent research has explored the use of Large Language Models (LLMs) for tackling complex graph reasoning tasks. However, due to the intricacies of graph structures and the inherent limitations of LLMs in handling long text, current approaches often fail to deliver satisfactory accuracy, even on small-scale graphs and simple tasks. To address these challenges, we introduce GraphAgent-Reasoner, a fine-tuning-free framework that utilizes a multi-agent collaboration strategy for explicit and precise graph reasoning. Inspired by distributed graph computation theory, our framework decomposes graph problems into smaller, node-centric tasks that are distributed among multiple agents. The agents collaborate to solve the overall problem, significantly reducing the amount of information and complexity handled by a single LLM, thus enhancing the accuracy of graph reasoning. By simply increasing the number of agents, GraphAgent-Reasoner can efficiently scale to accommodate larger graphs with over 1,000 nodes. Evaluated on the GraphInstruct dataset, our framework demonstrates near-perfect accuracy on polynomial-time graph reasoning tasks, significantly outperforming the best available models, both closed-source and fine-tuned open-source variants. Our framework also demonstrates the capability to handle real-world graph reasoning applications such as webpage importance analysis.

Updated: 2024-10-07 15:34:14

标题: 基于LLM多智能体的可伸缩和准确的图推理

摘要: 最近的研究探索了使用大型语言模型（LLMs）来解决复杂图形推理任务的方法。然而，由于图形结构的复杂性和LLMs在处理长文本方面固有的限制，当前的方法通常无法在小规模图形和简单任务上提供令人满意的准确性。为了解决这些挑战，我们介绍了GraphAgent-Reasoner，这是一个无需微调的框架，利用多代理协作策略进行显式和精确的图形推理。受分布式图计算理论启发，我们的框架将图形问题分解为更小、以节点为中心的任务，这些任务分布在多个代理之间。代理协作解决整体问题，显著减少了单个LLM处理的信息量和复杂性，从而提高了图形推理的准确性。通过简单增加代理的数量，GraphAgent-Reasoner可以有效扩展以适应具有1000个以上节点的更大图形。在GraphInstruct数据集上评估，我们的框架在多项式时间图形推理任务上展现出几乎完美的准确性，明显优于最佳可用模型，无论是闭源还是微调的开源变体。我们的框架还展示了处理实际图形推理应用的能力，如网页重要性分析。

更新时间: 2024-10-07 15:34:14

领域: cs.AI

下载: http://arxiv.org/abs/2410.05130v1

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

Combinatorial Optimization is crucial to numerous real-world applications, yet still presents challenges due to its (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete within industrial solvers. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. The current best methods either rely on a collection of pre-trained policies, or on data-inefficient fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an approach that leverages memory to improve the adaptation of neural solvers at inference time. MEMENTO enables updating the action distribution dynamically based on the outcome of previous decisions. We validate its effectiveness on benchmark problems, in particular Traveling Salesman and Capacitated Vehicle Routing, demonstrating its superiority over tree-search and policy-gradient fine-tuning; and showing it can be zero-shot combined with diversity-based solvers. We successfully train all RL auto-regressive solvers on large instances, and show that MEMENTO can scale and is data-efficient. Overall, MEMENTO enables to push the state-of-the-art on 11 out of 12 evaluated tasks.

Updated: 2024-10-07 15:33:37

标题: 记忆增强型神经求解器：在组合优化中实现高效适应的方法

摘要: 组合优化对许多现实世界的应用至关重要，但由于其（NP-）难性，仍然存在挑战。在现有方法中，启发式方法通常在质量和可扩展性之间提供最佳权衡，使它们适用于工业应用。虽然强化学习（RL）提供了一个灵活的框架来设计启发式方法，但在工业求解器中，其采用仍然不完整。现有的学习方法仍然缺乏适应特定实例并充分利用可用的计算预算的能力。当前最佳方法要么依赖于一系列预先训练好的策略，要么依赖于数据效率低下的微调；因此未能充分利用预算限制内新获得的信息。为此，我们提出了一种利用内存改进神经求解器在推断时间的适应性的方法MEMENTO。MEMENTO使得根据先前决策的结果动态地更新行动分布成为可能。我们在基准问题上验证了其有效性，特别是旅行推销员和容量车辆路径规划，证明了其优于树搜索和策略梯度微调；并展示了它可以与基于多样性的求解器零-shot结合。我们成功地训练了所有RL自回归求解器的大规模实例，并展示了MEMENTO可以扩展并具有数据效率。总的来说，MEMENTO使得在评估的12个任务中有11个取得了最新技术水平。

更新时间: 2024-10-07 15:33:37

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.16424v2

Investigating Guiding Information for Adaptive Collocation Point Sampling in PINNs

Physics-informed neural networks (PINNs) provide a means of obtaining approximate solutions of partial differential equations and systems through the minimisation of an objective function which includes the evaluation of a residual function at a set of collocation points within the domain. The quality of a PINNs solution depends upon numerous parameters, including the number and distribution of these collocation points. In this paper we consider a number of strategies for selecting these points and investigate their impact on the overall accuracy of the method. In particular, we suggest that no single approach is likely to be "optimal" but we show how a number of important metrics can have an impact in improving the quality of the results obtained when using a fixed number of residual evaluations. We illustrate these approaches through the use of two benchmark test problems: Burgers' equation and the Allen-Cahn equation.

Updated: 2024-10-07 15:29:14

标题: 探讨用于适应性PINNs中拟合点抽样的指导信息

摘要: 物理信息神经网络（PINNs）通过最小化包括在域内一组配点处评估残差函数的目标函数，提供了获得偏微分方程和系统的近似解的方法。PINNs解的质量取决于许多参数，包括这些配点的数量和分布。本文考虑了选择这些点的几种策略，并研究它们对方法整体准确性的影响。特别地，我们建议没有单一方法可能是“最佳”的，但我们展示了一些重要指标如何在使用固定数量的残差评估时对提高结果质量产生影响。我们通过使用两个基准测试问题：Burgers方程和Allen-Cahn方程来说明这些方法。

更新时间: 2024-10-07 15:29:14

领域: cs.LG

下载: http://arxiv.org/abs/2404.12282v2

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function $V^{\pi}$ corresponding to a policy $\pi$ does not provide reliable information on how far is the policy $\pi$ from the optimal one. We present a novel model-free upper value iteration procedure $({\sf UVIP})$ that allows us to estimate the suboptimality gap $V^{\star}(x) - V^{\pi}(x)$ from above and to construct confidence intervals for $V^\star$. Our approach relies on upper bounds to the solution of the Bellman optimality equation via martingale approach. We provide theoretical guarantees for ${\sf UVIP}$ under general assumptions and illustrate its performance on a number of benchmark RL problems.

Updated: 2024-10-07 15:27:58

标题: UVIP：评估强化学习算法的无模型方法

摘要: 政策评估是强化学习（RL）中比较不同算法的重要工具。然而，即使对与政策$\pi$相对应的价值函数$V^{\pi}$有精确的了解，也不能可靠地提供关于政策$\pi$与最优政策之间的差距有多大的信息。我们提出了一种新颖的无模型上限值迭代过程$({\sf UVIP})$，可以从上方估计子最优性差距$V^{\star}(x) - V^{\pi}(x)$，并为$V^\star$构建置信区间。我们的方法依赖于通过鞅方法对贝尔曼最优性方程的解的上限边界。我们在一般假设下为${\sf UVIP}$提供理论保证，并在一些基准RL问题上展示其性能。

更新时间: 2024-10-07 15:27:58

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2105.02135v4

On the Convergence of Hermitian Dynamic Mode Decomposition

We study the convergence of Hermitian Dynamic Mode Decomposition (DMD) to the spectral properties of self-adjoint Koopman operators. Hermitian DMD is a data-driven method that approximates the Koopman operator associated with an unknown nonlinear dynamical system, using discrete-time snapshots. This approach preserves the self-adjointness of the operator in its finite-dimensional approximations. \rev{We prove that, under suitably broad conditions, the spectral measures corresponding to the eigenvalues and eigenfunctions computed by Hermitian DMD converge to those of the underlying Koopman operator}. This result also applies to skew-Hermitian systems (after multiplication by $i$), applicable to generators of continuous-time measure-preserving systems. Along the way, we establish a general theorem on the convergence of spectral measures for finite sections of self-adjoint operators, including those that are unbounded, which is of independent interest to the wider spectral community. We numerically demonstrate our results by applying them to two-dimensional Schr\"odinger equations.

Updated: 2024-10-07 15:21:37

标题: 关于埃尔米特动态模态分解的收敛性

摘要: 我们研究了埃尔米特动态模态分解（DMD）收敛到自伴随 Koopman 算子的谱特性。埃尔米特 DMD 是一种数据驱动方法，通过离散时间快照来近似与未知非线性动力系统相关的 Koopman 算子。这种方法在其有限维逼近中保留了算子的自伴随性。我们证明，在适当广泛的条件下，由埃尔米特 DMD 计算得到的特征值和特征函数对应的谱测度收敛到基础 Koopman 算子的谱。这个结果也适用于斜埃尔米特系统（乘以 $i$ 后），适用于连续时间保度量系统的生成器。在此过程中，我们建立了关于自伴随算子有限部分的谱测度收敛的一般定理，包括那些无界的算子，这对更广泛的谱社区是独立感兴趣的。我们通过将结果应用于二维薛定谔方程数值演示了我们的结果。

更新时间: 2024-10-07 15:21:37

领域: math.NA,cs.LG,cs.NA,math.DS,math.SP

下载: http://arxiv.org/abs/2401.03192v2

Legal Theory for Pluralistic Alignment

Legal theory can address two related key problems of alignment: pluralism and specification. Alignment researchers must determine how to specify what is concretely meant by vague principles like helpfulness and fairness and they must ensure that their techniques do not exclude alternative perspectives on life and values. The law faces these same problems. Leading legal theories suggest the law solves these problems through the interaction of rules and cases, where general rules promulgated by a democratic authority are given specific content through their application over time. Concrete applications allow for convergence on practical meaning while preserving space for disagreement on values. These approaches suggest improvements to existing democratic alignment processes that use AI to create cases that give content to rules, allowing for more pluralist alignment.

Updated: 2024-10-07 15:16:25

标题: 多元对齐的法律理论

摘要: 法律理论可以解决两个相关的关键问题：多元主义和规范化。对齐研究人员必须确定如何具体界定模糊原则，如帮助和公平，并确保他们的技术不排除对生活和价值观的替代观点。法律也面临着相同的问题。主流法律理论表明，法律通过规则和案例的互动来解决这些问题，民主权威颁布的一般规则通过时间的应用得到具体内容。具体应用可以实现对实际含义的收敛，同时保留对价值观不同意见的空间。这些方法建议改进现有的利用人工智能创建案例以赋予规则内容的民主对齐流程，从而实现更多元化的对齐。

更新时间: 2024-10-07 15:16:25

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2410.17271v1

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability

In this paper, we develop a unified framework for lower bound methods in statistical estimation and interactive decision making. Classical lower bound techniques -- such as Fano's inequality, Le Cam's method, and Assouad's lemma -- have been central to the study of minimax risk in statistical estimation, yet they are insufficient for the analysis of methods that collect data in an interactive manner. The recent minimax lower bounds for interactive decision making via the Decision-Estimation Coefficient (DEC) appear to be genuinely different from the classical methods. We propose a unified view of these distinct methodologies through a general algorithmic lower bound method. We further introduce a novel complexity measure, decision dimension, which facilitates the derivation of new lower bounds for interactive decision making. In particular, decision dimension provides a characterization of bandit learnability for any structured bandit model class. Further, we characterize the sample complexity of learning convex model class up to a polynomial gap with the decision dimension, addressing the remaining gap between upper and lower bounds in Foster et al. (2021, 2023).

Updated: 2024-10-07 15:14:58

标题: Assouad、Fano和Le Cam在交互中的应用：一种统一的下界框架和对于赌徒学习能力的描述

摘要: 在本文中，我们开发了一个统一的框架，用于统计估计和交互式决策制定中的下限方法。经典的下限技术，如Fano的不等式、Le Cam的方法和Assouad的引理，一直以来都是统计估计中极小风险研究的核心，然而它们对于以交互方式收集数据的方法的分析是不足够的。最近关于交互式决策制定的极小风险下限通过决策估计系数（DEC）似乎真正不同于经典方法。我们提出了一个通过一般算法下限方法统一观点这些不同方法论的方法。我们进一步引入了一种新颖的复杂度度量，决策维度，有助于推导交互式决策制定的新下限。特别地，决策维度为任何结构化赌博模型类提供了一个可学习性描述。此外，我们通过决策维度表征了学习凸模型类的样本复杂度，填补了Foster等人（2021, 2023）中上限和下限之间的剩余差距。

更新时间: 2024-10-07 15:14:58

领域: cs.LG,cs.IT,math.IT,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.05117v1

Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning

Controllable generation through Stable Diffusion (SD) fine-tuning aims to improve fidelity, safety, and alignment with human guidance. Existing reinforcement learning from human feedback methods usually rely on predefined heuristic reward functions or pretrained reward models built on large-scale datasets, limiting their applicability to scenarios where collecting such data is costly or difficult. To effectively and efficiently utilize human feedback, we develop a framework, HERO, which leverages online human feedback collected on the fly during model learning. Specifically, HERO features two key mechanisms: (1) Feedback-Aligned Representation Learning, an online training method that captures human feedback and provides informative learning signals for fine-tuning, and (2) Feedback-Guided Image Generation, which involves generating images from SD's refined initialization samples, enabling faster convergence towards the evaluator's intent. We demonstrate that HERO is 4x more efficient in online feedback for body part anomaly correction compared to the best existing method. Additionally, experiments show that HERO can effectively handle tasks like reasoning, counting, personalization, and reducing NSFW content with only 0.5K online feedback.

Updated: 2024-10-07 15:12:01

标题: 人类反馈高效的在线扩散模型微调强化学习

摘要: 通过稳定扩散（SD）微调可控生成旨在提高忠实度、安全性，并与人类指导保持一致。现有的来自人类反馈的强化学习方法通常依赖于预定义的启发式奖励函数或建立在大规模数据集上的预训练奖励模型，限制了它们在收集此类数据成本高昂或困难的情况下的适用性。为了有效并高效地利用人类反馈，我们开发了一个名为HERO的框架，该框架利用在线学习过程中即时收集的人类反馈。具体而言，HERO具有两个关键机制：（1）反馈对齐表示学习，一种在线训练方法，捕获人类反馈并为微调提供信息性学习信号，以及（2）反馈引导图像生成，涉及从SD的精细化初始化样本生成图像，实现更快收敛到评估者意图。我们演示了相比最佳现有方法，HERO在在线反馈进行身体部位异常矫正时效率提高了4倍。此外，实验证明HERO能够有效处理推理、计数、个性化和减少不安全内容等任务，仅需0.5K在线反馈。

更新时间: 2024-10-07 15:12:01

领域: cs.LG,cs.AI,cs.CV,cs.HC

下载: http://arxiv.org/abs/2410.05116v1

AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and Tree Search

Quantum computers have the potential to outperform classical computers in important tasks such as optimization and number factoring. They are characterized by limited connectivity, which necessitates the routing of their computational bits, known as qubits, to specific locations during program execution to carry out quantum operations. Traditionally, the NP-hard optimization problem of minimizing the routing overhead has been addressed through sub-optimal rule-based routing techniques with inherent human biases embedded within the cost function design. This paper introduces a solution that integrates Monte Carlo Tree Search (MCTS) with Reinforcement Learning (RL). Our RL-based router, called AlphaRouter, outperforms the current state-of-the-art routing methods and generates quantum programs with up to $20\%$ less routing overhead, thus significantly enhancing the overall efficiency and feasibility of quantum computing.

Updated: 2024-10-07 15:10:54

标题: AlphaRouter：使用强化学习和树搜索进行量子电路路由

摘要: 量子计算机有潜力在重要任务中超越经典计算机，如优化和数值分解。它们的特点是连接性有限，这需要在程序执行过程中将它们的计算比特，即量子比特，路由到特定位置以执行量子操作。传统上，最小化路由开销的NP难优化问题通过具有固有人类偏见的次优规则路由技术来解决，这些偏见嵌入在成本函数设计中。本文介绍了一种将蒙特卡洛树搜索（MCTS）与强化学习（RL）相结合的解决方案。我们基于RL的路由器，称为AlphaRouter，优于当前最先进的路由方法，并生成具有高达20%较少路由开销的量子程序，从而显著提高了量子计算的整体效率和可行性。

更新时间: 2024-10-07 15:10:54

领域: quant-ph,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.05115v1

Autoregressive Image Diffusion: Generation of Image Sequence and Application in MRI

Magnetic resonance imaging (MRI) is a widely used non-invasive imaging modality. However, a persistent challenge lies in balancing image quality with imaging speed. This trade-off is primarily constrained by k-space measurements, which traverse specific trajectories in the spatial Fourier domain (k-space). These measurements are often undersampled to shorten acquisition times, resulting in image artifacts and compromised quality. Generative models learn image distributions and can be used to reconstruct high-quality images from undersampled k-space data. In this work, we present the autoregressive image diffusion (AID) model for image sequences and use it to sample the posterior for accelerated MRI reconstruction. The algorithm incorporates both undersampled k-space and pre-existing information. Models trained with fastMRI dataset are evaluated comprehensively. The results show that the AID model can robustly generate sequentially coherent image sequences. In MRI applications, the AID can outperform the standard diffusion model and reduce hallucinations, due to the learned inter-image dependencies. The project code is available at https://github.com/mrirecon/aid.

Updated: 2024-10-07 15:10:03

标题: 自回归图像扩散：图像序列生成及在磁共振成像中的应用

摘要: 磁共振成像（MRI）是一种广泛使用的非侵入性成像方式。然而，一个持续存在的挑战在于平衡图像质量和成像速度。这种权衡主要受到k-空间测量的限制，这些测量在空间傅里叶域（k-空间）中遍历特定轨迹。这些测量通常被欠采样以缩短采集时间，导致图像伪影和质量受损。生成模型学习图像分布，并可以用来从欠采样的k-空间数据中重建高质量图像。在这项工作中，我们提出了用于图像序列的自回归图像扩散（AID）模型，并使用它来对加速的MRI重建进行后验采样。该算法结合了欠采样的k-空间和现有信息。使用fastMRI数据集训练的模型经过全面评估。结果显示AID模型能够稳健地生成顺序连贯的图像序列。在MRI应用中，AID可以胜过标准扩散模型并减少幻觉，这是由于学习到的图像间依赖性。项目代码可在https://github.com/mrirecon/aid找到。

更新时间: 2024-10-07 15:10:03

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.14327v4

Synthetic Generation of Dermatoscopic Images with GAN and Closed-Form Factorization

In the realm of dermatological diagnoses, where the analysis of dermatoscopic and microscopic skin lesion images is pivotal for the accurate and early detection of various medical conditions, the costs associated with creating diverse and high-quality annotated datasets have hampered the accuracy and generalizability of machine learning models. We propose an innovative unsupervised augmentation solution that harnesses Generative Adversarial Network (GAN) based models and associated techniques over their latent space to generate controlled semiautomatically-discovered semantic variations in dermatoscopic images. We created synthetic images to incorporate the semantic variations and augmented the training data with these images. With this approach, we were able to increase the performance of machine learning models and set a new benchmark amongst non-ensemble based models in skin lesion classification on the HAM10000 dataset; and used the observed analytics and generated models for detailed studies on model explainability, affirming the effectiveness of our solution.

Updated: 2024-10-07 15:09:50

标题: 使用GAN和闭式因式分解合成皮肤镜图像

摘要: 在皮肤病诊断领域，皮肤镜和显微镜皮肤病变图像的分析对于准确和早期检测各种医学状况至关重要，然而创建多样化和高质量的注释数据集所需的成本阻碍了机器学习模型的准确性和泛化能力。我们提出了一种创新的无监督增强解决方案，利用生成对抗网络（GAN）模型及相关技术在它们的潜在空间中生成控制的半自动发现的语义变化的皮肤镜图像。我们创造了合成图像以包含这些语义变化，并用这些图像增强训练数据。通过这种方法，我们能够提高机器学习模型的性能，并在HAM10000数据集上为皮肤病变分类设定了一个新的基准；并利用观察到的分析和生成的模型进行详细的模型可解释性研究，证实了我们解决方案的有效性。

更新时间: 2024-10-07 15:09:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.05114v1

Hyper-Representations: Learning from Populations of Neural Networks

This thesis addresses the challenge of understanding Neural Networks through the lens of their most fundamental component: the weights, which encapsulate the learned information and determine the model behavior. At the core of this thesis is a fundamental question: Can we learn general, task-agnostic representations from populations of Neural Network models? The key contribution of this thesis to answer that question are hyper-representations, a self-supervised method to learn representations of NN weights. Work in this thesis finds that trained NN models indeed occupy meaningful structures in the weight space, that can be learned and used. Through extensive experiments, this thesis demonstrates that hyper-representations uncover model properties, such as their performance, state of training, or hyperparameters. Moreover, the identification of regions with specific properties in hyper-representation space allows to sample and generate model weights with targeted properties. This thesis demonstrates applications for fine-tuning, and transfer learning to great success. Lastly, it presents methods that allow hyper-representations to generalize beyond model sizes, architectures, and tasks. The practical implications of that are profound, as it opens the door to foundation models of Neural Networks, which aggregate and instantiate their knowledge across models and architectures. Ultimately, this thesis contributes to the deeper understanding of Neural Networks by investigating structures in their weights which leads to more interpretable, efficient, and adaptable models. By laying the groundwork for representation learning of NN weights, this research demonstrates the potential to change the way Neural Networks are developed, analyzed, and used.

Updated: 2024-10-07 15:03:00

标题: 超级表示：从神经网络群体中学习

摘要: 这篇论文解决了通过神经网络的最基本组件来理解神经网络的挑战：权重，这些权重包含了学习到的信息并确定了模型的行为。这篇论文的核心问题是：我们能否从神经网络模型的群体中学习到一般的、与任务无关的表示？回答这个问题的关键贡献是超表征，这是一种自监督方法，用于学习神经网络权重的表示。这篇论文的工作发现，经过训练的神经网络模型确实在权重空间中占据着有意义的结构，这些结构可以被学习和利用。通过大量实验证明，超表征揭示了模型的性能、训练状态或超参数等属性。此外，在超表征空间中识别具有特定属性的区域允许对具有目标属性的模型权重进行采样和生成。这篇论文展示了微调和迁移学习的应用取得了巨大成功。最后，它提出了一些方法，使超表征能够在模型尺寸、架构和任务之外进行泛化。这样做的实际意义是深远的，因为它为神经网络的基础模型打开了大门，这些模型在模型和架构之间汇集和实例化它们的知识。最终，这篇论文通过研究权重中的结构，有助于更深入地理解神经网络，从而产生更具可解释性、高效性和适应性的模型。通过为神经网络权重的表示学习奠定基础，这项研究展示了改变神经网络开发、分析和使用方式的潜力。

更新时间: 2024-10-07 15:03:00

领域: cs.LG

下载: http://arxiv.org/abs/2410.05107v1

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation

We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation technique to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$. More precisely, we show that the mean-squared error can be decomposed into the sum of two terms: a leading one of order $\mathcal{O}(n^{-1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $\mathcal{O}(n^{-3/4})$ where the power $3/4$ can not be improved in general. We also extend this result to the $p$-th moment bound keeping optimal scaling of the remainders with respect to $n$. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.

Updated: 2024-10-07 15:02:48

标题: 随机梯度下降与Richardson-Romberg外推的非渐近分析

摘要: 我们解决了使用随机梯度下降（SGD）算法和恒定步长解决强凸和光滑最小化问题的问题。先前的研究建议将Polyak-Ruppert平均过程与Richardson-Romberg外推技术相结合，以减少SGD的渐近偏差，但会导致方差轻微增加。我们通过提供关于迭代次数$n$的结果估计器的均方误差的展开，显著扩展了先前的结果。更确切地说，我们表明均方误差可以分解为两个项的和：一个阶为$\mathcal{O}(n^{-1/2})$的主导项，明确依赖于渐近协方差矩阵的最小极大值，以及一个阶为$\mathcal{O}(n^{-3/4})$的二阶项，通常无法改进$3/4$的幂。我们还将这一结果扩展到保持相对于$n$的最优缩放的$p$-th矩边界。我们的分析依赖于SGD迭代被视为时间均匀马尔可夫链的性质。特别是，我们建立了这个链相对于适当定义的加权Wasserstein半度量是几何收敛的。

更新时间: 2024-10-07 15:02:48

领域: math.OC,cs.LG,stat.ML,62L20, 93E35

下载: http://arxiv.org/abs/2410.05106v1

AI-Enhanced Ethical Hacking: A Linux-Focused Experiment

This technical report investigates the integration of generative AI (GenAI), specifically ChatGPT, into the practice of ethical hacking through a comprehensive experimental study and conceptual analysis. Conducted in a controlled virtual environment, the study evaluates GenAI's effectiveness across the key stages of penetration testing on Linux-based target machines operating within a virtual local area network (LAN), including reconnaissance, scanning and enumeration, gaining access, maintaining access, and covering tracks. The findings confirm that GenAI can significantly enhance and streamline the ethical hacking process while underscoring the importance of balanced human-AI collaboration rather than the complete replacement of human input. The report also critically examines potential risks such as misuse, data biases, hallucination, and over-reliance on AI. This research contributes to the ongoing discussion on the ethical use of AI in cybersecurity and highlights the need for continued innovation to strengthen security defences.

Updated: 2024-10-07 15:02:47

标题: AI增强的道德黑客：一个以Linux为重点的实验

摘要: 这份技术报告通过全面的实验研究和概念分析，调查了生成式人工智能（GenAI），特别是ChatGPT，如何整合到道德黑客实践中。在受控虚拟环境中进行的研究评估了GenAI在Linux目标机器上的关键渗透测试阶段的有效性，这些机器在虚拟局域网（LAN）中运行，包括侦察、扫描和枚举、获取访问权限、保持访问权限和覆盖痕迹。研究结果证实，GenAI可以显著增强和简化道德黑客过程，同时强调了平衡人工智能协作的重要性，而不是完全替代人类输入。该报告还批判性地审视了潜在风险，如滥用、数据偏见、幻觉和对人工智能的过度依赖。这项研究为关于人工智能在网络安全中的道德使用的讨论做出了贡献，并强调了继续创新以加强安全防御的需要。

更新时间: 2024-10-07 15:02:47

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.05105v1

Mechanistic?

The rise of the term "mechanistic interpretability" has accompanied increasing interest in understanding neural models -- particularly language models. However, this jargon has also led to a fair amount of confusion. So, what does it mean to be "mechanistic"? We describe four uses of the term in interpretability research. The most narrow technical definition requires a claim of causality, while a broader technical definition allows for any exploration of a model's internals. However, the term also has a narrow cultural definition describing a cultural movement. To understand this semantic drift, we present a history of the NLP interpretability community and the formation of the separate, parallel "mechanistic" interpretability community. Finally, we discuss the broad cultural definition -- encompassing the entire field of interpretability -- and why the traditional NLP interpretability community has come to embrace it. We argue that the polysemy of "mechanistic" is the product of a critical divide within the interpretability community.

Updated: 2024-10-07 15:02:12

标题: 机械性的？

摘要: “机械性可解释性”一词的兴起伴随着人们对神经模型（特别是语言模型）理解的兴趣日益增加。然而，这个行话也导致了相当多的混淆。那么，“机械性”意味着什么？我们描述了解释性研究中这一术语的四种用法。最狭窄的技术定义要求具有因果性的主张，而更广泛的技术定义允许对模型内部的任何探索。然而，这个术语还有一个狭窄的文化定义，描述了一个文化运动。为了理解这种语义漂移，我们提供了自然语言处理解释性社区的历史以及独立、平行的“机械性”解释性社区的形成。最后，我们讨论了广义文化定义，涵盖了整个解释性领域，并解释了为何传统的自然语言处理解释性社区已经开始接受它。我们认为，“机械性”一词的多义性是解释性社区内部的重要分歧所导致的结果。

更新时间: 2024-10-07 15:02:12

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.09087v1

Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs

Knowledge comprehension capability is an important aspect of human intelligence. As Large Language Models (LLMs) are being envisioned as superhuman agents, it is crucial for them to be proficient at knowledge comprehension. However, existing benchmarking studies do not provide consistent, generalizable, and formal guarantees on the knowledge comprehension capabilities of LLMs. In this work, we propose the first framework to certify knowledge comprehension in LLMs with formal probabilistic guarantees. Our certificates are quantitative -- they consist of high-confidence, tight bounds on the probability that a target LLM gives the correct answer on any knowledge comprehension prompt sampled from a distribution. We design and certify novel specifications that precisely represent distributions of knowledge comprehension prompts leveraging knowledge graphs. We certify SOTA LLMs for specifications over the Wikidata5m knowledge graph. We find that the knowledge comprehension capability improves significantly with scaling the size of the models.

Updated: 2024-10-07 15:01:48

标题: 解读智能：LLM知识理解认证框架

摘要: 知识理解能力是人类智能的重要方面。随着大型语言模型（LLMs）被设想为超人类代理，它们在知识理解方面的熟练程度至关重要。然而，现有的基准研究并没有提供关于LLMs知识理解能力的一致、可推广和正式保证。在这项工作中，我们提出了第一个框架，用于在LLMs中具有形式概率保证的知识理解认证。我们的证书是定量的 - 它们包括对目标LLM在从分布中抽取的任何知识理解提示上给出正确答案的概率的高置信度、紧密边界。我们设计和认证了新颖的规范，精确地表示利用知识图表示的知识理解提示的分布。我们为Wikidata5m知识图上的规范认证了SOTA LLMs。我们发现，随着模型规模的扩大，知识理解能力显著提高。

更新时间: 2024-10-07 15:01:48

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.15929v2

Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers

Understanding transition pathways between meta-stable states in molecular systems is crucial to advance material design and drug discovery. However, unbiased molecular dynamics simulations are computationally infeasible due to the high energy barriers separating these states. Although recent machine learning techniques offer potential solutions, they are often limited to simple systems or rely on collective variables (CVs) derived from costly domain expertise. In this paper, we introduce a novel approach that trains diffusion path samplers (DPS) for transition path sampling (TPS) without the need for CVs. We recast the problem as an amortized sampling of the target path measure, minimizing the log-variance divergence between the path measure induced by our DPS and the target path measure. To ensure scalability for high-dimensional tasks, we introduce (1) a new off-policy training objective based on learning control variates with replay buffers and (2) a scale-based equivariant parameterization of the bias forces. We evaluate our approach, coined TPS-DPS, on a synthetic double-well potential and three peptides: Alanine Dipeptide, Polyproline Helix, and Chignolin. Results show that our approach produces more realistic and diverse transition pathways compared to existing baselines.

Updated: 2024-10-07 14:54:18

标题: 使用改进的离策略训练扩散路径采样器的过渡路径抽样

摘要: 理解分子系统中亚稳态之间的转变路径对于推进材料设计和药物发现至关重要。然而，由于分离这些状态的高能量壁垒，无偏分子动力学模拟在计算上是不可行的。尽管最近的机器学习技术提供了潜在解决方案，但它们通常局限于简单系统或依赖于昂贵领域专业知识衍生的集体变量（CVs）。本文介绍了一种新方法，该方法训练扩散路径采样器（DPS）用于转变路径采样（TPS），无需CVs。我们将问题重新表述为对目标路径测量的摊销抽样，最小化由我们的DPS引起的路径测量与目标路径测量之间的对数方差差异。为了保证高维任务的可扩展性，我们引入了（1）一种基于学习控制变量和重放缓冲区的新离线策略训练目标和（2）基于尺度的等变参数化偏置力。我们评估了我们的方法，命名为TPS-DPS，在一个合成的双井位势和三种肽：丙氨酸二肽，聚脯氨酸螺旋和Chignolin上。结果表明，与现有基线相比，我们的方法产生了更现实和多样化的转变路径。

更新时间: 2024-10-07 14:54:18

领域: cs.LG

下载: http://arxiv.org/abs/2405.19961v4

DreamSat: Towards a General 3D Model for Novel View Synthesis of Space Objects

Novel view synthesis (NVS) enables to generate new images of a scene or convert a set of 2D images into a comprehensive 3D model. In the context of Space Domain Awareness, since space is becoming increasingly congested, NVS can accurately map space objects and debris, improving the safety and efficiency of space operations. Similarly, in Rendezvous and Proximity Operations missions, 3D models can provide details about a target object's shape, size, and orientation, allowing for better planning and prediction of the target's behavior. In this work, we explore the generalization abilities of these reconstruction techniques, aiming to avoid the necessity of retraining for each new scene, by presenting a novel approach to 3D spacecraft reconstruction from single-view images, DreamSat, by fine-tuning the Zero123 XL, a state-of-the-art single-view reconstruction model, on a high-quality dataset of 190 high-quality spacecraft models and integrating it into the DreamGaussian framework. We demonstrate consistent improvements in reconstruction quality across multiple metrics, including Contrastive Language-Image Pretraining (CLIP) score (+0.33%), Peak Signal-to-Noise Ratio (PSNR) (+2.53%), Structural Similarity Index (SSIM) (+2.38%), and Learned Perceptual Image Patch Similarity (LPIPS) (+0.16%) on a test set of 30 previously unseen spacecraft images. Our method addresses the lack of domain-specific 3D reconstruction tools in the space industry by leveraging state-of-the-art diffusion models and 3D Gaussian splatting techniques. This approach maintains the efficiency of the DreamGaussian framework while enhancing the accuracy and detail of spacecraft reconstructions. The code for this work can be accessed on GitHub (https://github.com/ARCLab-MIT/space-nvs).

Updated: 2024-10-07 14:51:54

标题: DreamSat：面向太空物体新视角合成的通用三维模型

摘要: 新颖视角合成（NVS）能够生成一个场景的新图像或将一组2D图像转换为全面的3D模型。在空间领域意识的背景下，由于空间变得越来越拥挤，NVS能够准确地映射空间物体和碎片，提高空间操作的安全性和效率。同样，在交会和接近任务中，3D模型可以提供有关目标物体形状、大小和方向的详细信息，从而更好地规划和预测目标的行为。在这项工作中，我们探索了这些重建技术的泛化能力，旨在通过提出一种新的方法在单视图图像上进行3D航天器重建，即DreamSat，通过在高质量数据集上对Zero123 XL进行微调，一个最先进的单视图重建模型，并将其集成到DreamGaussian框架中。我们在一个30个之前未见过的航天器图像的测试集上展示了重建质量的一致提升，包括对比语言-图像预训练（CLIP）得分（+0.33%）、峰值信噪比（PSNR）（+2.53%）、结构相似性指数（SSIM）（+2.38%）和学习到的感知图像块相似性（LPIPS）（+0.16%）。我们的方法通过利用最先进的扩散模型和3D高斯光斑技术，解决了空间行业缺乏特定领域3D重建工具的问题。这种方法在提高航天器重建的准确性和细节的同时，保持了DreamGaussian框架的效率。此工作的代码可在GitHub上访问（https://github.com/ARCLab-MIT/space-nvs）。

更新时间: 2024-10-07 14:51:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.05097v1

On the Structure of Game Provenance and its Applications

Provenance in databases has been thoroughly studied for positive and for recursive queries, then for first-order (FO) queries, i.e., having negation but no recursion. Query evaluation can be understood as a two-player game where the opponents argue whether or not a tuple is in the query answer. This game-theoretic approach yields a natural provenance model for FO queries, unifying how and why-not provenance. Here, we study the fine-grain structure of game provenance. A game $G=(V,E)$ consists of positions $V$ and moves $E$ and can be solved by computing the well-founded model of a single, unstratifiable rule: \[ \text{win}(X) \leftarrow \text{move}(X, Y), \neg \, \text{win}(Y). \] In the solved game $G^{\lambda}$, the value of a position $x\,{\in}\,V$ is either won, lost, or drawn. This value is explained by the provenance $\mathscr{P}$(x), i.e., certain (annotated) edges reachable from $x$. We identify seven edge types that give rise to new kinds of provenance, i.e., potential, actual, and primary, and demonstrate that "not all moves are created equal". We describe the new provenance types, show how they can be computed while solving games, and discuss applications, e.g., for abstract argumentation frameworks.

Updated: 2024-10-07 14:48:56

标题: 关于游戏溯源结构及其应用

摘要: 数据库中的溯源已经被彻底研究用于正向和递归查询，然后用于一阶（FO）查询，即具有否定但没有递归的查询。查询评估可以理解为一个两人游戏，对手争论一个元组是否在查询答案中。这种博弈论方法为FO查询提供了一个自然的溯源模型，统一了如何和为什么不能溯源。在这里，我们研究了游戏溯源的细粒度结构。一个游戏$G=(V,E)$由位置$V$和移动$E$组成，可以通过计算单一的、不可分层的规则的基础模型来解决：\[ \text{win}(X) \leftarrow \text{move}(X, Y), \neg \, \text{win}(Y). \] 在已解决的游戏$G^{\lambda}$中，位置$x\,{\in}\,V$的值是赢了、输了或平局。这个值由溯源$\mathscr{P}$(x)解释，即从$x$可达的某些（带注释的）边。我们确定了七种边类型，产生了新类型的溯源，即潜在的、实际的和主要的，并且证明了“并不是所有移动都是平等的”。我们描述了新的溯源类型，展示了它们在解决游戏时如何计算，并讨论了应用，例如对于抽象论证框架。

更新时间: 2024-10-07 14:48:56

领域: cs.AI

下载: http://arxiv.org/abs/2410.05094v1

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures, collectively referred to as "hallucinations". Recent studies have demonstrated that LLMs' internal states encode information regarding the truthfulness of their outputs, and that this information can be utilized to detect errors. In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. We first discover that the truthfulness information is concentrated in specific tokens, and leveraging this property significantly enhances error detection performance. Yet, we show that such error detectors fail to generalize across datasets, implying that -- contrary to prior claims -- truthfulness encoding is not universal but rather multifaceted. Next, we show that internal representations can also be used for predicting the types of errors the model is likely to make, facilitating the development of tailored mitigation strategies. Lastly, we reveal a discrepancy between LLMs' internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one. Taken together, these insights deepen our understanding of LLM errors from the model's internal perspective, which can guide future research on enhancing error analysis and mitigation.

Updated: 2024-10-07 14:46:11

标题: LLMs知道的比它们展示的更多：关于LLM幻觉的内在表征

摘要: 大型语言模型（LLMs）通常会产生错误，包括事实不准确、偏见和推理失败，这些统称为“幻觉”。最近的研究表明，LLMs的内部状态编码了关于其输出真实性的信息，而这些信息可以用于检测错误。在这项工作中，我们展示了LLMs的内部表示编码了比以前认识到的更多关于真实性的信息。我们首先发现真实性信息集中在特定的标记中，并利用这个特性显著提高了错误检测性能。然而，我们发现这种错误检测器无法在数据集之间推广，这意味着--与先前的说法相反--真实性编码并不是普遍的，而是多方面的。接下来，我们展示了内部表示也可以用于预测模型可能会犯的错误类型，从而促进定制的缓解策略的开发。最后，我们揭示了LLMs的内部编码和外部行为之间的差异：它们可能编码了正确答案，但始终生成错误答案。综上所述，这些见解加深了我们对LLM错误的理解，从模型的内部视角，这可以指导未来研究以增强错误分析和缓解。

更新时间: 2024-10-07 14:46:11

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2410.02707v2

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

Influence functions provide a principled method to assess the contribution of individual training samples to a specific target. Yet, their high computational costs limit their applications on large-scale models and datasets. Existing methods proposed for influence function approximation have significantly reduced the computational overheads. However, they mostly suffer from inaccurate estimation due to the lack of strong convergence guarantees from the algorithm. The family of hyperpower methods are well-known for their rigorous convergence guarantees on matrix inverse approximation, while the matrix multiplication operation can involve intractable memory and computation costs on large-scale models. We propose HyperINF, an efficient and accurate influence function approximation method which leverages the hyperpower method, specifically Schulz's iterative algorithm. To deal with the computation-intensive matrix multiplication, we incorporate the generalized fisher information (GFIM) as a low-rank approximation of the Hessian matrix, which reduces the memory and computation overheads to constant costs independent of ranks on LoRA-tuned models. We first demonstrate the superior accuracy and stability of \method compared to other baselines through a synthetic convergence simulation for matrix inversion. We further validate the efficacy of \method through extensive real-world data attribution tasks, including mislabeled data detection and data selection for LLM and VLM fine-tuning. On LoRA-tuned models, HyperINF achieves superior downstream performance with minimal memory and computational overhead, while other baselines suffer from significant degradation. Our codebase is available at https://github.com/Blackzxy/HyperINF.

Updated: 2024-10-07 14:42:45

标题: HyperINF：释放Schulz方法在数据影响估计中的超强力量

摘要: 影响函数提供了一种有原则的方法来评估个别训练样本对特定目标的贡献。然而，它们的高计算成本限制了它们在大规模模型和数据集上的应用。现有的影响函数近似方法显著减少了计算开销。然而，它们大多数受到准确估计的影响，因为算法缺乏强大的收敛保证。超幂方法家族以其在矩阵逆近似中的严格收敛保证而闻名，而矩阵乘法运算可能会在大规模模型上涉及难以处理的内存和计算成本。我们提出了HyperINF，一种高效且准确的影响函数近似方法，利用了超幂方法，具体来说是Schulz的迭代算法。为了应对计算密集型的矩阵乘法，我们将广义费舍尔信息（GFIM）作为海森矩阵的低秩近似，从而将内存和计算开销降低到与LoRA调整模型的秩无关的恒定成本。我们首先通过矩阵求逆的合成收敛仿真展示了\method相对于其他基线的优越准确性和稳定性。我们通过广泛的真实世界数据归因任务进一步验证了\method的有效性，包括误标记数据检测和用于LLM和VLM微调的数据选择。在LoRA调整模型上，HyperINF实现了卓越的下游性能，同时其他基线遭受了严重的退化。我们的代码库可在https://github.com/Blackzxy/HyperINF 上找到。

更新时间: 2024-10-07 14:42:45

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.05090v1

AI in Archival Science -- A Systematic Review

The rapid expansion of records creates significant challenges in management, including retention and disposition, appraisal, and organization. Our study underscores the benefits of integrating artificial intelligence (AI) within the broad realm of archival science. In this work, we start by performing a thorough analysis to understand the current use of AI in this area and identify the techniques employed to address challenges. Subsequently, we document the results of our review according to specific criteria. Our findings highlight key AI driven strategies that promise to streamline record-keeping processes and enhance data retrieval efficiency. We also demonstrate our review process to ensure transparency regarding our methodology. Furthermore, this review not only outlines the current state of AI in archival science and records management but also lays the groundwork for integrating new techniques to transform archival practices. Our research emphasizes the necessity for enhanced collaboration between the disciplines of artificial intelligence and archival science.

Updated: 2024-10-07 14:39:12

标题: 档案学中的人工智能——系统综述

摘要: 记录的快速扩张在管理方面带来了重大挑战，包括保留和处置、评估和组织。我们的研究强调了在档案学广泛领域内整合人工智能（AI）的益处。在这项工作中，我们首先进行了彻底分析，以了解当前在这一领域中使用AI的情况，并确定用于应对挑战的技术。随后，我们根据特定标准记录了我们审查的结果。我们的发现突出了关键的AI驱动策略，承诺简化记录保管过程并提高数据检索效率。我们还展示了我们的审查过程，以确保对我们的方法论透明。此外，这项审查不仅概述了档案学和记录管理中AI的当前状态，还为整合新技术改造档案实践奠定了基础。我们的研究强调了增强人工智能和档案学之间跨学科合作的必要性。

更新时间: 2024-10-07 14:39:12

领域: cs.DL,cs.AI

下载: http://arxiv.org/abs/2410.09086v1

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (non-parametric knowledge of neural network activations/representations by retrieval) will result in an impossible triangle -- reliability, generalization, and locality can not be realized together in the lifelong editing settings. For long-term memory, directly editing the parameters will cause conflicts with irrelevant pretrained knowledge or previous edits (poor reliability and locality). For working memory, retrieval-based activations can hardly make the model understand the edits and generalize (poor generalization). Therefore, we propose WISE to bridge the gap between memories. In WISE, we design a dual parametric memory scheme, which consists of the main memory for the pretrained knowledge and a side memory for the edited knowledge. We only edit the knowledge in the side memory and train a router to decide which memory to go through when given a query. For continual editing, we devise a knowledge-sharding mechanism where different sets of edits reside in distinct subspaces of parameters, and are subsequently merged into a shared memory without conflicts. Extensive experiments show that WISE can outperform previous model editing methods and overcome the impossible triangle under lifelong model editing of question answering, hallucination, and out-of-distribution settings across trending LLM architectures, e.g., GPT, LLaMA, and Mistral. Code is available at https://github.com/zjunlp/EasyEdit.

Updated: 2024-10-07 14:35:14

标题: WISE：重新思考大型语言模型的终身模型编辑知识记忆

摘要: 大型语言模型需要知识更新，以满足不断增长的世界事实，并纠正虚构的响应，促进终身模型编辑方法的应用。更新的知识存在于记忆中是模型编辑的一个基本问题。在本文中，我们发现编辑长期记忆（直接模型参数）或工作记忆（通过检索非参数化知识的神经网络激活/表示）都将导致一个不可能的三角形--在终身编辑设置中无法实现可靠性、泛化性和局部性。长期记忆方面，直接编辑参数会导致与不相关的预训练知识或先前编辑的冲突（可靠性和局部性差）。对于工作记忆，基于检索的激活几乎无法使模型理解编辑并进行泛化（泛化性差）。因此，我们提出了WISE来弥合记忆之间的差距。在WISE中，我们设计了一个双参数记忆方案，其中包括用于预训练知识的主记忆和用于编辑知识的辅助记忆。我们仅编辑辅助记忆中的知识，并训练一个路由器在给定查询时决定通过哪个记忆。对于持续编辑，我们设计了一种知识分片机制，不同的编辑集存在于参数的不同子空间中，并随后合并到一个共享记忆中，避免冲突。广泛的实验表明，WISE可以优于先前的模型编辑方法，并在问题回答、虚构和超出分布设置等终身模型编辑中克服不可能的三角形，跨越流行的LLM架构，如GPT、LLaMA和Mistral。代码可在https://github.com/zjunlp/EasyEdit 上找到。

更新时间: 2024-10-07 14:35:14

领域: cs.CL,cs.AI,cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.14768v2

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

The advancements of language language models (LLMs) have piqued growing interest in developing LLM-based language agents to automate scientific discovery end-to-end, which has sparked both excitement and skepticism about the true capabilities of such agents. In this work, we argue that for an agent to fully automate scientific discovery, it must be able to complete all essential tasks in the workflow. Thus, we call for rigorous assessment of agents on individual tasks in a scientific workflow before making bold claims on end-to-end automation. To this end, we present ScienceAgentBench, a new benchmark for evaluating language agents for data-driven scientific discovery. To ensure the scientific authenticity and real-world relevance of our benchmark, we extract 102 tasks from 44 peer-reviewed publications in four disciplines and engage nine subject matter experts to validate them. We unify the target output for every task to a self-contained Python program file and employ an array of evaluation metrics to examine the generated programs, execution results, and costs. Each task goes through multiple rounds of manual validation by annotators and subject matter experts to ensure its annotation quality and scientific plausibility. We also propose two effective strategies to mitigate data contamination concerns. Using our benchmark, we evaluate five open-weight and proprietary LLMs, each with three frameworks: direct prompting, OpenHands, and self-debug. Given three attempts for each task, the best-performing agent can only solve 32.4% of the tasks independently and 34.3% with expert-provided knowledge. These results underscore the limited capacities of current language agents in generating code for data-driven discovery, let alone end-to-end automation for scientific research.

Updated: 2024-10-07 14:33:50

标题: ScienceAgentBench：朝着数据驱动科学发现的语言代理的严格评估

摘要: 语言模型（LLMs）的进步引起了对开发基于LLM的语言代理来自动化科学发现的兴趣增长，这引发了对这类代理真实能力的兴奋和怀疑。在这项工作中，我们认为，要完全自动化科学发现，代理必须能够完成工作流中的所有基本任务。因此，我们呼吁在大胆宣称实现端到端自动化之前，对代理在科学工作流中的各项任务进行严格评估。为此，我们提出了ScienceAgentBench，这是一个用于评估基于语言代理的数据驱动科学发现的新基准。为确保我们基准的科学真实性和现实相关性，我们从四个学科的44篇同行评议的出版物中提取了102项任务，并邀请了九位学科专家对其进行验证。我们将每个任务的目标输出统一为一个独立的Python程序文件，并采用一系列评估指标来检查生成的程序、执行结果和成本。每个任务都经过多轮手动验证，由注释者和学科专家确保其注释质量和科学可信度。我们还提出了两种有效的策略来减轻数据污染的担忧。利用我们的基准，我们评估了五个开源和专有的LLMs，每个都有三个框架：直接提示、OpenHands和自我调试。每个任务有三次尝试，表现最佳的代理只能独立解决32.4%的任务，并且在专家提供知识的情况下可以解决34.3%。这些结果凸显了当前语言代理在生成用于数据驱动发现的代码方面的有限能力，更不用说对科学研究的端到端自动化了。

更新时间: 2024-10-07 14:33:50

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05080v1

Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematics (MASK)

This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent's application to the physical realm, employing robots to provide a more captivating and interactive experience. The proposed system, named the Masquerading Animated Social Kinematic (MASK), leverages an anthropomorphic robot which interacts with guests using non-verbal interactions, including facial expressions and gestures. A behavior generation system based upon a finite-state machine structure effectively conditions robotic behavior to convey distinct personas. The MASK framework integrates a perception engine, a behavior selection engine, and a comprehensive action library to enable real-time, dynamic interactions with minimal human intervention in behavior design. Throughout the user subject studies, we examined whether the users could recognize the intended character in both personality- and film-character-based persona conditions. We conclude by discussing the role of personas in interactive agents and the factors to consider for creating an engaging user experience.

Updated: 2024-10-07 14:33:28

标题: 走向在互动机器人中嵌入动态人物角色：假扮动画社交运动学(MASK)

摘要: 这篇论文介绍了设计和开发一种创新的交互式机器人系统，以增强观众参与度，利用类似角色的人物。基于以人物驱动的对话代理为基础，这项工作将代理的应用扩展到物理领域，利用机器人提供更引人入胜和互动体验。所提出的系统名为Masquerading Animated Social Kinematic (MASK)，利用一个拟人化机器人与客人进行非言语互动，包括面部表情和手势。基于有限状态机结构的行为生成系统有效地调节机器人行为以传达不同的人物角色。MASK框架集成了感知引擎、行为选择引擎和全面的动作库，以实现与最少人工干预的实时、动态交互行为设计。在用户主体研究过程中，我们考察用户是否能识别出在个性和电影角色为基础的人物条件下的预期人物。最后，我们讨论了人物在交互式代理中的作用以及为创造引人入胜的用户体验而考虑的因素。

更新时间: 2024-10-07 14:33:28

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.10041v2

Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data

Foundation models have recently been shown to be strong data compressors. However, when accounting for their excessive parameter count, their compression ratios are actually inferior to standard compression algorithms. Moreover, naively reducing the number of parameters may not necessarily help as it leads to worse predictions and thus weaker compression. In this paper, we conduct a large-scale empirical study to investigate whether there is a sweet spot where competitive compression ratios with pre-trained vanilla transformers are possible. To this end, we train families of models on 165GB of raw byte sequences of either text, image, or audio data (and all possible combinations of the three) and then compress 1GB of out-of-distribution (OOD) data from each modality. We find that relatively small models (i.e., millions of parameters) can outperform standard general-purpose compression algorithms (gzip, LZMA2) and even domain-specific compressors (PNG, JPEG 2000, FLAC) - even when factoring in parameter count. We achieve, e.g., the lowest compression ratio of 0.49 on OOD audio data (vs. 0.54 for FLAC). To study the impact of model- and dataset scale, we conduct extensive ablations and hyperparameter sweeps, and we investigate the effect of unimodal versus multimodal training. We find that even small models can be trained to perform well on multiple modalities, but, in contrast to previously reported results with large-scale foundation models, transfer to unseen modalities is generally weak.

Updated: 2024-10-07 14:32:03

标题: 通过预训练的Transformer进行压缩：基于字节级多模态数据的研究

摘要: 最近已经证明基础模型是强大的数据压缩器。然而，考虑到它们过多的参数数量，它们的压缩比实际上不如标准压缩算法。此外，简单地减少参数数量并不一定有帮助，因为这会导致更差的预测，从而导致更弱的压缩。在本文中，我们进行了大规模的实证研究，以探讨是否存在一种甜蜜点，可以实现与预训练的普通变压器相竞争的压缩比。为此，我们在165GB的原始字节序列上训练了各种模型族，包括文本、图像或音频数据（以及三者的所有可能组合），然后对每种模态的1GB脱离分布（OOD）数据进行压缩。我们发现相对较小的模型（即百万参数）可以胜过标准通用压缩算法（gzip、LZMA2），甚至领域特定的压缩器（PNG、JPEG 2000、FLAC）-即使考虑参数数量。例如，我们在OOD音频数据上实现了最低的压缩比0.49（FLAC为0.54）。为了研究模型和数据集规模的影响，我们进行了大量的消融和超参数扫描，并研究了单模态与多模态训练的影响。我们发现即使小模型也可以在多个模态上表现良好，但是与以前报道的大规模基础模型的结果相比，对未见模态的迁移通常较弱。

更新时间: 2024-10-07 14:32:03

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2410.05078v1

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Large language models (LLMs) have driven significant advancements across diverse NLP tasks, with long-context models gaining prominence for handling extended inputs. However, the expanding key-value (KV) cache size required by Transformer architectures intensifies the memory constraints, particularly during the decoding phase, creating a significant bottleneck. Existing sparse attention mechanisms designed to address this bottleneck have two limitations: (1) they often fail to reliably identify the most relevant tokens for attention, and (2) they overlook the spatial coherence of token selection across consecutive Transformer layers, which can lead to performance degradation and substantial overhead in token selection. This paper introduces TidalDecode, a simple yet effective algorithm and system for fast and accurate LLM decoding through position persistent sparse attention. TidalDecode leverages the spatial coherence of tokens selected by existing sparse attention methods and introduces a few token selection layers that perform full attention to identify the tokens with the highest attention scores, while all other layers perform sparse attention with the pre-selected tokens. This design enables TidalDecode to substantially reduce the overhead of token selection for sparse attention without sacrificing the quality of the generated results. Evaluation on a diverse set of LLMs and tasks shows that TidalDecode closely matches the generative performance of full attention methods while reducing the LLM decoding latency by up to 2.1x.

Updated: 2024-10-07 14:30:27

标题: 潮汐解码：具有位置持久稀疏注意力的快速准确LLM解码

摘要: 大型语言模型（LLMs）在各种自然语言处理任务中取得了显著进展，长上下文模型在处理扩展输入方面日益受到重视。然而，Transformer架构所需的扩展键值（KV）缓存大小增加了内存约束，特别是在解码阶段，造成了重大瓶颈。现有的旨在解决这一瓶颈的稀疏注意机制存在两个局限性：（1）它们经常无法可靠地识别最相关的注意力标记，和（2）它们忽视了在连续Transformer层之间进行标记选择的空间连贯性，这可能导致性能下降和标记选择中的显着开销。本文介绍了TidalDecode，这是一种简单而有效的算法和系统，通过位置持久的稀疏注意力实现了快速准确的LLM解码。TidalDecode利用了现有稀疏注意力方法选择的标记的空间连贯性，并引入了一些标记选择层，通过全注意力来识别具有最高注意力分数的标记，而其他层则使用预选的标记进行稀疏注意力。这种设计使得TidalDecode能够大幅减少稀疏注意力中标记选择的开销，而不会牺牲生成结果的质量。对各种LLMs和任务的评估表明，TidalDecode在减少LLM解码延迟高达2.1倍的同时，与全注意力方法的生成性能基本匹配。

更新时间: 2024-10-07 14:30:27

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.05076v1

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipping a shared base LLM with distinct domain-specific capabilities, activated via self-optimized routing. This allows for dynamic and capability-specific handling of various target tasks, enhancing overall capabilities, without extensive human-labeled data and added parameters. Our empirical results reveal that specializing LLMs may exhibit potential trade-offs in performances on non-specialized tasks. On the other hand, our Self-MoE demonstrates substantial improvements (6.5%p on average) over the base LLM across diverse benchmarks such as knowledge, reasoning, math, and coding. It also consistently outperforms other methods, including instance merging and weight merging, while offering better flexibility and interpretability by design with semantic experts and routing. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.

Updated: 2024-10-07 14:27:56

标题: 自身-MoE：朝向具有自身专业专家的组合大语言模型

摘要: 我们提出了Self-MoE，这是一种将单体LLM转变为自我专业化专家的组合模块系统，命名为MiXSE（MiXture of Self-specialized Experts）。我们的方法利用自我专业化，利用自动生成的合成数据构建专家模块，每个模块都配备了独特的领域特定能力，通过自我优化的路由激活。这样可以动态和能力特定地处理各种目标任务，增强整体能力，而无需大量人工标记的数据和添加参数。我们的实证结果显示，专门化的LLMs在非专业化任务上可能存在潜在的性能权衡。另一方面，我们的Self-MoE在各种基准测试中显著改进（平均提高6.5个百分点）基础LLM。它还在设计上通过语义专家和路由提供更好的灵活性和可解释性，始终优于其他方法，包括实例合并和权重合并。我们的研究结果突出了模块化的关键作用，Self-MoE对多个基础LLM的适用性，以及自我提升实现高效、可扩展和适应性系统的潜力。

更新时间: 2024-10-07 14:27:56

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12034v2

An Elementary Predictor Obtaining $2\sqrt{T}+1$ Distance to Calibration

Blasiok et al. [2023] proposed distance to calibration as a natural measure of calibration error that unlike expected calibration error (ECE) is continuous. Recently, Qiao and Zheng [2024] gave a non-constructive argument establishing the existence of an online predictor that can obtain $O(\sqrt{T})$ distance to calibration in the adversarial setting, which is known to be impossible for ECE. They leave as an open problem finding an explicit, efficient algorithm. We resolve this problem and give an extremely simple, efficient, deterministic algorithm that obtains distance to calibration error at most $2\sqrt{T}+1$.

Updated: 2024-10-07 14:26:56

标题: 一个基本的预测器，获得到校准的距离为$2\sqrt{T}+1$

摘要: Blasiok等人[2023]提出了与预期校准误差（ECE）不同的校准误差的自然度量距离，这种度量是连续的。最近，Qiao和Zheng [2024]通过非构造性论证，证明了存在一种在线预测器可以在对抗设置中获得$O(\sqrt{T})$的校准距离，这对于ECE来说是不可能的。他们将寻找一个明确的、高效的算法作为一个未解决的问题。我们解决了这个问题，并提出了一个非常简单、高效、确定性的算法，可以获得最多$2\sqrt{T}+1$的校准误差距离。

更新时间: 2024-10-07 14:26:56

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2402.11410v2

Function Gradient Approximation with Random Shallow ReLU Networks with Control Applications

Neural networks are widely used to approximate unknown functions in control. A common neural network architecture uses a single hidden layer (i.e. a shallow network), in which the input parameters are fixed in advance and only the output parameters are trained. The typical formal analysis asserts that if output parameters exist to approximate the unknown function with sufficient accuracy, then desired control performance can be achieved. A long-standing theoretical gap was that no conditions existed to guarantee that, for the fixed input parameters, required accuracy could be obtained by training the output parameters. Our recent work has partially closed this gap by demonstrating that if input parameters are chosen randomly, then for any sufficiently smooth function, with high-probability there are output parameters resulting in $O((1/m)^{1/2})$ approximation errors, where $m$ is the number of neurons. However, some applications, notably continuous-time value function approximation, require that the network approximates the both the unknown function and its gradient with sufficient accuracy. In this paper, we show that randomly generated input parameters and trained output parameters result in gradient errors of $O((\log(m)/m)^{1/2})$, and additionally, improve the constants from our prior work. We show how to apply the result to policy evaluation problems.

Updated: 2024-10-07 14:26:49

标题: 用具有控制应用的随机浅ReLU网络近似函数梯度

摘要: 神经网络被广泛用于在控制中逼近未知函数。常见的神经网络架构使用单隐藏层（即浅层网络），其中输入参数事先固定，只有输出参数进行训练。典型的形式分析断言，如果输出参数存在以足够准确度逼近未知函数，则可以实现所需的控制性能。长期存在的理论难题是，没有条件可以保证对于固定的输入参数，通过训练输出参数可以获得所需的准确度。我们最近的工作部分地弥补了这一空白，通过证明如果选择随机输入参数，那么对于任何足够平滑的函数，高概率下存在导致$O((1/m)^{1/2})$逼近误差的输出参数，其中$m$是神经元的数量。然而，一些应用，特别是连续时间值函数逼近，要求网络以足够准确度逼近未知函数及其梯度。在本文中，我们展示了随机生成的输入参数和训练的输出参数导致$O((\log(m)/m)^{1/2})$梯度误差，并且进一步改进了我们先前工作中的常数。我们展示了如何将结果应用于策略评估问题。

更新时间: 2024-10-07 14:26:49

领域: cs.LG,cs.SY,eess.SY,math.OC,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.05071v1

Function-Space MCMC for Bayesian Wide Neural Networks

Bayesian Neural Networks represent a fascinating confluence of deep learning and probabilistic reasoning, offering a compelling framework for understanding uncertainty in complex predictive models. In this paper, we investigate the use of the preconditioned Crank-Nicolson algorithm and its Langevin version to sample from the reparametrised posterior distribution of the weights as the widths of Bayesian Neural Networks grow larger. In addition to being robust in the infinite-dimensional setting, we prove that the acceptance probabilities of the proposed methods approach 1 as the width of the network increases, independently of any stepsize tuning. Moreover, we examine and compare how the mixing speeds of the underdamped Langevin Monte Carlo, the preconditioned Crank-Nicolson and the preconditioned Crank-Nicolson Langevin samplers are influenced by changes in the network width in some real-world cases. Our findings suggest that, in wide Bayesian Neural Networks configurations, the preconditioned Crank-Nicolson method allows for more efficient sampling of the reparametrised posterior distribution, as evidenced by a higher effective sample size and improved diagnostic results compared with the other analysed algorithms.

Updated: 2024-10-07 14:23:50

标题: 贝叶斯宽神经网络的函数空间MCMC

摘要: Bayesian神经网络代表了深度学习和概率推理的迷人交汇，为理解复杂预测模型中的不确定性提供了引人入胜的框架。在本文中，我们研究了预处理的Crank-Nicolson算法及其Langevin版本在从重新参数化后的权重后验分布中进行采样时的应用，当Bayesian神经网络的宽度增大时。除了在无限维度的设置中具有稳健性外，我们证明了所提出方法的接受概率在网络宽度增加时趋近于1，与任何步长调整无关。此外，我们研究并比较了在一些真实情况下，欠阻尼Langevin蒙特卡洛、预处理的Crank-Nicolson和预处理的Crank-Nicolson Langevin采样器的混合速度如何受网络宽度的变化影响。我们的发现表明，在宽Bayesian神经网络配置中，预处理的Crank-Nicolson方法允许更有效地对重新参数化后的后验分布进行采样，这表现在更高的有效样本量和改进的诊断结果上，与其他分析的算法相比。

更新时间: 2024-10-07 14:23:50

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2408.14325v3

Data-Centric Foundation Models in Computational Healthcare: A Survey

The advent of foundation models (FMs) as an emerging suite of AI techniques has struck a wave of opportunities in computational healthcare. The interactive nature of these models, guided by pre-training data and human instructions, has ignited a data-centric AI paradigm that emphasizes better data characterization, quality, and scale. In healthcare AI, obtaining and processing high-quality clinical data records has been a longstanding challenge, ranging from data quantity, annotation, patient privacy, and ethics. In this survey, we investigate a wide range of data-centric approaches in the FM era (from model pre-training to inference) towards improving the healthcare workflow. We discuss key perspectives in AI security, assessment, and alignment with human values. Finally, we offer a promising outlook of FM-based analytics to enhance the performance of patient outcome and clinical workflow in the evolving landscape of healthcare and medicine. We provide an up-to-date list of healthcare-related foundation models and datasets at https://github.com/Yunkun-Zhang/Data-Centric-FM-Healthcare .

Updated: 2024-10-07 14:20:42

标题: 计算医疗中的数据中心基础模型：一项调查

摘要: 基金会模型（FMs）作为新兴的一套人工智能技术，已经在计算健康领域带来了一波机遇。这些模型的互动特性，由预训练数据和人类指令引导，点燃了一个强调更好数据特征化、质量和规模的数据中心人工智能范式。在医疗人工智能领域，获取和处理高质量的临床数据记录一直是一个长期存在的挑战，涉及数据量、标注、患者隐私和道德等问题。在这项调查中，我们探讨了FM时代数据中心方法的广泛应用（从模型预训练到推断）以改进医疗工作流程。我们讨论了人工智能安全、评估和与人类价值观的一致性等关键视角。最后，我们提供了一个基于FM的分析的有希望的前景，以增强患者结果和临床工作流程的表现，在医疗和医学领域不断变化的格局中。我们在https://github.com/Yunkun-Zhang/Data-Centric-FM-Healthcare上提供了一个最新的与健康相关的基金会模型和数据集列表。

更新时间: 2024-10-07 14:20:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.02458v2

Autonomous Evaluation and Refinement of Digital Agents

We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control. We experiment with multiple evaluation models that trade off between inference cost, modularity of design, and accuracy. We validate the performance of these models in several popular benchmarks for digital agents, finding between 74.4 and 92.9% agreement with oracle evaluation metrics. Finally, we use these evaluators to improve the performance of existing agents via fine-tuning and inference-time guidance. Without any additional supervision, we improve state-of-the-art performance by 29% on the popular benchmark WebArena, and achieve around 75% relative improvement in device control settings.

Updated: 2024-10-07 14:19:37

标题: 数字代理的自主评估和改进

摘要: 我们展示了通用域自动评估器可以显著提高网络导航和设备控制代理的性能。我们尝试了多种评估模型，权衡推理成本、设计模块化和准确性。我们在多个数字代理的流行基准测试中验证了这些模型的性能，发现与神谕评估指标之间的一致性在74.4%到92.9%之间。最后，我们使用这些评估器通过微调和推理时间指导来提高现有代理的性能。在没有额外监督的情况下，我们在流行基准测试WebArena上将最先进的性能提高了29%，在设备控制设置中实现了约75%的相对改进。

更新时间: 2024-10-07 14:19:37

领域: cs.AI

下载: http://arxiv.org/abs/2404.06474v3

PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. However, the limited context awareness of LLMs degrades their performance on RAG tasks. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference, and many are tailored to specific position embeddings. In this paper, we propose Position-Embedding-Agnostic attention Re-weighting (PEAR), which enhances the context awareness of LLMs with zero inference overhead. Specifically, on a proxy task focused on context copying, we first detect heads which suppress the models' context awareness thereby diminishing RAG performance. To weaken the impact of these heads, we re-weight their outputs with learnable coefficients. The LLM (with frozen parameters) is optimized by adjusting these coefficients to minimize loss on the proxy task. As a result, the coefficients are optimized to values less than one, thereby reducing their tendency to suppress RAG performance. During inference, the optimized coefficients are fixed to re-weight these heads, regardless of the specific task at hand. Our proposed PEAR offers two major advantages over previous approaches: (1) It introduces zero additional inference overhead in terms of memory usage or inference time, while outperforming competitive baselines in accuracy and efficiency across various RAG tasks. (2) It is independent of position embedding algorithms, ensuring broader applicability.

Updated: 2024-10-07 14:17:44

标题: PEAR: 位置嵌入无关的注意力重新加权增强检索增强生成，零推理开销

摘要: 大型语言模型（LLM）与检索增强生成（RAG）相结合，为网络搜索引入了一种新的范式。然而，LLM的有限上下文意识降低了它们在RAG任务上的性能。现有的增强上下文意识的方法通常效率低下，在推断过程中产生时间或内存开销，并且许多方法都针对特定位置嵌入进行了定制。在本文中，我们提出了Position-Embedding-Agnostic attention Re-weighting（PEAR），它通过零推断开销增强LLM的上下文意识。具体来说，在一个聚焦于上下文复制的代理任务上，我们首先检测抑制模型上下文意识的头部，从而降低RAG性能。为了减弱这些头部的影响，我们使用可学习的系数来重新加权它们的输出。通过调整这些系数来最小化代理任务上的损失，优化了具有冻结参数的LLM。因此，这些系数被优化为小于一的值，从而降低了它们抑制RAG性能的倾向。在推断过程中，优化后的系数被固定以重新加权这些头部，无论手头的具体任务是什么。我们提出的PEAR相比以前的方法具有两个主要优点：（1）在内存使用或推断时间方面没有额外的推断开销，同时在各种RAG任务中的准确性和效率方面优于竞争基线。（2）它独立于位置嵌入算法，确保更广泛的适用性。

更新时间: 2024-10-07 14:17:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.19745v2

Geodesic Optimization for Predictive Shift Adaptation on EEG data

Electroencephalography (EEG) data is often collected from diverse contexts involving different populations and EEG devices. This variability can induce distribution shifts in the data $X$ and in the biomedical variables of interest $y$, thus limiting the application of supervised machine learning (ML) algorithms. While domain adaptation (DA) methods have been developed to mitigate the impact of these shifts, such methods struggle when distribution shifts occur simultaneously in $X$ and $y$. As state-of-the-art ML models for EEG represent the data by spatial covariance matrices, which lie on the Riemannian manifold of Symmetric Positive Definite (SPD) matrices, it is appealing to study DA techniques operating on the SPD manifold. This paper proposes a novel method termed Geodesic Optimization for Predictive Shift Adaptation (GOPSA) to address test-time multi-source DA for situations in which source domains have distinct $y$ distributions. GOPSA exploits the geodesic structure of the Riemannian manifold to jointly learn a domain-specific re-centering operator representing site-specific intercepts and the regression model. We performed empirical benchmarks on the cross-site generalization of age-prediction models with resting-state EEG data from a large multi-national dataset (HarMNqEEG), which included $14$ recording sites and more than $1500$ human participants. Compared to state-of-the-art methods, our results showed that GOPSA achieved significantly higher performance on three regression metrics ($R^2$, MAE, and Spearman's $\rho$) for several source-target site combinations, highlighting its effectiveness in tackling multi-source DA with predictive shifts in EEG data analysis. Our method has the potential to combine the advantages of mixed-effects modeling with machine learning for biomedical applications of EEG, such as multicenter clinical trials.

Updated: 2024-10-07 14:14:54

标题: 大脑电图数据的预测性移位适应的测地线优化

摘要: 脑电图（EEG）数据通常来自不同背景下的不同人群和不同EEG设备。这种变化可能会导致数据$X$和感兴趣的生物医学变量$y$发生分布转移，从而限制了监督机器学习（ML）算法的应用。虽然已经开发了领域自适应（DA）方法来减轻这些转移的影响，但当$X$和$y$同时发生分布转移时，这些方法往往会遇到困难。由于EEG的最先进的ML模型通过空间协方差矩阵来表示数据，这些矩阵位于对称正定（SPD）矩阵的黎曼流形上，因此在SPD流形上研究DA技术具有吸引力。本文提出了一种名为Geodesic Optimization for Predictive Shift Adaptation（GOPSA）的新方法，用于解决测试时多源DA的情况，其中源域具有不同的$y$分布。GOPSA利用黎曼流形的测地结构共同学习一个代表特定领域重新定位操作符的回归模型。我们在来自一个包括14个记录站点和1500多名参与者的大型跨国数据集（HarMNqEEG）的静息态EEG数据上进行了实证基准测试。与最先进的方法相比，我们的结果表明，GOPSA在几个源目标站点组合上在三个回归指标（$R^2$、MAE和Spearman's $\rho$）上取得了显著更高的性能，突显了其在处理EEG数据分析中具有预测转移的多源DA方面的有效性。我们的方法有潜力将混合效应建模与机器学习相结合，用于EEG的生物医学应用，如多中心临床试验。

更新时间: 2024-10-07 14:14:54

领域: stat.ML,cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.03878v2

FLAME: Adaptive and Reactive Concept Drift Mitigation for Federated Learning Deployments

This paper presents Federated Learning with Adaptive Monitoring and Elimination (FLAME), a novel solution capable of detecting and mitigating concept drift in Federated Learning (FL) Internet of Things (IoT) environments. Concept drift poses significant challenges for FL models deployed in dynamic and real-world settings. FLAME leverages an FL architecture, considers a real-world FL pipeline, and proves capable of maintaining model performance and accuracy while addressing bandwidth and privacy constraints. Introducing various features and extensions on previous works, FLAME offers a robust solution to concept drift, significantly reducing computational load and communication overhead. Compared to well-known lightweight mitigation methods, FLAME demonstrates superior performance in maintaining high F1 scores and reducing resource utilisation in large-scale IoT deployments, making it a promising approach for real-world applications.

Updated: 2024-10-07 14:14:39

标题: FLAME: 适应性和反应性的概念漂移缓解方法，用于联邦学习部署

摘要: 本文介绍了具有自适应监控和消除功能的联邦学习（FL）解决方案FLAME，该解决方案能够在联邦学习（FL）物联网（IoT）环境中检测和减轻概念漂移。概念漂移对于部署在动态和真实世界环境中的FL模型构成重大挑战。FLAME利用FL架构，考虑了真实世界的FL流水线，并证明了在处理带宽和隐私约束的同时能够保持模型性能和准确性。通过引入各种功能和对先前工作的扩展，FLAME提供了一个强大的解决方案来应对概念漂移，显著减少了计算负载和通信开销。与众所周知的轻量级减轻方法相比，FLAME在保持高F1分数和减少资源利用方面表现出优越性，在大规模IoT部署中具有潜在的实际应用前景。

更新时间: 2024-10-07 14:14:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.01386v2

SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification

Data curation is the problem of how to collect and organize samples into a dataset that supports efficient learning. Despite the centrality of the task, little work has been devoted towards a large-scale, systematic comparison of various curation methods. In this work, we take steps towards a formal evaluation of data curation strategies and introduce SELECT, the first large-scale benchmark of curation strategies for image classification. In order to generate baseline methods for the SELECT benchmark, we create a new dataset, ImageNet++, which constitutes the largest superset of ImageNet-1K to date. Our dataset extends ImageNet with 5 new training-data shifts, each approximately the size of ImageNet-1K itself, and each assembled using a distinct curation strategy. We evaluate our data curation baselines in two ways: (i) using each training-data shift to train identical image classification models from scratch (ii) using the data itself to fit a pretrained self-supervised representation. Our findings show interesting trends, particularly pertaining to recent methods for data curation such as synthetic data generation and lookup based on CLIP embeddings. We show that although these strategies are highly competitive for certain tasks, the curation strategy used to assemble the original ImageNet-1K dataset remains the gold standard. We anticipate that our benchmark can illuminate the path for new methods to further reduce the gap. We release our checkpoints, code, documentation, and a link to our dataset at https://github.com/jimmyxu123/SELECT.

Updated: 2024-10-07 14:14:38

标题: SELECT：图像分类数据整理策略的大规模基准测试

摘要: 数据策展是如何收集和组织样本以支持高效学习的问题。尽管这项任务的核心性很高，但对各种策展方法进行大规模、系统性比较的工作很少。在这项工作中，我们迈出了向数据策展战略的正式评估的步伐，并引入了SELECT，这是首个面向图像分类的策展战略的大规模基准。为了为SELECT基准生成基线方法，我们创建了一个新数据集ImageNet++，这是迄今为止最大的ImageNet-1K的超集。我们的数据集使用5种新的训练数据转移扩展了ImageNet，每种转移大致相当于ImageNet-1K本身的大小，并且每种都是使用不同的策展策略组装而成。我们以两种方式评估我们的数据策展基线：(i) 使用每个训练数据转移从头开始训练相同的图像分类模型；(ii) 使用数据本身来拟合一个预训练的自监督表示。我们的发现显示出有趣的趋势，特别是与最近的数据策展方法相关，如合成数据生成和基于CLIP嵌入的查找。我们表明，尽管这些策略在某些任务上具有很高的竞争力，但用于组装原始ImageNet-1K数据集的策展策略仍然是金标准。我们预计我们的基准可以为新方法进一步缩小差距的道路提供启示。我们在https://github.com/jimmyxu123/SELECT发布我们的检查点、代码、文档和数据集链接。

更新时间: 2024-10-07 14:14:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.05057v1

WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions

Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a sufficient litmus test of a model's utility in clinical practice. A model that can be trusted for practice should have a correspondence between explanation and clinical determination, yet no prior research has examined the attention fidelity of these models and their effect on ground truth explanations. We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WDs). We focus on two existing mental health and well-being datasets: (a) Multi-label Classification-based MultiWD, and (b) WellXplain for evaluating attention mechanism veracity against expert-labeled explanations. The labels are based on Halbert Dunn's theory of wellness, which gives grounding to our evaluation. We reveal four surprising results about LMs/LLMs: (1) Despite their human-like capabilities, GPT-3.5/4 lag behind RoBERTa, and MedAlpaca, a fine-tuned LLM on WellXplain fails to deliver any remarkable improvements in performance or explanations. (2) Re-examining LMs' predictions based on a confidence-oriented loss function reveals a significant performance drop. (3) Across all LMs/LLMs, the alignment between attention and explanations remains low, with LLMs scoring a dismal 0.0. (4) Most mental health-specific LMs/LLMs overlook domain-specific knowledge and undervalue explanations, causing these discrepancies. This study highlights the need for further research into their consistency and explanations in mental health and well-being.

Updated: 2024-10-07 14:08:13

标题: WellDunn：关于语言模型和大型语言模型在识别健康维度方面的鲁棒性和可解释性

摘要: 语言模型（LMs）被提出用于心理健康应用，其中不良结果的风险增加意味着预测性能可能不足以成为模型在临床实践中实用性的唯一标准。一个可以信赖用于实践的模型应该在解释和临床决定之间有对应关系，然而之前没有研究检查这些模型的注意力忠实度及其对真实解释的影响。我们引入了一个评估设计，重点关注LMs在识别健康维度（Wellness Dimensions，WDs）方面的鲁棒性和可解释性。我们关注两个现有的心理健康和幸福数据集：（a）基于多标签分类的MultiWD，以及（b）WellXplain用于评估注意力机制的真实性对比专家标记的解释。标签基于Halbert Dunn的健康理论，为我们的评估提供了基础。我们揭示了关于LMs/LLMs的四个令人惊讶的结果：（1）尽管具有类似于人类的能力，GPT-3.5/4落后于RoBERTa和MedAlpaca，一个在WellXplain上微调的LLM未能在性能或解释方面带来任何显著改进。（2）基于自信导向损失函数重新审视LMs的预测结果显示出显著的性能下降。（3）在所有LMs/LLMs中，注意力和解释之间的对齐仍然很低，LLMs得分为0.0。（4）大多数特定于心理健康的LMs/LLMs忽视了领域特定知识，并且低估解释，导致这些差异。这项研究突出了进一步研究心理健康和幸福领域中的一致性和解释的必要性。

更新时间: 2024-10-07 14:08:13

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12058v4

MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning

In recent years, Multi-modal Foundation Models (MFMs) and Embodied Artificial Intelligence (EAI) have been advancing side by side at an unprecedented pace. The integration of the two has garnered significant attention from the AI research community. In this work, we attempt to provide an in-depth and comprehensive evaluation of the performance of MFM s on embodied task planning, aiming to shed light on their capabilities and limitations in this domain. To this end, based on the characteristics of embodied task planning, we first develop a systematic evaluation framework, which encapsulates four crucial capabilities of MFMs: object understanding, spatio-temporal perception, task understanding, and embodied reasoning. Following this, we propose a new benchmark, named MFE-ETP, characterized its complex and variable task scenarios, typical yet diverse task types, task instances of varying difficulties, and rich test case types ranging from multiple embodied question answering to embodied task reasoning. Finally, we offer a simple and easy-to-use automatic evaluation platform that enables the automated testing of multiple MFMs on the proposed benchmark. Using the benchmark and evaluation platform, we evaluated several state-of-the-art MFMs and found that they significantly lag behind human-level performance. The MFE-ETP is a high-quality, large-scale, and challenging benchmark relevant to real-world tasks.

Updated: 2024-10-07 14:05:05

标题: MFE-ETP：多模态基础模型在实体任务规划上的全面评估基准

摘要: 近年来，多模态基础模型（MFM）和具身人工智能（EAI）以前所未有的速度并行发展。这两者的整合引起了人工智能研究界的重视。在这项工作中，我们试图对MFM在具身任务规划上的表现进行深入全面的评估，旨在揭示它们在这一领域的能力和局限性。为此，基于具身任务规划的特点，我们首先开发了一个系统性评估框架，涵盖了MFM的四个关键能力：对象理解、时空感知、任务理解和具身推理。接着，我们提出了一个名为MFE-ETP的新基准，特点是其复杂多变的任务场景、典型但多样的任务类型、难度各异的任务实例以及丰富的测试案例类型，从多重具身问题回答到具身任务推理。最后，我们提供了一个简单易用的自动评估平台，可实现对所提出基准上多个MFM的自动测试。利用这个基准和评估平台，我们评估了几种最先进的MFM，并发现它们明显落后于人类水平表现。MFE-ETP是一个质量高、规模大、具有挑战性的基准，与现实世界任务相关。

更新时间: 2024-10-07 14:05:05

领域: cs.AI

下载: http://arxiv.org/abs/2407.05047v3

In-the-loop Hyper-Parameter Optimization for LLM-Based Automated Design of Heuristics

Large Language Models (LLMs) have shown great potential in automatically generating and optimizing (meta)heuristics, making them valuable tools in heuristic optimization tasks. However, LLMs are generally inefficient when it comes to fine-tuning hyper-parameters of the generated algorithms, often requiring excessive queries that lead to high computational and financial costs. This paper presents a novel hybrid approach, LLaMEA-HPO, which integrates the open source LLaMEA (Large Language Model Evolutionary Algorithm) framework with a Hyper-Parameter Optimization (HPO) procedure in the loop. By offloading hyper-parameter tuning to an HPO procedure, the LLaMEA-HPO framework allows the LLM to focus on generating novel algorithmic structures, reducing the number of required LLM queries and improving the overall efficiency of the optimization process. We empirically validate the proposed hybrid framework on benchmark problems, including Online Bin Packing, Black-Box Optimization, and the Traveling Salesperson Problem. Our results demonstrate that LLaMEA-HPO achieves superior or comparable performance compared to existing LLM-driven frameworks while significantly reducing computational costs. This work highlights the importance of separating algorithmic innovation and structural code search from parameter tuning in LLM-driven code optimization and offers a scalable approach to improve the efficiency and effectiveness of LLM-based code generation.

Updated: 2024-10-07 14:04:31

标题: 基于LLM的启发式自动设计的在环超参数优化

摘要: 大型语言模型（LLMs）在自动生成和优化（元）启发式方面展现出巨大潜力，使它们成为启发式优化任务中宝贵的工具。然而，LLMs在微调生成算法的超参数方面通常效率低下，经常需要大量查询，导致高计算和财务成本。本文提出了一种新颖的混合方法LLaMEA-HPO，将开源的LLaMEA（大型语言模型进化算法）框架与循环中的超参数优化（HPO）过程相结合。通过将超参数调整转移到HPO过程，LLaMEA-HPO框架允许LLM专注于生成新颖的算法结构，减少所需的LLM查询数量，提高优化过程的整体效率。我们在基准问题上对所提出的混合框架进行了实证验证，包括在线装箱、黑盒优化和旅行推销员问题。我们的结果表明，LLaMEA-HPO在显著降低计算成本的同时，实现了优越或可比的性能，相较于现有的LLM驱动框架。这项工作强调了在LLM驱动的代码优化中将算法创新和结构代码搜索与参数调整分离的重要性，并提供了一种可扩展的方法来提高LLM-based代码生成的效率和有效性。

更新时间: 2024-10-07 14:04:31

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2410.16309v1

Learning Long Range Dependencies on Graphs via Random Walks

Message-passing graph neural networks (GNNs) excel at capturing local relationships but struggle with long-range dependencies in graphs. In contrast, graph transformers (GTs) enable global information exchange but often oversimplify the graph structure by representing graphs as sets of fixed-length vectors. This work introduces a novel architecture that overcomes the shortcomings of both approaches by combining the long-range information of random walks with local message passing. By treating random walks as sequences, our architecture leverages recent advances in sequence models to effectively capture long-range dependencies within these walks. Based on this concept, we propose a framework that offers (1) more expressive graph representations through random walk sequences, (2) the ability to utilize any sequence model for capturing long-range dependencies, and (3) the flexibility by integrating various GNN and GT architectures. Our experimental evaluations demonstrate that our approach achieves significant performance improvements on 19 graph and node benchmark datasets, notably outperforming existing methods by up to 13\% on the PascalVoc-SP and COCO-SP datasets. The code is available at https://github.com/BorgwardtLab/NeuralWalker.

Updated: 2024-10-07 14:01:11

标题: 通过随机游走学习图上的长程依赖

摘要: 消息传递图神经网络（GNNs）擅长捕捉局部关系，但在图中存在长程依赖时表现不佳。相比之下，图变换器（GTs）可以实现全局信息交换，但通常通过将图表示为固定长度向量集来过于简化图结构。本文介绍了一种新颖的架构，通过将随机游走的长程信息与局部消息传递相结合，克服了这两种方法的缺点。通过将随机游走视为序列，我们的架构利用了序列模型的最新进展，有效捕捉了这些游走中的长程依赖关系。基于这一概念，我们提出了一个框架，通过随机游走序列提供更具表现力的图表示，能够利用任何序列模型来捕获长程依赖关系，并通过集成各种GNN和GT架构来实现灵活性。我们的实验评估表明，我们的方法在19个图和节点基准数据集上取得了显著的性能改进，特别是在PascalVoc-SP和COCO-SP数据集上，超过现有方法高达13％。代码可在https://github.com/BorgwardtLab/NeuralWalker 获取。

更新时间: 2024-10-07 14:01:11

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.03386v2

Named Clinical Entity Recognition Benchmark

This technical report introduces a Named Clinical Entity Recognition Benchmark for evaluating language models in healthcare, addressing the crucial natural language processing (NLP) task of extracting structured information from clinical narratives to support applications like automated coding, clinical trial cohort identification, and clinical decision support. The leaderboard provides a standardized platform for assessing diverse language models, including encoder and decoder architectures, on their ability to identify and classify clinical entities across multiple medical domains. A curated collection of openly available clinical datasets is utilized, encompassing entities such as diseases, symptoms, medications, procedures, and laboratory measurements. Importantly, these entities are standardized according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model, ensuring consistency and interoperability across different healthcare systems and datasets, and a comprehensive evaluation of model performance. Performance of models is primarily assessed using the F1-score, and it is complemented by various assessment modes to provide comprehensive insights into model performance. The report also includes a brief analysis of models evaluated to date, highlighting observed trends and limitations. By establishing this benchmarking framework, the leaderboard aims to promote transparency, facilitate comparative analyses, and drive innovation in clinical entity recognition tasks, addressing the need for robust evaluation methods in healthcare NLP.

Updated: 2024-10-07 14:00:18

标题: 指定的临床实体识别基准

摘要: 这份技术报告介绍了一个名为临床实体识别基准的工具，用于评估医疗领域中的语言模型，在临床叙述中提取结构化信息以支持自动编码、临床试验队列识别和临床决策支持等应用的关键自然语言处理（NLP）任务。排行榜提供了一个标准化平台，用于评估各种语言模型，包括编码器和解码器架构，评估它们在跨多个医学领域中识别和分类临床实体的能力。使用了一个精心策划的开放可用的临床数据集合，包括疾病、症状、药物、程序和实验室测量等实体。重要的是，这些实体根据观察性医学结果合作伙伴关系（OMOP）通用数据模型进行了标准化，确保了在不同医疗系统和数据集之间的一致性和互操作性，以及对模型性能的全面评估。模型的性能主要通过F1分数进行评估，并通过各种评估模式来提供对模型性能的全面洞察。报告还包括对迄今为止评估的模型的简要分析，突出观察到的趋势和局限性。通过建立这个基准框架，排行榜旨在促进透明度，促进比较分析，并推动临床实体识别任务中的创新，解决医疗NLP中强有力评估方法的需求。

更新时间: 2024-10-07 14:00:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.05046v1

Can LLMs plan paths with extra hints from solvers?

Large Language Models (LLMs) have shown remarkable capabilities in natural language processing, mathematical problem solving, and tasks related to program synthesis. However, their effectiveness in long-term planning and higher-order reasoning has been noted to be limited and fragile. This paper explores an approach for enhancing LLM performance in solving a classical robotic planning task by integrating solver-generated feedback. We explore four different strategies for providing feedback, including visual feedback, we utilize fine-tuning, and we evaluate the performance of three different LLMs across a 10 standard and 100 more randomly generated planning problems. Our results suggest that the solver-generated feedback improves the LLM's ability to solve the moderately difficult problems, but the harder problems still remain out of reach. The study provides detailed analysis of the effects of the different hinting strategies and the different planning tendencies of the evaluated LLMs.

Updated: 2024-10-07 14:00:08

标题: LLMs能否通过解题者的额外提示规划路径？

摘要: 大型语言模型（LLMs）在自然语言处理、数学问题解决以及与程序合成相关的任务中展现出了显著的能力。然而，它们在长期规划和高阶推理方面的有效性被认为是有限和脆弱的。本文探讨了一种通过整合解决方案生成的反馈来增强LLM在解决经典机器人规划任务中的性能的方法。我们探讨了四种不同的提供反馈的策略，包括视觉反馈，我们利用微调，评估了三种不同LLMs在10个标准和100个更随机生成的规划问题上的性能。我们的结果表明，解决方案生成的反馈提高了LLM解决中等难度问题的能力，但更困难的问题仍然无法解决。该研究提供了对不同提示策略的影响以及评估的LLMs的不同规划倾向的详细分析。

更新时间: 2024-10-07 14:00:08

领域: cs.AI,cs.CL,cs.RO

下载: http://arxiv.org/abs/2410.05045v1

Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender Portfolio

Securities lending is an important part of the financial market structure, where agent lenders help long term institutional investors to lend out their securities to short sellers in exchange for a lending fee. Agent lenders within the market seek to optimize revenue by lending out securities at the highest rate possible. Typically, this rate is set by hard-coded business rules or standard supervised machine learning models. These approaches are often difficult to scale and are not adaptive to changing market conditions. Unlike a traditional stock exchange with a centralized limit order book, the securities lending market is organized similarly to an e-commerce marketplace, where agent lenders and borrowers can transact at any agreed price in a bilateral fashion. This similarity suggests that the use of typical methods for addressing dynamic pricing problems in e-commerce could be effective in the securities lending market. We show that existing contextual bandit frameworks can be successfully utilized in the securities lending market. Using offline evaluation on real historical data, we show that the contextual bandit approach can consistently outperform typical approaches by at least 15% in terms of total revenue generated.

Updated: 2024-10-07 13:59:42

标题: 证券借贷市场的动态定价：在代理出借人组合的收入优化中的应用

摘要: 证券借贷是金融市场结构的重要组成部分，代理借出人帮助长期机构投资者将其证券借给做空者，以换取借贷费。市场中的代理借出人致力于通过以可能的最高利率借出证券来优化收入。通常，这一利率由硬编码的业务规则或标准监督的机器学习模型确定。这些方法经常难以扩展，并且不适应不断变化的市场条件。与具有集中限价订单簿的传统股票交易所不同，证券借贷市场的组织方式类似于电子商务市场，代理借出人和借方可以双边协商任何同意的价格进行交易。这种相似性表明，在证券借贷市场中使用典型的动态定价问题解决方法可能是有效的。我们展示了现有的情境强盗框架可以成功地应用于证券借贷市场。通过对真实历史数据的离线评估，我们展示了情境强盗方法在总收入产生方面至少能够比典型方法提高15%以上的表现。

更新时间: 2024-10-07 13:59:42

领域: q-fin.TR,cs.LG

下载: http://arxiv.org/abs/2407.13687v4

PhotoReg: Photometrically Registering 3D Gaussian Splatting Models

Building accurate representations of the environment is critical for intelligent robots to make decisions during deployment. Advances in photorealistic environment models have enabled robots to develop hyper-realistic reconstructions, which can be used to generate images that are intuitive for human inspection. In particular, the recently introduced \ac{3DGS}, which describes the scene with up to millions of primitive ellipsoids, can be rendered in real time. \ac{3DGS} has rapidly gained prominence. However, a critical unsolved problem persists: how can we fuse multiple \ac{3DGS} into a single coherent model? Solving this problem will enable robot teams to jointly build \ac{3DGS} models of their surroundings. A key insight of this work is to leverage the {duality} between photorealistic reconstructions, which render realistic 2D images from 3D structure, and \emph{3D foundation models}, which predict 3D structure from image pairs. To this end, we develop PhotoReg, a framework to register multiple photorealistic \ac{3DGS} models with 3D foundation models. As \ac{3DGS} models are generally built from monocular camera images, they have \emph{arbitrary scale}. To resolve this, PhotoReg actively enforces scale consistency among the different \ac{3DGS} models by considering depth estimates within these models. Then, the alignment is iteratively refined with fine-grained photometric losses to produce high-quality fused \ac{3DGS} models. We rigorously evaluate PhotoReg on both standard benchmark datasets and our custom-collected datasets, including with two quadruped robots. The code is released at \url{ziweny11.github.io/photoreg}.

Updated: 2024-10-07 13:58:40

标题: PhotoReg：光度注册3D高斯喷洒模型

摘要: 建立准确的环境表示对于智能机器人在部署过程中做出决策至关重要。逼真环境模型的进展使机器人能够开发超逼真的重建，这些重建可用于生成对人类检查直觉的图像。特别是最近引入的3DGS，它用数百万个基本椭球描述场景，可以实时渲染。3DGS迅速崭露头角。然而，一个关键的未解决问题仍然存在：我们如何将多个3DGS融合成一个统一的模型？解决这个问题将使机器人团队共同构建其周围环境的3DGS模型成为可能。本研究的一个关键见解是利用逼真重建和3D基础模型之间的对偶关系，逼真重建从3D结构渲染出逼真的2D图像，而3D基础模型从图像对中预测3D结构。为此，我们开发了PhotoReg，一个框架，用于将多个逼真的3DGS模型与3D基础模型进行配准。由于3DGS模型通常是从单眼相机图像构建的，它们具有任意的比例。为了解决这个问题，PhotoReg通过考虑这些模型内的深度估计来主动强化不同3DGS模型之间的比例一致性。然后，通过精细的光度损失迭代地对齐，以产生高质量的融合3DGS模型。我们严格评估了PhotoReg在标准基准数据集和我们自定义收集的数据集上的性能，包括两个四足机器人。代码发布在\url{ziweny11.github.io/photoreg}。

更新时间: 2024-10-07 13:58:40

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.05044v1

Systematic Literature Review of Vision-Based Approaches to Outdoor Livestock Monitoring with Lessons from Wildlife Studies

Precision livestock farming (PLF) aims to improve the health and welfare of livestock animals and farming outcomes through the use of advanced technologies. Computer vision, combined with recent advances in machine learning and deep learning artificial intelligence approaches, offers a possible solution to the PLF ideal of 24/7 livestock monitoring that helps facilitate early detection of animal health and welfare issues. However, a significant number of livestock species are raised in large outdoor habitats that pose technological challenges for computer vision approaches. This review provides a comprehensive overview of computer vision methods and open challenges in outdoor animal monitoring. We include research from both the livestock and wildlife fields in the review because of the similarities in appearance, behaviour, and habitat for many livestock and wildlife. We focus on large terrestrial mammals, such as cattle, horses, deer, goats, sheep, koalas, giraffes, and elephants. We use an image processing pipeline to frame our discussion and highlight the current capabilities and open technical challenges at each stage of the pipeline. The review found a clear trend towards the use of deep learning approaches for animal detection, counting, and multi-species classification. We discuss in detail the applicability of current vision-based methods to PLF contexts and promising directions for future research.

Updated: 2024-10-07 13:53:17

标题: 基于视觉的室外畜牧监测方法的系统文献综述及野生动物研究经验分享

摘要: 精准畜牧养殖（PLF）旨在通过使用先进技术改善畜禽动物的健康和福利，以及农业产出。计算机视觉结合最新的机器学习和深度学习人工智能方法，为PLF理想的24/7畜禽监测提供了可能的解决方案，有助于促进动物健康和福利问题的早期检测。然而，大量畜牧物种生长在大型户外栖息地，这对计算机视觉方法提出了技术挑战。本综述提供了户外动物监测中计算机视觉方法和开放挑战的全面概述。由于许多畜禽和野生动物在外观、行为和栖息地方面的相似之处，我们在综述中包括了来自畜禽和野生动物领域的研究。我们专注于大型陆生哺乳动物，如牛、马、鹿、山羊、绵羊、考拉、长颈鹿和大象。我们使用图像处理管道来构建我们的讨论，并突出了管道各阶段的当前能力和开放技术挑战。综述发现，对于动物检测、计数和多物种分类，存在明显向深度学习方法的趋势。我们详细讨论了当前基于视觉的方法在PLF环境中的适用性以及未来研究的有前途的方向。

更新时间: 2024-10-07 13:53:17

领域: cs.CV,cs.LG,I.2.10; I.2.6; J.7

下载: http://arxiv.org/abs/2410.05041v1

MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks

Metal-organic frameworks (MOFs) are a class of crystalline materials with promising applications in many areas such as carbon capture and drug delivery. In this work, we introduce MOFFlow, the first deep generative model tailored for MOF structure prediction. Existing approaches, including ab initio calculations and even deep generative models, struggle with the complexity of MOF structures due to the large number of atoms in the unit cells. To address this limitation, we propose a novel Riemannian flow matching framework that reduces the dimensionality of the problem by treating the metal nodes and organic linkers as rigid bodies, capitalizing on the inherent modularity of MOFs. By operating in the $SE(3)$ space, MOFFlow effectively captures the roto-translational dynamics of these rigid components in a scalable way. Our experiment demonstrates that MOFFlow accurately predicts MOF structures containing several hundred atoms, significantly outperforming conventional methods and state-of-the-art machine learning baselines while being much faster.

Updated: 2024-10-07 13:51:58

标题: MOFFlow：金属-有机骨架结构预测的流匹配

摘要: 金属有机框架（MOFs）是一类具有许多应用前景的结晶材料，如碳捕集和药物输送。在这项工作中，我们介绍了MOFFlow，这是第一个专为MOF结构预测定制的深度生成模型。现有的方法，包括从头计算甚至深度生成模型，都难以处理由于单元格中大量原子而导致的MOF结构的复杂性。为了解决这一限制，我们提出了一种新颖的黎曼流匹配框架，通过将金属节点和有机连接器视为刚性体来减少问题的维度，利用MOFs的固有模块化。通过在$SE(3)$空间中操作，MOFFlow有效地捕捉了这些刚性组件的旋转平移动力学方式。我们的实验表明，MOFFlow能够准确预测含有数百个原子的MOF结构，明显优于传统方法和最先进的机器学习基线，同时速度更快。

更新时间: 2024-10-07 13:51:58

领域: q-bio.BM,cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2410.17270v1

Active Fine-Tuning of Generalist Policies

Pre-trained generalist policies are rapidly gaining relevance in robot learning due to their promise of fast adaptation to novel, in-domain tasks. This adaptation often relies on collecting new demonstrations for a specific task of interest and applying imitation learning algorithms, such as behavioral cloning. However, as soon as several tasks need to be learned, we must decide which tasks should be demonstrated and how often? We study this multi-task problem and explore an interactive framework in which the agent adaptively selects the tasks to be demonstrated. We propose AMF (Active Multi-task Fine-tuning), an algorithm to maximize multi-task policy performance under a limited demonstration budget by collecting demonstrations yielding the largest information gain on the expert policy. We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness to efficiently fine-tune neural policies in complex and high-dimensional environments.

Updated: 2024-10-07 13:26:36

标题: 通用策略的主动微调

摘要: 预训练的通用政策由于其快速适应新领域任务的承诺，正在机器人学习中迅速变得重要。这种适应性通常依赖于收集特定任务的新演示，并应用模仿学习算法，如行为克隆。然而，一旦需要学习几个任务，我们必须决定应该演示哪些任务以及多频繁？我们研究这个多任务问题，并在其中探索一个交互式框架，代理人自适应地选择要演示的任务。我们提出了AMF（主动多任务微调）算法，通过收集产生对专家政策信息增益最大的演示，来最大化有限演示预算下的多任务政策性能。我们在正则性假设下推导出AMF的性能保证，并展示了它在复杂和高维环境中有效地微调神经政策的经验有效性。

更新时间: 2024-10-07 13:26:36

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.05026v1

I Bet You Did Not Mean That: Testing Semantic Importance via Betting

Recent works have extended notions of feature importance to semantic concepts that are inherently interpretable to the users interacting with a black-box predictive model. Yet, precise statistical guarantees, such as false positive rate and false discovery rate control, are needed to communicate findings transparently and to avoid unintended consequences in real-world scenarios. In this paper, we formalize the global (i.e., over a population) and local (i.e., for a sample) statistical importance of semantic concepts for the predictions of opaque models by means of conditional independence, which allows for rigorous testing. We use recent ideas of sequential kernelized independence testing (SKIT) to induce a rank of importance across concepts, and showcase the effectiveness and flexibility of our framework on synthetic datasets as well as on image classification tasks using several and diverse vision-language models.

Updated: 2024-10-07 13:21:13

标题: 我打赌你不是那个意思：通过下注测试语义重要性

摘要: 最近的研究将特征重要性的概念扩展到语义概念，这些概念对与黑盒预测模型交互的用户来说是可以解释的。然而，精确的统计保证，如假阳性率和假发现率的控制，是必要的，以便透明地传达发现结果，并避免在现实场景中出现意外后果。在本文中，我们通过条件独立性形式化了语义概念对不透明模型预测的全局（即对于一个人口）和局部（即对于一个样本）的统计重要性，这允许进行严格的测试。我们使用最近的序贯核独立性检验（SKIT）的思想来引导概念之间的重要性排序，并展示了我们的框架在合成数据集以及使用多种不同的视觉语言模型进行图像分类任务上的有效性和灵活性。

更新时间: 2024-10-07 13:21:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19146v2

FRIDA: Free-Rider Detection using Privacy Attacks

Federated learning is increasingly popular as it enables multiple parties with limited datasets and resources to train a high-performing machine learning model collaboratively. However, similarly to other collaborative systems, federated learning is vulnerable to free-riders -- participants who do not contribute to the training but still benefit from the shared model. Free-riders not only compromise the integrity of the learning process but also slow down the convergence of the global model, resulting in increased costs for the honest participants. To address this challenge, we propose FRIDA: free-rider detection using privacy attacks, a framework that leverages inference attacks to detect free-riders. Unlike traditional methods that only capture the implicit effects of free-riding, FRIDA directly infers details of the underlying training datasets, revealing characteristics that indicate free-rider behaviour. Through extensive experiments, we demonstrate that membership and property inference attacks are effective for this purpose. Our evaluation shows that FRIDA outperforms state-of-the-art methods, especially in non-IID settings.

Updated: 2024-10-07 13:20:26

标题: FRIDA：使用隐私攻击进行免费骑手检测

摘要: 联邦学习因允许具有有限数据集和资源的多方协作训练高性能机器学习模型而越来越受欢迎。然而，类似其他协作系统，联邦学习容易受到搭便车者的影响，即那些不参与训练但仍从共享模型中获益的参与者。搭便车者不仅危害学习过程的完整性，还减缓全局模型的收敛速度，导致诚实参与者的成本增加。为了解决这一挑战，我们提出了FRIDA：使用隐私攻击检测搭便车者的框架，该框架利用推断攻击来检测搭便车者。与仅捕获搭便车隐含效果的传统方法不同，FRIDA直接推断潜在训练数据集的细节，揭示表明搭便车行为的特征。通过大量实验，我们证明成员资格和属性推断攻击对此目的是有效的。我们的评估显示，FRIDA在非独立同分布设置中表现优异，超过了最先进的方法。

更新时间: 2024-10-07 13:20:26

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.05020v1

When Can Transformers Count to n?

Large language models based on the transformer architectures can solve highly complex tasks. But are there simple tasks that such models cannot solve? Here we focus on very simple counting tasks, that involve counting how many times a token in the vocabulary have appeared in a string. We show that if the dimension of the transformer state is linear in the context length, this task can be solved. However, the solution we propose does not scale beyond this limit, and we provide theoretical arguments for why it is likely impossible for a size limited transformer to implement this task. Our empirical results demonstrate the same phase-transition in performance, as anticipated by the theoretical argument. Our results demonstrate the importance of understanding how transformers can solve simple tasks.

Updated: 2024-10-07 13:19:53

标题: 变压器何时能计数到n？

摘要: 基于Transformer架构的大型语言模型可以解决非常复杂的任务。但是，是否存在一些简单的任务这样的模型无法解决呢？在这里，我们专注于非常简单的计数任务，涉及计算词汇表中的一个标记在字符串中出现了多少次。我们展示了如果Transformer状态的维度与上下文长度成线性关系，这个任务是可以解决的。然而，我们提出的解决方案在超出这个限制后无法扩展，并且我们提供了理论论据说明对于一个尺寸有限的Transformer来说实现这个任务很可能是不可能的。我们的实证结果展示了性能的相同相位转变，与理论论证预期一致。我们的结果展示了了解Transformer如何解决简单任务的重要性。

更新时间: 2024-10-07 13:19:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.15160v2

RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

Neural multi-channel speech enhancement models, in particular those based on the U-Net architecture, demonstrate promising performance and generalization potential. These models typically encode input channels independently, and integrate the channels during later stages of the network. In this paper, we propose a novel modification of these models by incorporating relative information from the outset, where each channel is processed in conjunction with a reference channel through stacking. This input strategy exploits comparative differences to adaptively fuse information between channels, thereby capturing crucial spatial information and enhancing the overall performance. The experiments conducted on the CHiME-3 dataset demonstrate improvements in speech enhancement metrics across various architectures.

Updated: 2024-10-07 13:19:10

标题: RelUNet：用于多通道语音增强的相对通道融合U-Net

摘要: 神经多通道语音增强模型，特别是基于U-Net架构的模型，展现出有希望的性能和泛化潜力。这些模型通常独立编码输入通道，并在网络的后期阶段集成这些通道。在本文中，我们提出了对这些模型的一种新颖修改，通过在一开始就结合参考通道处理每个通道来引入相对信息。这种输入策略利用比较差异来自适应地融合通道之间的信息，从而捕获关键的空间信息并提高整体性能。在CHiME-3数据集上进行的实验表明，在各种架构中增强了语音增强指标。

更新时间: 2024-10-07 13:19:10

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.05019v1

Tokenization Is More Than Compression

Tokenization is a foundational step in natural language processing (NLP) tasks, bridging raw text and language models. Existing tokenization approaches like Byte-Pair Encoding (BPE) originate from the field of data compression, and it has been suggested that the effectiveness of BPE stems from its ability to condense text into a relatively small number of tokens. We test the hypothesis that fewer tokens lead to better downstream performance by introducing PathPiece, a new tokenizer that segments a document's text into the minimum number of tokens for a given vocabulary. Through extensive experimentation we find this hypothesis not to be the case, casting doubt on the understanding of the reasons for effective tokenization. To examine which other factors play a role, we evaluate design decisions across all three phases of tokenization: pre-tokenization, vocabulary construction, and segmentation, offering new insights into the design of effective tokenizers. Specifically, we illustrate the importance of pre-tokenization and the benefits of using BPE to initialize vocabulary construction. We train 64 language models with varying tokenization, ranging in size from 350M to 2.4B parameters, all of which are made publicly available.

Updated: 2024-10-07 13:17:03

标题: Tokenization不只是压缩

摘要: 标记化是自然语言处理（NLP）任务中的基础步骤，它连接了原始文本和语言模型。现有的标记化方法如字节对编码（BPE）源自数据压缩领域，并且有人认为BPE的有效性源于其将文本压缩为相对较少的标记的能力。我们通过引入PathPiece，一个将文档文本分段为给定词汇表的最少标记数的新标记器，来测试这一假设。通过广泛的实验，我们发现这一假设并不成立，对有效标记化的原因的理解产生了疑问。为了检查哪些其他因素起作用，我们评估了标记化的三个阶段中的设计决策：预标记化、词汇构建和分割，为有效标记器的设计提供了新的见解。具体来说，我们阐明了预标记化的重要性以及使用BPE初始化词汇构建的好处。我们训练了64个语言模型，这些模型的标记化方式各不相同，参数大小在350M到2.4B之间，所有这些模型都已公开。

更新时间: 2024-10-07 13:17:03

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2402.18376v2

T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data

Self-supervision is often used for pre-training to foster performance on a downstream task by constructing meaningful representations of samples. Self-supervised learning (SSL) generally involves generating different views of the same sample and thus requires data augmentations that are challenging to construct for tabular data. This constitutes one of the main challenges of self-supervision for structured data. In the present work, we propose a novel augmentation-free SSL method for tabular data. Our approach, T-JEPA, relies on a Joint Embedding Predictive Architecture (JEPA) and is akin to mask reconstruction in the latent space. It involves predicting the latent representation of one subset of features from the latent representation of a different subset within the same sample, thereby learning rich representations without augmentations. We use our method as a pre-training technique and train several deep classifiers on the obtained representation. Our experimental results demonstrate a substantial improvement in both classification and regression tasks, outperforming models trained directly on samples in their original data space. Moreover, T-JEPA enables some methods to consistently outperform or match the performance of traditional methods likes Gradient Boosted Decision Trees. To understand why, we extensively characterize the obtained representations and show that T-JEPA effectively identifies relevant features for downstream tasks without access to the labels. Additionally, we introduce regularization tokens, a novel regularization method critical for training of JEPA-based models on structured data.

Updated: 2024-10-07 13:15:07

标题: T-JEPA：无增强的自监督学习方法用于表格数据

摘要: 自我监督通常用于预训练，通过构建样本的有意义表示来促进下游任务的性能。自监督学习（SSL）通常涉及生成同一样本的不同视图，因此需要对表格数据进行挑战性的数据增强。这构成了自我监督在结构化数据中的主要挑战之一。在本研究中，我们提出了一种新颖的无增强自监督学习方法，适用于表格数据。我们的方法T-JEPA依赖于联合嵌入预测架构（JEPA），类似于潜在空间中的掩码重建。它涉及从同一样本中不同子集的潜在表示中预测另一个子集的潜在表示，从而学习丰富的表示而不需要增强。我们将我们的方法作为预训练技术，并在获得的表示上训练几个深度分类器。我们的实验结果表明，在分类和回归任务中，我们的方法实现了显著的改进，优于直接在原始数据空间中训练的模型。此外，T-JEPA使一些方法能够始终优于或与传统方法如梯度提升决策树的性能匹敌。为了理解原因，我们对获得的表示进行了广泛的表征，并展示T-JEPA能够有效地识别与下游任务相关的特征，而无需访问标签。另外，我们引入了正则化令牌，这是一种对于JEPA模型在结构化数据上进行训练至关重要的新型正则化方法。

更新时间: 2024-10-07 13:15:07

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.05016v1

softmax is not enough (for sharp out-of-distribution)

A key property of reasoning systems is the ability to make sharp decisions on their input data. For contemporary AI systems, a key carrier of sharp behaviour is the softmax function, with its capability to perform differentiable query-key lookups. It is a common belief that the predictive power of networks leveraging softmax arises from "circuits" which sharply perform certain kinds of computations consistently across many diverse inputs. However, for these circuits to be robust, they would need to generalise well to arbitrary valid inputs. In this paper, we dispel this myth: even for tasks as simple as finding the maximum key, any learned circuitry must disperse as the number of items grows at test time. We attribute this to a fundamental limitation of the softmax function to robustly approximate sharp functions, prove this phenomenon theoretically, and propose adaptive temperature as an ad-hoc technique for improving the sharpness of softmax at inference time.

Updated: 2024-10-07 13:13:41

标题: Softmax并不足够（对于尖锐的超出分布）

摘要: 推理系统的一个关键特性是能够对其输入数据做出明确的决定。对于当代人工智能系统来说，锋利行为的一个关键载体是softmax函数，它具有执行可微分查询-键查找的能力。人们普遍认为，利用softmax的网络的预测能力来自于“电路”，这些电路在许多不同的输入上始终能够明确执行某些类型的计算。然而，为了使这些电路具有鲁棒性，它们需要能够很好地泛化到任意有效的输入。在本文中，我们揭穿了这个谬论：即使对于像查找最大键这样简单的任务，任何学习到的电路在测试时随着项目数量的增加必须分散。我们将这归因于softmax函数在鲁棒地逼近锐利函数方面存在基本限制，从理论上证明了这一现象，并提出了自适应温度作为一种临时技术，用于在推断时改善softmax的锐度。

更新时间: 2024-10-07 13:13:41

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2410.01104v2

On Bits and Bandits: Quantifying the Regret-Information Trade-off

In many sequential decision problems, an agent performs a repeated task. He then suffers regret and obtains information that he may use in the following rounds. However, sometimes the agent may also obtain information and avoid suffering regret by querying external sources. We study the trade-off between the information an agent accumulates and the regret it suffers. We invoke information-theoretic methods for obtaining regret lower bounds, that also allow us to easily re-derive several known lower bounds. We introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates. We also prove regret upper bounds using the amount of information the agent accumulates. These bounds show that information measured in bits, can be traded off for regret, measured in reward. Finally, we demonstrate the utility of these bounds in improving the performance of a question-answering task with large language models, allowing us to obtain valuable insights.

Updated: 2024-10-07 13:12:20

标题: 关于位和强盗：量化后悔信息权衡

摘要: 在许多顺序决策问题中，一个代理执行重复的任务。然后他遭受后悔，并获得信息，他可以在接下来的回合中使用。然而，有时代理还可以通过查询外部来源获取信息并避免遭受后悔。我们研究了代理积累信息和遭受后悔之间的权衡。我们采用信息论方法来获得后悔的下限，这也让我们能够轻松重新推导出几个已知的下限。我们引入了第一个依赖于代理积累信息的贝叶斯后悔下限。我们还使用代理积累的信息来证明后悔的上限。这些界限显示，信息以比特为单位，可以用来换取以奖励为单位的后悔。最后，我们展示了这些界限在改善具有大型语言模型的问答任务性能方面的实用性，使我们能够获得有价值的见解。

更新时间: 2024-10-07 13:12:20

领域: cs.LG

下载: http://arxiv.org/abs/2405.16581v3

QML-IDS: Quantum Machine Learning Intrusion Detection System

The emergence of quantum computing and related technologies presents opportunities for enhancing network security. The transition towards quantum computational power paves the way for creating strategies to mitigate the constantly advancing threats to network integrity. In response to this technological advancement, our research presents QML-IDS, a novel Intrusion Detection System~(IDS) that combines quantum and classical computing techniques. QML-IDS employs Quantum Machine Learning~(QML) methodologies to analyze network patterns and detect attack activities. Through extensive experimental tests on publicly available datasets, we show that QML-IDS is effective at attack detection and performs well in binary and multiclass classification tasks. Our findings reveal that QML-IDS outperforms classical Machine Learning methods, demonstrating the promise of quantum-enhanced cybersecurity solutions for the age of quantum utility.

Updated: 2024-10-07 13:07:41

标题: QML-IDS：量子机器学习入侵检测系统

摘要: 量子计算及相关技术的出现为增强网络安全提供了机遇。向量子计算能力的过渡为创建应对不断发展的网络完整性威胁的策略铺平了道路。针对这一技术进步，我们的研究提出了QML-IDS，一种结合量子和经典计算技术的新型入侵检测系统（IDS）。QML-IDS采用量子机器学习（QML）方法来分析网络模式并检测攻击活动。通过对公开可用数据集进行广泛的实验测试，我们展示了QML-IDS在攻击检测方面的有效性，并在二进制和多类别分类任务中表现良好。我们的研究结果显示，QML-IDS优于经典机器学习方法，展示了量子增强网络安全解决方案在量子效用时代的潜力。

更新时间: 2024-10-07 13:07:41

领域: cs.CR

下载: http://arxiv.org/abs/2410.16308v1

How to Exhibit More Predictable Behaviors

This paper looks at predictability problems, i.e., wherein an agent must choose its strategy in order to optimize the predictions that an external observer could make. We address these problems while taking into account uncertainties on the environment dynamics and on the observed agent's policy. To that end, we assume that the observer 1. seeks to predict the agent's future action or state at each time step, and 2. models the agent using a stochastic policy computed from a known underlying problem, and we leverage on the framework of observer-aware Markov decision processes (OAMDPs). We propose action and state predictability performance criteria through reward functions built on the observer's belief about the agent policy; show that these induced predictable OAMDPs can be represented by goal-oriented or discounted MDPs; and analyze the properties of the proposed reward functions both theoretically and empirically on two types of grid-world problems.

Updated: 2024-10-07 13:06:01

标题: 如何展示更可预测的行为

摘要: 本文研究了可预测性问题，即代理必须选择其策略以优化外部观察者可能做出的预测。我们在考虑环境动态和观察到的代理策略的不确定性的情况下解决这些问题。为此，我们假设观察者1. 寻求在每个时间步预测代理的未来动作或状态，2. 使用从已知基础问题计算的随机策略来模拟代理，并利用观察者感知的马尔可夫决策过程（OAMDP）框架。我们通过基于观察者对代理策略的信念构建的奖励函数提出了动作和状态可预测性绩效标准；表明这些引导的可预测性OAMDP可以通过目标导向或折扣MDP来表示；并在两种类型的网格世界问题上从理论和实证的角度分析了所提出的奖励函数的属性。

更新时间: 2024-10-07 13:06:01

领域: cs.AI

下载: http://arxiv.org/abs/2404.11296v2

Towards the generation of hierarchical attack models from cybersecurity vulnerabilities using language models

This paper investigates the use of a pre-trained language model and siamese network to discern sibling relationships between text-based cybersecurity vulnerability data. The ultimate purpose of the approach presented in this paper is towards the construction of hierarchical attack models based on a set of text descriptions characterising potential/observed vulnerabilities in a given system. Due to the nature of the data, and the uncertainty sensitive environment in which the problem is presented, a practically oriented soft computing approach is necessary. Therefore, a key focus of this work is to investigate practical questions surrounding the reliability of predicted links towards the construction of such models, to which end conceptual and practical challenges and solutions associated with the proposed approach are outlined, such as dataset complexity and stability of predictions. Accordingly, the contributions of this paper focus on producing neural networks using a pre-trained language model for predicting sibling relationships between cybersecurity vulnerabilities, then outlining how to apply this capability towards the generation of hierarchical attack models. In addition, two data sampling mechanisms for tackling data complexity, and a consensus mechanism for reducing the amount of false positive predictions are outlined. Each of these approaches is compared and contrasted using empirical results from three sets of cybersecurity data to determine their effectiveness.

Updated: 2024-10-07 13:05:33

标题: 朝向利用语言模型从网络安全漏洞生成层次攻击模型的方向

摘要: 本文研究了使用预训练语言模型和连体网络来区分基于文本的网络安全漏洞数据之间的兄弟关系。本文提出的方法的最终目的是基于描述给定系统中潜在/观察到的漏洞的一组文本构建分层攻击模型。由于数据的性质以及问题所处的不确定性敏感环境，需要一个实际取向的软计算方法。因此，本工作的一个重点是研究围绕预测链接的可靠性向构建这种模型的实际问题，为此，概念和实际挑战以及与所提出的方法相关的解决方案被概述，如数据集复杂性和预测的稳定性。因此，本文的贡献重点在于使用预训练语言模型产生神经网络，以预测网络安全漏洞之间的兄弟关系，然后概述如何将这种能力应用于生成分层攻击模型。此外，还概述了两种用于处理数据复杂性的数据抽样机制，以及用于减少假阳性预测数量的共识机制。通过对来自三组网络安全数据的实证结果进行比较和对比，确定这些方法的有效性。

更新时间: 2024-10-07 13:05:33

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05351v1

FairFML: Fair Federated Machine Learning with a Case Study on Reducing Gender Disparities in Cardiac Arrest Outcome Prediction

Objective: Mitigating algorithmic disparities is a critical challenge in healthcare research, where ensuring equity and fairness is paramount. While large-scale healthcare data exist across multiple institutions, cross-institutional collaborations often face privacy constraints, highlighting the need for privacy-preserving solutions that also promote fairness. Materials and Methods: In this study, we present Fair Federated Machine Learning (FairFML), a model-agnostic solution designed to reduce algorithmic bias in cross-institutional healthcare collaborations while preserving patient privacy. As a proof of concept, we validated FairFML using a real-world clinical case study focused on reducing gender disparities in cardiac arrest outcome prediction. Results: We demonstrate that the proposed FairFML framework enhances fairness in federated learning (FL) models without compromising predictive performance. Our findings show that FairFML improves model fairness by up to 65% compared to the centralized model, while maintaining performance comparable to both local and centralized models, as measured by receiver operating characteristic analysis. Discussion and Conclusion: FairFML offers a promising and flexible solution for FL collaborations, with its adaptability allowing seamless integration with various FL frameworks and models, from traditional statistical methods to deep learning techniques. This makes FairFML a robust approach for developing fairer FL models across diverse clinical and biomedical applications.

Updated: 2024-10-07 13:02:04

标题: FairFML：公平的联合机器学习，并以减少性别差异在心脏骤停结果预测中的案例研究

摘要: 目标：在医疗保健研究中，缓解算法差异是一个关键挑战，确保公平和公正至关重要。虽然跨多个机构存在大规模的医疗数据，但跨机构合作往往面临隐私限制，突显了需要既促进公平性又保护隐私的解决方案的需求。材料和方法：在本研究中，我们提出了公平联邦机器学习（FairFML），这是一个无模型偏见的解决方案，旨在减少跨机构医疗合作中的算法偏差，同时保护患者隐私。作为概念验证，我们使用一个关于减少性别差异的临床案例研究，重点是心脏骤停结果预测，验证了FairFML。结果：我们展示了所提出的FairFML框架提高了联邦学习（FL）模型的公平性，而不影响预测性能。我们的研究结果表明，与集中模型相比，FairFML提高了模型公平性达65％，同时保持了与本地模型和集中模型相媲美的性能，通过接收器操作特性分析进行测量。讨论和结论：FairFML为FL合作提供了一种具有前景和灵活性的解决方案，其适应性使其能够与各种FL框架和模型（从传统统计方法到深度学习技术）无缝集成。这使得FairFML成为开发更公平的FL模型在各种临床和生物医学应用中的坚固方法。

更新时间: 2024-10-07 13:02:04

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17269v1

Assumption-Lean Post-Integrated Inference with Negative Control Outcomes

Data integration has become increasingly common in aligning multiple heterogeneous datasets. With high-dimensional outcomes, data integration methods aim to extract low-dimensional embeddings of observations to remove unwanted variations, such as batch effects and unmeasured covariates, inherent in data collected from different sources. However, multiple hypothesis testing after data integration can be substantially biased due to the data-dependent integration processes. To address this challenge, we introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes. By leveraging causal interpretations, we derive nonparametric identification conditions that form the basis of our PII approach. Our assumption-lean semiparametric inference method extends robustness and generality to projected direct effect estimands that account for mediators, confounders, and moderators. These estimands remain statistically meaningful under model misspecifications and with error-prone embeddings. We provide deterministic quantifications of the bias of target estimands induced by estimated embeddings and finite-sample linear expansions of the estimators with uniform concentration bounds on the residuals for all outcomes. The proposed doubly robust estimators are consistent and efficient under minimal assumptions, facilitating data-adaptive estimation with machine learning algorithms. Using random forests, we evaluate empirical statistical errors in simulations and analyze single-cell CRISPR perturbed datasets with potential unmeasured confounders.

Updated: 2024-10-07 12:52:38

标题: 假设-精简后整合推断，带有负对照结果

摘要: 数据整合在整合多个异构数据集方面变得越来越普遍。在高维度结果的情况下，数据整合方法旨在提取观察结果的低维嵌入，以消除来自不同来源收集的数据中固有的不想要的变化，如批次效应和未测量的协变量。然而，在数据整合之后进行多重假设检验可能存在实质性偏差，因为这是由数据相关的整合过程导致的。为了解决这一挑战，我们引入了一种健壮的后整合推断（PII）方法，通过利用负对照结果来调整潜在的异质性。通过利用因果解释，我们推导出非参数识别条件，这构成了我们PII方法的基础。我们的假设简单的半参数推断方法扩展了对考虑中介变量、混淆变量和调节变量的投影直接效果估计的稳健性和泛化性。这些效应估计在模型规范错误和嵌入错误时仍然具有统计意义。我们提供了由估计嵌入引起的目标估计偏差的确定性量化，以及估计器的有限样本线性展开，并为所有结果的残差提供了统一的集中界限。提出的双重健壮估计器在最小假设下是一致和高效的，有助于使用机器学习算法进行数据适应性估计。使用随机森林，我们评估了模拟中的实证统计误差，并分析了可能存在未测量混杂因素的单细胞CRISPR扰动数据集。

更新时间: 2024-10-07 12:52:38

领域: stat.ME,cs.LG,q-bio.GN,stat.AP,stat.ML

下载: http://arxiv.org/abs/2410.04996v1

Spectrum Extraction and Clipping for Implicitly Linear Layers

We show the effectiveness of automatic differentiation in efficiently and correctly computing and controlling the spectrum of implicitly linear operators, a rich family of layer types including all standard convolutional and dense layers. We provide the first clipping method which is correct for general convolution layers, and illuminate the representational limitation that caused correctness issues in prior work. We study the effect of the batch normalization layers when concatenated with convolutional layers and show how our clipping method can be applied to their composition. By comparing the accuracy and performance of our algorithms to the state-of-the-art methods, using various experiments, we show they are more precise and efficient and lead to better generalization and adversarial robustness. We provide the code for using our methods at https://github.com/Ali-E/FastClip.

Updated: 2024-10-07 12:52:19

标题: 隐式线性层的光谱提取和裁剪

摘要: 我们展示了自动微分在高效且正确地计算和控制隐式线性运算符的频谱方面的有效性，这是一种包括所有标准卷积和密集层的丰富层类型。我们提供了第一个适用于一般卷积层的修剪方法，并阐明了在先前工作中导致正确性问题的表现限制。我们研究了当与卷积层串联时批量归一化层的影响，并展示了我们的修剪方法如何应用于它们的组合。通过将我们的算法的准确性和性能与最先进的方法进行比较，使用各种实验，我们展示它们更加精确和高效，并且导致更好的泛化和对抗性鲁棒性。我们提供了使用我们方法的代码，网址为https://github.com/Ali-E/FastClip。

更新时间: 2024-10-07 12:52:19

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.16017v2

Stage-Wise and Prior-Aware Neural Speech Phase Prediction

This paper proposes a novel Stage-wise and Prior-aware Neural Speech Phase Prediction (SP-NSPP) model, which predicts the phase spectrum from input amplitude spectrum by two-stage neural networks. In the initial prior-construction stage, we preliminarily predict a rough prior phase spectrum from the amplitude spectrum. The subsequent refinement stage transforms the amplitude spectrum into a refined high-quality phase spectrum conditioned on the prior phase. Networks in both stages use ConvNeXt v2 blocks as the backbone and adopt adversarial training by innovatively introducing a phase spectrum discriminator (PSD). To further improve the continuity of the refined phase, we also incorporate a time-frequency integrated difference (TFID) loss in the refinement stage. Experimental results confirm that, compared to neural network-based no-prior phase prediction methods, the proposed SP-NSPP achieves higher phase prediction accuracy, thanks to introducing the coarse phase priors and diverse training criteria. Compared to iterative phase estimation algorithms, our proposed SP-NSPP does not require multiple rounds of staged iterations, resulting in higher generation efficiency.

Updated: 2024-10-07 12:45:20

标题: 阶段式和先验感知的神经语音相位预测

摘要: 这篇论文提出了一种新颖的阶段式和先验感知的神经语音相位预测（SP-NSPP）模型，该模型通过两阶段神经网络从输入幅度谱预测相位谱。在初始的先验构建阶段，我们从幅度谱中初步预测出一个粗略的先验相位谱。随后的精细调整阶段将幅度谱转换为基于先验相位的精细高质量相位谱。两个阶段的网络都使用ConvNeXt v2块作为骨干，并通过创新性地引入相位谱鉴别器（PSD）采用对抗训练。为了进一步改善精细相位的连续性，我们还在精细调整阶段引入了一个时间频率综合差异（TFID）损失。实验结果表明，与基于神经网络的无先验相位预测方法相比，所提出的SP-NSPP在相位预测准确性上取得了更高的成绩，这要归功于引入了粗略相位先验和多样的训练标准。与迭代相位估计算法相比，我们提出的SP-NSPP不需要多轮分阶段迭代，从而提高了生成效率。

更新时间: 2024-10-07 12:45:20

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.04990v1

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our model structure is the first that allows for reasoning about joint uncertainty over transitions and rewards. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards, action penalties, and difficult-to-explore regions. Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.

Updated: 2024-10-07 12:42:51

标题: 通过乐观的汤普森抽样实现高效的基于模型的强化学习

摘要: 通过与环境的互动学习复杂的机器人行为需要有原则的探索。有效的策略应该优先探索最大化奖励的状态-动作空间区域，乐观探索作为一个有前途的方向与这个想法一致，并实现样本有效的强化学习。然而，现有方法忽视了一个关键的方面：乐观主义需要以连接奖励和状态的信念为基础。为了解决这个问题，我们提出了一个基于汤普森抽样的乐观探索的实用、理论基础的方法。我们的模型结构是第一个允许对转换和奖励的联合不确定性进行推理的模型。我们在一组MuJoCo和VMAS连续控制任务上应用我们的方法。我们的实验证明，乐观探索在奖励稀疏、动作惩罚和难以探索的区域的环境中显著加速学习。此外，我们提供了乐观主义何时有益的见解，并强调模型不确定性在指导探索中的关键作用。

更新时间: 2024-10-07 12:42:51

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.04988v1

TabDDPM: Modelling Tabular Data with Diffusion Models

Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have also recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where datapoints are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling, since the individual features can be of completely different nature, i.e., some of them can be continuous and some of them can be discrete. To address such data types, we introduce TabDDPM -- a diffusion model that can be universally applied to any tabular dataset and handles any type of feature. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields. Additionally, we show that TabDDPM is eligible for privacy-oriented setups, where the original datapoints cannot be publicly shared.

Updated: 2024-10-07 12:38:57

标题: TabDDPM：使用扩散模型对表格数据建模

摘要: 去噪扩散概率模型目前正在成为许多重要数据模态的生成建模的主流范式。作为计算机视觉社区中最为普遍的模型，扩散模型最近也引起了其他领域的关注，包括语音、自然语言处理和类似图形的数据。在这项工作中，我们研究了扩散模型框架是否对一般的表格问题有优势，其中数据点通常由异构特征的向量表示。表格数据的固有异构性使得准确建模变得非常具有挑战性，因为个别特征可能完全不同，即有些可能是连续的，有些可能是离散的。为了解决这种数据类型，我们引入了TabDDPM - 一种扩散模型，可以普遍应用于任何表格数据集，并处理任何类型的特征。我们在广泛的基准测试中评估了TabDDPM，并展示了其相对于现有的GAN/VAE替代方案的优越性，这与扩散模型在其他领域的优势一致。此外，我们还展示了TabDDPM适用于面向隐私的设置，其中原始数据点无法公开共享。

更新时间: 2024-10-07 12:38:57

领域: cs.LG

下载: http://arxiv.org/abs/2209.15421v2

A Meta-Complexity Characterization of Quantum Cryptography

We prove the first meta-complexity characterization of a quantum cryptographic primitive. We show that one-way puzzles exist if and only if there is some quantum samplable distribution of binary strings over which it is hard to approximate Kolmogorov complexity. Therefore, we characterize one-way puzzles by the average-case hardness of a uncomputable problem. This brings to the quantum setting a recent line of work that characterizes classical cryptography with the average-case hardness of a meta-complexity problem, initiated by Liu and Pass. Moreover, since the average-case hardness of Kolmogorov complexity over classically polynomial-time samplable distributions characterizes one-way functions, this result poses one-way puzzles as a natural generalization of one-way functions to the quantum setting. Furthermore, our equivalence goes through probability estimation, giving us the additional equivalence that one-way puzzles exist if and only if there is a quantum samplable distribution over which probability estimation is hard. We also observe that the oracle worlds of defined by Kretschmer et. al. rule out any relativizing characterization of one-way puzzles by the hardness of a problem in NP or QMA, which means that it may not be possible with current techniques to characterize one-way puzzles with another meta-complexity problem.

Updated: 2024-10-07 12:29:27

标题: 量子密码学的元复杂性特征描述

摘要: 我们证明了第一个量子密码原语的元复杂性特征。我们表明，仅当存在某个量子可抽样的二进制字符串分布，且难以逼近Kolmogorov复杂性时，单向谜题存在。因此，我们通过一个无法计算的问题的平均情况难度来表征单向谜题。这将量子设置中的一个最近的研究领域引入到一个经由刘和帕斯发起的将经典密码学与元复杂性问题的平均情况难度相联系的研究领域。此外，由于在经典多项式时间可抽样分布上Kolmogorov复杂性的平均情况难度表征了单向函数，这一结果将单向谜题视为对量子设置中单向函数的自然推广。此外，我们的等价关系通过概率估计得以实现，从而给出了一个额外的等价关系，即单向谜题存在当且仅当存在一个量子可抽样分布，使得概率估计变得困难。我们还观察到由Kretschmer等人定义的oracle世界排除了任何通过NP或QMA问题的难度对单向谜题进行相对化特征化的可能性，这意味着使用当前技术可能无法用另一个元复杂性问题来表征单向谜题。

更新时间: 2024-10-07 12:29:27

领域: cs.CR,cs.CC,quant-ph

下载: http://arxiv.org/abs/2410.04984v1

Safe Learning-Based Optimization of Model Predictive Control: Application to Battery Fast-Charging

Model predictive control (MPC) is a powerful tool for controlling complex nonlinear systems under constraints, but often struggles with model uncertainties and the design of suitable cost functions. To address these challenges, we discuss an approach that integrates MPC with safe Bayesian optimization to optimize long-term closed-loop performance despite significant model-plant mismatches. By parameterizing the MPC stage cost function using a radial basis function network, we employ Bayesian optimization as a multi-episode learning strategy to tune the controller without relying on precise system models. This method mitigates conservativeness introduced by overly cautious soft constraints in the MPC cost function and provides probabilistic safety guarantees during learning, ensuring that safety-critical constraints are met with high probability. As a practical application, we apply our approach to fast charging of lithium-ion batteries, a challenging task due to the complicated battery dynamics and strict safety requirements, subject to the requirement to be implementable in real time. Simulation results demonstrate that, in the context of model-plant mismatch, our method reduces charging times compared to traditional MPC methods while maintaining safety. This work extends previous research by emphasizing closed-loop constraint satisfaction and offers a promising solution for enhancing performance in systems where model uncertainties and safety are critical concerns.

Updated: 2024-10-07 12:23:40

标题: 安全学习驱动的模型预测控制优化：应用于电池快速充电

摘要: 模型预测控制（MPC）是控制复杂非线性系统受约束条件下的强大工具，但常常面临模型不确定性和合适成本函数设计的困难。为了解决这些挑战，我们讨论了一种将MPC与安全贝叶斯优化相结合的方法，以优化长期闭环性能，尽管存在重要的模型-装置不匹配。通过使用径向基函数网络对MPC阶段成本函数进行参数化，我们采用贝叶斯优化作为多集数学习策略来调整控制器，而不依赖于精确的系统模型。这种方法缓解了MPC成本函数中过于谨慎的软约束引入的保守性，并在学习过程中提供概率安全保证，确保以高概率满足安全关键约束。作为一个实际应用，我们将我们的方法应用于锂离子电池的快速充电，这是一个具有挑战性的任务，由于复杂的电池动力学和严格的安全要求，同时还需要在实时实现的要求下。模拟结果表明，在模型-装置不匹配的情况下，我们的方法相比传统MPC方法减少了充电时间，同时保持了安全性。这项工作通过强调闭环约束满足来扩展了先前的研究，并为增强在模型不确定性和安全性是关键关注的系统中的性能提供了一个有前途的解决方案。

更新时间: 2024-10-07 12:23:40

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2410.04982v1

SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model for Long Sequences Learning

Spiking neural networks (SNNs) provide an energy-efficient solution by utilizing the spike-based and sparse nature of biological systems. Since the advent of Transformers, SNNs have struggled to compete with artificial networks on long sequential tasks, until the recent emergence of state space models (SSMs), which offer superior computational efficiency and modeling capability. However, applying the highly capable SSMs to SNNs for long sequences learning poses three major challenges: (1) The membrane potential is determined by the past spiking history of the neuron, leading to reduced efficiency for sequence modeling in parallel computing scenarios. (2) Complex dynamics of biological spiking neurons are crucial for functionality but challenging to simulate and exploit effectively in large networks. (3) It is arduous to maintain high sparsity while achieving high accuracy for spiking neurons without resorting to dense computing, as utilized in artificial neuron-based SSMs. To address them, we propose a sparse, precise and efficient spiking SSM framework, termed SPikE-SSM. For (1), we propose a boundary compression strategy (PMBC) to accelerate the inference of the spiking neuron model, enabling parallel processing for long sequence learning. For (2), we propose a novel and concise neuron model incorporating reset-refractory mechanism to leverage the inherent temporal dimension for dynamic computing with biological interpretability. For (3), we hierarchically integrate the proposed neuron model to the original SSM block, and enhance the dynamics of SPikE-SSM by incorporating trainable thresholds and refractory magnitudes to balance accuracy and sparsity. Extensive experiments verify the effectiveness and robustness of SPikE-SSM on the long range arena benchmarks and large language dataset WikiText-103, showing the potential of dynamic spiking neurons in efficient long sequence learning.

Updated: 2024-10-07 12:20:38

标题: SPikE-SSM：一种用于长序列学习的稀疏、精确和高效的脉冲状态空间模型

摘要: 尖峰神经网络（SNN）通过利用生物系统的尖峰基础和稀疏特性，提供了一种节能解决方案。自变压器问世以来，SNN在长序列任务上一直难以与人工网络竞争，直到最近出现了状态空间模型（SSMs），这些模型提供了卓越的计算效率和建模能力。然而，将高效的SSMs应用于SNN进行长序列学习面临三个主要挑战：（1）膜电位由神经元的过去尖峰历史决定，在并行计算场景中对序列建模效率降低。（2）生物尖峰神经元的复杂动态对功能至关重要，但在大型网络中挑战性地模拟和有效利用。（3）在不借助人工神经元SSMs中使用的密集计算的情况下，保持高稀疏性同时实现高精度对于尖峰神经元来说是困难的。为了解决这些问题，我们提出了一种稀疏、精确和高效的尖峰SSM框架，称为SPikE-SSM。对于（1），我们提出了一种边界压缩策略（PMBC），以加速尖峰神经元模型的推断，实现长序列学习的并行处理。对于（2），我们提出了一种新颖简洁的神经元模型，结合重置-不应期机制，利用固有的时间维度进行具有生物可解释性的动态计算。对于（3），我们将提出的神经元模型分层集成到原始SSM块中，并通过整合可训练的阈值和不应期幅度来增强SPikE-SSM的动态特性，以平衡精度和稀疏性。大量实验验证了SPikE-SSM在长距离竞技场基准和大型语言数据集WikiText-103上的有效性和稳健性，展示了动态尖峰神经元在高效长序列学习中的潜力。

更新时间: 2024-10-07 12:20:38

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2410.17268v1

Collaboration! Towards Robust Neural Methods for Routing Problems

Despite enjoying desirable efficiency and reduced reliance on domain expertise, existing neural methods for vehicle routing problems (VRPs) suffer from severe robustness issues -- their performance significantly deteriorates on clean instances with crafted perturbations. To enhance robustness, we propose an ensemble-based Collaborative Neural Framework (CNF) w.r.t. the defense of neural VRP methods, which is crucial yet underexplored in the literature. Given a neural VRP method, we adversarially train multiple models in a collaborative manner to synergistically promote robustness against attacks, while boosting standard generalization on clean instances. A neural router is designed to adeptly distribute training instances among models, enhancing overall load balancing and collaborative efficacy. Extensive experiments verify the effectiveness and versatility of CNF in defending against various attacks across different neural VRP methods. Notably, our approach also achieves impressive out-of-distribution generalization on benchmark instances.

Updated: 2024-10-07 12:12:51

标题: 合作！朝着解决路由问题的强大神经方法前进

摘要: 尽管现有的神经方法在车辆路径问题（VRPs）中具有良好的效率和减少对领域专业知识的依赖，但存在严重的鲁棒性问题--它们在经过精心设计的扰动后的干净实例上的性能显著下降。为了增强鲁棒性，我们提出了一种基于集成的协作神经框架（CNF）来防御神经VRP方法，这在文献中尚未得到充分探讨。给定一个神经VRP方法，我们以协作方式对多个模型进行对抗训练，以协同促进对抗攻击的鲁棒性，同时增强对干净实例的标准泛化。一个神经路由器被设计成能够灵活地在模型之间分配训练实例，增强整体负载平衡和协作效果。大量实验证实了CNF在防御不同神经VRP方法的各种攻击中的有效性和多功能性。值得注意的是，我们的方法还在基准实例上取得了令人印象深刻的超出分布的泛化能力。

更新时间: 2024-10-07 12:12:51

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.04968v1

Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models

Significant advancements have recently been made in large language models represented by GPT models. Users frequently have multi-round private conversations with cloud-hosted GPT models for task optimization. Yet, this operational paradigm introduces additional attack surfaces, particularly in custom GPTs and hijacked chat sessions. In this paper, we introduce a straightforward yet potent Conversation Reconstruction Attack. This attack targets the contents of previous conversations between GPT models and benign users, i.e., the benign users' input contents during their interaction with GPT models. The adversary could induce GPT models to leak such contents by querying them with designed malicious prompts. Our comprehensive examination of privacy risks during the interactions with GPT models under this attack reveals GPT-4's considerable resilience. We present two advanced attacks targeting improved reconstruction of past conversations, demonstrating significant privacy leakage across all models under these advanced techniques. Evaluating various defense mechanisms, we find them ineffective against these attacks. Our findings highlight the ease with which privacy can be compromised in interactions with GPT models, urging the community to safeguard against potential abuses of these models' capabilities.

Updated: 2024-10-07 12:11:58

标题: 重新构建您之前的对话！全面调查与GPT模型对话中的隐私泄露风险

摘要: 最近在代表GPT模型的大型语言模型方面取得了显著进展。用户经常与云托管的GPT模型进行多轮私人对话以进行任务优化。然而，这种操作范式引入了额外的攻击面，特别是在定制的GPT和劫持的聊天会话中。在本文中，我们介绍了一种简单但有效的对话重建攻击。该攻击针对GPT模型与良性用户之间先前对话的内容，即与GPT模型互动期间良性用户的输入内容。攻击者可以通过设计恶意提示查询GPT模型，诱使其泄露这些内容。我们对在这种攻击下与GPT模型的交互过程中的隐私风险进行了全面检查，发现GPT-4具有相当强大的韧性。我们提出了两种针对改进过去对话重建的高级攻击，展示了在这些高级技术下所有模型的显着隐私泄露。评估各种防御机制，我们发现它们无法抵御这些攻击。我们的研究结果突显了在与GPT模型的交互中隐私可以轻易被泄露的情况，敦促社区采取措施防范这些模型能力的潜在滥用。

更新时间: 2024-10-07 12:11:58

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.02987v2

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation

Automating garment manipulation poses a significant challenge for assistive robotics due to the diverse and deformable nature of garments. Traditional approaches typically require separate models for each garment type, which limits scalability and adaptability. In contrast, this paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories. By interpreting both visual and semantic information, our model enables robots to manage different garment states with a single model. We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data. Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates, providing a more flexible and general solution for robotic garment manipulation. In addition, this research also underscores the potential of VLMs to unify various garment manipulation tasks within a single framework, paving the way for broader applications in home automation and assistive robotics for future.

Updated: 2024-10-07 12:06:17

标题: SKT：将具有状态意识的关键点轨迹与视觉语言模型集成，用于机器人服装操作

摘要: 自动化服装操作对辅助机器人学来说是一个重大挑战，因为服装的多样性和可变性。传统方法通常需要为每种服装类型单独建模，这限制了可扩展性和适应性。相比之下，本文提出了一种统一的方法，利用视觉-语言模型（VLMs）来改进各种服装类别的关键点预测。通过解释视觉和语义信息，我们的模型使机器人能够使用单一模型管理不同的服装状态。我们利用先进的仿真技术创建了一个大规模的合成数据集，允许进行可扩展的训练，而无需大量的真实世界数据。实验结果表明，基于VLM的方法显著提高了关键点检测准确性和任务成功率，为机器人服装操作提供了更灵活和通用的解决方案。此外，这项研究还强调了VLM的潜力，将各种服装操作任务统一到一个框架中，为未来在家庭自动化和辅助机器人领域更广泛的应用铺平了道路。

更新时间: 2024-10-07 12:06:17

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.18082v2

Visual Question Decomposition on Multimodal Large Language Models

Question decomposition has emerged as an effective strategy for prompting Large Language Models (LLMs) to answer complex questions. However, while existing methods primarily focus on unimodal language models, the question decomposition capability of Multimodal Large Language Models (MLLMs) has yet to be explored. To this end, this paper explores visual question decomposition on MLLMs. Specifically, we introduce a systematic evaluation framework including a dataset and several evaluation criteria to assess the quality of the decomposed sub-questions, revealing that existing MLLMs struggle to produce high-quality sub-questions. To address this limitation, we propose a specific finetuning dataset, DecoVQA+, for enhancing the model's question decomposition capability. Aiming at enabling models to perform appropriate selective decomposition, we propose an efficient finetuning pipeline. The finetuning pipeline consists of our proposed dataset and a training objective for selective decomposition. Finetuned MLLMs demonstrate significant improvements in the quality of sub-questions and the policy of selective question decomposition. Additionally, the models also achieve higher accuracy with selective decomposition on VQA benchmark datasets.

Updated: 2024-10-07 12:05:55

标题: 多模态大型语言模型上的视觉问题分解

摘要: 问题分解已经被证明是一种有效的策略，可以促使大型语言模型（LLMs）回答复杂的问题。然而，尽管现有方法主要集中在单模式语言模型上，但多模式大型语言模型（MLLMs）的问题分解能力尚未被探索。因此，本文探讨了MLLMs上的视觉问题分解。具体而言，我们引入了一个系统化的评估框架，包括一个数据集和几个评估标准，以评估分解后子问题的质量，发现现有的MLLMs难以产生高质量的子问题。为了解决这一局限性，我们提出了一个特定的微调数据集DecoVQA+，以增强模型的问题分解能力。为了使模型能够进行适当的选择性分解，我们提出了一个高效的微调管道。微调管道包括我们提出的数据集和一个用于选择性分解的训练目标。经过微调的MLLMs在子问题质量和选择性问题分解策略方面取得了显著的改进。此外，这些模型在VQA基准数据集上实现了更高的准确性。

更新时间: 2024-10-07 12:05:55

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.19339v2

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

Offline Reinforcement Learning (RL) enables policy learning without active interactions, making it especially appealing for self-driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which, however, fails in stochastic environments with incorrect assumptions that identical actions can consistently achieve the same goal. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates uncertainties by conditional mutual information between transitions and returns. Discovering 'uncertainty accumulation' and 'temporal locality' properties of driving environments, we replace the global returns in decision transformers with truncated returns less affected by environments to learn from actual outcomes of actions rather than environment transitions. We also dynamically evaluate uncertainty at inference for cautious planning. Extensive experiments demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.

Updated: 2024-10-07 12:05:12

标题: 不确定性感知决策变换器用于随机驾驶环境

摘要: 离线强化学习（RL）使策略学习不需要主动交互，这使得它在自动驾驶任务中特别受欢迎。最近，Transformer的成功激发了将离线RL作为序列建模的想法，然而，在具有错误假设的随机环境中，即相同的动作可以一致实现相同目标的假设下，这种方法失败了。在本文中，我们介绍了一种适用于在随机驾驶环境中进行规划的不确定性感知决策Transformer（UNREST），而不引入额外的转换或复杂的生成模型。具体而言，UNREST通过转换和回报之间的条件互信息来估计不确定性。发现了驾驶环境的'不确定性积累'和'时间局部性'特性，我们用受环境影响较小的截断回报替代决策Transformer中的全局回报，以便从实际行动结果中学习，而不是从环境转换中学习。我们还在推理时动态评估不确定性，以进行谨慎规划。大量实验证明了UNREST在各种驾驶场景中的卓越性能以及我们的不确定性估计策略的有效性。

更新时间: 2024-10-07 12:05:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.16397v3

Activation Scaling for Steering and Interpreting Language Models

Given the prompt "Rome is in", can we steer a language model to flip its prediction of an incorrect token "France" to a correct token "Italy" by only multiplying a few relevant activation vectors with scalars? We argue that successfully intervening on a model is a prerequisite for interpreting its internal workings. Concretely, we establish a three-term objective: a successful intervention should flip the correct with the wrong token and vice versa (effectiveness), and leave other tokens unaffected (faithfulness), all while being sparse (minimality). Using gradient-based optimization, this objective lets us learn (and later evaluate) a specific kind of efficient and interpretable intervention: activation scaling only modifies the signed magnitude of activation vectors to strengthen, weaken, or reverse the steering directions already encoded in the model. On synthetic tasks, this intervention performs comparably with steering vectors in terms of effectiveness and faithfulness, but is much more minimal allowing us to pinpoint interpretable model components. We evaluate activation scaling from different angles, compare performance on different datasets, and make activation scalars a learnable function of the activation vectors themselves to generalize to varying-length prompts.

Updated: 2024-10-07 12:01:32

标题: 激活缩放用于引导和解释语言模型

摘要: 鉴于提示“罗马在”，我们能否通过仅将几个相关激活向量与标量相乘，引导语言模型将其对错误标记“法国”的预测翻转为正确标记“意大利”？我们认为成功干预模型是解释其内部运作的先决条件。具体而言，我们建立了一个三项目标：成功干预应该将正确标记与错误标记翻转，反之亦然（有效性），并使其他标记不受影响（忠实性），同时保持稀疏性（最小性）。通过基于梯度的优化，这个目标让我们学习（后来评估）一种特定类型的高效且可解释的干预：激活缩放仅修改激活向量的有符号幅度，以加强、减弱或颠倒模型已编码的引导方向。在合成任务中，这种干预在有效性和忠实性方面与引导向量表现相当，但更为最小，使我们能够准确定位可解释的模型组件。我们从不同角度评估激活缩放，比较在不同数据集上的性能，并将激活标量作为激活向量本身的可学习函数，以推广到长度不同的提示。

更新时间: 2024-10-07 12:01:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.04962v1

DYNAMICQA: Tracing Internal Knowledge Conflicts in Language Models

Knowledge-intensive language understanding tasks require Language Models (LMs) to integrate relevant context, mitigating their inherent weaknesses, such as incomplete or outdated knowledge. However, conflicting knowledge can be present in the LM's parameters, termed intra-memory conflict, which can affect a model's propensity to accept contextual knowledge. To study the effect of intra-memory conflict on an LM's ability to accept relevant context, we utilize two knowledge conflict measures and a novel dataset containing inherently conflicting data, DynamicQA. This dataset includes facts with a temporal dynamic nature where facts can change over time and disputable dynamic facts, which can change depending on the viewpoint. DynamicQA is the first to include real-world knowledge conflicts and provide context to study the link between the different types of knowledge conflicts. We also evaluate several measures on their ability to reflect the presence of intra-memory conflict: semantic entropy and a novel coherent persuasion score. With our extensive experiments, we verify that LMs exhibit a greater degree of intra-memory conflict with dynamic facts compared to facts that have a single truth value. Furthermore, we reveal that facts with intra-memory conflict are harder to update with context, suggesting that retrieval-augmented generation will struggle with the most commonly adapted facts.

Updated: 2024-10-07 11:59:37

标题: 动态问答：在语言模型中跟踪内部知识冲突

摘要: 知识密集型语言理解任务需要语言模型（LM）整合相关背景，减轻其固有的弱点，如不完整或过时的知识。然而，LM参数中可能存在冲突的知识，称为内存冲突，这可能影响模型接受上下文知识的倾向。为了研究内存冲突对LM接受相关背景的能力的影响，我们利用两种知识冲突度量和一个包含固有冲突数据的新数据集DynamicQA。该数据集包括具有时间动态性质的事实，事实可以随时间变化以及有争议的动态事实，这取决于观点。DynamicQA是第一个包括现实世界知识冲突并提供上下文来研究不同类型知识冲突之间联系的数据集。我们还评估了几种能够反映内存冲突存在的度量方法：语义熵和一种新型的连贯说服度分数。通过我们广泛的实验，我们验证了与具有单一真值事实相比，LM在动态事实中展现出更高程度的内存冲突。此外，我们揭示了具有内存冲突的事实更难随着上下文的更新，这表明检索增强生成将难以处理最常适应的事实。

更新时间: 2024-10-07 11:59:37

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2407.17023v2

Zero-Shot Vision-and-Language Navigation with Collision Mitigation in Continuous Environment

We propose the zero-shot Vision-and-Language Navigation with Collision Mitigation (VLN-CM), which takes these considerations. VLN-CM is composed of four modules and predicts the direction and distance of the next movement at each step. We utilize large foundation models for each modules. To select the direction, we use the Attention Spot Predictor (ASP), View Selector (VS), and Progress Monitor (PM). The ASP employs a Large Language Model (e.g. ChatGPT) to split navigation instructions into attention spots, which are objects or scenes at the location to move to (e.g. a yellow door). The VS selects from panorama images provided at 30-degree intervals the one that includes the attention spot, using CLIP similarity. We then choose the angle of the selected image as the direction to move in. The PM uses a rule-based approach to decide which attention spot to focus on next, among multiple spots derived from the instructions. If the similarity between the current attention spot and the visual observations decreases consecutively at each step, the PM determines that the agent has passed the current spot and moves on to the next one. For selecting the distance to move, we employed the Open Map Predictor (OMP). The OMP uses panorama depth information to predict an occupancy mask. We then selected a collision-free distance in the predicted direction based on the occupancy mask. We evaluated our method using the validation data of VLN-CE. Our approach showed better performance than several baseline methods, and the OPM was effective in mitigating collisions for the agent.

Updated: 2024-10-07 11:59:01

标题: 在连续环境中通过碰撞缓解实现零样本视觉与语言导航

摘要: 我们提出了零样本视觉-语言导航与碰撞缓解（VLN-CM）的方法，考虑了这些因素。VLN-CM由四个模块组成，在每一步预测下一次移动的方向和距离。我们为每个模块使用了大型基础模型。为了选择方向，我们使用了Attention Spot Predictor（ASP）、View Selector（VS）和Progress Monitor（PM）。ASP利用大型语言模型（例如ChatGPT）将导航指令分割为注意力点，这些点是需要移动到的位置的对象或场景（例如黄色门）。VS在提供30度间隔的全景图像中选择包含注意力点的图像，使用CLIP相似性。然后选择所选图像的角度作为移动方向。PM使用基于规则的方法决定下一个要关注的注意力点，从指令中派生出多个点。如果当前注意力点与视觉观察之间的相似度在每一步连续减小，PM确定代理已经经过当前点并转移到下一个点。为了选择移动的距离，我们使用了Open Map Predictor（OMP）。OMP使用全景深度信息预测占用掩模。然后根据占用掩模在预测方向上选择无碰撞的距离。我们使用VLN-CE的验证数据评估了我们的方法。我们的方法表现比几种基线方法更好，OMP在减轻代理碰撞方面是有效的。

更新时间: 2024-10-07 11:59:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.17267v1

Entropy-Based Uncertainty Modeling for Trajectory Prediction in Autonomous Driving

In autonomous driving, accurate motion prediction is essential for safe and efficient motion planning. To ensure safety, planners must rely on reliable uncertainty information about the predicted future behavior of surrounding agents, yet this aspect has received limited attention. This paper addresses the so-far neglected problem of uncertainty modeling in trajectory prediction. We adopt a holistic approach that focuses on uncertainty quantification, decomposition, and the influence of model composition. Our method is based on a theoretically grounded information-theoretic approach to measure uncertainty, allowing us to decompose total uncertainty into its aleatoric and epistemic components. We conduct extensive experiments on the nuScenes dataset to assess how different model architectures and configurations affect uncertainty quantification and model robustness.

Updated: 2024-10-07 11:57:37

标题: 基于熵的不确定性建模在自动驾驶中的轨迹预测

摘要: 在自动驾驶中，准确的运动预测对于安全和高效的运动规划至关重要。为了确保安全，规划者必须依赖于关于周围代理预测未来行为的可靠不确定性信息，然而这一方面受到了有限的关注。本文解决了轨迹预测中不确定性建模的迄今被忽视的问题。我们采用了一种全面的方法，重点关注不确定性量化、分解以及模型组成的影响。我们的方法基于一个理论上的信息论方法来衡量不确定性，使我们能够将总不确定性分解为其随机性和认知性组成部分。我们在nuScenes数据集上进行了大量实验，以评估不同模型架构和配置如何影响不确定性量化和模型的稳健性。

更新时间: 2024-10-07 11:57:37

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.01628v2

Maximizing the practical achievability of quantum annealing attacks on factorization-based cryptography

This work focuses on quantum methods for cryptanalysis of schemes based on the integer factorization problem and the discrete logarithm problem. We demonstrate how to practically solve the largest instances of the factorization problem by improving an approach that combines quantum and classical computations, assuming the use of the best publicly available special-class quantum computer: the quantum annealer. We achieve new computational experiment results by solving the largest instance of the factorization problem ever announced as solved using quantum annealing, with a size of 29 bits. The core idea of the improved approach is to leverage known sub-exponential classical method to break the problem down into many smaller computations and perform the most critical ones on a quantum computer. This approach does not reduce the complexity class, but it assesses the pragmatic capabilities of an attacker. It also marks a step forward in the development of hybrid methods, which in practice may surpass classical methods in terms of efficiency sooner than purely quantum computations will.

Updated: 2024-10-07 11:55:23

标题: 最大化基于量子退火攻击对基于因子分解的密码学的实现可行性

摘要: 这项工作侧重于量子方法，用于对基于整数因子分解问题和离散对数问题的方案进行密码分析。我们展示了如何通过改进结合量子和经典计算的方法来实际解决因子分解问题的最大实例，假设使用最好的公开可用特殊类量子计算机：量子退火器。我们通过解决迄今为止使用量子退火解决的最大因子分解问题实例，大小为29位，实现了新的计算实验结果。改进方法的核心思想是利用已知的亚指数经典方法将问题分解为许多较小的计算，并在量子计算机上执行最关键的计算。这种方法并不降低复杂度类，但评估了攻击者的实际能力。它也标志着混合方法的发展迈出了一步，实践中这种方法可能比纯量子计算更快地超越经典方法的效率。

更新时间: 2024-10-07 11:55:23

领域: cs.CR

下载: http://arxiv.org/abs/2410.04956v1

Residual Stream Analysis with Multi-Layer SAEs

Sparse autoencoders (SAEs) are a promising approach to interpreting the internal representations of transformer language models. However, SAEs are usually trained separately on each transformer layer, making it difficult to use them to study how information flows across layers. To solve this problem, we introduce the multi-layer SAE (MLSAE): a single SAE trained on the residual stream activation vectors from every transformer layer. Given that the residual stream is understood to preserve information across layers, we expected MLSAE latents to `switch on' at a token position and remain active at later layers. Interestingly, we find that individual latents are often active at a single layer for a given token or prompt, but this layer may differ for different tokens or prompts. We quantify these phenomena by defining a distribution over layers and considering its variance. We find that the variance of the distributions of latent activations over layers is about two orders of magnitude greater when aggregating over tokens compared with a single token. For larger underlying models, the degree to which latents are active at multiple layers increases, which is consistent with the fact that the residual stream activation vectors at adjacent layers become more similar. Finally, we relax the assumption that the residual stream basis is the same at every layer by applying pre-trained tuned-lens transformations, but our findings remain qualitatively similar. Our results represent a new approach to understanding how representations change as they flow through transformers. We release our code to train and analyze MLSAEs at https://github.com/tim-lawson/mlsae.

Updated: 2024-10-07 11:54:11

标题: 用多层SAEs进行残余流分析

摘要: 稀疏自编码器（SAEs）是解释Transformer语言模型内部表示的一种有前途的方法。然而，SAEs通常在每个Transformer层上单独训练，这使得难以利用它们来研究信息在层之间的流动。为了解决这个问题，我们引入了多层SAE（MLSAE）：一个单一的SAE，训练于每个Transformer层的残余流激活向量上。鉴于残余流被认为在层之间保留信息，我们预期MLSAE潜在因子在令牌位置“开启”并在后续层保持活跃。有趣的是，我们发现单个潜在因子在给定令牌或提示的单个层通常是活跃的，但对于不同的令牌或提示，这个层可能不同。我们通过在层上定义一个分布并考虑其方差来量化这些现象。我们发现，与单个令牌相比，将潜在激活在层上的分布聚合时，其方差约大两个数量级。对于更大的基础模型，潜在因子在多个层上活跃的程度增加，这与相邻层的残余流激活向量变得更相似的事实一致。最后，我们通过应用经过预训练的调整镜头变换来放松残余流基础在每个层上相同的假设，但我们的发现在质量上保持相似。我们的结果代表了一种新的方法来理解表示在通过transformers时如何改变。我们发布了用于训练和分析MLSAE的代码，网址为https://github.com/tim-lawson/mlsae。

更新时间: 2024-10-07 11:54:11

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.04185v2

Leverage Knowledge Graph and Large Language Model for Law Article Recommendation: A Case Study of Chinese Criminal Law

Court efficiency is vital for social stability. However, in most countries around the world, the grassroots courts face case backlogs, with decisions relying heavily on judicial personnel's cognitive labor, lacking intelligent tools to improve efficiency. To address this issue, we propose an efficient law article recommendation approach utilizing a Knowledge Graph (KG) and a Large Language Model (LLM). Firstly, we propose a Case-Enhanced Law Article Knowledge Graph (CLAKG) as a database to store current law statutes, historical case information, and correspondence between law articles and historical cases. Additionally, we introduce an automated CLAKG construction method based on LLM. On this basis, we propose a closed-loop law article recommendation method. Finally, through a series of experiments using judgment documents from the website "China Judgements Online", we have improved the accuracy of law article recommendation in cases from 0.549 to 0.694, demonstrating that our proposed method significantly outperforms baseline approaches.

Updated: 2024-10-07 11:45:04

标题: 利用知识图谱和大型语言模型进行法律文章推荐：以中国刑法为例研究

摘要: 法院效率对社会稳定至关重要。然而，在世界大多数国家，基层法院面临案件积压问题，决策严重依赖于司法人员的认知劳动，缺乏智能工具以提高效率。为解决这一问题，我们提出了一种利用知识图谱（KG）和大型语言模型（LLM）的高效法律文章推荐方法。首先，我们提出了一个案例增强的法律文章知识图谱（CLAKG）作为数据库，用于存储当前法律法规、历史案例信息以及法律文章与历史案例之间的对应关系。此外，我们介绍了一种基于LLM的自动化CLAKG构建方法。在此基础上，我们提出了一个闭环法律文章推荐方法。最后，通过使用来自“中国裁判文书网”网站的判决文件进行一系列实验，我们将案例中法律文章推荐的准确性从0.549提高到0.694，证明了我们提出的方法明显优于基准方法。

更新时间: 2024-10-07 11:45:04

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2410.04949v1

Machine listening in a neonatal intensive care unit

Oxygenators, alarm devices, and footsteps are some of the most common sound sources in a hospital. Detecting them has scientific value for environmental psychology but comes with challenges of its own: namely, privacy preservation and limited labeled data. In this paper, we address these two challenges via a combination of edge computing and cloud computing. For privacy preservation, we have designed an acoustic sensor which computes third-octave spectrograms on the fly instead of recording audio waveforms. For sample-efficient machine learning, we have repurposed a pretrained audio neural network (PANN) via spectral transcoding and label space adaptation. A small-scale study in a neonatological intensive care unit (NICU) confirms that the time series of detected events align with another modality of measurement: i.e., electronic badges for parents and healthcare professionals. Hence, this paper demonstrates the feasibility of polyphonic machine listening in a hospital ward while guaranteeing privacy by design.

Updated: 2024-10-07 11:44:38

标题: Neonatal Intensive Care Unit中的机器听力

摘要: 氧气机、警报设备和脚步声是医院中一些最常见的声音来源。检测它们对环境心理学具有科学价值，但也面临着隐私保护和有限标记数据的挑战。在本文中，我们通过边缘计算和云计算的结合来解决这两个挑战。为了保护隐私，我们设计了一个声学传感器，可以在实时计算第三倍频光谱图而不是记录音频波形。为了进行样本高效的机器学习，我们重新利用了一个预训练的音频神经网络（PANN），通过光谱转码和标签空间适应。在一个新生儿重症监护病房（NICU）进行的小规模研究证实，检测到的事件的时间序列与另一种测量模态相符：即父母和医疗专业人员的电子徽章。因此，本文证明了在医院病房中进行多声部机器听力的可行性，并通过设计保证隐私。

更新时间: 2024-10-07 11:44:38

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2409.11439v2

Real-time Ship Recognition and Georeferencing for the Improvement of Maritime Situational Awareness

In an era where maritime infrastructures are crucial, advanced situational awareness solutions are increasingly important. The use of optical camera systems can allow real-time usage of maritime footage. This thesis presents an investigation into leveraging deep learning and computer vision to advance real-time ship recognition and georeferencing for the improvement of maritime situational awareness. A novel dataset, ShipSG, is introduced, containing 3,505 images and 11,625 ship masks with corresponding class and geographic position. After an exploration of state-of-the-art, a custom real-time segmentation architecture, ScatYOLOv8+CBAM, is designed for the NVIDIA Jetson AGX Xavier embedded system. This architecture adds the 2D scattering transform and attention mechanisms to YOLOv8, achieving an mAP of 75.46% and an 25.3 ms per frame, outperforming state-of-the-art methods by over 5%. To improve small and distant ship recognition in high-resolution images on embedded systems, an enhanced slicing mechanism is introduced, improving mAP by 8% to 11%. Additionally, a georeferencing method is proposed, achieving positioning errors of 18 m for ships up to 400 m away and 44 m for ships between 400 m and 1200 m. The findings are also applied in real-world scenarios, such as the detection of abnormal ship behaviour, camera integrity assessment and 3D reconstruction. The approach of this thesis outperforms existing methods and provides a framework for integrating recognized and georeferenced ships into real-time systems, enhancing operational effectiveness and decision-making for maritime stakeholders. This thesis contributes to the maritime computer vision field by establishing a benchmark for ship segmentation and georeferencing research, demonstrating the viability of deep-learning-based recognition and georeferencing methods for real-time maritime monitoring.

Updated: 2024-10-07 11:43:42

标题: 实时船舶识别和地理参考技术用于提高海上局势感知

摘要: 在海洋基础设施至关重要的时代，先进的情境感知解决方案变得越来越重要。使用光学摄像头系统可以实现海洋摄影的实时使用。本文介绍了一项利用深度学习和计算机视觉来推进实时船舶识别和地理参考以提高海洋情境感知的研究。引入了一个新颖的数据集ShipSG，包含3,505张图像和11,625个船舶掩模，具有相应的类别和地理位置。在探索最新技术之后，设计了一个定制的实时分割架构ScatYOLOv8+CBAM，用于NVIDIA Jetson AGX Xavier嵌入式系统。该架构将2D散射变换和注意机制添加到YOLOv8中，实现了75.46%的mAP和每帧25.3毫秒的性能，优于最新方法超过5%。为了改善嵌入式系统中高分辨率图像上小型和远处船只的识别，引入了增强的切片机制，将mAP提高了8%至11%。此外，提出了一种地理参考方法，实现了距离400米以内的船只的18米定位误差和距离400米至1200米的船只的44米定位误差。研究结果还应用于现实情境，如异常船只行为检测、摄像头完整性评估和三维重建。本文的方法优于现有方法，并提供了一个框架，用于将识别和地理参考的船只集成到实时系统中，增强海洋利益相关者的运营效率和决策制定。该论文通过为船舶分割和地理参考研究建立基准，展示了基于深度学习的识别和地理参考方法在实时海洋监测中的可行性，为海洋计算机视觉领域做出了贡献。

更新时间: 2024-10-07 11:43:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.04946v1

Towards using Reinforcement Learning for Scaling and Data Replication in Cloud Systems

Given its intuitive nature, many Cloud providers opt for threshold-based data replication to enable automatic resource scaling. However, setting thresholds effectively needs human intervention to calibrate thresholds for each metric and requires a deep knowledge of current workload trends, which can be challenging to achieve. Reinforcement learning is used in many areas related to the Cloud Computing, and it is a promising field to get automatic data replication strategies. In this work, we survey data replication strategies and data scaling based on reinforcement learning (RL).

Updated: 2024-10-07 11:32:35

标题: 朝向在云系统中使用强化学习来实现规模化和数据复制

摘要: 考虑到其直观性，许多云服务提供商选择基于阈值的数据复制来实现自动资源扩展。然而，有效设置阈值需要人工干预，为每个指标校准阈值，并需要对当前工作负载趋势有深入的了解，这可能具有挑战性。强化学习在与云计算相关的许多领域中被使用，是一个有希望的领域来获得自动数据复制策略。在这项工作中，我们调查了基于强化学习（RL）的数据复制策略和数据扩展。

更新时间: 2024-10-07 11:32:35

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2410.11862v1

Next state prediction gives rise to entangled, yet compositional representations of objects

Compositional representations are thought to enable humans to generalize across combinatorially vast state spaces. Models with learnable object slots, which encode information about objects in separate latent codes, have shown promise for this type of generalization but rely on strong architectural priors. Models with distributed representations, on the other hand, use overlapping, potentially entangled neural codes, and their ability to support compositional generalization remains underexplored. In this paper we examine whether distributed models can develop linearly separable representations of objects, like slotted models, through unsupervised training on videos of object interactions. We show that, surprisingly, models with distributed representations often match or outperform models with object slots in downstream prediction tasks. Furthermore, we find that linearly separable object representations can emerge without object-centric priors, with auxiliary objectives like next-state prediction playing a key role. Finally, we observe that distributed models' object representations are never fully disentangled, even if they are linearly separable: Multiple objects can be encoded through partially overlapping neural populations while still being highly separable with a linear classifier. We hypothesize that maintaining partially shared codes enables distributed models to better compress object dynamics, potentially enhancing generalization.

Updated: 2024-10-07 11:32:17

标题: 下一个状态的预测导致了物体的纠缠，但又组合的表征

摘要: 组成性表示被认为使人类能够在组合庞大的状态空间中进行泛化。具有可学习对象槽的模型，这些槽编码关于对象的信息在不同的潜在代码中，已经显示出在这种泛化方面很有前途，但依赖于强大的架构先验。另一方面，具有分布式表示的模型使用重叠的、潜在纠缠的神经编码，它们支持组成性泛化的能力仍未得到充分探究。在本文中，我们研究分布式模型是否可以通过对对象交互视频进行无监督训练，像有槽模型一样发展线性可分离的对象表示。我们惊讶地发现，具有分布式表示的模型在下游预测任务中经常与或胜过具有对象槽的模型。此外，我们发现线性可分离的对象表示可以在没有对象中心先验的情况下出现，辅助目标如下一个状态的预测起着关键作用。最后，我们观察到，分布式模型的对象表示永远不会完全分离，即使它们是线性可分离的：多个对象可以通过部分重叠的神经群体进行编码，同时仍然可以通过线性分类器高度分离。我们假设保持部分共享的代码使分布式模型能够更好地压缩对象动态，从而增强泛化能力。

更新时间: 2024-10-07 11:32:17

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.04940v1

Training Interactive Agent in Large FPS Game Map with Rule-enhanced Reinforcement Learning

In the realm of competitive gaming, 3D first-person shooter (FPS) games have gained immense popularity, prompting the development of game AI systems to enhance gameplay. However, deploying game AI in practical scenarios still poses challenges, particularly in large-scale and complex FPS games. In this paper, we focus on the practical deployment of game AI in the online multiplayer competitive 3D FPS game called Arena Breakout, developed by Tencent Games. We propose a novel gaming AI system named Private Military Company Agent (PMCA), which is interactable within a large game map and engages in combat with players while utilizing tactical advantages provided by the surrounding terrain. To address the challenges of navigation and combat in modern 3D FPS games, we introduce a method that combines navigation mesh (Navmesh) and shooting-rule with deep reinforcement learning (NSRL). The integration of Navmesh enhances the agent's global navigation capabilities while shooting behavior is controlled using rule-based methods to ensure controllability. NSRL employs a DRL model to predict when to enable the navigation mesh, resulting in a diverse range of behaviors for the game AI. Customized rewards for human-like behaviors are also employed to align PMCA's behavior with that of human players.

Updated: 2024-10-07 11:27:45

标题: 在大型FPS游戏地图中使用规则增强强化学习训练交互式代理

摘要: 在竞技游戏领域，3D第一人称射击（FPS）游戏已经获得了巨大的流行，促使游戏人工智能系统的发展以增强游戏体验。然而，在实际场景中部署游戏人工智能仍然面临挑战，特别是在大规模和复杂的FPS游戏中。本文聚焦于在由腾讯游戏开发的在线多人竞技3D FPS游戏“竞技突围”中实际部署游戏人工智能。我们提出了一种名为私人军事公司代理（PMCA）的新型游戏人工智能系统，该系统可以在大型游戏地图中与玩家互动，并利用周围地形提供的战术优势进行战斗。为了解决现代3D FPS游戏中导航和战斗的挑战，我们引入了一种将导航网格（Navmesh）和射击规则与深度强化学习（NSRL）相结合的方法。Navmesh的整合增强了代理的全局导航能力，同时使用基于规则的方法来控制射击行为以确保可控性。NSRL利用DRL模型来预测何时启用导航网格，从而为游戏人工智能提供多样化的行为。还采用了针对类人行为的定制奖励来使PMCA的行为与人类玩家一致。

更新时间: 2024-10-07 11:27:45

领域: cs.AI

下载: http://arxiv.org/abs/2410.04936v1

The Role of Governments in Increasing Interconnected Post-Deployment Monitoring of AI

Language-based AI systems are diffusing into society, bringing positive and negative impacts. Mitigating negative impacts depends on accurate impact assessments, drawn from an empirical evidence base that makes causal connections between AI usage and impacts. Interconnected post-deployment monitoring combines information about model integration and use, application use, and incidents and impacts. For example, inference time monitoring of chain-of-thought reasoning can be combined with long-term monitoring of sectoral AI diffusion, impacts and incidents. Drawing on information sharing mechanisms in other industries, we highlight example data sources and specific data points that governments could collect to inform AI risk management.

Updated: 2024-10-07 11:24:29

标题: 政府在增加人工智能部署后监测互连性方面的作用

摘要: 基于语言的人工智能系统正在渗透到社会中，带来积极和消极影响。减轻负面影响取决于准确的影响评估，这些评估是从使人工智能使用与影响之间建立因果关系的经验证据基础中得出的。互联后部署监测结合了有关模型整合和使用、应用使用以及事件和影响的信息。例如，思维链推理的推理时间监控可以与长期监测部门人工智能扩散、影响和事件相结合。借鉴其他行业的信息共享机制，我们强调政府可以收集哪些数据源和具体数据点来指导人工智能风险管理。

更新时间: 2024-10-07 11:24:29

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2410.04931v1

Temporal Relational Reasoning of Large Language Models for Detecting Stock Portfolio Crashes

Stock portfolios are often exposed to rare consequential events (e.g., 2007 global financial crisis, 2020 COVID-19 stock market crash), as they do not have enough historical information to learn from. Large Language Models (LLMs) now present a possible tool to tackle this problem, as they can generalize across their large corpus of training data and perform zero-shot reasoning on new events, allowing them to detect possible portfolio crash events without requiring specific training data. However, detecting portfolio crashes is a complex problem that requires more than basic reasoning abilities. Investors need to dynamically process the impact of each new information found in the news articles, analyze the the relational network of impacts across news events and portfolio stocks, as well as understand the temporal context between impacts across time-steps, in order to obtain the overall aggregated effect on the target portfolio. In this work, we propose an algorithmic framework named Temporal Relational Reasoning (TRR). It seeks to emulate the spectrum of human cognitive capabilities used for complex problem-solving, which include brainstorming, memory, attention and reasoning. Through extensive experiments, we show that TRR is able to outperform state-of-the-art solutions on detecting stock portfolio crashes, and demonstrate how each of the proposed components help to contribute to its performance through an ablation study. Additionally, we further explore the possible applications of TRR by extending it to other related complex problems, such as the detection of possible global crisis events in Macroeconomics.

Updated: 2024-10-07 11:15:52

标题: 大型语言模型的时间关系推理用于检测股票投资组合崩盘

摘要: 股票投资组合经常面临罕见的重大事件（例如2007年全球金融危机、2020年COVID-19股市崩盘），因为它们没有足够的历史信息可以学习。大型语言模型（LLMs）现在提供了一个可能的工具来解决这个问题，因为它们可以在其庞大的训练数据语料库上进行泛化，并对新事件进行零-shot推理，从而能够在不需要具体训练数据的情况下检测可能导致投资组合崩盘的事件。然而，检测投资组合崩盘是一个复杂的问题，需要更多的基本推理能力。投资者需要动态处理在新闻文章中找到的每一条新信息的影响，分析新闻事件和投资组合股票之间的影响关系网络，以及理解时间步之间的影响的时间背景，以便获得对目标投资组合的整体聚合效果。在这项工作中，我们提出了一个名为时间关系推理（TRR）的算法框架。它试图模拟用于复杂问题解决的人类认知能力的谱系，包括头脑风暴、记忆、注意力和推理。通过大量实验证明，TRR能够超越最先进的解决方案，检测股票投资组合崩盘，并展示了提议的每个组件如何通过消蚀研究有助于其性能。此外，我们进一步探讨了将TRR扩展到其他相关复杂问题的可能应用，如在宏观经济学中检测可能的全球危机事件。

更新时间: 2024-10-07 11:15:52

领域: q-fin.RM,cs.AI,cs.CL,cs.LG,q-fin.CP

下载: http://arxiv.org/abs/2410.17266v1

GRU-D Characterizes Age-Specific Temporal Missingness in MIMIC-IV

Temporal missingness, defined as unobserved patterns in time series, and its predictive potentials represent an emerging area in clinical machine learning. We trained a gated recurrent unit with decay mechanisms, called GRU-D, for a binary classification between elderly - and young patients. We extracted time series for 5 vital signs from MIMIC-IV as model inputs. GRU-D was evaluated with means of 0.780 AUROC and 0.810 AUPRC on bootstrapped data. Interpreting trained model parameters, we found differences in blood pressure missingness and respiratory rate missingness as important predictors learned by parameterized hidden gated units. We successfully showed how GRU-D can be used to reveal patterns in temporal missingness building the basis of novel research directions.

Updated: 2024-10-07 11:07:16

标题: GRU-D表征MIMIC-IV中特定年龄段的时间缺失

摘要: 时间缺失，定义为时间序列中未观察到的模式及其预测潜力代表了临床机器学习中的新兴领域。我们训练了一种带有衰减机制的门控循环单元，称为GRU-D，用于对老年患者和年轻患者进行二元分类。我们从MIMIC-IV中提取了5个关键生理指标的时间序列作为模型输入。在自举数据上，GRU-D的平均AUROC为0.780，AUPRC为0.810。通过解释训练模型参数，我们发现血压缺失和呼吸频率缺失的差异是由参数化的隐藏门控单元学习到的重要预测因子。我们成功展示了GRU-D如何用于揭示时间缺失中的模式，从而为新的研究方向奠定基础。

更新时间: 2024-10-07 11:07:16

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.05350v1

Defense-as-a-Service: Black-box Shielding against Backdoored Graph Models

With the trend of large graph learning models, business owners tend to employ a model provided by a third party to deliver business services to users. However, these models might be backdoored, and malicious users can submit trigger-embedded inputs to manipulate the model predictions. Current graph backdoor defenses have several limitations: 1) depending on model-related details, 2) requiring additional model fine-tuning, and 3) relying upon extra explainability tools, all of which are infeasible under stringent privacy policies. To address those limitations, we propose GraphProt, which allows resource-constrained business owners to rely on third parties to avoid backdoor attacks on GNN-based graph classifiers. Our GraphProt is model-agnostic and only relies on the input graph. The key insight is to leverage subgraph information for prediction, thereby mitigating backdoor effects induced by triggers. GraphProt comprises two components: clustering-based trigger elimination and robust subgraph ensemble. Specifically, we first propose feature-topology clustering that aims to remove most of the anomalous subgraphs (triggers). Moreover, we design subgraph sampling strategies based on feature-topology clustering to build a robust classifier via majority vote. Experimental results across three backdoor attacks and six benchmark datasets demonstrate that GraphProt significantly reduces the backdoor attack success rate while preserving the model accuracy on regular graph classification tasks.

Updated: 2024-10-07 11:04:38

标题: 服务作为防御：针对带后门图模型的黑盒屏蔽

摘要: 随着大规模图学习模型的趋势，企业所有者倾向于雇佣第三方提供的模型来向用户提供业务服务。然而，这些模型可能会被植入后门，恶意用户可以提交包含触发器的输入来操纵模型预测。目前的图后门防御存在几个局限性：1）依赖于模型相关细节，2）需要额外的模型微调，3）依赖于额外的可解释性工具，这些在严格的隐私政策下是不可行的。为了解决这些局限性，我们提出了GraphProt，它允许资源受限的企业所有者依赖第三方来避免对基于GNN的图分类器的后门攻击。我们的GraphProt是与模型无关的，只依赖于输入图。关键见解是利用子图信息进行预测，从而减轻由触发器引起的后门效应。GraphProt包括两个组件：基于聚类的触发器消除和稳健子图集成。具体而言，我们首先提出了旨在消除大多数异常子图（触发器）的特征拓扑聚类。此外，我们设计了基于特征拓扑聚类的子图采样策略，通过多数投票构建稳健的分类器。在三种后门攻击和六个基准数据集上的实验结果表明，GraphProt显著降低了后门攻击成功率，同时保持了对正常图分类任务的模型准确性。

更新时间: 2024-10-07 11:04:38

领域: cs.LG,cs.AI,cs.CR,F.2.2

下载: http://arxiv.org/abs/2410.04916v1

A Novel Mathematical Framework for Objective Characterization of Ideas through Vector Embeddings in LLM

The demand for innovation in product design necessitates a prolific ideation phase. Conversational AI (CAI) systems that use Large Language Models (LLMs) such as GPT (Generative Pre-trained Transformer) have been shown to be fruitful in augmenting human creativity, providing numerous novel and diverse ideas. Despite the success in ideation quantity, the qualitative assessment of these ideas remains challenging and traditionally reliant on expert human evaluation. This method suffers from limitations such as human judgment errors, bias, and oversight. Addressing this gap, our study introduces a comprehensive mathematical framework for automated analysis to objectively evaluate the plethora of ideas generated by CAI systems and/or humans. This framework is particularly advantageous for novice designers who lack experience in selecting promising ideas. By converting the ideas into higher dimensional vectors and quantitatively measuring the diversity between them using tools such as UMAP, DBSCAN and PCA, the proposed method provides a reliable and objective way of selecting the most promising ideas, thereby enhancing the efficiency of the ideation phase.

Updated: 2024-10-07 11:04:31

标题: 一种新的数学框架：通过LLM中的向量嵌入客观表征思想

摘要: 产品设计创新的需求需要一个多产的构思阶段。使用大型语言模型（LLMs）如GPT（生成式预训练变换器）的会话人工智能（CAI）系统已被证明在增强人类创造力方面取得了成功，提供了许多新颖和多样化的想法。尽管在构思数量方面取得了成功，但对这些想法的质量评估仍然具有挑战性，传统上依赖于专家人工评估。这种方法存在人类判断错误、偏见和疏忽等限制。为填补这一缺口，我们的研究引入了一个全面的数学框架，用于自动分析以客观评估CAI系统和/或人类生成的大量想法。该框架特别有利于缺乏选择有前途想法经验的初学设计师。通过将想法转化为更高维度的向量，并使用诸如UMAP、DBSCAN和PCA等工具定量衡量它们之间的差异，提出的方法提供了一种可靠和客观的选择最有前景的想法的方式，从而提高构思阶段的效率。

更新时间: 2024-10-07 11:04:31

领域: cs.AI,53A45,I.2.7; G.3

下载: http://arxiv.org/abs/2409.07578v2

SoK: Towards Security and Safety of Edge AI

Advanced AI applications have become increasingly available to a broad audience, e.g., as centrally managed large language models (LLMs). Such centralization is both a risk and a performance bottleneck - Edge AI promises to be a solution to these problems. However, its decentralized approach raises additional challenges regarding security and safety. In this paper, we argue that both of these aspects are critical for Edge AI, and even more so, their integration. Concretely, we survey security and safety threats, summarize existing countermeasures, and collect open challenges as a call for more research in this area.

Updated: 2024-10-07 10:52:53

标题: SoK: 边缘人工智能的安全与安全

摘要: 先进的人工智能应用程序已经越来越多地面向广泛的受众，例如作为集中管理的大型语言模型（LLMs）。这种集中化既是一种风险，也是性能瓶颈-边缘人工智能承诺成为这些问题的解决方案。然而，其分散化的方法提出了关于安全性和安全性的额外挑战。在本文中，我们认为这两个方面对于边缘人工智能至关重要，甚至更为重要的是它们的整合。具体来说，我们调查了安全和安全威胁，总结了现有的对策，并收集了开放挑战，呼吁在这一领域开展更多研究。

更新时间: 2024-10-07 10:52:53

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.05349v1

An active learning method for solving competitive multi-agent decision-making and control problems

To identify a stationary action profile for a population of competitive agents, each executing private strategies, we introduce a novel active-learning scheme where a centralized external observer (or entity) can probe the agents' reactions and recursively update simple local parametric estimates of the action-reaction mappings. Under very general working assumptions (not even assuming that a stationary profile exists), sufficient conditions are established to assess the asymptotic properties of the proposed active learning methodology so that, if the parameters characterizing the action-reaction mappings converge, a stationary action profile is achieved. Such conditions hence act also as certificates for the existence of such a profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach.

Updated: 2024-10-07 10:52:47

标题: 一个解决竞争性多智能体决策和控制问题的主动学习方法

摘要: 为了确定一个竞争代理人群体的静态行为配置，每个代理人执行私人策略，我们引入了一种新颖的主动学习方案，其中一个集中式外部观察者（或实体）可以探测代理人的反应，并递归更新简单的局部参数估计动作-反应映射。在非常一般的工作假设下（甚至不假定存在静态配置），建立了足够的条件来评估所提出的主动学习方法的渐近特性，以便如果表征动作-反应映射的参数收敛，则实现了一个静态行为配置。这些条件也因此作为存在这样一个配置的证明。涉及典型的竞争多代理控制和决策问题的大量数值模拟说明了所提出的基于学习的方法的实际有效性。

更新时间: 2024-10-07 10:52:47

领域: eess.SY,cs.LG,cs.MA,cs.SY,math.OC

下载: http://arxiv.org/abs/2212.12561v5

Decomposition Polyhedra of Piecewise Linear Functions

In this paper we contribute to the frequently studied question of how to decompose a continuous piecewise linear (CPWL) function into a difference of two convex CPWL functions. Every CPWL function has infinitely many such decompositions, but for applications in optimization and neural network theory, it is crucial to find decompositions with as few linear pieces as possible. This is a highly challenging problem, as we further demonstrate by disproving a recently proposed approach by Tran and Wang [Minimal representations of tropical rational functions. Algebraic Statistics, 15(1):27-59, 2024]. To make the problem more tractable, we propose to fix an underlying polyhedral complex determining the possible locus of nonlinearity. Under this assumption, we prove that the set of decompositions forms a polyhedron that arises as intersection of two translated cones. We prove that irreducible decompositions correspond to the bounded faces of this polyhedron and minimal solutions must be vertices. We then identify cases with a unique minimal decomposition, and illustrate how our insights have consequences in the theory of submodular functions. Finally, we improve upon previous constructions of neural networks for a given convex CPWL function and apply our framework to obtain results in the nonconvex case.

Updated: 2024-10-07 10:48:36

标题: 分段线性函数的分解多面体

摘要: 在本文中，我们致力于研究如何将连续分段线性（CPWL）函数分解为两个凸CPWL函数的差的问题，这是一个经常研究的问题。每个CPWL函数都有无穷多种这样的分解方式，但在优化和神经网络理论的应用中，找到尽可能少线性片段的分解是至关重要的。这是一个非常具有挑战性的问题，我们通过反驳Tran和Wang最近提出的一种方法[《热带有理函数的最小表示》，代数统计，15(1)：27-59，2024]进一步证明了这一点。为了使问题更易处理，我们建议固定一个确定非线性可能位置的基础多面体复合体。在这个假设下，我们证明了分解的集合形成一个多面体，这个多面体是两个平移锥体的交集。我们证明了不可约分解对应于这个多面体的有界面，而最小解必须是顶点。然后我们确定了具有唯一最小分解的情况，并说明我们的见解对子模函数理论有什么影响。最后，我们改进了以前对于给定凸CPWL函数的神经网络构造，并将我们的框架应用到非凸情况中获得结果。

更新时间: 2024-10-07 10:48:36

领域: math.CO,cs.DM,cs.LG,cs.NE,math.OC

下载: http://arxiv.org/abs/2410.04907v1

Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics

In recent years, simulations of pedestrians using the multi-agent reinforcement learning (MARL) have been studied. This study considered the roads on a grid-world environment, and implemented pedestrians as MARL agents using an echo-state network and the least squares policy iteration method. Under this environment, the ability of these agents to learn to move forward by avoiding other agents was investigated. Specifically, we considered two types of tasks: the choice between a narrow direct route and a broad detour, and the bidirectional pedestrian flow in a corridor. The simulations results indicated that the learning was successful when the density of the agents was not that high.

Updated: 2024-10-07 10:28:24

标题: 多智能体强化学习使用回声状态网络及其在行人动态中的应用

摘要: 近年来，研究了使用多智能体强化学习（MARL）模拟行人的方法。本研究考虑了在一个网格世界环境中的道路，并将行人实现为使用回声状态网络和最小二乘策略迭代方法的MARL代理。在这种环境下，研究了这些代理学习避开其他代理向前移动的能力。具体而言，我们考虑了两种类型的任务：选择狭窄直通路线和宽广绕行之间的选择，以及走廊中的双向行人流动。模拟结果表明，在代理密度不是很高时，学习是成功的。

更新时间: 2024-10-07 10:28:24

领域: cs.MA,cs.AI,cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2312.11834v4

Low-Rank Continual Personalization of Diffusion Models

Recent personalization methods for diffusion models, such as Dreambooth, allow fine-tuning pre-trained models to generate new concepts. However, applying these techniques across multiple tasks in order to include, e.g., several new objects or styles, leads to mutual interference between their adapters. While recent studies attempt to mitigate this issue by combining trained adapters across tasks after fine-tuning, we adopt a more rigorous regime and investigate the personalization of large diffusion models under a continual learning scenario, where such interference leads to catastrophic forgetting of previous knowledge. To that end, we evaluate the na\"ive continual fine-tuning of customized models and compare this approach with three methods for consecutive adapters' training: sequentially merging new adapters, merging orthogonally initialized adapters, and updating only relevant parameters according to the task. In our experiments, we show that the proposed approaches mitigate forgetting when compared to the na\"ive approach.

Updated: 2024-10-07 10:19:09

标题: 低秩持续个性化扩散模型

摘要: 最近个性化扩散模型的方法，如Dreambooth，允许微调预先训练的模型以生成新概念。然而，将这些技术应用于多个任务，以包括多个新对象或风格，会导致它们的适配器之间发生相互干扰。最近的研究试图通过在微调后组合跨任务训练的适配器来缓解这个问题，我们采用更严格的方法，研究大型扩散模型在继续学习的情景下的个性化，其中这种干扰导致了对先前知识的灾难性遗忘。为此，我们评估了定制模型的朴素继续微调，并将此方法与三种连续适配器训练方法进行了比较：顺序合并新适配器，正交初始化适配器合并，以及仅根据任务更新相关参数。在我们的实验中，我们展示了与朴素方法相比，所提出的方法可以减轻遗忘。

更新时间: 2024-10-07 10:19:09

领域: cs.LG

下载: http://arxiv.org/abs/2410.04891v1

ResTNet: Defense against Adversarial Policies via Transformer in Computer Go

Although AlphaZero has achieved superhuman levels in Go, recent research has highlighted its vulnerability in particular situations requiring a more comprehensive understanding of the entire board. To address this challenge, this paper introduces ResTNet, a network that interleaves residual networks and Transformer. Our empirical experiments demonstrate several advantages of using ResTNet. First, it not only improves playing strength but also enhances the ability of global information. Second, it defends against an adversary Go program, called cyclic-adversary, tailor-made for attacking AlphaZero algorithms, significantly reducing the average probability of being attacked rate from 70.44% to 23.91%. Third, it improves the accuracy from 59.15% to 80.01% in correctly recognizing ladder patterns, which are one of the challenging patterns for Go AIs. Finally, ResTNet offers a potential explanation of the decision-making process and can also be applied to other games like Hex. To the best of our knowledge, ResTNet is the first to integrate residual networks and Transformer in the context of AlphaZero for board games, suggesting a promising direction for enhancing AlphaZero's global understanding.

Updated: 2024-10-07 10:17:24

标题: ResTNet：通过Transformer在计算机围棋中对抗对抗性策略

摘要: 虽然AlphaZero在围棋中达到了超人类水平，但最近的研究突显出其在需要更全面理解整个棋盘的特定情况下的脆弱性。为了解决这一挑战，本文介绍了ResTNet，一种将残差网络和Transformer交织在一起的网络。我们的实证实验展示了使用ResTNet的几个优势。首先，它不仅提高了下棋的实力，还增强了对全局信息的能力。其次，它抵御了一种名为循环对手的对手围棋程序，专门针对攻击AlphaZero算法而设计，将平均被攻击率从70.44%降低到23.91%。第三，它在正确识别阶梯模式方面的准确率从59.15%提高到80.01%，这是围棋AI面临的挑战性模式之一。最后，ResTNet提供了对决策过程的潜在解释，并且还可以应用于其他游戏如Hex。据我们所知，ResTNet是首个在AlphaZero的棋盘游戏背景下将残差网络和Transformer整合起来的网络，为增强AlphaZero的全局理解提供了一个有前途的方向。

更新时间: 2024-10-07 10:17:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05347v1

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a highly symmetric geometric structure referred to as neural collapse. This empirical evidence has spurred a line of theoretical research aimed at proving the emergence of neural collapse, mostly focusing on the unconstrained features model. Here, the features of the penultimate layer are free variables, which makes the model data-agnostic and, hence, puts into question its ability to capture DNN training. Our work addresses the issue, moving away from unconstrained features and studying DNNs that end with at least two linear layers. We first prove generic guarantees on neural collapse that assume (i) low training error and balancedness of the linear layers (for within-class variability collapse), and (ii) bounded conditioning of the features before the linear part (for orthogonality of class-means, as well as their alignment with weight matrices). We then show that such assumptions hold for gradient descent training with weight decay: (i) for networks with a wide first layer, we prove low training error and balancedness, and (ii) for solutions that are either nearly optimal or stable under large learning rates, we additionally prove the bounded conditioning. Taken together, our results are the first to show neural collapse in the end-to-end training of DNNs.

Updated: 2024-10-07 10:16:40

标题: 广泛使用权重衰减训练的神经网络可证明出现神经元坍塌

摘要: 深度神经网络（DNNs）在收敛时一致地通过高度对称的几何结构在最后一层代表训练数据，这种现象被称为神经坍塌。这种实证证据引发了一系列旨在证明神经坍塌出现的理论研究，主要集中在无约束特征模型上。在这种模型中，倒数第二层的特征是自由变量，使得模型与数据无关，因此，对其捕捉DNN训练的能力产生了疑问。我们的工作解决了这个问题，摆脱了无约束特征，研究以至少两个线性层结尾的DNNs。我们首先在神经坍塌上提供了一般性保证，假设（i）低训练误差和线性层的平衡性（用于类内变异性坍塌），以及（ii）线性部分之前特征的有界条件（用于类均值的正交性，以及它们与权重矩阵的对齐性）。然后我们展示了这些假设对于带有权重衰减的梯度下降训练是成立的：（i）对于具有宽第一层的网络，我们证明了低训练误差和平衡性，以及（ii）对于接近最优或在大学习率下稳定的解，我们另外证明了有界条件。综合来看，我们的结果首次展示了DNN的端到端训练中的神经坍塌。

更新时间: 2024-10-07 10:16:40

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.04887v1

A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches

Task and Motion Planning (TAMP) integrates high-level task planning and low-level motion planning to equip robots with the autonomy to effectively reason over long-horizon, dynamic tasks. Optimization-based TAMP focuses on hybrid optimization approaches that define goal conditions via objective functions and are capable of handling open-ended goals, robotic dynamics, and physical interaction between the robot and the environment. Therefore, optimization-based TAMP is particularly suited to solve highly complex, contact-rich locomotion and manipulation problems. This survey provides a comprehensive review on optimization-based TAMP, covering (i) planning domain representations, including action description languages and temporal logic, (ii) individual solution strategies for components of TAMP, including AI planning and trajectory optimization (TO), and (iii) the dynamic interplay between logic-based task planning and model-based TO. A particular focus of this survey is to highlight the algorithm structures to efficiently solve TAMP, especially hierarchical and distributed approaches. Additionally, the survey emphasizes the synergy between the classical methods and contemporary learning-based innovations such as large language models. Furthermore, the future research directions for TAMP is discussed in this survey, highlighting both algorithmic and application-specific challenges.

Updated: 2024-10-07 10:09:16

标题: 基于优化的任务和动作规划综述：从经典到学习方法

摘要: 任务与动作规划（TAMP）整合了高层任务规划和低层运动规划，为机器人提供了有效推理长期动态任务的自主性。基于优化的TAMP专注于定义目标条件的混合优化方法，通过客观函数来处理开放式目标、机器人动态和机器人与环境之间的物理交互。因此，基于优化的TAMP特别适用于解决高度复杂、接触丰富的运动和操作问题。本综述提供了关于基于优化的TAMP的全面回顾，涵盖了（i）规划领域表示，包括行为描述语言和时间逻辑，（ii）TAMP组件的个体解决策略，包括人工智能规划和轨迹优化（TO），以及（iii）基于逻辑的任务规划和基于模型的TO之间的动态相互作用。本综述的一个特别关注点是突出算法结构，以高效解决TAMP问题，特别是层次化和分布式方法。此外，综述强调了传统方法与当代学习型创新（如大型语言模型）之间的协同作用。此外，本综述讨论了TAMP的未来研究方向，突出了算法和应用特定挑战。

更新时间: 2024-10-07 10:09:16

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.02817v5

Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

Visual language pre-training (VLP) models have demonstrated significant success across various domains, yet they remain vulnerable to adversarial attacks. Addressing these adversarial vulnerabilities is crucial for enhancing security in multimodal learning. Traditionally, adversarial methods targeting VLP models involve simultaneously perturbing images and text. However, this approach faces notable challenges: first, adversarial perturbations often fail to translate effectively into real-world scenarios; second, direct modifications to the text are conspicuously visible. To overcome these limitations, we propose a novel strategy that exclusively employs image patches for attacks, thus preserving the integrity of the original text. Our method leverages prior knowledge from diffusion models to enhance the authenticity and naturalness of the perturbations. Moreover, to optimize patch placement and improve the efficacy of our attacks, we utilize the cross-attention mechanism, which encapsulates intermodal interactions by generating attention maps to guide strategic patch placements. Comprehensive experiments conducted in a white-box setting for image-to-text scenarios reveal that our proposed method significantly outperforms existing techniques, achieving a 100% attack success rate. Additionally, it demonstrates commendable performance in transfer tasks involving text-to-image configurations.

Updated: 2024-10-07 10:06:01

标题: 足够的贴片：自然对抗贴片针对视觉-语言预训练模型

摘要: 视觉语言预训练（VLP）模型在各个领域取得了显著的成功，但它们仍然容易受到对抗性攻击的影响。解决这些对抗性漏洞对于增强多模态学习中的安全性至关重要。传统上，针对VLP模型的对抗方法涉及同时扰动图像和文本。然而，这种方法面临显著挑战：首先，对抗性扰动通常无法有效地转化为现实场景；其次，对文本的直接修改显而易见。为了克服这些限制，我们提出了一种独家使用图像补丁进行攻击的新策略，从而保持原始文本的完整性。我们的方法利用扩散模型的先验知识来增强扰动的真实性和自然性。此外，为了优化贴片放置并提高我们攻击的效果，我们利用交叉注意力机制，通过生成注意力映射来指导战略性贴片的放置，从而囊括跨模态交互。在白盒设置下进行的针对图像到文本情景的全面实验表明，我们提出的方法明显优于现有技术，实现了100%的攻击成功率。此外，在涉及文本到图像配置的传输任务中表现出良好的性能。

更新时间: 2024-10-07 10:06:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.04884v1

QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing

Multi-task reinforcement learning (MTRL) aims to learn several tasks simultaneously for better sample efficiency than learning them separately. Traditional methods achieve this by sharing parameters or relabeled data between tasks. In this work, we introduce a new framework for sharing behavioral policies across tasks, which can be used in addition to existing MTRL methods. The key idea is to improve each task's off-policy data collection by employing behaviors from other task policies. Selectively sharing helpful behaviors acquired in one task to collect training data for another task can lead to higher-quality trajectories, leading to more sample-efficient MTRL. Thus, we introduce a simple and principled framework called Q-switch mixture of policies (QMP) that selectively shares behavior between different task policies by using the task's Q-function to evaluate and select useful shareable behaviors. We theoretically analyze how QMP improves the sample efficiency of the underlying RL algorithm. Our experiments show that QMP's behavioral policy sharing provides complementary gains over many popular MTRL algorithms and outperforms alternative ways to share behaviors in various manipulation, locomotion, and navigation environments. Videos are available at https://qmp-mtrl.github.io.

Updated: 2024-10-07 10:04:28

标题: QMP：多任务行为共享的Q开关政策混合

摘要: 多任务强化学习(MTRL)旨在同时学习多个任务，以比单独学习它们更高的样本效率。传统方法通过在任务之间共享参数或重新标记数据来实现这一目标。在这项工作中，我们引入了一个新的框架，用于在任务之间共享行为策略，这可以与现有的MTRL方法一起使用。关键思想是通过利用其他任务策略的行为来改进每个任务的离线数据收集。有选择地共享一个任务获得的有用行为，以收集另一个任务的训练数据，可以导致更高质量的轨迹，从而实现更高效的MTRL。因此，我们引入了一个简单而有原则的框架，称为Q-switch混合策略(QMP)，通过使用任务的Q函数来评估和选择有用的可共享行为，在不同的任务策略之间有选择地共享行为。我们在理论上分析了QMP如何提高底层RL算法的样本效率。我们的实验表明，QMP的行为策略共享比许多流行的MTRL算法提供了互补收益，并在各种操纵、移动和导航环境中优于共享行为的替代方法。视频可在https://qmp-mtrl.github.io上查看。

更新时间: 2024-10-07 10:04:28

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2302.00671v2

Improving the Sampling Strategy in KernelSHAP

Shapley values are a popular model-agnostic explanation framework for explaining predictions made by complex machine learning models. The framework provides feature contribution scores that sum to the predicted response and represent each feature's importance. The computation of exact Shapley values is computationally expensive due to estimating an exponential amount of non-trivial conditional expectations. The KernelSHAP framework enables us to approximate the Shapley values using a sampled subset of weighted conditional expectations. We propose three main novel contributions: a stabilizing technique to reduce the variance of the weights in the current state-of-the-art strategy, a novel weighing scheme that corrects the Shapley kernel weights based on sampled subsets, and a straightforward strategy that includes the important subsets and integrates them with the corrected Shapley kernel weights. We compare these new approximation strategies against existing ones by evaluating their Shapley value accuracy as a function of the number of subsets. The results demonstrate that our sampling strategies significantly enhance the accuracy of the approximated Shapley value explanations, making them more reliable in practical applications. This work provides valuable insights and practical recommendations for researchers and practitioners seeking to implement Shapley value-based explainability of their models.

Updated: 2024-10-07 10:02:31

标题: 改进KernelSHAP中的采样策略

摘要: 沙普利值是一种流行的与模型无关的解释框架，用于解释复杂机器学习模型所做的预测。该框架提供特征贡献分数，这些分数总和为预测响应，并表示每个特征的重要性。由于估计了大量非平凡条件期望，准确的沙普利值的计算在计算上非常昂贵。KernelSHAP框架使我们能够使用加权条件期望的抽样子集来近似沙普利值。我们提出了三个主要的新贡献：一种稳定技术，用于减少当前最先进策略中权重的方差，一种校正沙普利核权重的新的加权方案，基于抽样子集，以及一种包含重要子集并将其与校正后的沙普利核权重整合的简单策略。通过评估其沙普利值准确性作为子集数量的函数，我们将这些新的近似策略与现有策略进行比较。结果表明，我们的抽样策略显著提高了近似沙普利值解释的准确性，使其在实际应用中更加可靠。这项工作为寻求实施基于沙普利值的模型解释性的研究人员和从业者提供了宝贵的见解和实际建议。

更新时间: 2024-10-07 10:02:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.04883v1

SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully

Large language models (LLMs) demonstrate great performance in text generation. However, LLMs are still suffering from hallucinations. In this work, we propose an inference-time method, Self-Highlighted Hesitation (SH2), to help LLMs decode more truthfully. SH2 is based on a simple fact rooted in information theory that for an LLM, the tokens predicted with lower probabilities are prone to be more informative than others. Our analysis shows that the tokens assigned with lower probabilities by an LLM are more likely to be closely related to factual information, such as nouns, proper nouns, and adjectives. Therefore, we propose to ''highlight'' the factual information by selecting the tokens with the lowest probabilities and concatenating them to the original context, thus forcing the model to repeatedly read and hesitate on these tokens before generation. During decoding, we also adopt contrastive decoding to emphasize the difference in the output probabilities brought by the hesitation. Experimental results demonstrate that our SH2, requiring no additional data or models, can effectively help LLMs elicit factual knowledge and distinguish hallucinated contexts. Significant and consistent improvements are achieved by SH2 for LLaMA-7b, LLaMA2-7b and Mistral-7b on multiple hallucination tasks.

Updated: 2024-10-07 09:58:48

标题: SH2：自我突出的犹豫有助于您更真实地解码

摘要: 大型语言模型（LLMs）在文本生成方面表现出很好的性能。然而，LLMs 仍然受到幻觉的困扰。在这项工作中，我们提出了一种推理时间方法，Self-Highlighted Hesitation（SH2），以帮助LLMs更真实地解码。SH2基于信息论中的一个简单事实，即对于LLMs来说，通过较低概率预测的标记往往比其他标记更具信息量。我们的分析显示，由LLMs分配较低概率的标记更有可能与事实信息（如名词、专有名词和形容词）密切相关。因此，我们建议通过选择概率最低的标记并将它们连接到原始上下文来“突出显示”事实信息，从而迫使模型在生成之前反复阅读和犹豫这些标记。在解码过程中，我们还采用对比解码来强调由犹豫带来的输出概率的差异。实验证明，我们的SH2无需额外数据或模型，可以有效帮助LLMs引出事实知识并区分幻觉背景。SH2在多个幻觉任务上显著且一致地改善了LLaMA-7b、LLaMA2-7b和Mistral-7b的性能。

更新时间: 2024-10-07 09:58:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.05930v4

Leveraging Grammar Induction for Language Understanding and Generation

Grammar induction has made significant progress in recent years. However, it is not clear how the application of induced grammar could enhance practical performance in downstream tasks. In this work, we introduce an unsupervised grammar induction method for language understanding and generation. We construct a grammar parser to induce constituency structures and dependency relations, which is simultaneously trained on downstream tasks without additional syntax annotations. The induced grammar features are subsequently incorporated into Transformer as a syntactic mask to guide self-attention. We evaluate and apply our method to multiple machine translation tasks and natural language understanding tasks. Our method demonstrates superior performance compared to the original Transformer and other models enhanced with external parsers. Experimental results indicate that our method is effective in both from-scratch and pre-trained scenarios. Additionally, our research highlights the contribution of explicitly modeling the grammatical structure of texts to neural network models.

Updated: 2024-10-07 09:57:59

标题: 利用语法归纳进行语言理解和生成

摘要: 语法归纳在近年来取得了显著进展。然而，目前尚不清楚诱导语法的应用如何能够提升下游任务中的实际性能。在这项工作中，我们介绍了一种用于语言理解和生成的无监督语法归纳方法。我们构建了一个语法解析器，用于诱导组成结构和依赖关系，同时在下游任务上进行训练，无需额外的语法标注。随后，诱导的语法特征被整合到Transformer中作为句法掩码，以指导自注意力。我们评估并应用我们的方法到多个机器翻译任务和自然语言理解任务中。我们的方法表现出优越的性能，相较于原始Transformer和其他增强外部解析器的模型。实验结果表明，我们的方法在从零开始和预训练场景中均有效。此外，我们的研究强调了显式地对文本的语法结构进行建模对神经网络模型的贡献。

更新时间: 2024-10-07 09:57:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.04878v1

DALL-M: Context-Aware Clinical Data Augmentation with LLMs

X-ray images are vital in medical diagnostics, but their effectiveness is limited without clinical context. Radiologists often find chest X-rays insufficient for diagnosing underlying diseases, necessitating comprehensive clinical features and data integration. We present a novel framework to enhance the clinical context through augmentation techniques with clinical tabular data, thereby improving its applicability and reliability in AI medical diagnostics. We introduce a pioneering approach to clinical data augmentation that employs large language models to generate patient contextual synthetic data. This methodology is crucial for training more robust deep learning models in healthcare. It preserves the integrity of real patient data while enriching the dataset with contextually relevant synthetic features, significantly enhancing model performance. Our methodology, termed DALL-M, uses a three-phase feature generation process: (i)clinical context storage, (ii)expert query generation, and (iii)context-aware feature augmentation. DALL-M generates new, clinically relevant features by synthesizing chest X-ray images and reports. Applied to 799 cases using nine features from the MIMIC-IV dataset, it created an augmented set of 91 features. This is the first work to generate contextual values for patients' X-ray reports. Specifically, we provide (i)the capacity of LLMs to generate contextual synthetic values for existing clinical features and (ii)their ability to create entirely new clinically relevant features. Empirical validation with machine learning models showed significant performance improvements. Incorporating augmented features increased the F1 score by 16.5% and Precision and Recall by approximately 25%. DALL-M addresses a critical gap in clinical data augmentation, offering a robust framework for generating contextually enriched datasets.

Updated: 2024-10-07 09:51:46

标题: DALL-M：具有LLMs的上下文感知临床数据增强

摘要: X射线图像在医学诊断中至关重要，但在没有临床背景的情况下其有效性受到限制。放射科医师常常发现胸部X射线图像无法诊断潜在疾病，需要全面的临床特征和数据整合。我们提出了一个新颖的框架，通过与临床表格数据的增强技术来增强临床背景，从而提高其在AI医学诊断中的适用性和可靠性。我们引入了一种开创性的临床数据增强方法，利用大型语言模型生成患者上下文合成数据。这种方法对于在医疗保健领域训练更强大的深度学习模型至关重要。它在保留真实患者数据完整性的同时，通过增加具有相关背景的合成特征，显著提高了模型性能。我们的方法名为DALL-M，采用三阶段特征生成过程：（i）临床背景存储，（ii）专家查询生成和（iii）上下文感知特征增强。DALL-M通过合成胸部X射线图像和报告生成新的、临床相关的特征。在MIMIC-IV数据集的9个特征中应用于799例病例，它生成了一个包含91个特征的增强集。这是第一项为患者X射线报告生成上下文值的工作。具体而言，我们提供了（i）LLMs生成现有临床特征的上下文合成值的能力和（ii）它们创建全新的临床相关特征的能力。通过机器学习模型的实证验证显示出显著的性能改进。增加增强特征使F1分数提高了16.5％，精确度和召回率大约提高了25％。DALL-M解决了临床数据增强中的一个关键缺口，为生成上下文丰富的数据集提供了一个强大的框架。

更新时间: 2024-10-07 09:51:46

领域: cs.AI,cs.IR,cs.LG,I.5.1; J.3; H.3.3; I.2.7

下载: http://arxiv.org/abs/2407.08227v2

CBF-LLM: Safe Control for LLM Alignment

This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the safety filter, designed based on the CBF, to the output generation of the baseline LLM, i.e., the sequence of the token, with the aim of intervening in the generated text. The overall text-generation system is implemented with Llama 3 and a RoBERTa model, and the source code is available at https://github.com/Mya-Mya/CBF-LLM. The experiment demonstrates its control ability and effectiveness in reducing the number of interventions needed for user-specified alignment tasks.

Updated: 2024-10-07 09:49:08

标题: CBF-LLM: LLM对齐的安全控制

摘要: 这篇论文提出了一种基于控制的框架，用于通过利用控制障碍函数（CBF）来对齐大型语言模型（LLMs），以确保用户期望的文本生成。所提出的框架将基于CBF设计的安全过滤器应用于基准LLM的输出生成，即令牌序列，旨在干预生成的文本。整个文本生成系统使用Llama 3和RoBERTa模型实现，源代码可在https://github.com/Mya-Mya/CBF-LLM 上找到。实验证明了它在减少用户指定对齐任务所需干预次数方面的控制能力和有效性。

更新时间: 2024-10-07 09:49:08

领域: eess.SY,cs.AI,cs.CL,cs.SY

下载: http://arxiv.org/abs/2408.15625v2

Classification of All Blood Cell Images using ML and DL Models

Human blood primarily comprises plasma, red blood cells, white blood cells, and platelets. It plays a vital role in transporting nutrients to different organs, where it stores essential health-related data about the human body. Blood cells are utilized to defend the body against diverse infections, including fungi, viruses, and bacteria. Hence, blood analysis can help physicians assess an individual's physiological condition. Blood cells have been sub-classified into eight groups: Neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes (promyelocytes, myelocytes, and metamyelocytes), erythroblasts, and platelets or thrombocytes on the basis of their nucleus, shape, and cytoplasm. Traditionally, pathologists and hematologists in laboratories have examined these blood cells using a microscope before manually classifying them. The manual approach is slower and more prone to human error. Therefore, it is essential to automate this process. In our paper, transfer learning with CNN pre-trained models. VGG16, VGG19, ResNet-50, ResNet-101, ResNet-152, InceptionV3, MobileNetV2, and DenseNet-20 applied to the PBC dataset's normal DIB. The overall accuracy achieved with these models lies between 91.375 and 94.72%. Hence, inspired by these pre-trained architectures, a model has been proposed to automatically classify the ten types of blood cells with increased accuracy. A novel CNN-based framework has been presented to improve accuracy. The proposed CNN model has been tested on the PBC dataset normal DIB. The outcomes of the experiments demonstrate that our CNN-based framework designed for blood cell classification attains an accuracy of 99.91% on the PBC dataset. Our proposed convolutional neural network model performs competitively when compared to earlier results reported in the literature.

Updated: 2024-10-07 09:48:14

标题: 使用机器学习和深度学习模型对所有血细胞图像进行分类

摘要: 人类血液主要包括血浆、红细胞、白细胞和血小板。它在将营养物质运送到不同器官方面发挥着至关重要的作用，同时也存储着有关人体健康的重要数据。血细胞被用于抵御多种感染，包括真菌、病毒和细菌。因此，血液分析可以帮助医生评估个体的生理状况。血细胞根据其细胞核、形状和细胞质被细分为八组：中性粒细胞、嗜酸性粒细胞、嗜碱性粒细胞、淋巴细胞、单核细胞、未成熟粒细胞（原始髓细胞、粒细胞和幼粒细胞）、红细胞和血小板或血栓细胞。传统上，实验室的病理学家和血液学家在手动分类之前会使用显微镜检查这些血细胞。手动方法速度较慢，且更容易出现人为错误。因此，自动化这一过程至关重要。在我们的论文中，使用CNN预训练模型进行迁移学习。VGG16、VGG19、ResNet-50、ResNet-101、ResNet-152、InceptionV3、MobileNetV2和DenseNet-20应用于PBC数据集的正常DIB。这些模型的整体准确率在91.375%至94.72%之间。因此，受这些预训练架构的启发，提出了一个模型来自动分类十种血细胞，准确率提高了。提出了一个基于CNN的新颖框架来提高准确性。提出的CNN模型已在PBC数据集正常DIB上进行了测试。实验结果表明，我们为血细胞分类设计的基于CNN的框架在PBC数据集上取得了99.91%的准确率。与文献中早期报道的结果相比，我们提出的卷积神经网络模型表现出色。

更新时间: 2024-10-07 09:48:14

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2308.06300v3

A Framework for Pupil Tracking with Event Cameras

Saccades are extremely rapid movements of both eyes that occur simultaneously, typically observed when an individual shifts their focus from one object to another. These movements are among the swiftest produced by humans and possess the potential to achieve velocities greater than that of blinks. The peak angular speed of the eye during a saccade can reach as high as 700{\deg}/s in humans, especially during larger saccades that cover a visual angle of 25{\deg}. Previous research has demonstrated encouraging outcomes in comprehending neurological conditions through the study of saccades. A necessary step in saccade detection involves accurately identifying the precise location of the pupil within the eye, from which additional information such as gaze angles can be inferred. Conventional frame-based cameras often struggle with the high temporal precision necessary for tracking very fast movements, resulting in motion blur and latency issues. Event cameras, on the other hand, offer a promising alternative by recording changes in the visual scene asynchronously and providing high temporal resolution and low latency. By bridging the gap between traditional computer vision and event-based vision, we present events as frames that can be readily utilized by standard deep learning algorithms. This approach harnesses YOLOv8, a state-of-the-art object detection technology, to process these frames for pupil tracking using the publicly accessible Ev-Eye dataset. Experimental results demonstrate the framework's effectiveness, highlighting its potential applications in neuroscience, ophthalmology, and human-computer interaction.

Updated: 2024-10-07 09:46:07

标题: 一种基于事件相机的瞳孔追踪框架

摘要: Saccades are extremely rapid movements of both eyes that occur simultaneously, typically observed when an individual shifts their focus from one object to another. These movements are among the swiftest produced by humans and possess the potential to achieve velocities greater than that of blinks. The peak angular speed of the eye during a saccade can reach as high as 700 degrees per second in humans, especially during larger saccades that cover a visual angle of 25 degrees. Previous research has demonstrated encouraging outcomes in comprehending neurological conditions through the study of saccades. A necessary step in saccade detection involves accurately identifying the precise location of the pupil within the eye, from which additional information such as gaze angles can be inferred. Conventional frame-based cameras often struggle with the high temporal precision necessary for tracking very fast movements, resulting in motion blur and latency issues. Event cameras, on the other hand, offer a promising alternative by recording changes in the visual scene asynchronously and providing high temporal resolution and low latency. By bridging the gap between traditional computer vision and event-based vision, we present events as frames that can be readily utilized by standard deep learning algorithms. This approach harnesses YOLOv8, a state-of-the-art object detection technology, to process these frames for pupil tracking using the publicly accessible Ev-Eye dataset. Experimental results demonstrate the framework's effectiveness, highlighting its potential applications in neuroscience, ophthalmology, and human-computer interaction.

更新时间: 2024-10-07 09:46:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.16665v2

AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models

Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial attacks, particularly targeted adversarial images that manipulate the model to generate harmful content specified by the adversary. Current attack methods rely on predefined target labels to create targeted adversarial attacks, which limits their scalability and applicability for large-scale robustness evaluations. In this paper, we propose AnyAttack, a self-supervised framework that generates targeted adversarial images for VLMs without label supervision, allowing any image to serve as a target for the attack. To address the limitation of existing methods that require label supervision, we introduce a contrastive loss that trains a generator on a large-scale unlabeled image dataset, LAION-400M dataset, for generating targeted adversarial noise. This large-scale pre-training endows our method with powerful transferability across a wide range of VLMs. Extensive experiments on five mainstream open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, and MiniGPT-4) across three multimodal tasks (image-text retrieval, multimodal classification, and image captioning) demonstrate the effectiveness of our attack. Additionally, we successfully transfer AnyAttack to multiple commercial VLMs, including Google's Gemini, Claude's Sonnet, and Microsoft's Copilot. These results reveal an unprecedented risk to VLMs, highlighting the need for effective countermeasures.

Updated: 2024-10-07 09:45:18

标题: AnyAttack：面向视觉语言模型的大规模自监督生成定向对抗样本

摘要: 由于其多模态能力，视觉语言模型（VLMs）在现实世界场景中找到了许多重要的应用。然而，最近的研究表明，VLMs容易受到基于图像的对抗攻击的影响，特别是针对性的对抗图像，可以操纵模型生成对手指定的有害内容。目前的攻击方法依赖于预定义的目标标签来创建有针对性的对抗攻击，这限制了它们在大规模鲁棒性评估中的可扩展性和适用性。在本文中，我们提出了AnyAttack，一个自监督框架，用于为VLMs生成有针对性的对抗图像，无需标签监督，允许任何图像作为攻击的目标。为了解决现有方法需要标签监督的限制，我们引入了一个对比损失，对一个大规模未标记的图像数据集LAION-400M数据集进行了训练，用于生成有针对性的对抗噪声。这种大规模预训练赋予了我们的方法在各种VLMs中具有强大的可迁移性。对五个主流开源VLMs（CLIP、BLIP、BLIP2、InstructBLIP和MiniGPT-4）进行了广泛实验，在三个多模态任务（图像-文本检索、多模态分类和图像字幕）中展示了我们攻击的有效性。此外，我们成功地将AnyAttack转移到多个商业VLMs，包括谷歌的Gemini、克劳德的Sonnet和微软的Copilot。这些结果揭示了VLMs面临的前所未有的风险，凸显了对有效对策的需求。

更新时间: 2024-10-07 09:45:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.05346v1

Odyssey: Empowering Minecraft Agents with Open-World Skills

Recent studies have delved into constructing generalist agents for open-world environments like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic programmatic tasks, e.g., material collection and tool-crafting following the Minecraft tech-tree, treating the ObtainDiamond task as the ultimate goal. This limitation stems from the narrowly defined set of actions available to agents, requiring them to learn effective long-horizon strategies from scratch. Consequently, discovering diverse gameplay opportunities in the open world becomes challenging. In this work, we introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world. Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills. (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki. (3) A new agent capability benchmark includes the long-term planning task, the dynamic-immediate planning task, and the autonomous exploration task. Extensive experiments demonstrate that the proposed Odyssey framework can effectively evaluate different capabilities of LLM-based agents. All datasets, model weights, and code are publicly available to motivate future research on more advanced autonomous agent solutions.

Updated: 2024-10-07 09:40:07

标题: 《奥德赛：赋予Minecraft代理开放世界技能》

摘要: 最近的研究已经开始构建通用代理程序，用于像Minecraft这样的开放世界环境。尽管结果令人鼓舞，现有的努力主要集中在解决基本的编程任务，例如按照Minecraft技术树进行材料收集和工具制作，将获取钻石的任务视为最终目标。这种局限性源于代理程序可用的行动集合定义狭窄，要求它们从零开始学习有效的长期策略。因此，在开放世界中发现各种游戏机会变得具有挑战性。在这项工作中，我们介绍了Odyssey，一个新的框架，为基于大型语言模型（LLM）的代理程序赋予探索广阔Minecraft世界的开放世界技能。Odyssey包括三个关键部分：（1）一个交互式代理程序，具有一个包含40个原始技能和183个组合技能的开放世界技能库。（2）一个在大型问答数据集上进行微调的LLaMA-3模型，该数据集包含来自Minecraft Wiki的390k+指令条目。（3）一个新的代理能力基准包括长期规划任务、动态即时规划任务和自主探索任务。广泛的实验证明，所提出的Odyssey框架可以有效评估基于LLM的代理程序的不同能力。所有数据集、模型权重和代码都可以公开获取，以激励未来研究更高级的自主代理解决方案。

更新时间: 2024-10-07 09:40:07

领域: cs.AI

下载: http://arxiv.org/abs/2407.15325v2

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

The Adam optimizer is widely used for transformer optimization in practice, which makes understanding the underlying optimization mechanisms an important problem. However, due to the Adam's complexity, theoretical analysis of how it optimizes transformers remains a challenging task. Fortunately, Sign Gradient Descent (SignGD) serves as an effective surrogate for Adam. Despite its simplicity, theoretical understanding of how SignGD optimizes transformers still lags behind. In this work, we study how SignGD optimizes a two-layer transformer -- consisting of a softmax attention layer with trainable query-key parameterization followed by a linear layer -- on a linearly separable noisy dataset. We identify four stages in the training dynamics, each exhibiting intriguing behaviors. Based on the training dynamics, we prove the fast convergence but poor generalization of the learned transformer on the noisy dataset. We also show that Adam behaves similarly to SignGD in terms of both optimization and generalization in this setting. Additionally, we find that the poor generalization of SignGD is not solely due to data noise, suggesting that both SignGD and Adam requires high-quality data for real-world tasks. Finally, experiments on synthetic and real-world datasets empirically support our theoretical results.

Updated: 2024-10-07 09:36:43

标题: 关于使用符号梯度下降优化和泛化两层Transformer模型的研究

摘要: Adam优化器在实践中被广泛用于transformer的优化，这使得理解其潜在的优化机制成为一个重要问题。然而，由于Adam的复杂性，对其如何优化transformer进行理论分析仍然是一个具有挑战性的任务。幸运的是，符号梯度下降（SignGD）作为Adam的有效替代品。尽管SignGD简单，但对其如何优化transformer的理论理解仍然落后。在这项工作中，我们研究了SignGD如何优化一个包含softmax注意力层（具有可训练的查询-键参数化）和线性层的两层transformer，在一个线性可分的嘈杂数据集上。我们确定了训练动态中的四个阶段，每个阶段都展示出有趣的行为。基于训练动态，我们证明了在嘈杂数据集上学习的transformer的快速收敛但泛化性差。我们还展示了在这种情况下，Adam在优化和泛化方面与SignGD表现类似。此外，我们发现SignGD的泛化性不佳不仅仅是由于数据噪声，这表明SignGD和Adam在实际任务中都需要高质量的数据。最后，对合成和真实数据集的实验从经验上支持我们的理论结果。

更新时间: 2024-10-07 09:36:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.04870v1

Federated brain tumor segmentation: an extensive benchmark

Recently, federated learning has raised increasing interest in the medical image analysis field due to its ability to aggregate multi-center data with privacy-preserving properties. A large amount of federated training schemes have been published, which we categorize into global (one final model), personalized (one model per institution) or hybrid (one model per cluster of institutions) methods. However, their applicability on the recently published Federated Brain Tumor Segmentation 2022 dataset has not been explored yet. We propose an extensive benchmark of federated learning algorithms from all three classes on this task. While standard FedAvg already performs very well, we show that some methods from each category can bring a slight performance improvement and potentially limit the final model(s) bias toward the predominant data distribution of the federation. Moreover, we provide a deeper understanding of the behaviour of federated learning on this task through alternative ways of distributing the pooled dataset among institutions, namely an Independent and Identical Distributed (IID) setup, and a limited data setup.

Updated: 2024-10-07 09:32:19

标题: 联邦式脑肿瘤分割：一个广泛的基准测试

摘要: 最近，由于联邦学习具有聚合多中心数据并具有隐私保护属性的能力，它在医学图像分析领域引起了越来越多的关注。已经发表了大量的联邦训练方案，我们将其分类为全局（一个最终模型）、个性化（每个机构一个模型）或混合（每个机构集群一个模型）方法。然而，它们在最近发布的2022年《联邦脑肿瘤分割》数据集上的适用性尚未被探索。我们提出在这项任务上对来自所有三类的联邦学习算法进行广泛基准测试。尽管标准的FedAvg表现已经非常出色，但我们展示了每个类别中的一些方法可以带来轻微的性能改进，并潜在地限制最终模型对联邦的主导数据分布的偏见。此外，我们通过在机构之间分发汇总数据集的替代方式，即独立和相同分布（IID）设置和有限数据设置，提供了对这项任务中联邦学习行为的更深入理解。

更新时间: 2024-10-07 09:32:19

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2410.17265v1

Mastering Chinese Chess AI (Xiangqi) Without Search

We have developed a high-performance Chinese Chess AI that operates without reliance on search algorithms. This AI has demonstrated the capability to compete at a level commensurate with the top 0.1\% of human players. By eliminating the search process typically associated with such systems, this AI achieves a Queries Per Second (QPS) rate that exceeds those of systems based on the Monte Carlo Tree Search (MCTS) algorithm by over a thousandfold and surpasses those based on the AlphaBeta pruning algorithm by more than a hundredfold. The AI training system consists of two parts: supervised learning and reinforcement learning. Supervised learning provides an initial human-like Chinese chess AI, while reinforcement learning, based on supervised learning, elevates the strength of the entire AI to a new level. Based on this training system, we carried out enough ablation experiments and discovered that 1. The same parameter amount of Transformer architecture has a higher performance than CNN on Chinese chess; 2. Possible moves of both sides as features can greatly improve the training process; 3. Selective opponent pool, compared to pure self-play training, results in a faster improvement curve and a higher strength limit. 4. Value Estimation with Cutoff(VECT) improves the original PPO algorithm training process and we will give the explanation.

Updated: 2024-10-07 09:27:51

标题: 掌握象棋人工智能（中国象棋）不依赖搜索

摘要: 我们开发了一个高性能的中国象棋人工智能系统，它不依赖于搜索算法。这个人工智能系统已经展示出可以与顶级0.1%的人类玩家竞争的能力。通过消除通常与这类系统相关的搜索过程，这个人工智能系统实现了每秒查询率（QPS），超过了基于蒙特卡洛树搜索（MCTS）算法的系统一千倍以上，并且超过了基于AlphaBeta修剪算法的系统一百倍以上。这个人工智能训练系统由两部分组成：监督学习和强化学习。监督学习提供了一个初始的类人类的中国象棋人工智能，而基于监督学习的强化学习则将整个人工智能的强度提升到一个新的水平。基于这个训练系统，我们进行了足够多的消融实验，并发现了以下几点：1. 相同数量的Transformer架构参数在中国象棋上比CNN有更高的性能；2. 作为特征的双方可能的走法可以极大地改善训练过程；3. 选择对手池，与纯自我对弈训练相比，会产生更快的改进曲线和更高的实力上限；4. 带有截断值估计（VECT）的价值估算改进了原始的PPO算法训练过程，我们将给出解释。

更新时间: 2024-10-07 09:27:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.04865v1

AQMLator -- An Auto Quantum Machine Learning E-Platform

A successful Machine Learning (ML) model implementation requires three main components: training dataset, suitable model architecture and training procedure. Given dataset and task, finding an appropriate model might be challenging. AutoML, a branch of ML, focuses on automatic architecture search -- a meta method that aims at moving human from ML system design process. The success of ML and the development of quantum computing (QC) in recent years led to a birth of new fascinating field called Quantum Machine Learning (QML) that, amongst others, incorporates quantum computers into ML models. In this paper we present AQMLator, an Auto Quantum Machine Learning platform that aims to automatically propose and train the quantum layers of an ML model with minimal input from the user. This way, data scientists can bypass the entry barrier for QC and use QML. AQMLator uses standard ML libraries, making it easy to introduce into existing ML pipelines.

Updated: 2024-10-07 09:20:59

标题: AQMLator -- 一种自动量子机器学习电子平台

摘要: 一个成功的机器学习（ML）模型实现需要三个主要组成部分：训练数据集、合适的模型架构和训练程序。鉴于数据集和任务，找到一个合适的模型可能具有挑战性。AutoML是ML的一个分支，专注于自动架构搜索——这是一个旨在将人类从ML系统设计过程中移出的元方法。近年来ML的成功和量子计算（QC）的发展导致了一个新的令人着迷的领域的诞生，称为量子机器学习（QML），其中包括量子计算机在ML模型中。在本文中，我们介绍了AQMLator，一个旨在自动提出并训练ML模型的量子层的Auto Quantum Machine Learning平台，用户的输入最小。这样，数据科学家可以绕过QC的入门障碍并使用QML。AQMLator使用标准的ML库，使其易于引入现有的ML管道中。

更新时间: 2024-10-07 09:20:59

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2409.18338v3

Radio Map Prediction from Aerial Images and Application to Coverage Optimization

In recent years, several studies have explored deep learning algorithms to predict large-scale signal fading, or path loss, in urban communication networks. The goal is to replace costly measurement campaigns, inaccurate statistical models, or computationally expensive ray-tracing simulations with machine learning models that deliver quick and accurate predictions. We focus on predicting path loss radio maps using convolutional neural networks, leveraging aerial images alone or in combination with supplementary height information. Notably, our approach does not rely on explicit classification of environmental objects, which is often unavailable for most locations worldwide. While the prediction of radio maps using complete 3D environmental data is well-studied, the use of only aerial images remains under-explored. We address this gap by showing that state-of-the-art models developed for existing radio map datasets can be effectively adapted to this task, achieving strong performance. Additionally, we introduce a new model that slightly exceeds the performance of the present state-of-the-art with reduced complexity. The trained models are differentiable, and therefore they can be incorporated in various network optimization algorithms. While an extensive discussion is beyond this paper's scope, we demonstrate this through an example optimizing the directivity of base stations in cellular networks via backpropagation to enhance coverage.

Updated: 2024-10-07 09:19:20

标题: 用航拍图像预测无线电地图并应用于覆盖优化

摘要: 近年来，许多研究已探索了深度学习算法，用于预测城市通信网络中的大规模信号衰落或路径损耗。其目标是用可以快速准确预测的机器学习模型取代昂贵的测量活动、不准确的统计模型或计算昂贵的射线跟踪模拟。我们专注于使用卷积神经网络预测路径损耗无线电地图，利用航空图像单独或与补充高度信息结合。值得注意的是，我们的方法不依赖于环境对象的明确分类，这在全球大多数地点通常无法获得。虽然使用完整的三维环境数据预测无线电地图已经得到广泛研究，但仅使用航空图像的情况尚未得到充分探讨。我们通过展示为现有无线电地图数据集开发的最新模型可以有效地适应这一任务来填补这一空白，从而取得了强大的性能。此外，我们引入了一种新模型，其性能略高于目前的最新技术，并且复杂度更低。训练的模型是可微的，因此它们可以被整合到各种网络优化算法中。尽管本文的讨论范围有限，但我们通过一个优化通过反向传播增强覆盖范围的蜂窝网络基站指向性的示例来展示这一点。

更新时间: 2024-10-07 09:19:20

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2410.17264v1

Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation

Learning skills that interact with objects is of major importance for robotic manipulation. These skills can indeed serve as an efficient prior for solving various manipulation tasks. We propose a novel Skill Learning approach that discovers composable behaviors by solving a large and diverse number of autonomously generated tasks. Our method learns skills allowing the robot to consistently and robustly interact with objects in its environment. The discovered behaviors are embedded in primitives which can be composed with Hierarchical Reinforcement Learning to solve unseen manipulation tasks. In particular, we leverage Asymmetric Self-Play to discover behaviors and Multiplicative Compositional Policies to embed them. We compare our method to Skill Learning baselines and find that our skills are more interactive. Furthermore, the learned skills can be used to solve a set of unseen manipulation tasks, in simulation as well as on a real robotic platform.

Updated: 2024-10-07 09:19:13

标题: 无监督技能发现用于机器人操作的自动生成任务

摘要: 学习与对象进行交互的技能对于机器人操作至关重要。这些技能确实可以作为解决各种操纵任务的有效先验。我们提出了一种新颖的技能学习方法，通过解决大量和多样化的自动生成任务来发现可组合的行为。我们的方法学习了让机器人能够在其环境中始终和稳健地与对象进行交互的技能。发现的行为被嵌入原语中，可以与分层强化学习结合起来解决未见过的操纵任务。特别地，我们利用非对称自我博弈来发现行为，并利用乘法组合策略来嵌入它们。我们将我们的方法与技能学习基线进行比较，并发现我们的技能更具互动性。此外，学习到的技能可以用来解决一组未见过的操纵任务，无论是在模拟环境中还是在真实的机器人平台上。

更新时间: 2024-10-07 09:19:13

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.04855v1

TimeCNN: Refining Cross-Variable Interaction on Time Point for Time Series Forecasting

Time series forecasting is extensively applied across diverse domains. Transformer-based models demonstrate significant potential in modeling cross-time and cross-variable interaction. However, we notice that the cross-variable correlation of multivariate time series demonstrates multifaceted (positive and negative correlations) and dynamic progression over time, which is not well captured by existing Transformer-based models. To address this issue, we propose a TimeCNN model to refine cross-variable interactions to enhance time series forecasting. Its key innovation is timepoint-independent, where each time point has an independent convolution kernel, allowing each time point to have its independent model to capture relationships among variables. This approach effectively handles both positive and negative correlations and adapts to the evolving nature of variable relationships over time. Extensive experiments conducted on 12 real-world datasets demonstrate that TimeCNN consistently outperforms state-of-the-art models. Notably, our model achieves significant reductions in computational requirements (approximately 60.46%) and parameter count (about 57.50%), while delivering inference speeds 3 to 4 times faster than the benchmark iTransformer model

Updated: 2024-10-07 09:16:58

标题: TimeCNN：在时间点上细化时间序列预测的跨变量交互

摘要: 时间序列预测广泛应用于各个领域。基于Transformer的模型在建模跨时间和跨变量交互方面展现出显著潜力。然而，我们注意到多变量时间序列的跨变量相关性表现出多方面（正相关和负相关）和随时间动态变化的特点，这种特点现有的基于Transformer的模型无法很好地捕捉。为了解决这个问题，我们提出了一个TimeCNN模型来精炼跨变量交互以增强时间序列预测。其关键创新在于时间点独立，每个时间点都有独立的卷积核，使得每个时间点都有自己的模型来捕捉变量之间的关系。这种方法有效处理了正相关和负相关，并适应了随时间变化的变量关系的演变。在12个真实数据集上进行的广泛实验表明，TimeCNN始终优于最先进的模型。值得注意的是，我们的模型在计算要求方面实现了显著的降低（约为60.46%），参数数量减少（约为57.50%），同时推断速度比基准iTransformer模型快3到4倍。

更新时间: 2024-10-07 09:16:58

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.04853v1

AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations

Large Language Models prompting, such as using in-context demonstrations, is a mainstream technique for invoking LLMs to perform high-performance and solid complex reasoning (e.g., mathematical reasoning, commonsense reasoning), and has the potential for further human-machine collaborative scientific findings. However, current LLMs are delicate and elusive in prompt words and styles. And there is an unseen gap between LLM understanding and human-written prompts. This paper introduces Alignedcot, an LLM-acquainted prompting technique that includes proficient ``native-speaking'' in in-context learning for the LLMs. Specifically, it achieves consistent and correct step-wise prompts in zero-shot scenarios by progressively probing, refining, and formatting the LLM chain of thoughts so that free from handcrafted few-shot demonstrations while maintaining the prompt quality. We conduct experiments on mathematical reasoning and commonsense reasoning. We find that LLMs with Alignedcot perform significantly superior to them with human-crafted demonstrations. We further apply Alignedcot for rewriting the GSM8K training set, resulting in a GSM8K-Align dataset. We observe its benefits for retrieval augmented generation. The code and data can be found at https://github.com/yangzhch6/AlignedCoT.

Updated: 2024-10-07 09:11:49

标题: AlignedCoT：通过母语演示促进大型语言模型

摘要: 大型语言模型提示，例如使用上下文演示，是调用LLM执行高性能和坚实复杂推理（例如数学推理，常识推理）的主流技术，并且具有进一步人机协作科学发现的潜力。然而，当前的LLM在提示词和风格上是微妙而难以捉摸的。LLM理解和人类编写的提示之间存在看不见的差距。本文介绍了一种名为Alignedcot的LLM熟悉提示技术，其中包括在LLM的上下文学习中具有熟练的“母语水平”。具体而言，通过逐步探测、优化和格式化LLM的思维链，实现了零-shot情景下的一致且正确的逐步提示，从而摆脱手工制作的少量演示，同时保持提示质量。我们对数学推理和常识推理进行了实验。我们发现，具有Alignedcot的LLM性能明显优于具有人工制作演示的LLM。我们进一步将Alignedcot应用于重写GSM8K训练集，生成了GSM8K-Align数据集。我们观察到它对检索增强生成的好处。代码和数据可以在https://github.com/yangzhch6/AlignedCoT 找到。

更新时间: 2024-10-07 09:11:49

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.13538v5

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

In the field of image editing, three core challenges persist: controllability, background preservation, and efficiency. Inversion-based methods rely on time-consuming optimization to preserve the features of the initial images, which results in low efficiency due to the requirement for extensive network inference. Conversely, inversion-free methods lack theoretical support for background similarity, as they circumvent the issue of maintaining initial features to achieve efficiency. As a consequence, none of these methods can achieve both high efficiency and background consistency. To tackle the challenges and the aforementioned disadvantages, we introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process. Specifically, a corresponding measurement term related to both the initial features and Langevin dynamics is introduced to optimize the estimated image generated by the given target prompt. Extensive experimental results indicate that the proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions. Furthermore, the method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.

Updated: 2024-10-07 09:04:50

标题: 后编辑：后验采样用于高效的零样本图像编辑

摘要: 在图像编辑领域，存在三个核心挑战：可控性、背景保留和效率。基于反演的方法依赖于耗时的优化来保留初始图像的特征，这导致效率低下，因为需要进行大量网络推断。相反，无反演的方法缺乏背景相似性的理论支持，因为它们规避了保持初始特征以实现效率的问题。因此，这些方法都无法同时实现高效率和背景一致性。为了解决这些挑战和前述的缺点，我们引入了PostEdit方法，该方法将后验方案纳入扩散抽样过程中。具体地，引入了一个与初始特征和朗之万动力学相关的相应测量项，以优化由给定目标提示生成的估计图像。大量实验结果表明，所提出的PostEdit方法在准确保留未编辑区域的同时实现了最先进的编辑性能。此外，该方法既无反演又无需训练，生成高质量结果需要约1.5秒和18 GB的GPU内存。

更新时间: 2024-10-07 09:04:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.04844v1

Spatio-Temporal 3D Point Clouds from WiFi-CSI Data via Transformer Networks

Joint communication and sensing (JC\&S) is emerging as a key component in 5G and 6G networks, enabling dynamic adaptation to environmental changes and enhancing contextual awareness for optimized communication. By leveraging real-time environmental data, JC\&S improves resource allocation, reduces latency, and enhances power efficiency, while also supporting simulations and predictive modeling. This makes it a key technology for reactive systems and digital twins. These systems can respond to environmental events in real-time, offering transformative potential in sectors like smart cities, healthcare, and Industry 5.0, where adaptive and multimodal interaction is critical to enhance real-time decision-making. In this work, we present a transformer-based architecture that processes temporal Channel State Information (CSI) data, specifically amplitude and phase, to generate 3D point clouds of indoor environments. The model utilizes a multi-head attention to capture complex spatio-temporal relationships in CSI data and is adaptable to different CSI configurations. We evaluate the architecture on the MM-Fi dataset, using two different protocols to capture human presence in indoor environments. The system demonstrates strong potential for accurate 3D reconstructions and effectively distinguishes between close and distant objects, advancing JC\&S applications for spatial sensing in future wireless networks.

Updated: 2024-10-07 08:59:04

标题: 通过Transformer网络从WiFi-CSI数据中获取的时空三维点云

摘要: 联合通信和感知（JC\&S）正成为5G和6G网络中的关键组件，实现对环境变化的动态适应，并增强优化通信的上下文意识。通过利用实时环境数据，JC\&S改善资源分配，减少延迟，提高能源效率，同时还支持模拟和预测建模。这使其成为响应式系统和数字孪生的关键技术。这些系统可以实时响应环境事件，在智能城市、医疗保健和工业5.0等领域提供变革性潜力，其中自适应和多模态交互对增强实时决策至关重要。在这项工作中，我们提出了一种基于变压器的架构，用于处理时间通道状态信息（CSI）数据，特别是振幅和相位，以生成室内环境的3D点云。该模型利用多头注意力来捕捉CSI数据中的复杂时空关系，并可适应不同的CSI配置。我们在MM-Fi数据集上评估了该架构，使用两种不同的协议捕捉室内环境中的人员存在。该系统展示了精确的3D重建潜力，并有效区分近距离和远距离物体，推进了未来无线网络中空间感知的JC\&S应用。

更新时间: 2024-10-07 08:59:04

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2410.16303v1

Computational design of target-specific linear peptide binders with TransformerBeta

The computational prediction and design of peptide binders targeting specific linear epitopes is crucial in biological and biomedical research, yet it remains challenging due to their highly dynamic nature and the scarcity of experimentally solved binding data. To address this problem, we built an unprecedentedly large-scale library of peptide pairs within stable secondary structures (beta sheets), leveraging newly available AlphaFold predicted structures. We then developed a machine learning method based on the Transformer architecture for the design of specific linear binders, in analogy to a language translation task. Our method, TransformerBeta, accurately predicts specific beta strand interactions and samples sequences with beta sheet-like molecular properties, while capturing interpretable physico-chemical interaction patterns. As such, it can propose specific candidate binders targeting linear epitope for experimental validation to inform protein design.

Updated: 2024-10-07 08:52:54

标题: 使用TransformerBeta进行目标特异性线性肽结合物的计算设计

摘要: 计算预测和设计靶向特定线性表位的肽结合物在生物和生物医学研究中至关重要，然而由于其高度动态的特性和实验数据的稀缺性，仍然具有挑战性。为了解决这个问题，我们建立了一个前所未有的大规模肽对库，其中包含稳定的二级结构（β折叠），利用新近可用的AlphaFold预测结构。然后，我们基于Transformer架构开发了一种机器学习方法，用于设计特定的线性结合物，类似于语言翻译任务。我们的方法TransformerBeta能够准确预测特定的β链相互作用，并生成具有β折叠类似分子特性的序列，同时捕捉可解释的物理化学相互作用模式。因此，它可以提出针对线性表位的特定候选结合物，以进行实验验证，为蛋白质设计提供信息。

更新时间: 2024-10-07 08:52:54

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2410.16302v1

Investigating Role of Big Five Personality Traits in Audio-Visual Rapport Estimation

Automatic rapport estimation in social interactions is a central component of affective computing. Recent reports have shown that the estimation performance of rapport in initial interactions can be improved by using the participant's personality traits as the model's input. In this study, we investigate whether this findings applies to interactions between friends by developing rapport estimation models that utilize nonverbal cues (audio and facial expressions) as inputs. Our experimental results show that adding Big Five features (BFFs) to nonverbal features can improve the estimation performance of self-reported rapport in dyadic interactions between friends. Next, we demystify how BFFs improve the estimation performance of rapport through a comparative analysis between models with and without BFFs. We decompose rapport ratings into perceiver effects (people's tendency to rate other people), target effects (people's tendency to be rated by other people), and relationship effects (people's unique ratings for a specific person) using the social relations model. We then analyze the extent to which BFFs contribute to capturing each effect. Our analysis demonstrates that the perceiver's and the target's BFFs lead estimation models to capture the perceiver and the target effects, respectively. Furthermore, our experimental results indicate that the combinations of facial expression features and BFFs achieve best estimation performances not only in estimating rapport ratings, but also in estimating three effects. Our study is the first step toward understanding why personality-aware estimation models of interpersonal perception accomplish high estimation performance.

Updated: 2024-10-07 08:52:33

标题: 研究大五人格特质在音视频融洽度评估中的作用

摘要: 社交互动中自动建立融洽关系的估计是情感计算的核心组成部分。最近的报告显示，在初次互动中，利用参与者的个性特征作为模型输入可以改善融洽关系的估计性能。在这项研究中，我们通过开发利用非语言线索（音频和面部表情）作为输入的融洽关系估计模型，来调查这一发现是否适用于朋友之间的互动。我们的实验结果显示，将大五人格特征（BFFs）添加到非语言特征中可以提高朋友之间二元互动中自我报告的融洽关系估计性能。接下来，我们通过比较分析揭示了BFFs如何提高融洽关系估计性能。我们使用社会关系模型将融洽评分分解为感知者效应（人们评价他人的倾向）、目标效应（人们被他人评价的倾向）和关系效应（人们对特定人的独特评分），然后分析BFFs对捕捉每种效应的贡献程度。我们的分析表明，感知者和目标的BFFs分别导致估计模型捕捉感知者和目标效应。此外，我们的实验结果表明，面部表情特征和BFFs的组合不仅在估计融洽评分方面取得最佳估计性能，而且在估计三种效应方面也是如此。我们的研究是理解为什么个性感知模型的估计性能可以较高的第一步。

更新时间: 2024-10-07 08:52:33

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.11861v1

Can Large Language Models Understand Symbolic Graphics Programs?

Against the backdrop of enthusiasm for large language models (LLMs), there is an urgent need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models have not encountered during training. Utilizing symbolic graphics programs, we propose a domain well-suited to test multiple spatial-semantic reasoning skills of LLMs. Popular in computer graphics, these programs procedurally generate visual data. While LLMs exhibit impressive skills in general program synthesis and analysis, symbolic graphics programs offer a new layer of evaluation: they allow us to test an LLM's ability to answer different-grained semantic-level questions of the images or 3D geometries without a vision encoder. To semantically understand the symbolic programs, LLMs would need to possess the ability to "imagine" and reason how the corresponding graphics content would look with only the symbolic description. We use this task to evaluate LLMs by creating a large benchmark for the semantic visual understanding of symbolic graphics programs, built procedurally with minimal human effort. Particular emphasis is placed on transformations of images that leave the image level semantics invariant while introducing significant changes to the underlying program. We evaluate commercial and open-source LLMs on our benchmark to assess their ability to reason about visual output of programs, finding that LLMs considered stronger at reasoning generally perform better. Lastly, we introduce a novel method to improve this ability -- Symbolic Instruction Tuning (SIT), in which the LLM is finetuned with pre-collected instruction data on symbolic graphics programs. Interestingly, we find that SIT not only improves LLM's understanding on symbolic programs, but it also improves general reasoning ability on various other benchmarks.

Updated: 2024-10-07 08:44:35

标题: 大型语言模型能理解符号图形程序吗？

摘要: 在对大型语言模型（LLMs）充满热情的背景下，迫切需要对它们的能力和不足进行科学评估。这在某种程度上是非常困难的，因为很难找到这些模型在训练过程中没有遇到过的任务。利用符号图形程序，我们提出了一个适合测试LLMs多种空间-语义推理技能的领域。这些程序在计算机图形学中很受欢迎，它们可以程序生成视觉数据。虽然LLMs在一般程序合成和分析方面表现出色，符号图形程序提供了一个新的评估层次：它们允许我们测试LLMs在没有视觉编码器的情况下回答图像或3D几何的不同语义级问题的能力。为了从语义上理解符号程序，LLMs需要具备"想象"和推理的能力，以仅凭符号描述预测对应图形内容的外观。我们利用这个任务来评估LLMs，通过用最少的人力努力程序化构建一个大型基准来评估符号图形程序的语义视觉理解能力。我们特别强调对图像进行变换，使图像级别的语义保持不变，同时对底层程序引入显著变化。我们评估商业和开源LLMs在我们的基准上，以评估它们推理关于程序的视觉输出的能力，发现一般认为推理能力较强的LLMs表现得更好。最后，我们引入了一种新方法来提高这种能力——符号指导调整（SIT），其中LLM通过预先收集的符号图形程序指令数据进行微调。有趣的是，我们发现SIT不仅提高了LLMs对符号程序的理解，还提高了在各种其他基准上的一般推理能力。

更新时间: 2024-10-07 08:44:35

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2408.08313v2

Cost Estimation in Unit Commitment Problems Using Simulation-Based Inference

The Unit Commitment (UC) problem is a key optimization task in power systems to forecast the generation schedules of power units over a finite time period by minimizing costs while meeting demand and technical constraints. However, many parameters required by the UC problem are unknown, such as the costs. In this work, we estimate these unknown costs using simulation-based inference on an illustrative UC problem, which provides an approximated posterior distribution of the parameters given observed generation schedules and demands. Our results highlight that the learned posterior distribution effectively captures the underlying distribution of the data, providing a range of possible values for the unknown parameters given a past observation. This posterior allows for the estimation of past costs using observed past generation schedules, enabling operators to better forecast future costs and make more robust generation scheduling forecasts. We present avenues for future research to address overconfidence in posterior estimation, enhance the scalability of the methodology and apply it to more complex UC problems modeling the network constraints and renewable energy sources.

Updated: 2024-10-07 08:44:08

标题: 使用基于模拟推断的成本估算在机组组合问题中的应用

摘要: Unit Commitment（UC）问题是电力系统中的一个关键优化任务，通过最小化成本来预测一定时间段内发电机组的发电计划，同时满足需求和技术约束。然而，许多UC问题所需的参数是未知的，比如成本。在这项工作中，我们使用基于模拟的推断来估计这些未知成本，通过一个说明性的UC问题，提供了给定观察到的发电计划和需求的参数的近似后验分布。我们的结果表明，学习到的后验分布有效地捕捉了数据的基本分布，为未知参数提供了一系列可能的值，基于过去的观测。这个后验允许利用观察到的过去发电计划来估计过去的成本，使运营商能够更好地预测未来成本，并进行更可靠的发电计划预测。我们提出了未来研究的方向，以解决后验估计中的过度自信，增强方法的可扩展性，并将其应用于更复杂的UC问题，建模网络约束和可再生能源来源。

更新时间: 2024-10-07 08:44:08

领域: cs.LG

下载: http://arxiv.org/abs/2409.03588v2

An Effective Theory of Bias Amplification

Machine learning models may capture and amplify biases present in data, leading to disparate test performance across social groups. To better understand, evaluate, and mitigate these possible biases, a deeper theoretical understanding of how model design choices and data distribution properties could contribute to bias is needed. In this work, we contribute a precise analytical theory in the context of ridge regression, both with and without random projections, where the former models neural networks in a simplified regime. Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias in various feature and parameter regimes. For example, we demonstrate that there may be an optimal regularization penalty or training time to avoid bias amplification, and there can be fundamental differences in test error between groups that do not vanish with increased parameterization. Importantly, our theoretical predictions align with several empirical observations reported in the literature. We extensively empirically validate our theory on diverse synthetic and semi-synthetic datasets.

Updated: 2024-10-07 08:43:22

标题: 一种有效的偏见放大理论

摘要: 机器学习模型可能捕捉并放大数据中存在的偏见，导致社会群体之间的测试表现存在差异。为了更好地理解、评估和减轻这些可能的偏见，需要更深入地理解模型设计选择和数据分布特性如何导致偏见。在这项工作中，我们在岭回归的背景下提出了一个精确的分析理论，既包括随机投影，又包括不包括随机投影，前者模拟了神经网络在简化环境中的模型。我们的理论提供了机器学习偏见的统一和严格的解释，深入探讨偏见放大和少数群体偏见在各种特征和参数范围内的现象。例如，我们证明可能存在一种最佳的正则化惩罚或训练时间来避免偏见放大，并且在增加参数化的情况下，不同群体之间的测试错误可能存在根本性差异。重要的是，我们的理论预测与文献中报道的几个实证观察相一致。我们在多样的合成和半合成数据集上广泛地经验验证了我们的理论。

更新时间: 2024-10-07 08:43:22

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2410.17263v1

Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models

As large language models (LLMs) become integral to various applications, ensuring both their safety and utility is paramount. Jailbreak attacks, which manipulate LLMs into generating harmful content, pose significant challenges to this balance. Existing defenses, such as prompt engineering and safety fine-tuning, often introduce computational overhead, increase inference latency, and lack runtime flexibility. Moreover, overly restrictive safety measures can degrade model utility by causing refusals of benign queries. In this paper, we introduce Jailbreak Antidote, a method that enables real-time adjustment of LLM safety preferences by manipulating a sparse subset of the model's internal states during inference. By shifting the model's hidden representations along a safety direction with varying strengths, we achieve flexible control over the safety-utility balance without additional token overhead or inference delays. Our analysis reveals that safety-related information in LLMs is sparsely distributed; adjusting approximately 5% of the internal state is as effective as modifying the entire state. Extensive experiments on nine LLMs (ranging from 2 billion to 72 billion parameters), evaluated against ten jailbreak attack methods and compared with six defense strategies, validate the effectiveness and efficiency of our approach. By directly manipulating internal states during reasoning, Jailbreak Antidote offers a lightweight, scalable solution that enhances LLM safety while preserving utility, opening new possibilities for real-time safety mechanisms in widely-deployed AI systems.

Updated: 2024-10-07 08:40:35

标题: 越狱解毒剂：通过大型语言模型中的稀疏表示调整实现运行时安全性和效用平衡

摘要: 随着大型语言模型（LLMs）成为各种应用的核心，确保它们的安全性和效用至关重要。越狱攻击将LLMs操纵为生成有害内容，对此平衡提出了重大挑战。现有的防御措施，如提示工程和安全微调，通常会引入计算负担，增加推断延迟，并缺乏运行时灵活性。此外，过于严格的安全措施可能通过拒绝良性查询而降低模型效用。在本文中，我们介绍了Jailbreak Antidote，这是一种方法，通过在推断过程中操纵模型的内部状态的稀疏子集，实现LLM安全偏好的实时调整。通过沿着具有不同强度的安全方向移动模型的隐藏表示，我们实现了对安全-效用平衡的灵活控制，而不会增加额外的令牌负担或推断延迟。我们的分析表明，LLMs中与安全相关的信息是稀疏分布的；调整大约5%的内部状态就像修改整个状态一样有效。针对九个LLMs（参数范围从20亿到720亿），针对十种越狱攻击方法进行了广泛实验，并与六种防御策略进行了比较，验证了我们方法的有效性和效率。通过在推理过程中直接操纵内部状态，Jailbreak Antidote提供了一种轻量级、可扩展的解决方案，增强了LLM的安全性，同时保持了效用，为广泛部署的AI系统中的实时安全机制开辟了新的可能性。

更新时间: 2024-10-07 08:40:35

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2410.02298v2

Multimodal Fusion Strategies for Mapping Biophysical Landscape Features

Multimodal aerial data are used to monitor natural systems, and machine learning can significantly accelerate the classification of landscape features within such imagery to benefit ecology and conservation. It remains under-explored, however, how these multiple modalities ought to be fused in a deep learning model. As a step towards filling this gap, we study three strategies (Early fusion, Late fusion, and Mixture of Experts) for fusing thermal, RGB, and LiDAR imagery using a dataset of spatially-aligned orthomosaics in these three modalities. In particular, we aim to map three ecologically-relevant biophysical landscape features in African savanna ecosystems: rhino middens, termite mounds, and water. The three fusion strategies differ in whether the modalities are fused early or late, and if late, whether the model learns fixed weights per modality for each class or generates weights for each class adaptively, based on the input. Overall, the three methods have similar macro-averaged performance with Late fusion achieving an AUC of 0.698, but their per-class performance varies strongly, with Early fusion achieving the best recall for middens and water and Mixture of Experts achieving the best recall for mounds.

Updated: 2024-10-07 08:40:29

标题: 多模态融合策略用于地理生物景观特征的映射

摘要: 多模态航空数据被用于监测自然系统，机器学习可以显著加速对这些图像中景观特征进行分类，从而造福生态学和保护工作。然而，目前尚未探讨如何将这些多种模态融合到深度学习模型中。为了填补这一空白，我们研究了三种融合策略（早期融合、后期融合和专家混合），使用了三种模态的空间对齐正交照片数据集来融合热像、RGB 和 LiDAR 图像。具体来说，我们旨在在非洲稀树草原生态系统中映射三个生态相关的生物地理景观要素：犀牛粪堆、白蚁丘和水。这三种融合策略在模态是早期还是后期融合上有所不同，如果是后期融合，模型是否学习每个类别的固定权重，或者根据输入自适应生成每个类别的权重。总体而言，这三种方法在宏观平均性能上相似，后期融合实现了0.698的AUC，但它们的每个类别性能差异很大，早期融合在粪堆和水的召回率上表现最好，而专家混合在丘的召回率上表现最好。

更新时间: 2024-10-07 08:40:29

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.04833v1

TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF.

Updated: 2024-10-07 08:28:43

标题: TD-NeRF：用于联合相机姿态和神经辐射场优化的新型截断深度先验

摘要: 对准确的相机姿势的依赖是阻碍神经辐射场（NeRF）模型在3D重建和SLAM任务中广泛部署的重要障碍。现有方法引入单目深度先验以共同优化相机姿势和NeRF，但未能充分利用深度先验并忽视其固有噪声的影响。在本文中，我们提出了截断深度NeRF（TD-NeRF），这是一种新颖的方法，可以通过共同优化辐射场和相机姿势的可学习参数来训练NeRF从未知的相机姿势。我们的方法通过三个关键进展明确利用单目深度先验：1）我们提出了一种基于截断正态分布的新型基于深度的射线采样策略，提高了姿势估计的收敛速度和准确性；2）为了避开局部极小值并改进深度几何结构，我们引入了一种逐渐改进深度精度的粗到细的训练策略；3）我们提出了一种更强大的帧间点约束，增强了训练过程中抵抗深度噪声的鲁棒性。在三个数据集上的实验结果表明，TD-NeRF在相机姿势和NeRF的联合优化方面表现优异，超越了先前的作品，并生成了更准确的深度几何结构。我们的方法的实现已经发布在https://github.com/nubot-nudt/TD-NeRF。

更新时间: 2024-10-07 08:28:43

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.07027v2

Audio-Driven Emotional 3D Talking-Head Generation

Audio-driven video portrait synthesis is a crucial and useful technology in virtual human interaction and film-making applications. Recent advancements have focused on improving the image fidelity and lip-synchronization. However, generating accurate emotional expressions is an important aspect of realistic talking-head generation, which has remained underexplored in previous works. We present a novel system in this paper for synthesizing high-fidelity, audio-driven video portraits with accurate emotional expressions. Specifically, we utilize a variational autoencoder (VAE)-based audio-to-motion module to generate facial landmarks. These landmarks are concatenated with emotional embeddings to produce emotional landmarks through our motion-to-emotion module. These emotional landmarks are then used to render realistic emotional talking-head video using a Neural Radiance Fields (NeRF)-based emotion-to-video module. Additionally, we propose a pose sampling method that generates natural idle-state (non-speaking) videos in response to silent audio inputs. Extensive experiments demonstrate that our method obtains more accurate emotion generation with higher fidelity.

Updated: 2024-10-07 08:23:05

标题: 基于音频驱动的情感3D语音生成

摘要: 音频驱动的视频肖像合成是虚拟人类交互和电影制作应用中至关重要且有用的技术。最近的进展集中在提高图像保真度和嘴唇同步性上。然而，生成准确的情感表达是逼真对话头生成的重要方面，在先前的作品中仍未得到充分探索。本文提出了一种新颖的系统，用于合成具有准确情感表达的高保真度、音频驱动的视频肖像。具体来说，我们利用基于变分自动编码器（VAE）的音频到动作模块生成面部地标。这些地标与情感嵌入连接，通过我们的动作到情感模块产生情感地标。然后，这些情感地标用于使用基于神经辐射场（NeRF）的情感到视频模块渲染逼真的情感对话头视频。此外，我们提出了一种姿势采样方法，对无声音频输入生成自然的静态状态（非说话）视频。大量实验证明，我们的方法获得更准确的情感生成和更高的保真度。

更新时间: 2024-10-07 08:23:05

领域: cs.CV,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.17262v1

Taming Gradient Oversmoothing and Expansion in Graph Neural Networks

Oversmoothing has been claimed as a primary bottleneck for multi-layered graph neural networks (GNNs). Multiple analyses have examined how and why oversmoothing occurs. However, none of the prior work addressed how optimization is performed under the oversmoothing regime. In this work, we show the presence of $\textit{gradient oversmoothing}$ preventing optimization during training. We further analyze that GNNs with residual connections, a well-known solution to help gradient flow in deep architecture, introduce $\textit{gradient expansion}$, a phenomenon of the gradient explosion in diverse directions. Therefore, adding residual connections cannot be a solution for making a GNN deep. Our analysis reveals that constraining the Lipschitz bound of each layer can neutralize the gradient expansion. To this end, we provide a simple yet effective normalization method to prevent the gradient expansion. An empirical study shows that the residual GNNs with hundreds of layers can be efficiently trained with the proposed normalization without compromising performance. Additional studies show that the empirical observations corroborate our theoretical analysis.

Updated: 2024-10-07 08:22:20

标题: 驯服图神经网络中的梯度过度平滑和扩张

摘要: 过度平滑被认为是多层图神经网络（GNNs）的主要瓶颈。多个分析研究了过度平滑发生的方式和原因。然而，先前的工作没有解决过度平滑状态下如何进行优化的问题。在这项工作中，我们展示了$\textit{梯度过度平滑}$的存在，在训练过程中阻止了优化。我们进一步分析了，具有残差连接的GNNs，这是一个用来帮助深层架构中的梯度流动的众所周知的解决方案，引入了$\textit{梯度扩张}$，即梯度在不同方向上的爆炸现象。因此，添加残差连接不能解决使GNN变得更深的问题。我们的分析表明，约束每一层的利普希茨边界可以中和梯度扩张。因此，我们提出了一种简单而有效的归一化方法，以防止梯度扩张。实证研究表明，使用提出的归一化方法可以有效地训练具有数百层的残差GNNs，而不会影响性能。额外的研究显示，实证观察结果支持我们的理论分析。

更新时间: 2024-10-07 08:22:20

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.04824v1

A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence

Machine learning (ML) models are increasingly used in various applications, from recommendation systems in e-commerce to diagnosis prediction in healthcare. In this paper, we present a novel dynamic framework for thinking about the deployment of ML models in a performative, human-ML collaborative system. In our framework, the introduction of ML recommendations changes the data-generating process of human decisions, which are only a proxy to the ground truth and which are then used to train future versions of the model. We show that this dynamic process in principle can converge to different stable points, i.e. where the ML model and the Human+ML system have the same performance. Some of these stable points are suboptimal with respect to the actual ground truth. As a proof of concept, we conduct an empirical user study with 1,408 participants. In the study, humans solve instances of the knapsack problem with the help of machine learning predictions of varying performance. This is an ideal setting because we can identify the actual ground truth, and evaluate the performance of human decisions supported by ML recommendations. We find that for many levels of ML performance, humans can improve upon the ML predictions. We also find that the improvement could be even higher if humans rationally followed the ML recommendations. Finally, we test whether monetary incentives can increase the quality of human decisions, but we fail to find any positive effect. Using our empirical data to approximate our collaborative system suggests that the learning process would dynamically reach an equilibrium performance that is around 92% of the maximum knapsack value. Our results have practical implications for the deployment of ML models in contexts where human decisions may deviate from the indisputable ground truth.

Updated: 2024-10-07 08:20:55

标题: 一个表现性人类与机器学习协作的动态模型：理论与实证证据

摘要: 机器学习（ML）模型越来越多地应用于各种领域，从电子商务中的推荐系统到医疗保健中的诊断预测。在本文中，我们提出了一个新颖的动态框架，用于思考在一个执行性、人机协作系统中部署ML模型。在我们的框架中，引入ML推荐改变了人类决策的数据生成过程，这些决策仅仅是地面真相的代理，并且用于训练未来版本的模型。我们展示了这种动态过程原则上可以收敛到不同的稳定点，即ML模型和人机系统具有相同的性能。其中一些稳定点相对于实际地面真相是次优的。作为概念验证，我们进行了一项涉及1,408名参与者的实证用户研究。在这项研究中，人类通过机器学习预测以不同性能解决背包问题的实例。这是一个理想的设置，因为我们可以确定实际地面真相，并评估人类决策在ML推荐支持下的表现。我们发现，对于许多不同水平的ML性能，人类可以改进ML预测。我们还发现，如果人类理性地遵循ML推荐，改进可能会更高。最后，我们测试了金钱激励是否可以提高人类决策的质量，但我们未能发现任何积极效果。利用我们的实证数据来近似我们的协作系统表明，学习过程会动态达到一个约为最大背包价值的92％的平衡性能。我们的结果对于在人类决策可能偏离无可争议的地面真相的情境中部署ML模型具有实际意义。

更新时间: 2024-10-07 08:20:55

领域: cs.LG,cs.AI,cs.HC,econ.GN,q-fin.EC,68T05,I.2.1; I.2.6; K.6

下载: http://arxiv.org/abs/2405.13753v3

ProteinBench: A Holistic Evaluation of Protein Foundation Models

Recent years have witnessed a surge in the development of protein foundation models, significantly improving performance in protein prediction and generative tasks ranging from 3D structure prediction and protein design to conformational dynamics. However, the capabilities and limitations associated with these models remain poorly understood due to the absence of a unified evaluation framework. To fill this gap, we introduce ProteinBench, a holistic evaluation framework designed to enhance the transparency of protein foundation models. Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance. Our comprehensive evaluation of protein foundation models reveals several key findings that shed light on their current capabilities and limitations. To promote transparency and facilitate further research, we release the evaluation dataset, code, and a public leaderboard publicly for further analysis and a general modular toolkit. We intend for ProteinBench to be a living benchmark for establishing a standardized, in-depth evaluation framework for protein foundation models, driving their development and application while fostering collaboration within the field.

Updated: 2024-10-07 08:20:32

标题: ProteinBench：蛋白质基础模型的全面评估

摘要: 近年来，蛋白质基础模型的发展迅速增长，显著提高了蛋白质预测和生成任务的性能，包括从3D结构预测和蛋白设计到构象动力学。然而，由于缺乏统一的评估框架，与这些模型相关的能力和局限性仍然不够清楚。为了填补这一空白，我们引入了ProteinBench，一个旨在提高蛋白质基础模型透明度的综合评估框架。我们的方法包括三个关键组成部分：（i）一个广泛涵盖蛋白质领域主要挑战的任务的分类，基于不同蛋白质形态之间的关系；（ii）一个多指标评估方法，评估质量、新颖性、多样性和稳健性四个关键维度的性能；以及（iii）根据各种用户目标的深入分析，提供模型性能的全面视角。我们对蛋白质基础模型的全面评估揭示了几个关键发现，揭示了它们当前的能力和局限性。为了促进透明度并促进进一步研究，我们公开发布评估数据集、代码和公开排行榜，供进一步分析和一般模块化工具包使用。我们希望ProteinBench成为一个活跃的基准，建立一个标准化、深入的蛋白质基础模型评估框架，推动它们的发展和应用，同时促进该领域内的合作。

更新时间: 2024-10-07 08:20:32

领域: q-bio.QM,cs.AI,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2409.06744v2

Trained Models Tell Us How to Make Them Robust to Spurious Correlation without Group Annotation

Classifiers trained with Empirical Risk Minimization (ERM) tend to rely on attributes that have high spurious correlation with the target. This can degrade the performance on underrepresented (or 'minority') groups that lack these attributes, posing significant challenges for both out-of-distribution generalization and fairness objectives. Many studies aim to enhance robustness to spurious correlation, but they sometimes depend on group annotations for training. Additionally, a common limitation in previous research is the reliance on group-annotated validation datasets for model selection. This constrains their applicability in situations where the nature of the spurious correlation is not known, or when group labels for certain spurious attributes are not available. To enhance model robustness with minimal group annotation assumptions, we propose Environment-based Validation and Loss-based Sampling (EVaLS). It uses the losses from an ERM-trained model to construct a balanced dataset of high-loss and low-loss samples, mitigating group imbalance in data. This significantly enhances robustness to group shifts when equipped with a simple post-training last layer retraining. By using environment inference methods to create diverse environments with correlation shifts, EVaLS can potentially eliminate the need for group annotation in validation data. In this context, the worst environment accuracy acts as a reliable surrogate throughout the retraining process for tuning hyperparameters and finding a model that performs well across diverse group shifts. EVaLS effectively achieves group robustness, showing that group annotation is not necessary even for validation. It is a fast, straightforward, and effective approach that reaches near-optimal worst group accuracy without needing group annotations, marking a new chapter in the robustness of trained models against spurious correlation.

Updated: 2024-10-07 08:17:44

标题: 经过训练的模型告诉我们如何使它们对虚假相关性具有稳健性，而无需组注释。

摘要: 使用经验风险最小化(ERM)训练的分类器往往依赖具有与目标高度虚假相关性的属性。这可能降低对那些缺乏这些属性的代表性较低(或“少数”)群体的表现，在超出分布泛化和公平目标方面带来重大挑战。许多研究旨在增强对虚假相关性的鲁棒性，但有时它们依赖于用于训练的群体注释。此外，先前研究的一个常见限制是依赖于群体注释验证数据集进行模型选择。这限制了它们在不了解虚假相关性性质或某些虚假属性的群体标签不可用的情况下的适用性。为了在最小的群体注释假设下增强模型的鲁棒性，我们提出了基于环境验证和基于损失抽样(EVaLS)。它使用从ERM训练模型中得到的损失来构建一个高损失和低损失样本平衡的数据集，减轻数据中的群体不平衡。这通过简单的训练后的最后一层重新训练显著增强了对群体转移的鲁棒性。通过使用环境推断方法创建具有相关性转移的多样化环境，EVaLS潜在地可以消除在验证数据中需要群体注释的需求。在这种情况下，最差环境准确性在整个重新训练过程中充当可靠的替代，用于调整超参数并找到在各种群体转移中表现良好的模型。EVaLS有效实现了群体鲁棒性，显示出即使在验证时也不需要群体注释。它是一种快速、简单且有效的方法，可以在不需要群体注释的情况下达到接近最优最差群体准确性，标志着训练模型抵抗虚假相关性的鲁棒性的新篇章。

更新时间: 2024-10-07 08:17:44

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.05345v1

CAT: Concept-level backdoor ATtacks for Concept Bottleneck Models

Despite the transformative impact of deep learning across multiple domains, the inherent opacity of these models has driven the development of Explainable Artificial Intelligence (XAI). Among these efforts, Concept Bottleneck Models (CBMs) have emerged as a key approach to improve interpretability by leveraging high-level semantic information. However, CBMs, like other machine learning models, are susceptible to security threats, particularly backdoor attacks, which can covertly manipulate model behaviors. Understanding that the community has not yet studied the concept level backdoor attack of CBM, because of "Better the devil you know than the devil you don't know.", we introduce CAT (Concept-level Backdoor ATtacks), a methodology that leverages the conceptual representations within CBMs to embed triggers during training, enabling controlled manipulation of model predictions at inference time. An enhanced attack pattern, CAT+, incorporates a correlation function to systematically select the most effective and stealthy concept triggers, thereby optimizing the attack's impact. Our comprehensive evaluation framework assesses both the attack success rate and stealthiness, demonstrating that CAT and CAT+ maintain high performance on clean data while achieving significant targeted effects on backdoored datasets. This work underscores the potential security risks associated with CBMs and provides a robust testing methodology for future security assessments.

Updated: 2024-10-07 08:14:17

标题: CAT：概念级别的概念瓶颈模型后门攻击

摘要: 尽管深度学习在多个领域产生了变革性影响，但这些模型的固有不透明性推动了可解释人工智能（XAI）的发展。在这些努力中，概念瓶颈模型（CBMs）已经成为提高可解释性的关键方法，通过利用高级语义信息。然而，CBMs像其他机器学习模型一样，容易受到安全威胁，特别是后门攻击，可以秘密操纵模型行为。鉴于社区尚未研究CBM的概念级后门攻击，因为“知己知彼，百战不殆”，我们引入了CAT（概念级后门攻击），这是一种利用CBMs内部概念表示在训练过程中嵌入触发器的方法，从而使模型在推断时可以受控地操纵预测。增强的攻击模式CAT+结合了相关函数，系统地选择最有效和隐蔽的概念触发器，从而优化攻击的影响。我们的综合评估框架评估了攻击成功率和隐蔽性，表明CAT和CAT+在干净数据上保持高性能，同时在后门数据集上实现显著的目标效果。这项工作强调了与CBMs相关的潜在安全风险，并为未来安全评估提供了强大的测试方法。

更新时间: 2024-10-07 08:14:17

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2410.04823v1

Physics-Informed GNN for non-linear constrained optimization: PINCO a solver for the AC-optimal power flow

The energy transition is driving the integration of large shares of intermittent power sources in the electric power grid. Therefore, addressing the AC optimal power flow (AC-OPF) effectively becomes increasingly essential. The AC-OPF, which is a fundamental optimization problem in power systems, must be solved more frequently to ensure the safe and cost-effective operation of power systems. Due to its non-linear nature, AC-OPF is often solved in its linearized form, despite inherent inaccuracies. Non-linear solvers, such as the interior point method, are typically employed to solve the full OPF problem. However, these iterative methods may not converge for large systems and do not guarantee global optimality. This work explores a physics-informed graph neural network, PINCO, to solve the AC-OPF. We demonstrate that this method provides accurate solutions in a fraction of the computational time when compared to the established non-linear programming solvers. Remarkably, PINCO generalizes effectively across a diverse set of loading conditions in the power system. We show that our method can solve the AC-OPF without violating inequality constraints. Furthermore, it can function both as a solver and as a hybrid universal function approximator. Moreover, the approach can be easily adapted to different power systems with minimal adjustments to the hyperparameters, including systems with multiple generators at each bus. Overall, this work demonstrates an advancement in the field of power system optimization to tackle the challenges of the energy transition. The code and data utilized in this paper are available at https://anonymous.4open.science/r/opf_pinn_iclr-B83E/.

Updated: 2024-10-07 08:08:36

标题: 物理信息图神经网络应用于非线性约束优化：PINCO用于交流最优功率流的求解器

摘要: 能源转型正在推动间歇性电源在电力网中的大规模整合。因此，有效解决交流最优功率流（AC-OPF）问题变得越来越重要。AC-OPF是电力系统中的一个基本优化问题，必须更频繁地解决以确保电力系统的安全和经济运行。由于其非线性特性，AC-OPF通常以线性化形式解决，尽管存在固有的不准确性。非线性求解器，如内点法，通常用于解决完整的OPF问题。然而，这些迭代方法可能对于大型系统不收敛，并且不能保证全局最优性。本文探讨了一种物理信息图神经网络，PINCO，用于解决AC-OPF问题。我们证明了与已建立的非线性规划求解器相比，该方法在计算时间的一小部分内提供准确的解决方案。值得注意的是，PINCO能够有效地泛化到电力系统中各种负载条件。我们展示了我们的方法可以在不违反不等式约束的情况下解决AC-OPF问题。此外，它既可以作为求解器，也可以作为混合通用函数逼近器。此外，该方法可以很容易地适应不同的电力系统，只需对超参数进行最小的调整，包括每个母线上有多个发电机的系统。总的来说，这项工作展示了在电力系统优化领域取得的进展，以应对能源转型的挑战。本文中使用的代码和数据可在https://anonymous.4open.science/r/opf_pinn_iclr-B83E/ 上找到。

更新时间: 2024-10-07 08:08:36

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2410.04818v1

Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders

Multiview systems have become a key technology in modern computer vision, offering advanced capabilities in scene understanding and analysis. However, these systems face critical challenges in bandwidth limitations and computational constraints, particularly for resource-limited camera nodes like drones. This paper presents a novel approach for communication-efficient distributed multiview detection and tracking using masked autoencoders (MAEs). We introduce a semantic-guided masking strategy that leverages pre-trained segmentation models and a tunable power function to prioritize informative image regions. This approach, combined with an MAE, reduces communication overhead while preserving essential visual information. We evaluate our method on both virtual and real-world multiview datasets, demonstrating comparable performance in terms of detection and tracking performance metrics compared to state-of-the-art techniques, even at high masking ratios. Our selective masking algorithm outperforms random masking, maintaining higher accuracy and precision as the masking ratio increases. Furthermore, our approach achieves a significant reduction in transmission data volume compared to baseline methods, thereby balancing multiview tracking performance with communication efficiency.

Updated: 2024-10-07 08:06:41

标题: 资源高效的多视图感知：将语义掩蔽与掩蔽自动编码器集成

摘要: 多视图系统已成为现代计算机视觉中的关键技术，提供了在场景理解和分析方面的高级能力。然而，这些系统面临带宽限制和计算约束等关键挑战，尤其是对于资源有限的摄像机节点如无人机。本文提出了一种新颖的方法，使用掩蔽自编码器（MAEs）进行通信高效的分布式多视图检测和跟踪。我们引入了一种语义引导的掩蔽策略，利用预训练的分割模型和可调节的功率函数来优先考虑信息丰富的图像区域。这种方法结合MAE，减少了通信开销，同时保留了必要的视觉信息。我们在虚拟和现实世界的多视图数据集上评估了我们的方法，展示了与最先进技术相比，在检测和跟踪性能指标方面具有可比性的性能，即使在高掩蔽比率下也是如此。我们的选择性掩蔽算法优于随机掩蔽，随着掩蔽比率的增加，保持了更高的准确率和精度。此外，与基准方法相比，我们的方法实现了传输数据量的显著减少，从而在多视图跟踪性能和通信效率之间取得平衡。

更新时间: 2024-10-07 08:06:41

领域: cs.CV,cs.AI,eess.IV,eess.SP

下载: http://arxiv.org/abs/2410.04817v1

A Review of Artificial Intelligence based Biological-Tree Construction: Priorities, Methods, Applications and Trends

Biological tree analysis serves as a pivotal tool in uncovering the evolutionary and differentiation relationships among organisms, genes, and cells. Its applications span diverse fields including phylogenetics, developmental biology, ecology, and medicine. Traditional tree inference methods, while foundational in early studies, face increasing limitations in processing the large-scale, complex datasets generated by modern high-throughput technologies. Recent advances in deep learning offer promising solutions, providing enhanced data processing and pattern recognition capabilities. However, challenges remain, particularly in accurately representing the inherently discrete and non-Euclidean nature of biological trees. In this review, we first outline the key biological priors fundamental to phylogenetic and differentiation tree analyses, facilitating a deeper interdisciplinary understanding between deep learning researchers and biologists. We then systematically examine the commonly used data formats and databases, serving as a comprehensive resource for model testing and development. We provide a critical analysis of traditional tree generation methods, exploring their underlying biological assumptions, technical characteristics, and limitations. Current developments in deep learning-based tree generation are reviewed, highlighting both recent advancements and existing challenges. Furthermore, we discuss the diverse applications of biological trees across various biological domains. Finally, we propose potential future directions and trends in leveraging deep learning for biological tree research, aiming to guide further exploration and innovation in this field.

Updated: 2024-10-07 08:00:41

标题: 人工智能基于生物树构建的综述：优先事项、方法、应用和趋势

摘要: 生物树分析是揭示生物体、基因和细胞之间的演化和分化关系的关键工具。其应用涵盖了包括系统发育学、发育生物学、生态学和医学在内的多个领域。传统的树推断方法虽然在早期研究中具有基础性作用，但在处理现代高通量技术生成的大规模复杂数据集时面临越来越多的限制。深度学习的最新进展提供了有前途的解决方案，提供了增强的数据处理和模式识别能力。然而，挑战仍然存在，特别是在准确表示生物树固有的离散和非欧几里得特性方面。在本综述中，我们首先概述了对系统发育和分化树分析至关重要的生物学先验知识，促进了深度学习研究人员和生物学家之间更深入的跨学科理解。然后，我们系统地检查了常用的数据格式和数据库，为模型测试和开发提供了全面的资源。我们对传统树生成方法进行了批判性分析，探讨了它们的基本生物假设、技术特征和限制。回顾了基于深度学习的树生成的当前发展，突出了最近的进展和现有的挑战。此外，我们讨论了生物树在各种生物领域中的多样化应用。最后，我们提出了利用深度学习进行生物树研究的潜在未来方向和趋势，旨在引导这一领域的进一步探索和创新。

更新时间: 2024-10-07 08:00:41

领域: q-bio.PE,cs.AI

下载: http://arxiv.org/abs/2410.04815v1

Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data

In science, we are often interested in obtaining a generative model of the underlying system dynamics from observed time series. While powerful methods for dynamical systems reconstruction (DSR) exist when data come from a single domain, how to best integrate data from multiple dynamical regimes and leverage it for generalization is still an open question. This becomes particularly important when individual time series are short, and group-level information may help to fill in for gaps in single-domain data. At the same time, averaging is not an option in DSR, as it will wipe out crucial dynamical properties (e.g., limit cycles in one domain vs. chaos in another). Hence, a framework is needed that enables to efficiently harvest group-level (multi-domain) information while retaining all single-domain dynamical characteristics. Here we provide such a hierarchical approach and showcase it on popular DSR benchmarks, as well as on neuroscientific and medical time series. In addition to faithful reconstruction of all individual dynamical regimes, our unsupervised methodology discovers common low-dimensional feature spaces in which datasets with similar dynamics cluster. The features spanning these spaces were further dynamically highly interpretable, surprisingly in often linear relation to control parameters that govern the dynamics of the underlying system. Finally, we illustrate transfer learning and generalization to new parameter regimes.

Updated: 2024-10-07 07:54:53

标题: 学习可解释的分层动态系统模型：从时间序列数据中

摘要: 在科学中，我们经常对从观察到的时间序列中获取底层系统动态的生成模型感兴趣。虽然在数据来自单个域时存在强大的动态系统重建（DSR）方法，但如何最好地整合来自多个动态机制的数据并利用它进行泛化仍然是一个开放的问题。当个体时间序列较短且群体级别信息可能有助于填补单一域数据的空白时，这变得尤为重要。与此同时，在DSR中平均化不是一个选择，因为它会消除关键的动态特性（例如，在一个域中的极限循环与在另一个域中的混沌）。因此，需要一个框架，能够有效地收集群体级别（多域）信息同时保留所有单一域动态特性。在这里，我们提供了这样一个分层方法，并在流行的DSR基准测试以及神经科学和医学时间序列上展示了它。除了忠实地重建所有个体动态机制外，我们的无监督方法还发现了具有相似动态的数据集在其中聚类的共同低维特征空间。跨越这些空间的特征进一步动态高度可解释，令人惊讶的是，它们通常与控制底层系统动态的参数之间存在线性关系。最后，我们展示了对新参数范围的迁移学习和泛化。

更新时间: 2024-10-07 07:54:53

领域: cs.LG,cs.AI,math.DS,nlin.CD,physics.data-an

下载: http://arxiv.org/abs/2410.04814v1

Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

With the aid of large language models, current conversational recommender system (CRS) has gaining strong abilities to persuade users to accept recommended items. While these CRSs are highly persuasive, they can mislead users by incorporating incredible information in their explanations, ultimately damaging the long-term trust between users and the CRS. To address this, we propose a simple yet effective method, called PC-CRS, to enhance the credibility of CRS's explanations during persuasion. It guides the explanation generation through our proposed credibility-aware persuasive strategies and then gradually refines explanations via post-hoc self-reflection. Experimental results demonstrate the efficacy of PC-CRS in promoting persuasive and credible explanations. Further analysis reveals the reason behind current methods producing incredible explanations and the potential of credible explanations to improve recommendation accuracy.

Updated: 2024-10-07 07:49:27

标题: 超越说服：朝向具有可信解释的会话式推荐系统

摘要: 通过大型语言模型的帮助，当前的对话式推荐系统（CRS）已经具有了强大的说服用户接受推荐物品的能力。虽然这些CRS非常具有说服力，但它们可能会通过在解释中加入不可信的信息来误导用户，最终损害用户与CRS之间的长期信任。为了解决这个问题，我们提出了一种简单而有效的方法，称为PC-CRS，以增强CRS在说服过程中的解释可信度。它通过我们提出的基于可信度的说服策略来指导解释生成，然后通过事后自我反思逐渐完善解释。实验结果表明，PC-CRS在促进具有说服力和可信度的解释方面的有效性。进一步分析揭示了当前方法产生不可信解释的原因以及可信解释提高推荐准确性的潜力。

更新时间: 2024-10-07 07:49:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.14399v2

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM's pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client's local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.

Updated: 2024-10-07 07:45:18

标题: FedBiP：具有个性化潜在扩散模型的异构一次性联邦学习

摘要: 最近，一次性联邦学习（OSFL）作为一种特殊的去中心化机器学习范式，受到了相当大的关注。OSFL仅需要一轮客户端数据或模型上传，相较于传统的联邦学习，可以降低通信成本并减轻隐私威胁。尽管存在这些有希望的前景，但现有方法在应用于真实世界的OSFL系统时面临客户数据的异质性和数据量有限等挑战。最近，潜在扩散模型（LDM）在通过在大规模数据集上进行预训练来合成高质量图像方面取得了显著进展，从而提供了一个潜在的解决方案来克服这些问题。然而，直接将预训练的LDM应用于异构的OSFL会导致合成数据中出现显著的分布偏移，从而导致在训练在这些数据上的分类模型的性能下降。这个问题在稀有领域，如医学影像领域，特别明显，这些领域在LDM的预训练数据中所占比例较小。为了解决这一挑战，我们提出了联邦双层个性化（FedBiP），该方法在实例级别和概念级别对预训练的LDM进行个性化。通过FedBiP，可以在不违反隐私规定的情况下，根据客户端的本地数据分布合成图像。FedBiP还是第一个能够同时解决OSFL中特征空间异质性和客户数据稀缺性的方法。我们的方法通过在带有特征空间异质性的三个OSFL基准测试以及具有标签异质性的具有挑战性的医学和卫星图像数据集上进行了大量实验验证。结果表明，FedBiP的有效性远远优于其他OSFL方法。

更新时间: 2024-10-07 07:45:18

领域: cs.LG,cs.CV,cs.DC,cs.MM

下载: http://arxiv.org/abs/2410.04810v1

Efficient Shield Synthesis via State-Space Transformation

We consider the problem of synthesizing safety strategies for control systems, also known as shields. Since the state space is infinite, shields are typically computed over a finite-state abstraction, with the most common abstraction being a rectangular grid. However, for many systems, such a grid does not align well with the safety property or the system dynamics. That is why a coarse grid is rarely sufficient, but a fine grid is typically computationally infeasible to obtain. In this paper, we show that appropriate state-space transformations can still allow to use a coarse grid at almost no computational overhead. We demonstrate in three case studies that our transformation-based synthesis outperforms a standard synthesis by several orders of magnitude. In the first two case studies, we use domain knowledge to select a suitable transformation. In the third case study, we instead report on results in engineering a transformation without domain knowledge.

Updated: 2024-10-07 07:41:11

标题: 通过状态空间转换实现高效的屏蔽合成

摘要: 我们考虑合成控制系统的安全策略的问题，也称为屏障。由于状态空间是无限的，通常会在一个有限状态抽象上计算屏障，最常见的抽象是一个矩形网格。然而，对于许多系统，这样的网格与安全属性或系统动态并不很好地匹配。这就是为什么粗略的网格很少足够，但通常很难获得细网格的计算。在本文中，我们展示适当的状态空间转换仍然可以让我们以几乎没有计算开销的方式使用粗网格。我们在三个案例研究中展示，我们基于转换的合成优于标准合成数个数量级。在前两个案例研究中，我们使用领域知识选择适当的转换。在第三个案例研究中，我们报告了在没有领域知识的情况下工程化转换的结果。

更新时间: 2024-10-07 07:41:11

领域: cs.LO,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.19911v4

HF-NTT: Hazard-Free Dataflow Accelerator for Number Theoretic Transform

Polynomial multiplication is one of the fundamental operations in many applications, such as fully homomorphic encryption (FHE). However, the computational inefficiency stemming from polynomials with many large-bit coefficients poses a significant challenge for the practical implementation of FHE. The Number Theoretic Transform (NTT) has proven an effective tool in enhancing polynomial multiplication, but a fast and adaptable method for generating NTT accelerators is lacking. In this paper, we introduce HF-NTT, a novel NTT accelerator. HF-NTT efficiently handles polynomials of varying degrees and moduli, allowing for a balance between performance and hardware resources by adjusting the number of Processing Elements (PEs). Meanwhile, we introduce a data movement strategy that eliminates the need for bit-reversal operations, resolves different hazards, and reduces the clock cycles. Furthermore, Our accelerator includes a hardware-friendly modular multiplication design and a configurable PE capable of adapting its data path, resulting in a universal architecture. We synthesized and implemented prototype using Vivado 2022.2, and evaluated it on the Xilinx Virtex-7 FPGA platform. The results demonstrate significant improvements in Area-Time-Product (ATP) and processing speed for different polynomial degrees. In scenarios involving multi-modulus polynomial multiplication, our prototype consistently outperforms other designs in both ATP and latency metrics.

Updated: 2024-10-07 07:31:38

标题: HF-NTT：无危险数据流加速器用于数论变换

摘要: 多项式乘法是许多应用中的基本操作之一，如全同态加密（FHE）。然而，由于具有许多大位系数的多项式导致的计算效率低下，对FHE的实际实现构成了重大挑战。数论变换（NTT）已被证明是增强多项式乘法的有效工具，但缺乏一种快速且适应性强的生成NTT加速器的方法。在本文中，我们介绍了HF-NTT，一种新颖的NTT加速器。HF-NTT有效地处理不同次数和模的多项式，通过调整处理单元（PEs）的数量，实现了性能和硬件资源之间的平衡。同时，我们引入了一种数据移动策略，消除了位逆序操作的需要，解决了不同的危险，并减少了时钟周期。此外，我们的加速器包括一个友好的硬件模块化乘法设计和一个可配置的PE，能够调整其数据路径，从而实现通用架构。我们使用Vivado 2022.2进行了综合和实现原型，并在Xilinx Virtex-7 FPGA平台上进行了评估。结果表明，不同多项式次数的面积-时间-乘积（ATP）和处理速度都得到了显著改善。在涉及多模数多项式乘法的场景中，我们的原型在ATP和延迟指标方面始终优于其他设计。

更新时间: 2024-10-07 07:31:38

领域: cs.AR,cs.CR

下载: http://arxiv.org/abs/2410.04805v1

Timer-XL: Long-Context Transformers for Unified Time Series Forecasting

We present Timer-XL, a generative Transformer for unified time series forecasting. To uniformly predict 1D and 2D time series, we generalize next token prediction, predominantly adopted for causal generation of 1D sequences, to multivariate next token prediction. The proposed paradigm uniformly formulates various forecasting scenarios as a long-context generation problem. We opt for the generative Transformer, which can capture global-range and causal dependencies while providing contextual flexibility, to implement unified forecasting on univariate series characterized by non-stationarity, multivariate time series with complicated dynamics and correlations, and covariate-informed contexts that include both endogenous and exogenous variables. Technically, we propose a universal TimeAttention to facilitate generative Transformers on time series, which can effectively capture fine-grained intra- and inter-series dependencies of flattened time series tokens (patches) and is further strengthened by position embeddings in both temporal and variable dimensions. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach. As a large time series model, it demonstrates notable model transferability by large-scale pre-training, as well as contextual flexibility in token lengths, positioning it as a one-for-all forecaster.

Updated: 2024-10-07 07:27:39

标题: Timer-XL：用于统一时间序列预测的长上下文Transformer

摘要: 我们提出了Timer-XL，一个用于统一时间序列预测的生成式Transformer。为了统一地预测1D和2D时间序列，我们将主要用于因果生成1D序列的下一个标记预测泛化为多元下一个标记预测。所提出的范式将各种预测场景统一地构建为长上下文生成问题。我们选择了生成式Transformer，它可以捕获全局范围和因果依赖性，同时提供上下文灵活性，以实现对具有非平稳性的单变量系列、具有复杂动态和相关性的多元时间序列，以及包括内生和外生变量的协变量信息的统一预测。在技术上，我们提出了一个通用的TimeAttention，以促进时间序列上的生成式Transformer，它可以有效捕获平坦时间序列标记（补丁）的细粒度内部和跨序列依赖关系，并在时间和变量维度上通过位置嵌入进一步加强。通过统一方法，Timer-XL在具有挑战性的预测基准上实现了最先进的性能。作为一个大型时间序列模型，它通过大规模预训练展现了显着的模型可转移性，同时在标记长度上具有上下文灵活性，使其成为一个全能预测器。

更新时间: 2024-10-07 07:27:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.04803v1

Building Damage Assessment in Conflict Zones: A Deep Learning Approach Using Geospatial Sub-Meter Resolution Data

Very High Resolution (VHR) geospatial image analysis is crucial for humanitarian assistance in both natural and anthropogenic crises, as it allows to rapidly identify the most critical areas that need support. Nonetheless, manually inspecting large areas is time-consuming and requires domain expertise. Thanks to their accuracy, generalization capabilities, and highly parallelizable workload, Deep Neural Networks (DNNs) provide an excellent way to automate this task. Nevertheless, there is a scarcity of VHR data pertaining to conflict situations, and consequently, of studies on the effectiveness of DNNs in those scenarios. Motivated by this, our work extensively studies the applicability of a collection of state-of-the-art Convolutional Neural Networks (CNNs) originally developed for natural disasters damage assessment in a war scenario. To this end, we build an annotated dataset with pre- and post-conflict images of the Ukrainian city of Mariupol. We then explore the transferability of the CNN models in both zero-shot and learning scenarios, demonstrating their potential and limitations. To the best of our knowledge, this is the first study to use sub-meter resolution imagery to assess building damage in combat zones.

Updated: 2024-10-07 07:26:38

标题: 在冲突地区的建筑损坏评估：利用地理空间亚米分辨率数据的深度学习方法

摘要: Very High Resolution (VHR) geospatial image analysis is crucial for humanitarian assistance in both natural and anthropogenic crises, as it allows to rapidly identify the most critical areas that need support. Nonetheless, manually inspecting large areas is time-consuming and requires domain expertise. Thanks to their accuracy, generalization capabilities, and highly parallelizable workload, Deep Neural Networks (DNNs) provide an excellent way to automate this task. Nevertheless, there is a scarcity of VHR data pertaining to conflict situations, and consequently, of studies on the effectiveness of DNNs in those scenarios. Motivated by this, our work extensively studies the applicability of a collection of state-of-the-art Convolutional Neural Networks (CNNs) originally developed for natural disasters damage assessment in a war scenario. To this end, we build an annotated dataset with pre- and post-conflict images of the Ukrainian city of Mariupol. We then explore the transferability of the CNN models in both zero-shot and learning scenarios, demonstrating their potential and limitations. To the best of our knowledge, this is the first study to use sub-meter resolution imagery to assess building damage in combat zones.

更新时间: 2024-10-07 07:26:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.04802v1

Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering

The goal of this paper is to improve the performance of pretrained Vision Transformer (ViT) models, particularly DINOv2, in image clustering task without requiring re-training or fine-tuning. As model size increases, high-norm artifacts anomaly appears in the patches of multi-head attention. We observe that this anomaly leads to reduced accuracy in zero-shot image clustering. These artifacts are characterized by disproportionately large values in the attention map compared to other patch tokens. To address these artifacts, we propose an approach called Inference-Time Attention Engineering (ITAE), which manipulates attention function during inference. Specifically, we identify the artifacts by investigating one of the Query-Key-Value (QKV) patches in the multi-head attention and attenuate their corresponding attention values inside the pretrained models. ITAE shows improved clustering accuracy on multiple datasets by exhibiting more expressive features in latent space. Our findings highlight the potential of ITAE as a practical solution for reducing artifacts in pretrained ViT models and improving model performance in clustering tasks without the need for re-training or fine-tuning.

Updated: 2024-10-07 07:26:10

标题: 通过推理时间注意力工程改进图像聚类的文献

摘要: 本文旨在改善预训练的Vision Transformer（ViT）模型，特别是DINOv2，在图像聚类任务中的性能，而无需重新训练或微调。随着模型大小的增加，多头注意力的补丁中出现高规范异常。我们观察到这种异常导致零样本图像聚类的准确性降低。这些异常特征是指与其他补丁令牌相比，在注意力图中具有不成比例的大值。为了解决这些异常特征，我们提出了一种称为Inference-Time Attention Engineering（ITAE）的方法，在推断期间操纵注意力函数。具体来说，我们通过调查多头注意力中的一个Query-Key-Value（QKV）补丁来识别异常特征，并减弱预训练模型中它们对应的注意力值。ITAE通过在潜在空间中展示更具表现力的特征，在多个数据集上展现出改善的聚类准确性。我们的研究结果突显了ITAE作为减少预训练ViT模型中异常特征并改善聚类任务性能的潜力，而无需重新训练或微调。

更新时间: 2024-10-07 07:26:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.04801v1

Transforming Color: A Novel Image Colorization Method

This paper introduces a novel method for image colorization that utilizes a color transformer and generative adversarial networks (GANs) to address the challenge of generating visually appealing colorized images. Conventional approaches often struggle with capturing long-range dependencies and producing realistic colorizations. The proposed method integrates a transformer architecture to capture global information and a GAN framework to improve visual quality. In this study, a color encoder that utilizes a random normal distribution to generate color features is applied. These features are then integrated with grayscale image features to enhance the overall representation of the images. Our method demonstrates superior performance compared with existing approaches by utilizing the capacity of the transformer, which can capture long-range dependencies and generate a realistic colorization of the GAN. Experimental results show that the proposed network significantly outperforms other state-of-the-art colorization techniques, highlighting its potential for image colorization. This research opens new possibilities for precise and visually compelling image colorization in domains such as digital restoration and historical image analysis.

Updated: 2024-10-07 07:23:42

标题: 转变颜色：一种新颖的图像着色方法

摘要: 本文介绍了一种利用颜色变换器和生成对抗网络（GANs）进行图像着色的新方法，以解决生成视觉上吸引人的彩色图像的挑战。传统方法通常难以捕捉长距离依赖关系并产生逼真的着色。所提出的方法整合了一个变换器架构来捕捉全局信息，以及一个GAN框架来提高视觉质量。在本研究中，应用了一个利用随机正态分布生成颜色特征的颜色编码器。然后将这些特征与灰度图像特征整合，以增强图像的整体表现。我们的方法通过利用变换器的能力，可以捕捉长距离依赖关系并生成GAN的逼真着色，表现出优越的性能，与现有方法相比。实验结果表明，所提出的网络明显优于其他最先进的着色技术，突显了其在图像着色领域的潜力。这项研究为数字修复和历史图像分析等领域提供了精确和视觉上引人入胜的图像着色的新可能性。

更新时间: 2024-10-07 07:23:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.04799v1

EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts

Mistake action detection from egocentric videos is crucial for developing intelligent archives that detect workers' errors and provide feedback. Previous studies have been limited to specific domains, focused on detecting mistakes from videos without procedural texts, and analyzed whether actions are mistakes. To address these limitations, in this paper, we propose the EgoOops dataset, which includes egocentric videos, procedural texts, and three types of annotations: video-text alignment, mistake labels, and descriptions for mistakes. EgoOops covers five procedural domains and includes 50 egocentric videos. The video-text alignment allows the model to detect mistakes based on both videos and procedural texts. The mistake labels and descriptions enable detailed analysis of real-world mistakes. Based on EgoOops, we tackle two tasks: video-text alignment and mistake detection. For video-text alignment, we enhance the recent StepFormer model with an additional loss for fine-tuning. Based on the alignment results, we propose a multi-modal classifier to predict mistake labels. In our experiments, the proposed methods achieve higher performance than the baselines. In addition, our ablation study demonstrates the effectiveness of combining videos and texts. We will release the dataset and codes upon publication.

Updated: 2024-10-07 07:19:50

标题: EgoOops：一个用于从主观视角视频中检测错误操作的数据集，带有过程性文本

摘要: 从自我中心视频中检测错误动作对于开发能够检测工人错误并提供反馈的智能档案非常重要。先前的研究局限于特定领域，侧重于从没有过程文本的视频中检测错误，并分析动作是否错误。为了解决这些限制，本文提出了EgoOops数据集，其中包括自我中心视频、过程文本和三种类型的注释：视频文本对齐、错误标签和错误描述。EgoOops涵盖了五个过程领域，并包括50个自我中心视频。视频文本对齐允许模型基于视频和过程文本检测错误。错误标签和描述可实现对真实世界错误的详细分析。基于EgoOops，我们解决了两个任务：视频文本对齐和错误检测。对于视频文本对齐，我们使用附加损失对最近的StepFormer模型进行了优化。基于对齐结果，我们提出了一个多模态分类器来预测错误标签。在我们的实验中，所提出的方法实现了比基线更高的性能。此外，我们的消融研究证明了结合视频和文本的有效性。我们将在出版后发布数据集和代码。

更新时间: 2024-10-07 07:19:50

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.05343v1

Discrete Distribution Networks

We introduce a novel generative model, the Discrete Distribution Networks (DDN), that approximates data distribution using hierarchical discrete distributions. We posit that since the features within a network inherently capture distributional information, enabling the network to generate multiple samples simultaneously, rather than a single output, may offer an effective way to represent distributions. Therefore, DDN fits the target distribution, including continuous ones, by generating multiple discrete sample points. To capture finer details of the target data, DDN selects the output that is closest to the Ground Truth (GT) from the coarse results generated in the first layer. This selected output is then fed back into the network as a condition for the second layer, thereby generating new outputs more similar to the GT. As the number of DDN layers increases, the representational space of the outputs expands exponentially, and the generated samples become increasingly similar to the GT. This hierarchical output pattern of discrete distributions endows DDN with unique property: more general zero-shot conditional generation. We demonstrate the efficacy of DDN and its intriguing properties through experiments on CIFAR-10 and FFHQ. The code is available at https://discrete-distribution-networks.github.io/

Updated: 2024-10-07 07:14:23

标题: 离散分布网络

摘要: 我们介绍了一种新颖的生成模型，称为离散分布网络（DDN），它使用层次离散分布来近似数据分布。我们认为，由于网络内部的特征本质上捕捉了分布信息，使网络能够同时生成多个样本，而不是单个输出，可能是表示分布的一种有效方式。因此，DDN通过生成多个离散样本点来拟合目标分布，包括连续分布。为了捕捉目标数据的更细节，DDN从第一层生成的粗糙结果中选择最接近Ground Truth（GT）的输出。然后将此选定的输出作为条件馈送回网络的第二层，从而生成更类似于GT的新输出。随着DDN层数的增加，输出的表示空间呈指数级扩展，并且生成的样本越来越类似于GT。这种离散分布的层次输出模式赋予DDN独特的特性：更通用的零样本条件生成。我们通过在CIFAR-10和FFHQ上进行实验展示了DDN及其有趣的特性的有效性。代码可在https://discrete-distribution-networks.github.io/找到。

更新时间: 2024-10-07 07:14:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.00036v2

Analysis of Hybrid Compositions in Animation Film with Weakly Supervised Learning

We present an approach for the analysis of hybrid visual compositions in animation in the domain of ephemeral film. We combine ideas from semi-supervised and weakly supervised learning to train a model that can segment hybrid compositions without requiring pre-labeled segmentation masks. We evaluate our approach on a set of ephemeral films from 13 film archives. Results demonstrate that the proposed learning strategy yields a performance close to a fully supervised baseline. On a qualitative level the performed analysis provides interesting insights on hybrid compositions in animation film.

Updated: 2024-10-07 06:57:23

标题: 使用弱监督学习分析动画电影中的混合构图

摘要: 我们提出了一种用于分析短暂电影领域中混合视觉构图的方法。我们结合了半监督学习和弱监督学习的思想，训练了一个模型，可以在不需要预先标记分割蒙版的情况下分割混合构图。我们在来自13个电影档案的一组短暂电影上评估了我们的方法。结果表明，所提出的学习策略产生了接近完全监督基线的性能。在定性水平上，进行的分析为动画电影中的混合构图提供了有趣的见解。

更新时间: 2024-10-07 06:57:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.04789v1

Comparative Simulation of Phishing Attacks on a Critical Information Infrastructure Organization: An Empirical Study

Nowadays, cybersecurity is crucial. Therefore, cybersecurity awareness should be a concern for businesses, particularly critical infrastructure organizations. The results of this study, using simulated phishing attacks, indicate that in the first attempt, workers of a Thai railway firm received a phony email purporting to inform recipients of a special deal from a reputable retailer of IT equipment. The findings showed that 10.9% of the 735 workers fell for the scam. This demonstrates a good level of awareness regarding cyber dangers. The workers who were duped by the initial attack received awareness training. Next, a second attempt was carried out. This time, the strategy was for the workers to change their passwords through an email notification from the fake IT staff. According to the findings, 1.4% of the workers fell victim to both attacks (different email content), and a further 8.0% of the workers who did not fall victim to the first attack were deceived. Furthermore, after the statistical analysis, it was confirmed that there is a difference in the relationship between the workers and the two phishing attack simulations using different content. As a result, this study has demonstrated that different types of content can affect levels of awareness.

Updated: 2024-10-07 06:55:44

标题: 对关键信息基础设施组织进行网络钓鱼攻击的比较模拟：实证研究

摘要: 现今，网络安全至关重要。因此，网络安全意识应成为企业特别是关键基础设施组织的关注重点。本研究的结果表明，在模拟的网络钓鱼攻击中，泰国一家铁路公司的员工在第一次尝试中收到了一封假冒邮件，声称要通知接收者有关一家知名IT设备零售商的特别优惠。研究结果显示，在735名员工中，有10.9%受骗。这表明对网络危险有着较高的意识水平。受到初次攻击欺骗的员工接受了意识培训。接着，进行了第二次尝试。这次，策略是要求员工通过假冒的IT工作人员发送的电子邮件通知更改他们的密码。根据研究结果，有1.4%的员工分别受害于两次攻击（不同的邮件内容），另有8.0%的员工在第一次攻击未受害的情况下被欺骗。此外，在统计分析之后，确认了员工与使用不同内容的两种网络钓鱼攻击模拟之间的关系存在差异。因此，本研究表明不同类型的内容可以影响意识水平。

更新时间: 2024-10-07 06:55:44

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2410.20728v1

Fast Training of Sinusoidal Neural Fields via Scaling Initialization

Neural fields are an emerging paradigm that represent data as continuous functions parameterized by neural networks. Despite many advantages, neural fields often have a high training cost, which prevents a broader adoption. In this paper, we focus on a popular family of neural fields, called sinusoidal neural fields (SNFs), and study how it should be initialized to maximize the training speed. We find that the standard initialization scheme for SNFs -- designed based on the signal propagation principle -- is suboptimal. In particular, we show that by simply multiplying each weight (except for the last layer) by a constant, we can accelerate SNF training by 10$\times$. This method, coined $\textit{weight scaling}$, consistently provides a significant speedup over various data domains, allowing the SNFs to train faster than more recently proposed architectures. To understand why the weight scaling works well, we conduct extensive theoretical and empirical analyses which reveal that the weight scaling not only resolves the spectral bias quite effectively but also enjoys a well-conditioned optimization trajectory.

Updated: 2024-10-07 06:38:43

标题: 通过缩放初始化快速训练正弦神经场

摘要: 神经场是一种新兴的范式，将数据表示为由神经网络参数化的连续函数。尽管具有许多优点，神经场通常具有较高的训练成本，这阻碍了更广泛的采用。在本文中，我们专注于一种流行的神经场家族，称为正弦神经场（SNFs），并研究如何初始化以最大化训练速度。我们发现，为SNFs设计的标准初始化方案 -- 基于信号传播原理 -- 是次优的。特别地，我们展示，通过简单地将每个权重（除最后一层外）乘以一个常数，我们可以将SNF训练加速10倍。这种方法，被称为“权重缩放”，在各种数据领域上一贯提供显著的加速，使得SNFs比最近提出的架构更快地训练。为了理解为什么权重缩放效果好，我们进行了广泛的理论和实证分析，揭示了权重缩放不仅有效地解决了频谱偏差，而且享有一个良好条件的优化轨迹。

更新时间: 2024-10-07 06:38:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.04779v1

Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths

Advanced models such as OpenAI o1 exhibit impressive problem-solving capabilities through step-by-step reasoning. However, they may still falter on more complex problems, making errors that disrupt their reasoning paths. We attribute this to the expansive solution space, where each step has the risk of diverging into mistakes. To enhance language model reasoning, we introduce a specialized training framework called Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths. Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance. Reasoning Paths Optimization does not rely on large-scale human-annotated rationales or outputs from closed-source models, making it scalable and data-efficient. We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions. The experiments demonstrate that our framework significantly enhances the reasoning performance of large language models, with up to 3.1% and 4.3% improvement on GSM8K and MMLU (STEM) respectively. Our data and code can be found at https://reasoning-paths.github.io.

Updated: 2024-10-07 06:37:25

标题: 推理路径优化：学习从多样路径推理和探索

摘要: 先进模型如OpenAI o1通过逐步推理展现出令人印象深刻的问题解决能力。然而，在更复杂的问题上，它们仍可能会出错，这些错误会破坏它们的推理路径。我们将这归因于解决方案空间的扩展，其中每一步都有分歧导致错误的风险。为了增强语言模型的推理能力，我们引入了一种名为Reasoning Paths Optimization (RPO)的专门训练框架，该框架使学习能够从不同的路径进行推理和探索。我们的方法鼓励在每个推理步骤中选择有利的分支，同时惩罚不利的分支，从而提高模型的整体问题解决性能。Reasoning Paths Optimization不依赖于大规模人工注释的理由或来自闭源模型的输出，因此具有可扩展性和数据效率。我们关注多步推理任务，比如数学文字问题和基于科学的考试题。实验证明，我们的框架显著提高了大型语言模型的推理性能，在GSM8K和MMLU（STEM）上分别提高了3.1%和4.3%。我们的数据和代码可以在https://reasoning-paths.github.io找到。

更新时间: 2024-10-07 06:37:25

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.10858v1

On the Power of Randomization in Fair Classification and Representation

Fair classification and fair representation learning are two important problems in supervised and unsupervised fair machine learning, respectively. Fair classification asks for a classifier that maximizes accuracy on a given data distribution subject to fairness constraints. Fair representation maps a given data distribution over the original feature space to a distribution over a new representation space such that all classifiers over the representation satisfy fairness. In this paper, we examine the power of randomization in both these problems to minimize the loss of accuracy that results when we impose fairness constraints. Previous work on fair classification has characterized the optimal fair classifiers on a given data distribution that maximize accuracy subject to fairness constraints, e.g., Demographic Parity (DP), Equal Opportunity (EO), and Predictive Equality (PE). We refine these characterizations to demonstrate when the optimal randomized fair classifiers can surpass their deterministic counterparts in accuracy. We also show how the optimal randomized fair classifier that we characterize can be obtained as a solution to a convex optimization problem. Recent work has provided techniques to construct fair representations for a given data distribution such that any classifier over this representation satisfies DP. However, the classifiers on these fair representations either come with no or weak accuracy guarantees when compared to the optimal fair classifier on the original data distribution. Extending our ideas for randomized fair classification, we improve on these works, and construct DP-fair, EO-fair, and PE-fair representations that have provably optimal accuracy and suffer no accuracy loss compared to the optimal DP-fair, EO-fair, and PE-fair classifiers respectively on the original data distribution.

Updated: 2024-10-07 06:32:03

标题: 关于在公平分类和表示中随机化的力量

摘要: 公平分类和公平表示学习分别是监督和无监督公平机器学习中的两个重要问题。公平分类要求一个分类器在给定数据分布上最大化准确性，同时满足公平性约束。公平表示将一个给定数据分布从原始特征空间映射到一个新的表示空间，使得所有在表示上的分类器都满足公平性。在本文中，我们研究了随机化在这两个问题中的作用，以最小化当我们施加公平性约束时导致的准确性损失。以前关于公平分类的工作已经表征了在给定数据分布上最大化准确性的最优公平分类器，例如，人口统计平等（DP）、平等机会（EO）和预测平等（PE）。我们对这些表征进行了改进，以展示最优随机化公平分类器何时能够在准确性上超越其确定性对应物。我们还展示了我们表征的最优随机化公平分类器如何可以作为凸优化问题的解来获得。最近的工作提供了构建公平表示的技术，使得在这些表示上的任何分类器都满足DP。然而，与原始数据分布上的最优公平分类器相比，这些公平表示上的分类器要么没有准确性保证，要么准确性保证较弱。延伸我们的随机化公平分类思想，我们改进了这些工作，并构建了具有证明最优准确性的DP公平、EO公平和PE公平表示，与原始数据分布上的最优DP公平、EO公平和PE公平分类器相比没有准确性损失。

更新时间: 2024-10-07 06:32:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.03142v2

OmniBuds: A Sensory Earable Platform for Advanced Bio-Sensing and On-Device Machine Learning

Sensory earables have evolved from basic audio enhancement devices into sophisticated platforms for clinical-grade health monitoring and wellbeing management. This paper introduces OmniBuds, an advanced sensory earable platform integrating multiple biosensors and onboard computation powered by a machine learning accelerator, all within a real-time operating system (RTOS). The platform's dual-ear symmetric design, equipped with precisely positioned kinetic, acoustic, optical, and thermal sensors, enables highly accurate and real-time physiological assessments. Unlike conventional earables that rely on external data processing, OmniBuds leverage real-time onboard computation to significantly enhance system efficiency, reduce latency, and safeguard privacy by processing data locally. This capability includes executing complex machine learning models directly on the device. We provide a comprehensive analysis of OmniBuds' design, hardware and software architecture demonstrating its capacity for multi-functional applications, accurate and robust tracking of physiological parameters, and advanced human-computer interaction.

Updated: 2024-10-07 06:30:59

标题: OmniBuds：用于高级生物传感和设备上机器学习的感知耳戴平台

摘要: 感知耳机已经从基本的音频增强设备发展成为临床级健康监测和健康管理的先进平台。本文介绍了OmniBuds，这是一个集成多个生物传感器和机器学习加速器的先进感知耳机平台，所有这些都在实时操作系统（RTOS）中运行。该平台具有双耳对称设计，配备精确定位的动力学、声学、光学和热传感器，可以实现高度准确和实时的生理评估。与依赖外部数据处理的传统耳机不同，OmniBuds利用实时的设备内计算显著提高系统效率，减少延迟，并通过本地处理数据来保护隐私。这种能力包括在设备上直接执行复杂的机器学习模型。我们对OmniBuds的设计、硬件和软件架构进行了全面分析，展示了其多功能应用、准确和稳健的生理参数跟踪以及先进的人机交互能力。

更新时间: 2024-10-07 06:30:59

领域: cs.ET,cs.LG

下载: http://arxiv.org/abs/2410.04775v1

Intelligent Computing Social Modeling and Methodological Innovations in Political Science in the Era of Large Language Models

The recent wave of artificial intelligence, epitomized by large language models (LLMs), has presented opportunities and challenges for methodological innovation in political science, sparking discussions on a potential paradigm shift in the social sciences. However, how can we understand the impact of LLMs on knowledge production and paradigm transformation in the social sciences from a comprehensive perspective that integrates technology and methodology? What are LLMs' specific applications and representative innovative methods in political science research? These questions, particularly from a practical methodological standpoint, remain underexplored. This paper proposes the "Intelligent Computing Social Modeling" (ICSM) method to address these issues by clarifying the critical mechanisms of LLMs. ICSM leverages the strengths of LLMs in idea synthesis and action simulation, advancing intellectual exploration in political science through "simulated social construction" and "simulation validation." By simulating the U.S. presidential election, this study empirically demonstrates the operational pathways and methodological advantages of ICSM. By integrating traditional social science paradigms, ICSM not only enhances the quantitative paradigm's capability to apply big data to assess the impact of factors but also provides qualitative paradigms with evidence for social mechanism discovery at the individual level, offering a powerful tool that balances interpretability and predictability in social science research. The findings suggest that LLMs will drive methodological innovation in political science through integration and improvement rather than direct substitution.

Updated: 2024-10-07 06:30:59

标题: 大语言模型时代政治科学智能计算社会建模和方法创新

摘要: 最近的一波人工智能浪潮，以大型语言模型（LLMs）为典型，为政治科学中的方法论创新提供了机遇和挑战，引发了关于社会科学范式转变的讨论。然而，我们如何能够全面理解LLMs对社会科学知识生产和范式转变的影响，从整合技术和方法论的角度来看呢？在政治科学研究中，LLMs的具体应用和代表性创新方法是什么？这些问题，特别是从实践方法论的角度来看，尚未得到充分探讨。本文提出了“智能计算社会建模”（ICSM）方法来解决这些问题，通过澄清LLMs的关键机制。ICSM利用LLMs在思想综合和行动模拟方面的优势，通过“模拟社会构建”和“模拟验证”推进政治科学中的智力探索。通过模拟美国总统选举，本研究从实证角度展示了ICSM的操作路径和方法论优势。通过整合传统社会科学范式，ICSM不仅增强了定量范式应用大数据评估因素影响的能力，还为定性范式提供了个体层面社会机制发现的证据，提供了一种在社会科学研究中平衡可解释性和可预测性的强大工具。研究结果表明，LLMs将通过整合和改进而不是直接替代推动政治科学中的方法论创新。

更新时间: 2024-10-07 06:30:59

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2410.16301v1

AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure

Large language model (LLM)-based AI delegates are increasingly utilized to act on behalf of users, assisting them with a wide range of tasks through conversational interfaces. Despite their advantages, concerns arise regarding the potential risk of privacy leaks, particularly in scenarios involving social interactions. While existing research has focused on protecting privacy by limiting the access of AI delegates to sensitive user information, many social scenarios require disclosing private details to achieve desired outcomes, necessitating a balance between privacy protection and disclosure. To address this challenge, we conduct a pilot study to investigate user preferences for AI delegates across various social relations and task scenarios, and then propose a novel AI delegate system that enables privacy-conscious self-disclosure. Our user study demonstrates that the proposed AI delegate strategically protects privacy, pioneering its use in diverse and dynamic social interactions.

Updated: 2024-10-07 06:29:54

标题: AI代表的双重关注：确保隐私和战略性自我披露

摘要: 基于大型语言模型（LLM）的人工智能代表越来越多地被用于代表用户行动，通过对话界面帮助他们处理各种任务。尽管具有诸多优势，但人们对于潜在的隐私泄露风险表示担忧，特别是在涉及社交互动的场景中。尽管现有研究集中在通过限制人工智能代表对敏感用户信息的访问来保护隐私，但许多社交场景需要披露私人细节以实现期望的结果，这需要在隐私保护和披露之间取得平衡。为了解决这一挑战，我们进行了一项试点研究，调查用户对不同社交关系和任务场景下的人工智能代表的偏好，然后提出了一种新颖的AI代表系统，实现了注重隐私的自我披露。我们的用户研究表明，所提出的AI代表在战略上保护隐私，开拓了其在多样化和动态社交互动中的应用。

更新时间: 2024-10-07 06:29:54

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2409.17642v2

PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging

Lensless cameras offer significant advantages in size, weight, and cost compared to traditional lens-based systems. Without a focusing lens, lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, current algorithms struggle with inaccurate forward imaging models and insufficient priors to reconstruct high-quality images. To overcome these limitations, we introduce a novel two-stage approach for consistent and photorealistic lensless image reconstruction. The first stage of our approach ensures data consistency by focusing on accurately reconstructing the low-frequency content with a spatially varying deconvolution method that adjusts to changes in the Point Spread Function (PSF) across the camera's field of view. The second stage enhances photorealism by incorporating a generative prior from pre-trained diffusion models. By conditioning on the low-frequency content retrieved in the first stage, the diffusion model effectively reconstructs the high-frequency details that are typically lost in the lensless imaging process, while also maintaining image fidelity. Our method achieves a superior balance between data fidelity and visual quality compared to existing methods, as demonstrated with two popular lensless systems, PhlatCam and DiffuserCam. Project website: https://phocolens.github.io/.

Updated: 2024-10-07 06:23:51

标题: PhoCoLens：无透镜成像中的逼真和一致重建

摘要: 无镜头相机相比传统基于镜头的系统在尺寸、重量和成本上具有显著优势。无镜头相机没有聚焦镜头，依靠计算算法从多路复用的测量中恢复场景。然而，当前算法在不准确的前向成像模型和不足的先验条件下重建高质量图像时存在困难。为了克服这些限制，我们提出了一种新颖的两阶段方法，用于一致和逼真的无镜头图像重建。我们方法的第一阶段通过专注于利用空间变化的去卷积方法准确重建低频内容来确保数据一致性，该方法可以根据相机视野中光斑传播函数（PSF）的变化进行调整。第二阶段通过将经过预训练扩散模型的生成先验纳入，增强了照片逼真度。通过在第一阶段检索到的低频内容上进行条件控制，扩散模型有效地重建了通常在无镜头成像过程中丢失的高频细节，同时保持图像的保真度。我们的方法在数据保真度和视觉质量之间实现了优越的平衡，与现有方法相比，通过对两种流行的无镜头系统PhlatCam和DiffuserCam进行演示。项目网站：https://phocolens.github.io/。

更新时间: 2024-10-07 06:23:51

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.17996v2

Granular Ball Twin Support Vector Machine

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture ModelsTwin support vector machine (TSVM) is an emerging machine learning model with versatile applicability in classification and regression endeavors. Nevertheless, TSVM confronts noteworthy challenges: $(i)$ the imperative demand for matrix inversions presents formidable obstacles to its efficiency and applicability on large-scale datasets; $(ii)$ the omission of the structural risk minimization (SRM) principle in its primal formulation heightens the vulnerability to overfitting risks; and $(iii)$ the TSVM exhibits a high susceptibility to noise and outliers, and also demonstrates instability when subjected to resampling. In view of the aforementioned challenges, we propose the granular ball twin support vector machine (GBTSVM). GBTSVM takes granular balls, rather than individual data points, as inputs to construct a classifier. These granular balls, characterized by their coarser granularity, exhibit robustness to resampling and reduced susceptibility to the impact of noise and outliers. We further propose a novel large-scale granular ball twin support vector machine (LS-GBTSVM). LS-GBTSVM's optimization formulation ensures two critical facets: $(i)$ it eliminates the need for matrix inversions, streamlining the LS-GBTSVM's computational efficiency, and $(ii)$ it incorporates the SRM principle through the incorporation of regularization terms, effectively addressing the issue of overfitting. The proposed LS-GBTSVM exemplifies efficiency, scalability for large datasets, and robustness against noise and outliers. We conduct a comprehensive evaluation of the GBTSVM and LS-GBTSVM models on benchmark datasets from UCI, KEEL, and NDC datasets. Our experimental findings and statistical analyses affirm the superior generalization prowess of the proposed GBTSVM and LS-GBTSVM models.

Updated: 2024-10-07 06:20:36

标题: 颗粒球双支持向量机

摘要: 在混合模型中高效、可扩展地计算非参数最大似然估计器双支持向量机(TSVM)是一种新兴的机器学习模型，在分类和回归任务中具有广泛的适用性。然而，TSVM面临着显著的挑战：(i)对矩阵求逆的迫切需求对其在大规模数据集上的效率和适用性构成了巨大障碍；(ii)在其原始公式中忽略了结构风险最小化(SRM)原则，增加了过拟合风险的脆弱性；(iii)TSVM对噪声和异常值具有很高的敏感性，并且在重采样时表现不稳定。鉴于上述挑战，我们提出了颗粒球双支持向量机(GBTSVM)。GBTSVM将颗粒球作为输入，而不是单个数据点来构建分类器。这些颗粒球，以其更粗的粒度特征，对重采样表现出稳健性，并减少了对噪声和异常值的影响。我们进一步提出了一种新颖的大规模颗粒球双支持向量机(LS-GBTSVM)。LS-GBTSVM的优化公式确保了两个关键方面：(i)它消除了对矩阵求逆的需求，简化了LS-GBTSVM的计算效率；(ii)通过引入正则化项，它将SRM原则纳入其中，有效解决了过拟合问题。所提出的LS-GBTSVM展现了高效性、适用于大数据集的可扩展性，并对噪声和异常值具有抗干扰性。我们在UCI、KEEL和NDC数据集上对GBTSVM和LS-GBTSVM模型进行了全面评估。我们的实验结果和统计分析证实了所提出的GBTSVM和LS-GBTSVM模型具有卓越的泛化能力。

更新时间: 2024-10-07 06:20:36

领域: cs.LG

下载: http://arxiv.org/abs/2410.04774v1

Scalable and Adaptively Secure Any-Trust Distributed Key Generation and All-hands Checkpointing

The classical distributed key generation protocols (DKG) are resurging due to their widespread applications in blockchain. While efforts have been made to improve DKG communication, practical large-scale deployments are still yet to come due to various challenges, including the heavy computation and communication (particularly broadcast) overhead in their adversarial cases. In this paper, we propose a practical DKG for DLog-based cryptosystems, which achieves (quasi-)linear computation and communication per-node cost with the help of a common coin, even in the face of the maximal amount of Byzantine nodes. Moreover, our protocol is secure against adaptive adversaries, which can corrupt less than half of all nodes. The key to our improvements lies in delegating the most costly operations to an Any-Trust group together with a set of techniques for adaptive security. This group is randomly sampled and consists of a small number of individuals. The population only trusts that at least one member in the group is honest, without knowing which one. Moreover, we present a generic transformer that enables us to efficiently deploy a conventional distributed protocol like our DKG, even when the participants have different weights. Additionally, we introduce an extended broadcast channel based on a blockchain and data dispersal network (such as IPFS), enabling reliable broadcasting of arbitrary-size messages at the cost of constant-size blockchain storage.

Updated: 2024-10-07 06:20:17

标题: 可扩展和自适应安全的任信任分布式密钥生成和全员检查点

摘要: 经典的分布式密钥生成协议（DKG）由于其在区块链中的广泛应用而再次兴起。虽然人们已经努力改进DKG通信，但由于各种挑战，包括在对抗性情况下的大量计算和通信（特别是广播）开销，实际的大规模部署仍然尚未到来。在本文中，我们提出了一种基于DLog的密码系统的实用DKG，即使面对最大数量的拜占庭节点，也能实现每个节点成本（准）线性的计算和通信，借助一个公共硬币。此外，我们的协议对自适应对手具有安全性，这些对手可以破坏不到一半的所有节点。我们的改进关键在于将最昂贵的操作委托给一个任意信任组，以及一系列适应性安全技术。该组是随机抽样的，由少数个体组成。人口只相信该组中至少有一个成员是诚实的，而不知道是哪一个。此外，我们提出了一个通用的转换器，使我们能够有效地部署像我们的DKG这样的传统分布式协议，即使参与者有不同的权重。此外，我们引入了一种基于区块链和数据分散网络（如IPFS）的扩展广播通道，实现了可靠地广播任意大小的消息，以恒定大小的区块链存储成本。

更新时间: 2024-10-07 06:20:17

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2311.09592v4

From Transparency to Accountability and Back: A Discussion of Access and Evidence in AI Auditing

Artificial intelligence (AI) is increasingly intervening in our lives, raising widespread concern about its unintended and undeclared side effects. These developments have brought attention to the problem of AI auditing: the systematic evaluation and analysis of an AI system, its development, and its behavior relative to a set of predetermined criteria. Auditing can take many forms, including pre-deployment risk assessments, ongoing monitoring, and compliance testing. It plays a critical role in providing assurances to various AI stakeholders, from developers to end users. Audits may, for instance, be used to verify that an algorithm complies with the law, is consistent with industry standards, and meets the developer's claimed specifications. However, there are many operational challenges to AI auditing that complicate its implementation. In this work, we examine a key operational issue in AI auditing: what type of access to an AI system is needed to perform a meaningful audit? Addressing this question has direct policy relevance, as it can inform AI audit guidelines and requirements. We begin by discussing the factors that auditors balance when determining the appropriate type of access, and unpack the benefits and drawbacks of four types of access. We conclude that, at minimum, black-box access -- providing query access to a model without exposing its internal implementation -- should be granted to auditors, as it balances concerns related to trade secrets, data privacy, audit standardization, and audit efficiency. We then suggest a framework for determining how much further access (in addition to black-box access) to grant auditors. We show that auditing can be cast as a natural hypothesis test, draw parallels hypothesis testing and legal procedure, and argue that this framing provides clear and interpretable guidance on audit implementation.

Updated: 2024-10-07 06:15:46

标题: 从透明到问责再到透明：AI审计中的访问和证据讨论

摘要: 人工智能（AI）越来越多地介入我们的生活，引起了对其意外和未申报副作用的广泛关注。这些发展引起了对AI审计问题的关注：对AI系统、其开发和行为进行系统评估和分析，相对于一组预定标准。审计可以采取多种形式，包括部署前的风险评估、持续监测和合规性测试。它在向各种AI利益相关者提供保证方面发挥着关键作用，从开发人员到最终用户。例如，审计可用于验证算法是否符合法律要求，是否符合行业标准，并且是否符合开发人员声明的规格。然而，AI审计存在许多操作挑战，使其实施变得复杂。在这项工作中，我们研究了AI审计中的一个关键操作问题：为了进行有意义的审计，需要何种类型的对AI系统的访问权限？回答这个问题与政策直接相关，因为它可以为AI审计指南和要求提供信息。我们首先讨论了审计员在确定适当的访问类型时平衡的因素，并解开了四种访问方式的优缺点。我们得出结论，至少应该向审计员授予黑盒访问权限——向模型提供查询访问权限，而不暴露其内部实现，因为这平衡了与商业秘密、数据隐私、审计标准化和审计效率相关的问题。然后，我们建议一个框架，用于确定额外访问权限（除了黑盒访问权限）应该授予审计员多少。我们展示了审计可以被视为一种自然假设检验，将假设检验与法律程序进行类比，并认为这种框架提供了对审计实施的清晰和可解释的指导。

更新时间: 2024-10-07 06:15:46

领域: cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.04772v1

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https://physdreamer.github.io/.

Updated: 2024-10-07 06:08:09

标题: PhysDreamer: 通过视频生成与三维物体的基于物理的交互

摘要: 现实物体的相互作用对于创建沉浸式虚拟体验至关重要，然而合成逼真的三维物体动力学以响应新颖的互动仍然是一个重大挑战。与无条件或文本条件的动力学生成不同，动作条件的动力学需要感知物体的物理材料属性，并将三维运动预测基于这些属性，如物体的刚度。然而，由于缺乏材料的真实数据，估计物理材料属性是一个悬而未决的问题，因为测量这些属性对于真实物体来说非常困难。我们提出了PhysDreamer，这是一种基于物理的方法，通过利用视频生成模型学习的物体动力学先验，赋予静态三维物体交互动力学。通过提炼这些先验，PhysDreamer使得合成物体对新颖互动（如外部力或代理操纵）做出逼真响应成为可能。我们在弹性物体的多样例子上展示了我们的方法，并通过用户研究评估了合成互动的逼真程度。PhysDreamer通过使静态三维物体以一种物理上合理的方式动态响应交互刺激，迈出了更引人入胜和逼真的虚拟体验的一步。请访问我们的项目页面https://physdreamer.github.io/。

更新时间: 2024-10-07 06:08:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.13026v2

Molecular topological deep learning for polymer property prediction

Accurate and efficient prediction of polymer properties is of key importance for polymer design. Traditional experimental tools and density function theory (DFT)-based simulations for polymer property evaluation, are both expensive and time-consuming. Recently, a gigantic amount of graph-based molecular models have emerged and demonstrated huge potential in molecular data analysis. Even with the great progresses, these models tend to ignore the high-order and mutliscale information within the data. In this paper, we develop molecular topological deep learning (Mol-TDL) for polymer property analysis. Our Mol-TDL incorporates both high-order interactions and multiscale properties into topological deep learning architecture. The key idea is to represent polymer molecules as a series of simplicial complices at different scales and build up simplical neural networks accordingly. The aggregated information from different scales provides a more accurate prediction of polymer molecular properties.

Updated: 2024-10-07 05:44:02

标题: 分子拓扑深度学习用于聚合物性质预测

摘要: 精确高效地预测聚合物性质对聚合物设计至关重要。传统的实验工具和基于密度泛函理论（DFT）的模拟用于聚合物性质评估，既昂贵又耗时。最近，出现了大量基于图的分子模型，并在分子数据分析中展示了巨大潜力。尽管取得了巨大进展，这些模型往往忽略了数据中的高阶和多尺度信息。在本文中，我们开发了分子拓扑深度学习（Mol-TDL）用于聚合物性质分析。我们的Mol-TDL将高阶相互作用和多尺度性质融入拓扑深度学习架构。关键思想是将聚合物分子表示为不同尺度的单纯复杂系列，并相应地构建单纯神经网络。来自不同尺度的聚合信息提供了对聚合物分子性质更准确的预测。

更新时间: 2024-10-07 05:44:02

领域: cond-mat.mtrl-sci,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.04765v1

Double Oracle Neural Architecture Search for Game Theoretic Deep Learning Models

In this paper, we propose a new approach to train deep learning models using game theory concepts including Generative Adversarial Networks (GANs) and Adversarial Training (AT) where we deploy a double-oracle framework using best response oracles. GAN is essentially a two-player zero-sum game between the generator and the discriminator. The same concept can be applied to AT with attacker and classifier as players. Training these models is challenging as a pure Nash equilibrium may not exist and even finding the mixed Nash equilibrium is difficult as training algorithms for both GAN and AT have a large-scale strategy space. Extending our preliminary model DO-GAN, we propose the methods to apply the double oracle framework concept to Adversarial Neural Architecture Search (NAS for GAN) and Adversarial Training (NAS for AT) algorithms. We first generalize the players' strategies as the trained models of generator and discriminator from the best response oracles. We then compute the meta-strategies using a linear program. For scalability of the framework where multiple network models of best responses are stored in the memory, we prune the weakly-dominated players' strategies to keep the oracles from becoming intractable. Finally, we conduct experiments on MNIST, CIFAR-10 and TinyImageNet for DONAS-GAN. We also evaluate the robustness under FGSM and PGD attacks on CIFAR-10, SVHN and TinyImageNet for DONAS-AT. We show that all our variants have significant improvements in both subjective qualitative evaluation and quantitative metrics, compared with their respective base architectures.

Updated: 2024-10-07 05:42:01

标题: 双预言神经架构搜索用于博弈论深度学习模型

摘要: 在本文中，我们提出了一种利用博弈论概念包括生成对抗网络（GANs）和对抗训练（AT）训练深度学习模型的新方法，其中我们使用最佳响应预言者部署了一个双预言者框架。GAN基本上是一个生成器和鉴别器之间的二人零和博弈。同样的概念可以应用于以攻击者和分类器为玩家的AT。训练这些模型是具有挑战性的，因为纯纳什均衡可能不存在，甚至找到混合纳什均衡也很困难，因为GAN和AT的训练算法具有大规模的策略空间。扩展我们的初步模型DO-GAN，我们提出了将双预言者框架概念应用于对抗神经架构搜索（GAN的NAS）和对抗训练（AT的NAS）算法的方法。我们首先将玩家的策略泛化为生成器和鉴别器的训练模型，这些模型来自最佳响应预言者。然后，我们使用线性规划计算元策略。为了使多个网络模型的最佳响应存储在内存中的框架具有可扩展性，我们修剪弱支配玩家的策略，以防止预言者变得难以处理。最后，我们对MNIST、CIFAR-10和TinyImageNet进行了DONAS-GAN的实验。我们还评估了在CIFAR-10、SVHN和TinyImageNet上对FGSM和PGD攻击的DONAS-AT的鲁棒性。我们展示了与各自基础架构相比，所有我们的变种在主观定性评估和定量指标方面都有显著改进。

更新时间: 2024-10-07 05:42:01

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2410.04764v1

Stochastic Runge-Kutta Methods: Provable Acceleration of Diffusion Models

Diffusion models play a pivotal role in contemporary generative modeling, claiming state-of-the-art performance across various domains. Despite their superior sample quality, mainstream diffusion-based stochastic samplers like DDPM often require a large number of score function evaluations, incurring considerably higher computational cost compared to single-step generators like generative adversarial networks. While several acceleration methods have been proposed in practice, the theoretical foundations for accelerating diffusion models remain underexplored. In this paper, we propose and analyze a training-free acceleration algorithm for SDE-style diffusion samplers, based on the stochastic Runge-Kutta method. The proposed sampler provably attains $\varepsilon^2$ error -- measured in KL divergence -- using $\widetilde O(d^{3/2} / \varepsilon)$ score function evaluations (for sufficiently small $\varepsilon$), strengthening the state-of-the-art guarantees $\widetilde O(d^{3} / \varepsilon)$ in terms of dimensional dependency. Numerical experiments validate the efficiency of the proposed method.

Updated: 2024-10-07 05:34:51

标题: 随机龙格-库塔方法：扩散模型的可证加速

摘要: 扩散模型在当代生成建模中发挥着关键作用，在各个领域都取得了最先进的性能。尽管主流基于扩散的随机取样器（如DDPM）具有优越的样本质量，但通常需要大量的评分函数评估，与生成对抗网络等单步生成器相比，其计算成本要高得多。虽然在实践中提出了几种加速方法，但加速扩散模型的理论基础仍未得到充分探讨。在本文中，我们提出并分析了一种基于随机Runge-Kutta方法的SDE风格扩散采样器的无培训加速算法。所提出的采样器可以证明在KL散度中使用$ \varepsilon ^ 2 $误差时，需要$ \widetilde O（d ^ {3/2} / \varepsilon）$的评分函数评估（对于足够小的$ \varepsilon $），从而加强了维度依赖性方面的最先进保证$ \widetilde O（d ^ {3} / \varepsilon）$。数值实验证实了所提方法的效率。

更新时间: 2024-10-07 05:34:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.04760v1

IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS

Recent advancements in text-to-speech (TTS) synthesis show that large-scale models trained with extensive web data produce highly natural-sounding output. However, such data is scarce for Indian languages due to the lack of high-quality, manually subtitled data on platforms like LibriVox or YouTube. To address this gap, we enhance existing large-scale ASR datasets containing natural conversations collected in low-quality environments to generate high-quality TTS training data. Our pipeline leverages the cross-lingual generalization of denoising and speech enhancement models trained on English and applied to Indian languages. This results in IndicVoices-R (IV-R), the largest multilingual Indian TTS dataset derived from an ASR dataset, with 1,704 hours of high-quality speech from 10,496 speakers across 22 Indian languages. IV-R matches the quality of gold-standard TTS datasets like LJSpeech, LibriTTS, and IndicTTS. We also introduce the IV-R Benchmark, the first to assess zero-shot, few-shot, and many-shot speaker generalization capabilities of TTS models on Indian voices, ensuring diversity in age, gender, and style. We demonstrate that fine-tuning an English pre-trained model on a combined dataset of high-quality IndicTTS and our IV-R dataset results in better zero-shot speaker generalization compared to fine-tuning on the IndicTTS dataset alone. Further, our evaluation reveals limited zero-shot generalization for Indian voices in TTS models trained on prior datasets, which we improve by fine-tuning the model on our data containing diverse set of speakers across language families. We open-source all data and code, releasing the first TTS model for all 22 official Indian languages.

Updated: 2024-10-07 05:29:01

标题: IndicVoices-R：解锁用于扩展印度TTS的大规模多语言多说话者语音语料库

摘要: 最近的文本到语音（TTS）合成技术取得了进展，表明使用大规模模型在广泛的网络数据上训练可以产生非常自然的输出。然而，由于在平台如LibriVox或YouTube上缺乏高质量的手动字幕数据，印度语言的这类数据很少。为了填补这一空白，我们改进了现有的大规模自动语音识别（ASR）数据集，其中包含在低质量环境中收集的自然对话，以生成高质量的TTS训练数据。我们的流水线利用了在英语上训练的去噪和语音增强模型的跨语言泛化，并应用于印度语言。这导致了IndicVoices-R（IV-R），这是从ASR数据集中派生的最大的多语种印度语TTS数据集，包含来自22种印度语言的10,496位发言者的1,704小时高质量语音。IV-R与LJSpeech、LibriTTS和IndicTTS等金标准TTS数据集的质量相匹配。我们还引入了IV-R Benchmark，这是第一个评估TTS模型在印度语音上的零样本、少样本和多样本说话者泛化能力的基准，确保了年龄、性别和风格的多样性。我们证明，在高质量的IndicTTS和我们的IV-R数据集的组合数据集上对英语预训练模型进行微调，比仅在IndicTTS数据集上微调获得更好的零样本说话者泛化。此外，我们的评估显示，对于以前的数据集上训练的TTS模型，在印度语音上的零样本泛化能力有限，我们通过在包含跨语言家族的各种发言者的数据上微调模型来改进。我们开源所有数据和代码，发布了第一个适用于所有22种官方印度语言的TTS模型。

更新时间: 2024-10-07 05:29:01

领域: cs.CL,cs.LG,cs.SD,eess.SP

下载: http://arxiv.org/abs/2409.05356v2

Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM

This work presents an interpretable decision-making framework for autonomous vehicles that integrates traffic regulations, norms, and safety guidelines comprehensively and enables seamless adaptation to different regions. While traditional rule-based methods struggle to incorporate the full scope of traffic rules, we develop a Traffic Regulation Retrieval (TRR) Agent based on Retrieval-Augmented Generation (RAG) to automatically retrieve relevant traffic rules and guidelines from extensive regulation documents and relevant records based on the ego vehicle's situation. Given the semantic complexity of the retrieved rules, we also design a reasoning module powered by a Large Language Model (LLM) to interpret these rules, differentiate between mandatory rules and safety guidelines, and assess actions on legal compliance and safety. Additionally, the reasoning is designed to be interpretable, enhancing both transparency and reliability. The framework demonstrates robust performance on both hypothesized and real-world cases across diverse scenarios, along with the ability to adapt to different regions with ease.

Updated: 2024-10-07 05:27:22

标题: 驾驶规则：LLM增强推理的自主车辆可解释决策制定

摘要: 这项工作提出了一个可解释的自主车辆决策框架，综合整合了交通法规、规范和安全指南，并实现了对不同地区的无缝适应。传统的基于规则的方法难以涵盖交通规则的全部范围，我们基于检索增强生成（RAG）开发了一个交通规则检索（TRR）代理，可以自动从广泛的法规文件和相关记录中检索与自车情况相关的交通规则和指南。鉴于检索到的规则的语义复杂性，我们还设计了一个由大型语言模型（LLM）驱动的推理模块，用于解释这些规则，区分强制性规则和安全指南，并评估行动的合法性和安全性。此外，推理设计为可解释的，增强了透明度和可靠性。该框架在各种场景下的假设和实际案例中表现出鲁棒的性能，同时具有轻松适应不同地区的能力。

更新时间: 2024-10-07 05:27:22

领域: cs.AI

下载: http://arxiv.org/abs/2410.04759v1

Item Cluster-aware Prompt Learning for Session-based Recommendation

Session-based recommendation (SBR) aims to capture dynamic user preferences by analyzing item sequences within individual sessions. However, most existing approaches focus mainly on intra-session item relationships, neglecting the connections between items across different sessions (inter-session relationships), which limits their ability to fully capture complex item interactions. While some methods incorporate inter-session information, they often suffer from high computational costs, leading to longer training times and reduced efficiency. To address these challenges, we propose the CLIP-SBR (Cluster-aware Item Prompt learning for Session-Based Recommendation) framework. CLIP-SBR is composed of two modules: 1) an item relationship mining module that builds a global graph to effectively model both intra- and inter-session relationships, and 2) an item cluster-aware prompt learning module that uses soft prompts to integrate these relationships into SBR models efficiently. We evaluate CLIP-SBR across eight SBR models and three benchmark datasets, consistently demonstrating improved recommendation performance and establishing CLIP-SBR as a robust solution for session-based recommendation tasks.

Updated: 2024-10-07 05:20:21

标题: 项目集群感知的会话推荐中的提示学习

摘要: 基于会话的推荐（SBR）旨在通过分析个体会话内的项目序列来捕捉动态用户偏好。然而，大多数现有方法主要关注会话内项目之间的关系，忽略了跨不同会话的项目之间的连接（会话间关系），这限制了它们充分捕捉复杂项目交互的能力。虽然一些方法整合了会话间信息，但往往面临高计算成本的问题，导致训练时间更长且效率降低。为了解决这些挑战，我们提出了CLIP-SBR（Cluster-aware Item Prompt learning for Session-Based Recommendation）框架。CLIP-SBR由两个模块组成：1）一个项目关系挖掘模块，用于构建全局图以有效建模会话内和会话间关系；2）一个项目集群感知提示学习模块，使用软提示将这些关系有效地整合到SBR模型中。我们在八个SBR模型和三个基准数据集上评估了CLIP-SBR，不断展示出改进的推荐性能，并将其确立为会话推荐任务的稳健解决方案。

更新时间: 2024-10-07 05:20:21

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.04756v1

A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies: Taxonomy, Corpus and GDPR Concept Classifiers

Machine learning based classifiers that take a privacy policy as the input and predict relevant concepts are useful in different applications such as (semi-)automated compliance analysis against requirements of the EU GDPR. In all past studies, such classifiers produce a concept label per segment (e.g., sentence or paragraph) and their performances were evaluated by using a dataset of labeled segments without considering the privacy policy they belong to. However, such an approach could overestimate the performance in real-world settings, where all segments in a new privacy policy are supposed to be unseen. Additionally, we also observed other research gaps, including the lack of a more complete GDPR taxonomy and the less consideration of hierarchical information in privacy policies. To fill such research gaps, we developed a more complete GDPR taxonomy, created the first corpus of labeled privacy policies with hierarchical information, and conducted the most comprehensive performance evaluation of GDPR concept classifiers for privacy policies. Our work leads to multiple novel findings, including the confirmed inappropriateness of splitting training and test sets at the segment level, the benefits of considering hierarchical information, and the limitations of the "one size fits all" approach, and the significance of testing cross-corpus generalizability.

Updated: 2024-10-07 05:19:12

标题: 基于GDPR的隐私政策分析的全面研究：分类法、语料库和GDPR概念分类器

摘要: 基于机器学习的分类器以隐私政策作为输入，并预测相关概念，在不同应用中非常有用，例如（半）自动化合规性分析对欧盟GDPR要求。在所有过去的研究中，这种分类器为每个部分（例如句子或段落）生成一个概念标签，其性能是通过使用带标签的部分数据集进行评估的，而不考虑它们所属的隐私政策。然而，这种方法在现实世界设置中可能会高估性能，在那里新隐私政策中的所有部分都应该是未知的。此外，我们还观察到其他研究空白，包括缺乏更完整的GDPR分类法和在隐私政策中较少考虑层次信息。为了填补这些研究空白，我们开发了更完整的GDPR分类法，创建了带有层次信息的标记隐私政策语料库，并对GDPR概念分类器在隐私政策中进行了最全面的性能评估。我们的工作带来了多个新发现，包括确认在段落级别拆分训练和测试集的不合适性，考虑层次信息的好处，"一刀切"方法的局限性，以及测试跨语料库的泛化可靠性的重要性。

更新时间: 2024-10-07 05:19:12

领域: cs.CR

下载: http://arxiv.org/abs/2410.04754v1

CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

Large Language Models (LLMs) excel in diverse tasks but often underperform in specialized fields due to limited domain-specific or proprietary corpus. Continual pre-training (CPT) enhances LLM capabilities by imbuing new domain-specific or proprietary knowledge while replaying general corpus to prevent catastrophic forgetting. The data mixture ratio of general corpus and domain-specific corpus, however, has been chosen heuristically, leading to sub-optimal training efficiency in practice. In this context, we attempt to re-visit the scaling behavior of LLMs under the hood of CPT, and discover a power-law relationship between loss, mixture ratio, and training tokens scale. We formalize the trade-off between general and domain-specific capabilities, leading to a well-defined Critical Mixture Ratio (CMR) of general and domain data. By striking the balance, CMR maintains the model's general ability and achieves the desired domain transfer, ensuring the highest utilization of available resources. Considering the balance between efficiency and effectiveness, CMR can be regarded as the optimal mixture ratio. Through extensive experiments, we ascertain the predictability of CMR, propose CMR scaling law and have substantiated its generalization. These findings offer practical guidelines for optimizing LLM training in specialized domains, ensuring both general and domain-specific performance while efficiently managing training resources.

Updated: 2024-10-07 05:16:25

标题: CMR缩放定律：预测语言模型持续预训练的关键混合比率

摘要: 大型语言模型（LLMs）在各种任务中表现出色，但通常在专业领域表现不佳，这是因为其领域特定或专有语料库有限。持续预训练（CPT）通过在重播通用语料库的同时灌输新的领域特定或专有知识，增强了LLM的能力，以防止灾难性遗忘。然而，通用语料库和领域特定语料库的数据混合比例通常是根据经验选择的，导致实践中训练效率不佳。在这种情况下，我们试图重新审视LLMs在CPT背景下的扩展行为，并发现了损失、混合比例和训练令牌规模之间的幂律关系。我们形式化了通用和领域特定能力之间的权衡，导致了通用和领域数据的明确定义的临界混合比（CMR）。通过保持平衡，CMR保持了模型的通用能力，并实现了期望的领域转移，确保了可用资源的最高利用率。考虑到效率和有效性之间的平衡，CMR可以被视为最佳混合比。通过广泛的实验，我们确定了CMR的可预测性，提出了CMR扩展定律，并证明了其泛化性。这些发现为优化专业领域中LLM训练提供了实用指导，确保在有效管理训练资源的同时实现通用和领域特定性能。

更新时间: 2024-10-07 05:16:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17467v2

ImProver: Agent-Based Automated Proof Optimization

Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.

Updated: 2024-10-07 05:14:18

标题: ImProver: 基于代理的自动证明优化

摘要: 大型语言模型（LLMs）已被用于在诸如Lean之类的证明助手中生成数学定理的形式证明。然而，我们经常希望根据不同的标准优化形式证明，这取决于其下游用途。例如，我们可能希望证明符合某种风格，或者是可读性强、简洁或模块化结构。拥有经过适当优化的证明对于学习任务也很重要，尤其是因为人类撰写的证明可能不适用于该目的。为此，我们研究了一个新的自动证明优化问题：重写证明，使其正确并优化任意标准，比如长度或可读性。作为自动证明优化的第一种方法，我们提出了ImProver，这是一个大型语言模型代理，用于在Lean中重写证明以优化任意用户定义的度量标准。我们发现简单地应用LLMs来进行证明优化是不够的，因此我们将各种改进纳入ImProver中，例如使用新颖的Chain-of-States技术中的符号化Lean上下文，以及错误校正和检索。我们在重写真实世界的本科、竞赛和研究级数学定理时测试了ImProver，发现ImProver能够重写证明，使其更短、更模块化和更易读。

更新时间: 2024-10-07 05:14:18

领域: cs.AI,cs.CL,cs.LG,cs.LO

下载: http://arxiv.org/abs/2410.04753v1

IR-Aware ECO Timing Optimization Using Reinforcement Learning

Engineering change orders (ECOs) in late stages make minimal design fixes to recover from timing shifts due to excessive IR drops. This paper integrates IR-drop-aware timing analysis and ECO timing optimization using reinforcement learning (RL). The method operates after physical design and power grid synthesis, and rectifies IR-drop-induced timing degradation through gate sizing. It incorporates the Lagrangian relaxation (LR) technique into a novel RL framework, which trains a relational graph convolutional network (R-GCN) agent to sequentially size gates to fix timing violations. The R-GCN agent outperforms a classical LR-only algorithm: in an open 45nm technology, it (a) moves the Pareto front of the delay-power tradeoff curve to the left (b) saves runtime over the prior approaches by running fast inference using trained models, and (c) reduces the perturbation to placement by sizing fewer cells. The RL model is transferable across timing specifications and to unseen designs with fine tuning.

Updated: 2024-10-07 05:12:36

标题: 使用强化学习的IR感知ECO时序优化

摘要: 工程变更订单（ECOs）在后期阶段进行最小设计修复，以从由过度IR下降引起的时间偏移中恢复。本文将IR下降感知时间分析和ECO时间优化与强化学习（RL）相结合。该方法在物理设计和电源网格合成之后运行，并通过门尺寸调整纠正IR下降引起的时间降级。它将Lagrange松弛（LR）技术整合到一个新颖的RL框架中，该框架训练关系图卷积网络（R-GCN）代理以顺序调整门尺寸以修复时间违规。R-GCN代理优于经典的仅LR算法：在一个开放的45纳米技术中，它（a）将延迟-功耗权衡曲线的帕累托前沿移至左侧（b）通过使用训练模型进行快速推理节省运行时间，并且（c）通过调整更少的单元减少对放置的干扰。RL模型可在时间规格和未见设计中进行微调后进行传递。

更新时间: 2024-10-07 05:12:36

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2402.07781v2

Error Bounds of Supervised Classification from Information-Theoretic Perspective

In this paper, we explore bounds on the expected risk when using deep neural networks for supervised classification from an information theoretic perspective. Firstly, we introduce model risk and fitting error, which are derived from further decomposing the empirical risk. Model risk represents the expected value of the loss under the model's predicted probabilities and is exclusively dependent on the model. Fitting error measures the disparity between the empirical risk and model risk. Then, we derive the upper bound on fitting error, which links the back-propagated gradient and the model's parameter count with the fitting error. Furthermore, we demonstrate that the generalization errors are bounded by the classification uncertainty, which is characterized by both the smoothness of the distribution and the sample size. Based on the bounds on fitting error and generalization, by utilizing the triangle inequality, we establish an upper bound on the expected risk. This bound is applied to provide theoretical explanations for overparameterization, non-convex optimization and flat minima in deep learning. Finally, empirical verification confirms a significant positive correlation between the derived theoretical bounds and the practical expected risk, thereby affirming the practical relevance of the theoretical findings.

Updated: 2024-10-07 05:07:07

标题: 监督分类的信息论视角下的误差界限

摘要: 在本文中，我们从信息论的角度探讨了在使用深度神经网络进行监督分类时对期望风险的界限。首先，我们介绍了模型风险和拟合误差，这是从进一步分解经验风险得出的。模型风险代表了在模型的预测概率下损失的期望值，仅取决于模型本身。拟合误差衡量了经验风险与模型风险之间的差异。然后，我们推导了拟合误差的上界，将反向传播梯度和模型参数数量与拟合误差联系起来。此外，我们证明了泛化误差被分类不确定性界定，这由分布的平滑性和样本量所表征。基于拟合误差和泛化的界限，通过利用三角不等式，我们建立了对期望风险的上界。这个界限被应用于为过度参数化、非凸优化和深度学习中的平坦极小提供理论解释。最后，经验验证证实了推导的理论界限与实际期望风险之间的显著正相关性，从而确认了理论发现的实际相关性。

更新时间: 2024-10-07 05:07:07

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2406.04567v3

A Moreau Envelope Approach for LQR Meta-Policy Estimation

We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems. We propose a Moreau Envelope-based surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations. Moreover, we design an algorithm to find an approximate first-order stationary point of the meta-LQR cost function. Numerical results show that the proposed approach outperforms naive averaging of controllers on new realizations of the linear system. We also provide empirical evidence that our method has better sample complexity than Model-Agnostic Meta-Learning (MAML) approaches.

Updated: 2024-10-07 05:04:42

标题: 一种用于LQR元策略估计的莫罗包络法Approach

摘要: 我们研究了离散时间线性不变不确定动态系统中线性二次调节器（LQR）的策略估计问题。我们提出了一个基于Moreau包络的替代LQR成本，该成本由不确定系统的有限实现集构建，以定义一个可以有效调整到新实现的元策略。此外，我们设计了一种算法来寻找元LQR成本函数的近似一阶稳定点。数值结果表明，所提出的方法在新线性系统实现上优于控制器的朴素平均。我们还提供经验证据表明，我们的方法比模型无关元学习（MAML）方法具有更好的样本复杂性。

更新时间: 2024-10-07 05:04:42

领域: math.OC,cs.LG,cs.SY,eess.SY,49M99, 93E35, 93C05,I.2.8

下载: http://arxiv.org/abs/2403.17364v2

Mixture of Linear Models Co-supervised by Deep Neural Networks

Deep neural network (DNN) models have achieved phenomenal success for applications in many domains, ranging from academic research in science and engineering to industry and business. The modeling power of DNN is believed to have come from the complexity and over-parameterization of the model, which on the other hand has been criticized for the lack of interpretation. Although certainly not true for every application, in some applications, especially in economics, social science, healthcare industry, and administrative decision making, scientists or practitioners are resistant to use predictions made by a black-box system for multiple reasons. One reason is that a major purpose of a study can be to make discoveries based upon the prediction function, e.g., to reveal the relationships between measurements. Another reason can be that the training dataset is not large enough to make researchers feel completely sure about a purely data-driven result. Being able to examine and interpret the prediction function will enable researchers to connect the result with existing knowledge or gain insights about new directions to explore. Although classic statistical models are much more explainable, their accuracy often falls considerably below DNN. In this paper, we propose an approach to fill the gap between relatively simple explainable models and DNN such that we can more flexibly tune the trade-off between interpretability and accuracy. Our main idea is a mixture of discriminative models that is trained with the guidance from a DNN. Although mixtures of discriminative models have been studied before, our way of generating the mixture is quite different.

Updated: 2024-10-07 04:57:43

标题: 深度神经网络共同监督的线性模型混合

摘要: 深度神经网络（DNN）模型在许多领域的应用中取得了惊人的成功，从科学工程的学术研究到工业和商业。DNN的建模能力被认为来自于模型的复杂性和过度参数化，但另一方面也因缺乏解释性而受到批评。尽管对于每个应用来说并非都是如此，但在某些应用中，特别是在经济学、社会科学、医疗保健行业和行政决策中，科学家或从业者对使用黑匣子系统进行预测持抵制态度，原因有多种。一个原因是研究的一个主要目的可能是基于预测函数进行发现，例如揭示测量之间的关系。另一个原因可能是训练数据集不足以使研究人员对纯粹基于数据的结果完全确信。能够检查和解释预测函数将使研究人员能够将结果与现有知识联系起来或获得有关探索新方向的见解。尽管经典统计模型更易解释，但它们的准确性通常远低于DNN。在本文中，我们提出了一种方法来填补相对简单可解释模型和DNN之间的差距，以便更灵活地调整解释性和准确性之间的权衡。我们的主要思想是一种受DNN指导训练的判别模型混合。尽管以前研究过判别模型的混合，但我们生成混合的方式是非常不同的。

更新时间: 2024-10-07 04:57:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2108.04035v2

PSA: Private Set Alignment for Secure and Collaborative Analytics on Large-Scale Data

Enforcement of privacy regulation is essential for collaborative data analytics. In this work, we address a scenario in which two companies expect to securely join their datasets with respect to their common customers to maximize data insights. Apart from the necessary protection of raw data, it becomes more challenging to protect the identities and attributes of common customers, as it requires participants to align their records associated with common customers without knowing who they are. We proposed a solution, dubbed PSA, for this scenario, which is effectively applicable to real-world use cases, such as evaluating advertising conversion using data from both publishers and merchants. The contributions of this work are threefold: 1. We defined the notion of PSA with two levels of privacy protection and proposed novel PSA protocols based on the modified oblivious switching network, which leverages efficient symmetric key operations and offline precomputation to save online run time. 2. We implemented and benchmarked the proposed protocols in different network conditions by joining two datasets, each at the scale of one million records, in 35.5 sec on a single thread with a network bandwidth of 500 Mbps, resulting in an X100 improvement over the existing Homomorphic based protocols. 3. We give new proof for an algorithm of quasi-linear complexity that constructs an oblivious switching network to achieve a target permutation distinct from the existing one in the literature.

Updated: 2024-10-07 04:39:14

标题: PSA：用于大规模数据安全和协作分析的私有集对齐

摘要: 隐私法规的执行对于协作数据分析至关重要。在这项工作中，我们解决了两家公司期望安全地将其关于共同客户的数据集合并以最大化数据洞察的情景。除了对原始数据的必要保护外，保护共同客户的身份和属性变得更加具有挑战性，因为这需要参与者在不知道共同客户是谁的情况下对其相关的记录进行对齐。我们提出了一个名为PSA的解决方案，适用于现实世界的用例，例如使用来自发布商和商家的数据评估广告转化。这项工作的贡献有三个方面：1.我们定义了带有两个隐私保护级别的PSA概念，并基于修改后的遗忘切换网络提出了新颖的PSA协议，该协议利用高效的对称密钥操作和离线预计算来节省在线运行时间。2.我们在不同网络条件下实现并对提出的协议进行基准测试，通过在单个线程上合并两个数据集（每个数据集包含一百万条记录），在网络带宽为500 Mbps的情况下仅用35.5秒，相对于现有基于同态加密的协议实现了100倍的改进。3.我们为一种准线性复杂度的算法提供了新的证明，该算法构建了一个遗忘切换网络以实现与文献中现有的目标排列不同的目标排列。

更新时间: 2024-10-07 04:39:14

领域: cs.CR

下载: http://arxiv.org/abs/2410.04746v1

Nonparametric Strategy Test

We present a nonparametric statistical test for determining whether an agent is following a given mixed strategy in a repeated strategic-form game given samples of the agent's play. This involves two components: determining whether the agent's frequencies of pure strategies are sufficiently close to the target frequencies, and determining whether the pure strategies selected are independent between different game iterations. Our integrated test involves applying a chi-squared goodness of fit test for the first component and a generalized Wald-Wolfowitz runs test for the second component. The results from both tests are combined using Bonferroni correction to produce a complete test for a given significance level $\alpha.$ We applied the test to publicly available data of human rock-paper-scissors play. The data consists of 50 iterations of play for 500 human players. We test with a null hypothesis that the players are following a uniform random strategy independently at each game iteration. Using a significance level of $\alpha = 0.05$, we conclude that 305 (61%) of the subjects are following the target strategy.

Updated: 2024-10-07 04:36:10

标题: 非参数策略检验

摘要: 我们提出了一种非参数统计检验方法，用于确定代理是否在重复的战略形式博弈中遵循给定的混合策略，给出了代理的游戏样本。这涉及两个组成部分：确定代理的纯策略频率是否足够接近目标频率，以及确定所选择的纯策略在不同游戏迭代之间是否独立。我们的综合测试包括对第一部分应用卡方拟合优度检验和对第二部分应用广义Wald-Wolfowitz序列检验。两个测试的结果使用Bonferroni校正结合在一起，以产生一个完整的测试，对于给定的显著性水平α。我们将测试应用于公开可用的人类猜拳游戏数据。数据包括500名人类玩家的50次游戏迭代。我们用零假设测试，即玩家在每次游戏迭代中独立地遵循均匀随机策略。使用显著性水平α=0.05，我们得出结论，305名（61%）受试者正在遵循目标策略。

更新时间: 2024-10-07 04:36:10

领域: stat.ME,cs.AI,cs.GT,cs.MA,econ.TH

下载: http://arxiv.org/abs/2312.10695v5

Progressive-Hint Prompting Improves Reasoning in Large Language Models

The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%).

Updated: 2024-10-07 04:28:04

标题: 渐进提示促进大型语言模型的推理能力

摘要: 大型语言模型（LLMs）在推理任务中的表现严重依赖于提示设计，链式思维（CoT）和自一致性是增强这种能力的关键方法。然而，这些方法并没有充分利用LLM生成的答案来引导后续响应。本文提出了一种名为渐进提示（PHP）的新提示方法，通过使用先前生成的答案作为提示逐渐引导朝向正确答案，使用户和LLMs之间能够自动进行多次交互。PHP与CoT和自一致性正交，易于与最先进的技术结合以进一步提高性能。我们在七个基准测试上进行了广泛而全面的实验。结果表明，PHP显著提高了准确性，同时保持高效。例如，对于text-davinci-003，我们观察到与贪婪解码相比，与复杂CoT相比在GSM8K上的准确率提高了4.2％，在自一致性上减少了46.17％的样本路径。通过GPT-4和PHP，我们在SVAMP（89.1％-> 91.9％），GSM8K（92％-> 95.5％），AQuA（76.4％-> 79.9％）和MATH（50.3％-> 53.9％）上实现了最先进的性能。

更新时间: 2024-10-07 04:28:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2304.09797v6

Smart energy management: process structure-based hybrid neural networks for optimal scheduling and economic predictive control in integrated systems

Integrated energy systems (IESs) are complex systems consisting of diverse operating units spanning multiple domains. To address its operational challenges, we propose a physics-informed hybrid time-series neural network (NN) surrogate to predict the dynamic performance of IESs across multiple time scales. This neural network-based modeling approach develops time-series multi-layer perceptrons (MLPs) for the operating units and integrates them with prior process knowledge about system structure and fundamental dynamics. This integration forms three hybrid NNs (long-term, slow, and fast MLPs) that predict the entire system dynamics across multiple time scales. Leveraging these MLPs, we design an NN-based scheduler and an NN-based economic model predictive control (NEMPC) framework to meet global operational requirements: rapid electrical power responsiveness to operators requests, adequate cooling supply to customers, and increased system profitability, while addressing the dynamic time-scale multiplicity present in IESs. The proposed day-ahead scheduler is formulated using the ReLU network-based MLP, which effectively represents IES performance under a broad range of conditions from a long-term perspective. The scheduler is then exactly recast into a mixed-integer linear programming problem for efficient evaluation. The real-time NEMPC, based on slow and fast MLPs, comprises two sequential distributed control agents: a slow NEMPC for the cooling-dominant subsystem with slower transient responses and a fast NEMPC for the power-dominant subsystem with faster responses. Extensive simulations demonstrate that the developed scheduler and NEMPC schemes outperform their respective benchmark scheduler and controller by about 25% and 40%. Together, they enhance overall system performance by over 70% compared to benchmark approaches.

Updated: 2024-10-07 04:24:39

标题: 智能能源管理：基于过程结构的混合神经网络用于集成系统中的最佳调度和经济预测控制

摘要: 综合能源系统（IESs）是由跨越多个领域的多元运营单元组成的复杂系统。为了解决其运营挑战，我们提出了一种基于物理信息的混合时间序列神经网络（NN）替代方案，用于跨越多个时间尺度预测IESs的动态性能。这种基于神经网络的建模方法开发了用于运营单元的时间序列多层感知器（MLPs），并将它们与关于系统结构和基本动态的先前过程知识相整合。这种整合形成了三个混合NNs（长期、慢速和快速MLPs），可以跨越多个时间尺度预测整个系统的动态。利用这些MLPs，我们设计了一个基于NN的调度程序和一个基于NN的经济模型预测控制（NEMPC）框架，以满足全球运营要求：对运营商请求的快速电力响应，向客户提供充足的冷却供应，增加系统盈利能力，同时解决IESs中存在的动态时间尺度多样性。提出的日前调度程序使用基于ReLU网络的MLP进行制定，从长期的角度有效地表示IESs在各种条件下的性能。然后，将调度程序精确地重构为混合整数线性规划问题以进行高效评估。基于慢速和快速MLPs的实时NEMPC包括两个顺序分布式控制代理：用于冷却为主的子系统的慢速NEMPC，具有较慢的瞬态响应，以及用于功率为主的子系统的快速NEMPC，具有更快的响应。大量模拟表明，开发的调度程序和NEMPC方案比其各自的基准调度程序和控制器效果提高了约25%和40%。两者共同使整体系统性能提高了超过70%，与基准方法相比。

更新时间: 2024-10-07 04:24:39

领域: eess.SY,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2410.04743v1

sDPO: Don't Use Your Data All at Once

As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at once. We demonstrate that this method facilitates the use of more precisely aligned reference models within the DPO training framework. Furthermore, sDPO trains the final model to be more performant, even outperforming other popular LLMs with more parameters.

Updated: 2024-10-07 04:21:15

标题: sDPO：不要一次性使用所有数据

摘要: 随着大型语言模型（LLM）的发展不断推进，将它们与人类偏好对齐变得愈发重要。我们提出了逐步DPO（sDPO），这是对最近流行的直接偏好优化（DPO）进行扩展的方法，用于对齐调整。该方法涉及将可用的偏好数据集分割，并以逐步方式利用它们，而不是一次性全部使用。我们证明了这种方法有助于在DPO训练框架内使用更精确对齐的参考模型。此外，sDPO训练最终模型更具性能，甚至超越了具有更多参数的其他流行LLM。

更新时间: 2024-10-07 04:21:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.19270v2

MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

Recent In-Context Learning based methods have achieved remarkable success in Text-to-SQL task. However, there is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as BIRD. Besides, existing work has neglected to supervise intermediate steps when solving questions iteratively with question decomposition methods, and the schema linking methods used in these works are very rudimentary. To address these issues, we propose MAG-SQL, a multi-agent generative approach with soft schema linking and iterative Sub-SQL refinement. In our framework, an entity-based method with tables' summary is used to select the columns in database, and a novel targets-conditions decomposition method is introduced to decompose those complex questions. Additionally, we build a iterative generating module which includes a Sub-SQL Generator and Sub-SQL Refiner, introducing external oversight for each step of generation. Through a series of ablation studies, the effectiveness of each agent in our framework has been demonstrated. When evaluated on the BIRD benchmark with GPT-4, MAG-SQL achieves an execution accuracy of 61.08%, compared to the baseline accuracy of 46.35% for vanilla GPT-4 and the baseline accuracy of 57.56% for MAC-SQL. Besides, our approach makes similar progress on Spider.

Updated: 2024-10-07 04:17:18

标题: MAG-SQL：软架构链接和迭代子SQL细化的多智能体生成方法，用于文本到SQL

摘要: 最近基于上下文学习的方法在文本到SQL任务中取得了显著的成功。然而，在具有复杂数据库架构和困难问题的数据集（如BIRD）上，这些模型的性能仍然与人类表现存在较大差距。此外，现有工作在使用问题分解方法迭代解决问题时忽略了监督中间步骤，并且这些工作中使用的模式链接方法非常基础。为解决这些问题，我们提出了MAG-SQL，这是一种具有软架构链接和迭代子SQL细化的多代理生成方法。在我们的框架中，使用基于实体的方法和表格摘要来选择数据库中的列，并引入了一种新颖的目标-条件分解方法来分解这些复杂问题。此外，我们构建了一个包括子SQL生成器和子SQL细化器的迭代生成模块，为每个生成步骤引入外部监督。通过一系列消融研究，已证明了我们框架中每个代理的有效性。在BIRD基准测试中，与基线准确率46.35%的普通GPT-4和基线准确率57.56%的MAC-SQL相比，MAG-SQL在GPT-4上实现了61.08%的执行准确率。此外，我们的方法在Spider上取得了类似的进展。

更新时间: 2024-10-07 04:17:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.07930v3

TableRAG: Million-Token Table Understanding with Language Models

Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables. However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale. Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.

Updated: 2024-10-07 04:15:02

标题: TableRAG：使用语言模型理解百万令牌表

摘要: 最近语言模型（LMs）的进展显著增强了它们处理表格数据的能力，主要通过操纵和分析表格的程序辅助机制。然而，这些方法通常需要整个表格作为输入，由于位置偏差或上下文长度限制，导致可扩展性挑战。为了解决这些挑战，我们引入了TableRAG，一个专门为基于LM的表格理解设计的检索增强生成（RAG）框架。TableRAG利用查询扩展结合模式和单元检索来准确定位关键信息，然后提供给LMs。这使得数据编码更加高效和精确的检索，显著减少提示长度并减轻信息丢失。我们从Arcade和BIRD-SQL数据集中开发了两个新的百万标记基准，以全面评估TableRAG在规模上的有效性。我们的结果表明，TableRAG的检索设计实现了最高的检索质量，导致在大规模表格理解方面实现了最新的最先进性能。

更新时间: 2024-10-07 04:15:02

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.04739v1

TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Although reward models have been successful in improving multimodal large language models, the reward models themselves remain brutal and contain minimal information. Notably, existing reward models only mimic human annotations by assigning only one binary feedback to any text, no matter how long the text is. In the realm of multimodal language models, where models are required to process both images and texts, a naive reward model may learn implicit biases toward texts and become less grounded in images. In this paper, we propose a $\textbf{T}$oken-$\textbf{L}$evel $\textbf{D}$etective $\textbf{R}$eward Model ($\textbf{TLDR}$) to provide fine-grained annotations to each text token. We first introduce a perturbation-based method to generate synthetic hard negatives and their token-level labels to train TLDR models. Then we show the rich usefulness of TLDR models both in assisting off-the-shelf models to self-correct their generations, and in serving as a hallucination evaluation tool. Finally, we show that TLDR models can significantly speed up human annotation by 3 times to acquire a broader range of high-quality vision language data.

Updated: 2024-10-07 04:00:22

标题: TLDR：大规模视觉语言模型的令牌级侦探奖励模型

摘要: 尽管奖励模型在改进多模态大型语言模型方面取得了成功，但奖励模型本身仍然残酷且包含最少信息。值得注意的是，现有的奖励模型仅通过为任何文本分配一个二元反馈来模仿人类注释，无论文本有多长。在多模态语言模型领域，模型需要处理图像和文本，一个天真的奖励模型可能会学习对文本的隐含偏见，并且与图像联系较少。在本文中，我们提出了一个Token-Level Detective Reward Model（TLDR），为每个文本标记提供细粒度注释。我们首先介绍了一种基于扰动的方法来生成合成的困难负例及其标记，以训练TLDR模型。然后我们展示了TLDR模型的丰富用途，既可以辅助现成模型自我纠正生成，也可以作为幻觉评估工具。最后，我们展示了TLDR模型可以将人类注释的速度提高3倍，以获取更广泛的高质量视觉语言数据。

更新时间: 2024-10-07 04:00:22

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2410.04734v1

Correcting Diffusion Generation through Resampling

Despite diffusion models' superior capabilities in modeling complex distributions, there are still non-trivial distributional discrepancies between generated and ground-truth images, which has resulted in several notable problems in image generation, including missing object errors in text-to-image generation and low image quality. Existing methods that attempt to address these problems mostly do not tend to address the fundamental cause behind these problems, which is the distributional discrepancies, and hence achieve sub-optimal results. In this paper, we propose a particle filtering framework that can effectively address both problems by explicitly reducing the distributional discrepancies. Specifically, our method relies on a set of external guidance, including a small set of real images and a pre-trained object detector, to gauge the distribution gap, and then design the resampling weight accordingly to correct the gap. Experiments show that our methods can effectively correct missing object errors and improve image quality in various image generation tasks. Notably, our method outperforms the existing strongest baseline by 5% in object occurrence and 1.0 in FID on MS-COCO. Our code is publicly available at https://github.com/UCSB-NLP-Chang/diffusion_resampling.git.

Updated: 2024-10-07 03:59:27

标题: 通过重新采样纠正扩散生成

摘要: 虽然扩散模型在建模复杂分布方面具有优越能力，但生成图像与实际图像之间仍存在非常重要的分布差异，这导致图像生成中出现了一些显著问题，包括文本到图像生成中的缺失物体错误和低图像质量。现有的解决方法大多没有解决这些问题背后的根本原因，即分布差异，因此达到了次优结果。在本文中，我们提出了一个粒子滤波框架，可以通过明确减少分布差异来有效解决这两个问题。具体而言，我们的方法依赖于一组外部指导，包括一小组真实图像和一个经过预训练的物体检测器，来衡量分布差距，然后相应地设计重新采样权重来纠正这种差距。实验证明，我们的方法可以有效纠正缺失物体错误并改善各种图像生成任务中的图像质量。值得注意的是，我们的方法在MS-COCO数据集上比现有最强基准线表现出更好的结果，物体出现次数提高了5%，FID值提高了1.0。我们的代码公开在https://github.com/UCSB-NLP-Chang/diffusion_resampling.git。

更新时间: 2024-10-07 03:59:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.06038v2

A Framework for Guided Motion Planning

Randomized sampling based algorithms are widely used in robot motion planning due to the problem's intractability, and are experimentally effective on a wide range of problem instances. Most variants bias their sampling using various heuristics related to the known underlying structure of the search space. In this work, we formalize the intuitive notion of guided search by defining the concept of a guiding space. This new language encapsulates many seemingly distinct prior methods under the same framework, and allows us to reason about guidance, a previously obscured core contribution of different algorithms. We suggest an information theoretic method to evaluate guidance, which experimentally matches intuition when tested on known algorithms in a variety of environments. The language and evaluation of guidance suggests improvements to existing methods, and allows for simple hybrid algorithms that combine guidance from multiple sources.

Updated: 2024-10-07 03:56:10

标题: 一个引导式运动规划框架

摘要: 基于随机抽样的算法在机器人运动规划中被广泛使用，因为问题本身的复杂性，这些算法在各种问题实例上都具有实验有效性。大多数变体使用各种与已知搜索空间底层结构相关的启发式方法来偏向其抽样。在这项工作中，我们通过定义引导空间的概念，形式化了引导搜索的直观概念。这种新语言将许多看似不同的先前方法封装在同一框架下，并允许我们推理出引导的核心贡献，这是以前不为人所知的。我们提出了一种信息论方法来评估引导，在各种环境中测试已知算法时，实验结果与直觉相匹配。引导的语言和评估表明了对现有方法的改进，并允许简单的混合算法，结合多个来源的引导。

更新时间: 2024-10-07 03:56:10

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.03133v2

Machine Learning for Asymptomatic Ratoon Stunting Disease Detection With Freely Available Satellite Based Multispectral Imaging

Disease detection in sugarcane, particularly the identification of asymptomatic infectious diseases such as Ratoon Stunting Disease (RSD), is critical for effective crop management. This study employed various machine learning techniques to detect the presence of RSD in different sugarcane varieties, using vegetation indices derived from freely available satellite-based spectral data. Our results show that the Support Vector Machine with a Radial Basis Function Kernel (SVM-RBF) was the most effective algorithm, achieving classification accuracy between 85.64% and 96.55%, depending on the variety. Gradient Boosting and Random Forest also demonstrated high performance achieving accuracy between 83.33% to 96.55%, while Logistic Regression and Quadratic Discriminant Analysis showed variable results across different varieties. The inclusion of sugarcane variety and vegetation indices was important in the detection of RSD. This agreed with what was identified in the current literature. Our study highlights the potential of satellite-based remote sensing as a cost-effective and efficient method for large-scale sugarcane disease detection alternative to traditional manual laboratory testing methods.

Updated: 2024-10-07 03:53:15

标题: 利用免费卫星多光谱成像技术进行无症状竹娄矮化病的机器学习检测

摘要: 甘蔗疾病的检测，特别是对无症状传染性疾病（如甘蔗茬蔗矮化病）的识别对于有效的作物管理至关重要。本研究采用各种机器学习技术，利用来自免费提供的卫星光谱数据衍生的植被指数来检测不同甘蔗品种中RSD的存在。我们的结果显示，具有径向基函数核的支持向量机（SVM-RBF）是最有效的算法，实现了在不同品种之间的分类准确率在85.64%至96.55%之间。梯度提升和随机森林也表现出很高的性能，准确率在83.33%至96.55%之间，而逻辑回归和二次判别分析在不同品种之间表现出不同的结果。甘蔗品种和植被指数的包含对于RSD的检测至关重要，这与当前文献中所识别的相符。我们的研究突显了基于卫星遥感的远程感测作为大规模甘蔗疾病检测的一种经济高效方法，可以替代传统的人工实验室测试方法。

更新时间: 2024-10-07 03:53:15

领域: cs.LG,cs.CV,eess.IV,I.4; I.2

下载: http://arxiv.org/abs/2410.03141v2

Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation benchmarks often focus primarily on simplistic single-step or multi-step reasoning with a limited set of inference rules. Furthermore, the lack of datasets for evaluating non-monotonic reasoning represents a crucial gap since it aligns more closely with human-like reasoning. To address these limitations, we propose Multi-LogiEval, a comprehensive evaluation dataset encompassing multi-step logical reasoning with various inference rules and depths. Multi-LogiEval covers three logic types--propositional, first-order, and non-monotonic--consisting of more than 30 inference rules and more than 60 of their combinations with various depths. Leveraging this dataset, we conduct evaluations on a range of LLMs including GPT-4, ChatGPT, Gemini-Pro, Yi, Orca, and Mistral, employing a zero-shot chain-of-thought. Experimental results show that there is a significant drop in the performance of LLMs as the reasoning steps/depth increases (average accuracy of ~68% at depth-1 to ~43% at depth-5). We further conduct a thorough investigation of reasoning chains generated by LLMs which reveals several important findings. We believe that Multi-LogiEval facilitates future research for evaluating and enhancing the logical reasoning ability of LLMs. Data is available at https://github.com/Mihir3009/Multi-LogiEval.

Updated: 2024-10-07 03:48:18

标题: Multi-LogiEval:针对评估大型语言模型多步逻辑推理能力

摘要: 随着大型语言模型（LLMs）在自然语言理解任务中继续展现出卓越的表现，有必要衡量它们在类似人类的多步逻辑推理方面的能力。现有的逻辑推理评估基准通常主要关注具有有限推理规则集的简单单步或多步推理。此外，缺乏用于评估非单调推理的数据集代表了一个关键差距，因为它更接近于类似人类的推理。为了解决这些限制，我们提出了Multi-LogiEval，一个涵盖多步逻辑推理的全面评估数据集，其中包括各种推理规则和深度。Multi-LogiEval涵盖了三种逻辑类型--命题、一阶和非单调--包括30多种推理规则以及它们的60多种组合，具有不同深度。利用这一数据集，我们对一系列LLMs进行评估，包括GPT-4、ChatGPT、Gemini-Pro、Yi、Orca和Mistral，采用了零射链思维。实验结果表明，LLMs的表现在推理步骤/深度增加时显著下降（在深度-1时平均准确率约为68%，在深度-5时约为43%）。我们进一步对LLMs生成的推理链进行了彻底调查，揭示了几个重要发现。我们相信Multi-LogiEval有助于未来评估和提升LLMs的逻辑推理能力。数据可在https://github.com/Mihir3009/Multi-LogiEval上找到。

更新时间: 2024-10-07 03:48:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17169v3

Mirror-Consistency: Harnessing Inconsistency in Majority Voting

Self-Consistency, a widely-used decoding strategy, significantly boosts the reasoning capabilities of Large Language Models (LLMs). However, it depends on the plurality voting rule, which focuses on the most frequent answer while overlooking all other minority responses. These inconsistent minority views often illuminate areas of uncertainty within the model's generation process. To address this limitation, we present Mirror-Consistency, an enhancement of the standard Self-Consistency approach. Our method incorporates a 'reflective mirror' into the self-ensemble decoding process and enables LLMs to critically examine inconsistencies among multiple generations. Additionally, just as humans use the mirror to better understand themselves, we propose using Mirror-Consistency to enhance the sample-based confidence calibration methods, which helps to mitigate issues of overconfidence. Our experimental results demonstrate that Mirror-Consistency yields superior performance in both reasoning accuracy and confidence calibration compared to Self-Consistency.

Updated: 2024-10-07 03:41:08

标题: 镜像一致性：利用多数投票中的不一致性

摘要: 自洽性是一种广泛使用的解码策略，显著提升了大型语言模型（LLM）的推理能力。然而，它依赖于多数投票规则，该规则侧重于最常见的答案，而忽略了所有其他少数派回答。这些不一致的少数派观点通常会揭示模型生成过程中的不确定性领域。为了解决这一局限性，我们提出了镜像一致性，这是对标准自洽性方法的增强。我们的方法将一个“反射镜”纳入自集成解码过程中，使LLM能够批判性地检查多个生成之间的不一致性。此外，正如人类使用镜子更好地了解自己一样，我们建议使用镜像一致性来增强基于样本的信心校准方法，有助于缓解过度自信的问题。我们的实验结果表明，与自洽性相比，镜像一致性在推理准确性和信心校准方面表现出更优异的性能。

更新时间: 2024-10-07 03:41:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.10857v1

SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting

Sim2Real transfer, particularly for manipulation policies relying on RGB images, remains a critical challenge in robotics due to the significant domain shift between synthetic and real-world visual data. In this paper, we propose SplatSim, a novel framework that leverages Gaussian Splatting as the primary rendering primitive to reduce the Sim2Real gap for RGB-based manipulation policies. By replacing traditional mesh representations with Gaussian Splats in simulators, SplatSim produces highly photorealistic synthetic data while maintaining the scalability and cost-efficiency of simulation. We demonstrate the effectiveness of our framework by training manipulation policies within SplatSim and deploying them in the real world in a zero-shot manner, achieving an average success rate of 86.25%, compared to 97.5% for policies trained on real-world data. Videos can be found on our project page: https://splatsim.github.io

Updated: 2024-10-07 03:37:36

标题: SplatSim：使用高斯分割的RGB操作策略零迁移Sim2Real

摘要: Sim2Real转移，特别是对于依赖RGB图像的操作策略，由于合成和现实世界视觉数据之间的显著领域转移，仍然是机器人技术中的一个关键挑战。在本文中，我们提出了SplatSim，一个新颖的框架，利用高斯点阵作为主要渲染原语，以减少基于RGB的操作策略的Sim2Real差距。通过在模拟器中用高斯点阵替换传统的网格表示，SplatSim产生高度逼真的合成数据，同时保持模拟的可扩展性和成本效益。我们通过在SplatSim中训练操作策略并以零次射击方式部署它们到现实世界中，展示了我们框架的有效性，实现了平均成功率为86.25%，而在真实世界数据上训练的策略成功率为97.5%。视频可以在我们的项目页面找到：https://splatsim.github.io

更新时间: 2024-10-07 03:37:36

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.10161v3

Airport Delay Prediction with Temporal Fusion Transformers

Since flight delay hurts passengers, airlines, and airports, its prediction becomes crucial for the decision-making of all stakeholders in the aviation industry and thus has been attempted by various previous research. However, previous delay predictions are often categorical and at a highly aggregated level. To improve that, this study proposes to apply the novel Temporal Fusion Transformer model and predict numerical airport arrival delays at quarter hour level for U.S. top 30 airports. Inputs to our model include airport demand and capacity forecasts, historic airport operation efficiency information, airport wind and visibility conditions, as well as enroute weather and traffic conditions. The results show that our model achieves satisfactory performance measured by small prediction errors on the test set. In addition, the interpretability analysis of the model outputs identifies the important input factors for delay prediction.

Updated: 2024-10-07 03:36:11

标题: 使用时间融合变压器进行机场延误预测

摘要: 由于航班延误对乘客、航空公司和机场都造成了损失，因此对其进行预测对航空业所有利益相关者的决策至关重要，因此此前已有多项研究尝试进行预测。然而，以往的延误预测通常是分类的，并且在高度聚合的水平上进行。为了改进这一点，本研究提出应用新颖的时间融合变压器模型，并预测美国前30个机场的季度小时级别的数字化到达延误。我们模型的输入包括机场需求和容量预测、历史机场运营效率信息、机场风和能见度条件，以及航路天气和交通条件。结果显示，我们的模型在测试集上表现出了令人满意的性能，通过小的预测误差进行度量。此外，对模型输出的可解释性分析确定了延误预测的重要输入因素。

更新时间: 2024-10-07 03:36:11

领域: cs.LG

下载: http://arxiv.org/abs/2405.08293v3

ProtoNAM: Prototypical Neural Additive Models for Interpretable Deep Tabular Learning

Generalized additive models (GAMs) have long been a powerful white-box tool for the intelligible analysis of tabular data, revealing the influence of each feature on the model predictions. Despite the success of neural networks (NNs) in various domains, their application as NN-based GAMs in tabular data analysis remains suboptimal compared to tree-based ones, and the opacity of encoders in NN-GAMs also prevents users from understanding how networks learn the functions. In this work, we propose a new deep tabular learning method, termed Prototypical Neural Additive Model (ProtoNAM), which introduces prototypes into neural networks in the framework of GAMs. With the introduced prototype-based feature activation, ProtoNAM can flexibly model the irregular mapping from tabular features to the outputs while maintaining the explainability of the final prediction. We also propose a gradient-boosting inspired hierarchical shape function modeling method, facilitating the discovery of complex feature patterns and bringing transparency into the learning process of each network layer. Our empirical evaluations demonstrate that ProtoNAM outperforms all existing NN-based GAMs, while providing additional insights into the shape function learned for each feature. The source code of ProtoNAM is available at \url{https://github.com/Teddy-XiongGZ/ProtoNAM}.

Updated: 2024-10-07 03:25:46

标题: ProtoNAM: 可解释深度表格学习的原型神经添加模型

摘要: 广义可加模型（GAMs）长期以来一直是一种强大的白盒工具，用于可理解地分析表格数据，揭示每个特征对模型预测的影响。尽管神经网络（NNs）在各个领域取得了成功，但它们作为基于神经网络的GAMs在表格数据分析中的应用仍然不如基于树的模型，而NN-GAMs中编码器的不透明性也阻碍了用户理解网络学习函数的过程。在这项工作中，我们提出了一种新的深度表格学习方法，称为原型神经可加模型（ProtoNAM），它在GAMs框架内将原型引入神经网络中。通过引入基于原型的特征激活，ProtoNAM可以灵活地建模从表格特征到输出的不规则映射，同时保持最终预测的可解释性。我们还提出了一个受梯度提升启发的分层形状函数建模方法，有助于发现复杂特征模式，并为每个网络层的学习过程带来透明度。我们的实证评估表明，ProtoNAM优于所有现有的基于神经网络的GAMs，同时提供了有关每个特征学习到的形状函数的额外见解。ProtoNAM的源代码可在\url{https://github.com/Teddy-XiongGZ/ProtoNAM}上找到。

更新时间: 2024-10-07 03:25:46

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.04723v1

HYDRA-FL: Hybrid Knowledge Distillation for Robust and Accurate Federated Learning

Data heterogeneity among Federated Learning (FL) users poses a significant challenge, resulting in reduced global model performance. The community has designed various techniques to tackle this issue, among which Knowledge Distillation (KD)-based techniques are common. While these techniques effectively improve performance under high heterogeneity, they inadvertently cause higher accuracy degradation under model poisoning attacks (known as attack amplification). This paper presents a case study to reveal this critical vulnerability in KD-based FL systems. We show why KD causes this issue through empirical evidence and use it as motivation to design a hybrid distillation technique. We introduce a novel algorithm, Hybrid Knowledge Distillation for Robust and Accurate FL (HYDRA-FL), which reduces the impact of attacks in attack scenarios by offloading some of the KD loss to a shallow layer via an auxiliary classifier. We model HYDRA-FL as a generic framework and adapt it to two KD-based FL algorithms, FedNTD and MOON. Using these two as case studies, we demonstrate that our technique outperforms baselines in attack settings while maintaining comparable performance in benign settings.

Updated: 2024-10-07 03:24:47

标题: HYDRA-FL: 混合知识蒸馏用于强大和准确的联邦学习

摘要: Federated Learning（FL）用户之间的数据异质性构成了一个重要挑战，导致全局模型性能降低。社区设计了各种技术来解决这个问题，其中基于知识蒸馏（KD）的技术很常见。虽然这些技术有效地提高了在高异质性下的性能，但它们无意中导致了在模型投毒攻击下更高的准确度降低（称为攻击放大）。本文通过一个案例研究揭示了基于KD的FL系统中这一关键漏洞。我们通过经验证据展示了为什么KD会导致这个问题，并将其作为设计混合蒸馏技术的动机。我们介绍了一种新算法，用于强大和准确的FL的混合知识蒸馏（HYDRA-FL），通过辅助分类器将一部分KD损失转移到浅层，从而减少攻击场景中的影响。我们将HYDRA-FL建模为一个通用框架，并将其适用于两种基于KD的FL算法，FedNTD和MOON。通过这两种案例研究，我们证明了我们的技术在攻击设置中优于基线，同时在良性设置中保持可比较的性能。

更新时间: 2024-10-07 03:24:47

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2409.19912v3

A Strategy for Label Alignment in Deep Neural Networks

One recent research demonstrated successful application of the label alignment property for unsupervised domain adaptation in a linear regression settings. Instead of regularizing representation learning to be domain invariant, the research proposed to regularize the linear regression model to align with the top singular vectors of the data matrix from the target domain. In this work we expand upon this idea and generalize it to the case of deep learning, where we derive an alternative formulation of the original adaptation algorithm exploiting label alignment suitable for deep neural network. We also perform experiments to demonstrate that our approach achieves comparable performance to mainstream unsupervised domain adaptation methods while having stabler convergence. All experiments and implementations in our work can be found at the following codebase: \url{https://github.com/xuanrui-work/DeepLabelAlignment}.

Updated: 2024-10-07 03:23:23

标题: 深度神经网络中标签对齐的策略

摘要: 最近的一项研究展示了在线性回归设置中成功应用标签对齐属性进行无监督领域自适应的情况。该研究提出，与将表示学习规范化为域不变形不同，应当将线性回归模型规范化为与目标域数据矩阵的前几个奇异向量对齐。在这项工作中，我们扩展了这一想法，并将其推广到深度学习的情况，我们推导了一个原始适应算法的另一种公式，利用标签对齐适用于深度神经网络。我们还进行了实验，证明我们的方法在保持更稳定收敛的同时，实现了与主流无监督领域自适应方法可比较的性能。我们工作中的所有实验和实现都可以在以下代码库中找到：https://github.com/xuanrui-work/DeepLabelAlignment。

更新时间: 2024-10-07 03:23:23

领域: cs.LG

下载: http://arxiv.org/abs/2410.04722v1

ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction

Autoregressive models (ARMs) and diffusion models (DMs) represent two leading paradigms in generative modeling, each excelling in distinct areas: ARMs in global context modeling and long-sequence generation, and DMs in generating high-quality local contexts, especially for continuous data such as images and short videos. However, ARMs often suffer from exponential error accumulation over long sequences, leading to physically implausible results, while DMs are limited by their local context generation capabilities. In this work, we introduce Autoregressive Coherent multimodal generation with Diffusion Correction (ACDC), a zero-shot approach that combines the strengths of both ARMs and DMs at the inference stage without the need for additional fine-tuning. ACDC leverages ARMs for global context generation and memory-conditioned DMs for local correction, ensuring high-quality outputs by correcting artifacts in generated multimodal tokens. In particular, we propose a memory module based on large language models (LLMs) that dynamically adjusts the conditioning texts for the DMs, preserving crucial global context information. Our experiments on multimodal tasks, including coherent multi-frame story generation and autoregressive video generation, demonstrate that ACDC effectively mitigates the accumulation of errors and significantly enhances the quality of generated outputs, achieving superior performance while remaining agnostic to specific ARM and DM architectures. Project page: https://acdc2025.github.io/

Updated: 2024-10-07 03:22:51

标题: ACDC：利用扩散校正的自回归一致多模态生成

摘要: 自回归模型（ARMs）和扩散模型（DMs）代表了生成建模中的两个主要范式，各自在不同领域表现出色：ARMs在全局上下文建模和长序列生成方面表现优异，而DMs在生成高质量的局部上下文方面表现出色，尤其适用于连续数据（如图像和短视频）。然而，ARMs往往在长序列中出现指数级误差积累，导致物理上不合理的结果，而DMs受限于其局部上下文生成能力。在这项工作中，我们介绍了Autoregressive Coherent multimodal generation with Diffusion Correction（ACDC），这是一种零样本方法，结合了ARMs和DMs在推理阶段的优势，无需额外的微调。ACDC利用ARMs进行全局上下文生成和记忆条件DMs进行局部修正，通过纠正生成的多模式令牌中的伪影来确保高质量的输出。特别是，我们提出了一个基于大型语言模型（LLMs）的记忆模块，动态调整DMs的条件文本，保留关键的全局上下文信息。我们在多模式任务上的实验，包括连贯的多帧故事生成和自回归视频生成，表明ACDC有效地减轻了误差的累积，并显著提高了生成输出的质量，实现了卓越的性能，同时保持对特定ARM和DM架构的不可知性。项目页面：https://acdc2025.github.io/

更新时间: 2024-10-07 03:22:51

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.04721v1

MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

Recent advancements in large language models (LLMs) focus on aligning to heterogeneous human expectations and values via multi-objective preference alignment. However, existing methods are dependent on the policy model parameters, which require high-cost repetition of their alignment algorithms for each new policy model, and they cannot expand to unseen objectives due to their static alignment objectives. In this work, we propose Meta-Objective Aligner (MetaAligner), the first policy-agnostic and generalizable method for multi-objective preference alignment. MetaAligner models multi-objective alignment into three stages: (1) dynamic objectives reformulation algorithm reorganizes traditional alignment datasets to supervise the model on performing flexible alignment across different objectives; (2) conditional weak-to-strong correction paradigm aligns the weak outputs of fixed policy models to approach strong outputs with higher preferences in the corresponding alignment objectives, enabling plug-and-play inferences on any policy models, which significantly reduces training costs and facilitates alignment on close-source policy models; (3) generalizable inference method flexibly adjusts target objectives by updating their text descriptions in the prompts, facilitating generalizable alignment to unseen objectives. Experimental results show that MetaAligner achieves significant and balanced improvements in multi-objective alignments on 10 state-of-the-art policy models, and saves up to 93.63% of GPU training hours compared to previous alignment methods. The model also effectively aligns unseen objectives, marking the first step towards generalizable multi-objective preference alignment.

Updated: 2024-10-07 03:19:16

标题: MetaAligner：通往语言模型的可推广多目标对齐

摘要: 最近，大型语言模型（LLMs）方面的最新进展集中在通过多目标偏好对齐来与异质人类期望和价值保持一致。然而，现有方法依赖于策略模型参数，需要高成本重复它们的对齐算法以适应每个新策略模型，而且由于它们的静态对齐目标，不能扩展到未知目标。在这项工作中，我们提出了Meta-Objective Aligner（MetaAligner），这是第一个与策略无关且可泛化的多目标偏好对齐方法。MetaAligner将多目标对齐建模为三个阶段：（1）动态目标重构算法重新组织传统的对齐数据集，以监督模型在不同目标上执行灵活的对齐；（2）条件弱到强校正范式将固定策略模型的弱输出对齐到具有更高偏好的相应对齐目标的强输出，从而使任何策略模型都能够插入和运行推断，从而显著降低训练成本并促进对近源策略模型的对齐；（3）可泛化推断方法通过更新提示中的文本描述来灵活调整目标目标，促进对未知目标的泛化对齐。实验结果表明，MetaAligner 在 10 个最先进的策略模型上实现了显著和平衡的多目标对齐改进，并与先前的对齐方法相比，可以节省高达 93.63% 的 GPU 训练时间。该模型还有效地对齐了未知目标，标志着通往可泛化的多目标偏好对齐的第一步。

更新时间: 2024-10-07 03:19:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.17141v3

Influence-based Attributions can be Manipulated

Influence Functions are a standard tool for attributing predictions to training data in a principled manner and are widely used in applications such as data valuation and fairness. In this work, we present realistic incentives to manipulate influence-based attributions and investigate whether these attributions can be \textit{systematically} tampered by an adversary. We show that this is indeed possible for logistic regression models trained on ResNet feature embeddings and standard tabular fairness datasets and provide efficient attacks with backward-friendly implementations. Our work raises questions on the reliability of influence-based attributions in adversarial circumstances. Code is available at : \url{https://github.com/infinite-pursuits/influence-based-attributions-can-be-manipulated}

Updated: 2024-10-07 03:13:37

标题: 基于影响力的归因可以被操控

摘要: 影响函数是一种将预测归因于训练数据的标准工具，以一种原则性的方式广泛应用于数据评估和公平性等应用领域。在这项工作中，我们提出了操纵基于影响的归因的现实动机，并调查对手是否可以\textit{系统地}篡改这些归因。我们展示了对于在ResNet特征嵌入和标准表格公平性数据集上训练的逻辑回归模型，确实可以实现这一点，并提供了具有反向友好实现的有效攻击。我们的工作引发了在对抗情况下影响基础归因可靠性的问题。代码可在以下链接找到：https://github.com/infinite-pursuits/influence-based-attributions-can-be-manipulated

更新时间: 2024-10-07 03:13:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.05208v4

Rule-based Data Selection for Large Language Models

The quality of training data significantly impacts the performance of large language models (LLMs). There are increasing studies using LLMs to rate and select data based on several human-crafted metrics (rules). However, these conventional rule-based approaches often depend too heavily on human heuristics, lack effective metrics for assessing rules, and exhibit limited adaptability to new tasks. In our study, we introduce an innovative rule-based framework that utilizes the orthogonality of score vectors associated with rules as a novel metric for rule evaluations. Our approach includes an automated pipeline that first uses LLMs to generate a diverse set of rules, encompassing various rating dimensions to evaluate data quality. Then it rates a batch of data based on these rules and uses the determinantal point process (DPP) from random matrix theory to select the most orthogonal score vectors, thereby identifying a set of independent rules. These rules are subsequently used to evaluate all data, selecting samples with the highest average scores for downstream tasks such as LLM training. We verify the effectiveness of our method through two experimental setups: 1) comparisons with ground truth ratings and 2) benchmarking LLMs trained with the chosen data. Our comprehensive experiments cover a range of scenarios, including general pre-training and domain-specific fine-tuning in areas such as IMDB, Medical, Math, and Code. The outcomes demonstrate that our DPP-based rule rating method consistently outperforms other approaches, including rule-free rating, uniform sampling, importance resampling, and QuRating, in terms of both rating precision and model performance.

Updated: 2024-10-07 03:13:06

标题: 基于规则的数据选择用于大型语言模型

摘要: 训练数据的质量显著影响大型语言模型（LLMs）的性能。越来越多的研究使用LLMs根据几个人工制定的度量标准（规则）对数据进行评分和选择。然而，这些传统的基于规则的方法往往过分依赖人类启发式方法，缺乏有效的评估规则的度量标准，并且对新任务的适应能力有限。在我们的研究中，我们引入了一种创新的基于规则的框架，利用与规则相关的分数向量的正交性作为规则评估的新度量标准。我们的方法包括一个自动化流程，首先使用LLMs生成一系列多样化的规则，涵盖各种评分维度以评估数据质量。然后根据这些规则对一批数据进行评分，并使用来自随机矩阵理论的行列式点过程（DPP）选择最正交的分数向量，从而识别一组独立的规则。这些规则随后用于评估所有数据，选择具有最高平均分数的样本，用于下游任务，如LLM训练。我们通过两个实验设置验证了我们的方法的有效性：1）与地面真相评分的比较和2）使用选定数据进行训练的LLMs的基准测试。我们的综合实验涵盖了一系列场景，包括在IMDB、医学、数学和代码等领域的通用预训练和领域特定微调。结果表明，我们基于DPP的规则评分方法在评分精度和模型性能方面始终优于其他方法，包括无规则评分、均匀采样、重要性重采样和QuRating。

更新时间: 2024-10-07 03:13:06

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.04715v1

Tight Stability, Convergence, and Robustness Bounds for Predictive Coding Networks

Energy-based learning algorithms, such as predictive coding (PC), have garnered significant attention in the machine learning community due to their theoretical properties, such as local operations and biologically plausible mechanisms for error correction. In this work, we rigorously analyze the stability, robustness, and convergence of PC through the lens of dynamical systems theory. We show that, first, PC is Lyapunov stable under mild assumptions on its loss and residual energy functions, which implies intrinsic robustness to small random perturbations due to its well-defined energy-minimizing dynamics. Second, we formally establish that the PC updates approximate quasi-Newton methods by incorporating higher-order curvature information, which makes them more stable and able to converge with fewer iterations compared to models trained via backpropagation (BP). Furthermore, using this dynamical framework, we provide new theoretical bounds on the similarity between PC and other algorithms, i.e., BP and target propagation (TP), by precisely characterizing the role of higher-order derivatives. These bounds, derived through detailed analysis of the Hessian structures, show that PC is significantly closer to quasi-Newton updates than TP, providing a deeper understanding of the stability and efficiency of PC compared to conventional learning methods.

Updated: 2024-10-07 02:57:26

标题: 预测编码网络的紧稳定性、收敛性和鲁棒性界限

摘要: 基于能量的学习算法，如预测编码（PC），由于其理论特性，如局部操作和生物合理的错误校正机制，已经在机器学习社区中引起了重大关注。在这项工作中，我们通过动力系统理论的视角对PC的稳定性、鲁棒性和收敛性进行了严格分析。我们首先展示，在对其损失和残余能量函数进行温和假设的情况下，PC是李雅普诺夫稳定的，这意味着由于其定义良好的能量最小化动态，对小随机扰动具有内在的鲁棒性。其次，我们正式建立了PC更新通过整合更高阶曲率信息来近似拟牛顿方法，这使它们比通过反向传播（BP）训练的模型更稳定且能够在更少的迭代次数内收敛。此外，利用这个动态框架，我们通过精确描述更高阶导数的作用，提供了PC与其他算法（如BP和目标传播（TP））之间相似性的新理论界限。通过对Hessian结构的详细分析推导出的这些界限显示，PC与拟牛顿更新相比于TP更接近，从而更深入地理解PC与传统学习方法相比的稳定性和效率。

更新时间: 2024-10-07 02:57:26

领域: cs.LG,cs.AI,cs.NE,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.04708v1

Masked Autoencoder with Swin Transformer Network for Mitigating Electrode Shift in HD-EMG-based Gesture Recognition

Multi-channel surface Electromyography (sEMG), also referred to as high-density sEMG (HD-sEMG), plays a crucial role in improving gesture recognition performance for myoelectric control. Pattern recognition models developed based on HD-sEMG, however, are vulnerable to changing recording conditions (e.g., signal variability due to electrode shift). This has resulted in significant degradation in performance across subjects, and sessions. In this context, the paper proposes the Masked Autoencoder with Swin Transformer (MAST) framework, where training is performed on a masked subset of HDsEMG channels. A combination of four masking strategies, i.e., random block masking; temporal masking; sensor-wise random masking, and; multi-scale masking, is used to learn latent representations and increase robustness against electrode shift. The masked data is then passed through MAST's three-path encoder-decoder structure, leveraging a multi-path Swin-Unet architecture that simultaneously captures time-domain, frequency-domain, and magnitude-based features of the underlying HD-sEMG signal. These augmented inputs are then used in a self-supervised pre-training fashion to improve the model's generalization capabilities. Experimental results demonstrate the superior performance of the proposed MAST framework in comparison to its counterparts.

Updated: 2024-10-07 02:55:36

标题: 使用Swin Transformer网络的掩码自编码器用于减轻基于HD-EMG的手势识别中的电极移位

摘要: 多通道表面肌电图（sEMG），也称为高密度sEMG（HD-sEMG），在改善肌电控制手势识别性能中起着至关重要的作用。然而，基于HD-sEMG开发的模式识别模型容易受到记录条件变化（例如由于电极移位而引起的信号变异性）的影响。这导致了跨受试者和会话性能显著下降。在这种背景下，本文提出了掩码自动编码器和斯文变换器（MAST）框架，其中训练是在HD-sEMG通道的掩码子集上进行的。使用四种掩码策略的组合，即随机块掩码；时间掩码；传感器随机掩码和；多尺度掩码，以学习潜在表示并增加对电极移位的鲁棒性。然后，将掩码数据通过MAST的三路径编码器-解码器结构传递，利用多路径Swin-Unet架构，同时捕获底层HD-sEMG信号的时间域、频率域和幅度特征。然后，这些增强输入以自监督预训练的方式用于改善模型的泛化能力。实验结果表明，所提出的MAST框架在性能上优于其对手。

更新时间: 2024-10-07 02:55:36

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17261v1

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Computationally intensive decoding procedures--including search, reranking, and self-critique--can improve the quality of language model (LM) outputs in problems spanning code generation, numerical reasoning, and dialog. Existing work typically applies the same decoding procedure for every input to an LM. But not all inputs require the same amount of computation to process. Can we allocate decoding computation adaptively, using more resources to answer questions whose answers will be harder to compute? We present an approach that predicts the distribution of rewards given an input and computation budget, then allocates additional computation to inputs for which it is predicted to be most useful. We apply this approach in two decoding procedures: first, an adaptive best-of-k procedure that dynamically selects the number of samples to generate as input to a reranker; second, a routing procedure that dynamically responds to a query using a decoding procedure that is expensive but accurate, or one that is cheaper but less capable. Across a suite of programming, mathematics, and dialog tasks, we show that accurate computation-allocation procedures can be learned, and reduce computation by up to 50% at no cost to response quality, or improve quality by up to 10% at a fixed computational budget.

Updated: 2024-10-07 02:52:30

标题: 学习如何努力思考：LM计算的输入自适应分配

摘要: 计算密集型的解码程序-包括搜索、重新排名和自我批评-可以提高语言模型（LM）在涵盖代码生成、数值推理和对话等问题上的输出质量。现有研究通常将相同的解码程序应用于LM的每个输入。但并非所有输入都需要相同量的计算来处理。我们是否可以自适应地分配解码计算资源，为那些难以计算答案的问题使用更多资源？我们提出了一种方法，该方法根据输入和计算预算来预测奖励的分布，然后将额外的计算资源分配给那些被预测为最有用的输入。我们将这种方法应用到两种解码程序中：首先是一种自适应的最佳-k程序，动态选择要生成的样本数量作为重新排名器的输入；其次是一种路由程序，根据一个昂贵但准确的解码程序或一个便宜但能力较弱的解码程序动态响应查询。在一系列编程、数学和对话任务中，我们展示了准确的计算分配程序可以被学习，并且在不影响响应质量的情况下，可以减少高达50%的计算量，或在固定的计算预算下将质量提高高达10%。

更新时间: 2024-10-07 02:52:30

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.04707v1

NeuroBOLT: Resting-state EEG-to-fMRI Synthesis with Multi-dimensional Feature Mapping

Functional magnetic resonance imaging (fMRI) is an indispensable tool in modern neuroscience, providing a non-invasive window into whole-brain dynamics at millimeter-scale spatial resolution. However, fMRI is constrained by issues such as high operation costs and immobility. With the rapid advancements in cross-modality synthesis and brain decoding, the use of deep neural networks has emerged as a promising solution for inferring whole-brain, high-resolution fMRI features directly from electroencephalography (EEG), a more widely accessible and portable neuroimaging modality. Nonetheless, the complex projection from neural activity to fMRI hemodynamic responses and the spatial ambiguity of EEG pose substantial challenges both in modeling and interpretability. Relatively few studies to date have developed approaches for EEG-fMRI translation, and although they have made significant strides, the inference of fMRI signals in a given study has been limited to a small set of brain areas and to a single condition (i.e., either resting-state or a specific task). The capability to predict fMRI signals in other brain areas, as well as to generalize across conditions, remain critical gaps in the field. To tackle these challenges, we introduce a novel and generalizable framework: NeuroBOLT, i.e., Neuro-to-BOLD Transformer, which leverages multi-dimensional representation learning from temporal, spatial, and spectral domains to translate raw EEG data to the corresponding fMRI activity signals across the brain. Our experiments demonstrate that NeuroBOLT effectively reconstructs resting-state fMRI signals from primary sensory, high-level cognitive areas, and deep subcortical brain regions, achieving state-of-the-art accuracy and significantly advancing the integration of these two modalities.

Updated: 2024-10-07 02:47:55

标题: NeuroBOLT：多维特征映射的静息态EEG到fMRI合成

摘要: 功能性磁共振成像（fMRI）是现代神经科学中不可或缺的工具，提供了一个非侵入性的窗口，可以以毫米级空间分辨率观察整个大脑的动态。然而，fMRI受到高昂的运营成本和不便移动性等问题的限制。随着跨模态综合和脑解码技术的快速发展，利用深度神经网络直接从脑电图（EEG）推断整个大脑高分辨率fMRI特征已成为一种有前途的解决方案，EEG是一种更广泛可及和便携的神经成像模态。然而，从神经活动到fMRI血液动力学反应的复杂投影以及EEG的空间模糊性在建模和可解释性方面都存在重大挑战。迄今为止相对较少的研究开发了EEG-fMRI转换方法，尽管它们取得了重大进展，但在给定研究中推断fMRI信号仅限于一小部分脑区域和单一条件（即休息状态或特定任务）。在其他脑区域预测fMRI信号以及在条件之间进行泛化的能力仍然是该领域的重要差距。为了解决这些挑战，我们介绍了一个新颖且通用的框架：NeuroBOLT，即神经到BOLD变换器，它利用从时间、空间和频谱领域的多维表示学习将原始EEG数据转换为整个大脑的相应fMRI活动信号。我们的实验表明，NeuroBOLT有效地重建了来自主要感觉、高级认知区域和深度皮层下脑区的休息状态fMRI信号，实现了最先进的准确性，并显著推进了这两种模态的整合。

更新时间: 2024-10-07 02:47:55

领域: eess.IV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.05341v1

Evalverse: Unified and Accessible Library for Large Language Model Evaluation

This paper introduces Evalverse, a novel library that streamlines the evaluation of Large Language Models (LLMs) by unifying disparate evaluation tools into a single, user-friendly framework. Evalverse enables individuals with limited knowledge of artificial intelligence to easily request LLM evaluations and receive detailed reports, facilitated by an integration with communication platforms like Slack. Thus, Evalverse serves as a powerful tool for the comprehensive assessment of LLMs, offering both researchers and practitioners a centralized and easily accessible evaluation framework. Finally, we also provide a demo video for Evalverse, showcasing its capabilities and implementation in a two-minute format.

Updated: 2024-10-07 02:47:36

标题: Evalverse：用于大型语言模型评估的统一且易于访问的库

摘要: 这篇论文介绍了Evalverse，这是一个新颖的库，通过将不同的评估工具统一到一个用户友好的框架中，简化了对大型语言模型（LLMs）的评估。Evalverse使得对人工智能知识有限的个人可以轻松请求LLM评估并接收详细报告，同时通过与Slack等通信平台的集成来实现。因此，Evalverse是一个强大的工具，用于对LLMs进行全面评估，为研究人员和从业者提供了一个集中且易于访问的评估框架。最后，我们还提供了Evalverse的演示视频，展示其在两分钟的格式中的功能和实施。

更新时间: 2024-10-07 02:47:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.00943v2

Transferable Watermarking to Self-supervised Pre-trained Graph Encoders by Trigger Embeddings

Recent years have witnessed the prosperous development of Graph Self-supervised Learning (GSSL), which enables to pre-train transferable foundation graph encoders. However, the easy-to-plug-in nature of such encoders makes them vulnerable to copyright infringement. To address this issue, we develop a novel watermarking framework to protect graph encoders in GSSL settings. The key idea is to force the encoder to map a set of specially crafted trigger instances into a unique compact cluster in the outputted embedding space during model pre-training. Consequently, when the encoder is stolen and concatenated with any downstream classifiers, the resulting model inherits the `backdoor' of the encoder and predicts the trigger instances to be in a single category with high probability regardless of the ground truth. Experimental results have shown that, the embedded watermark can be transferred to various downstream tasks in black-box settings, including node classification, link prediction and community detection, which forms a reliable watermark verification system for GSSL in reality. This approach also shows satisfactory performance in terms of model fidelity, reliability and robustness.

Updated: 2024-10-07 02:47:19

标题: 可转移水印技术应用于自监督预训练图编码器的触发嵌入

摘要: 近年来，图形自监督学习（GSSL）取得了繁荣发展，使得可以预先训练可转移的基础图编码器。然而，这种编码器易于插入的特性使其容易受到版权侵犯。为解决这一问题，我们开发了一种新颖的水印框架来保护GSSL设置中的图编码器。关键思想是在模型预训练期间强制编码器将一组特别设计的触发实例映射到输出的嵌入空间中的唯一紧凑簇中。因此，当编码器被盗用并与任何下游分类器连接时，生成的模型会继承编码器的“后门”，并以高概率预测触发实例属于单一类别，而不考虑真实情况。实验结果表明，嵌入的水印可以在黑盒设置中传输到各种下游任务中，包括节点分类、链接预测和社区检测，从而形成了一个可靠的GSSL水印验证系统。这种方法在模型忠实度、可靠性和稳健性方面也表现出令人满意的性能。

更新时间: 2024-10-07 02:47:19

领域: cs.CR

下载: http://arxiv.org/abs/2406.13177v2

Generating CAD Code with Vision-Language Models for 3D Designs

Generative AI has transformed the fields of Design and Manufacturing by providing efficient and automated methods for generating and modifying 3D objects. One approach involves using Large Language Models (LLMs) to generate Computer- Aided Design (CAD) scripting code, which can then be executed to render a 3D object; however, the resulting 3D object may not meet the specified requirements. Testing the correctness of CAD generated code is challenging due to the complexity and structure of 3D objects (e.g., shapes, surfaces, and dimensions) that are not feasible in code. In this paper, we introduce CADCodeVerify, a novel approach to iteratively verify and improve 3D objects generated from CAD code. Our approach works by producing ameliorative feedback by prompting a Vision-Language Model (VLM) to generate and answer a set of validation questions to verify the generated object and prompt the VLM to correct deviations. To evaluate CADCodeVerify, we introduce, CADPrompt, the first benchmark for CAD code generation, consisting of 200 natural language prompts paired with expert-annotated scripting code for 3D objects to benchmark progress. Our findings show that CADCodeVerify improves VLM performance by providing visual feedback, enhancing the structure of the 3D objects, and increasing the success rate of the compiled program. When applied to GPT-4, CADCodeVerify achieved a 7.30% reduction in Point Cloud distance and a 5.0% improvement in success rate compared to prior work

Updated: 2024-10-07 02:44:50

标题: 利用视觉-语言模型生成3D设计的CAD代码

摘要: 生成式人工智能已经通过提供高效和自动化的方法生成和修改3D物体，改变了设计和制造领域。其中一种方法涉及使用大型语言模型（LLMs）生成计算机辅助设计（CAD）脚本代码，然后执行以呈现3D物体；然而，生成的3D物体可能不符合指定要求。由于3D物体的复杂性和结构（例如形状、表面和尺寸）在代码中不可行，因此测试CAD生成的代码的正确性是具有挑战性的。在本文中，我们介绍了CADCodeVerify，这是一种新颖的方法，用于迭代验证和改进从CAD代码生成的3D物体。我们的方法通过促使视觉语言模型（VLM）生成并回答一组验证问题来产生改进性反馈，以验证生成的物体并促使VLM纠正偏差。为了评估CADCodeVerify，我们引入了CADPrompt，作为CAD代码生成的第一个基准，其中包含200个自然语言提示，配对专家注释的3D物体脚本代码，以便进行基准测试。我们的研究结果显示，CADCodeVerify通过提供视觉反馈、增强3D物体的结构以及提高编译程序的成功率，改善了VLM的性能。当应用于GPT-4时，与先前工作相比，CADCodeVerify在点云距离上实现了7.30%的减少，并提高了5.0%的成功率。

更新时间: 2024-10-07 02:44:50

领域: cs.LG

下载: http://arxiv.org/abs/2410.05340v1

Neural Fourier Modelling: A Highly Compact Approach to Time-Series Analysis

Neural time-series analysis has traditionally focused on modeling data in the time domain, often with some approaches incorporating equivalent Fourier domain representations as auxiliary spectral features. In this work, we shift the main focus to frequency representations, modeling time-series data fully and directly in the Fourier domain. We introduce Neural Fourier Modelling (NFM), a compact yet powerful solution for time-series analysis. NFM is grounded in two key properties of the Fourier transform (FT): (i) the ability to model finite-length time series as functions in the Fourier domain, treating them as continuous-time elements in function space, and (ii) the capacity for data manipulation (such as resampling and timespan extension) within the Fourier domain. We reinterpret Fourier-domain data manipulation as frequency extrapolation and interpolation, incorporating this as a core learning mechanism in NFM, applicable across various tasks. To support flexible frequency extension with spectral priors and effective modulation of frequency representations, we propose two learning modules: Learnable Frequency Tokens (LFT) and Implicit Neural Fourier Filters (INFF). These modules enable compact and expressive modeling in the Fourier domain. Extensive experiments demonstrate that NFM achieves state-of-the-art performance on a wide range of tasks (forecasting, anomaly detection, and classification), including challenging time-series scenarios with previously unseen sampling rates at test time. Moreover, NFM is highly compact, requiring fewer than 40K parameters in each task, with time-series lengths ranging from 100 to 16K.

Updated: 2024-10-07 02:39:55

标题: 神经傅立叶建模：一种高度紧凑的时间序列分析方法

摘要: 神经时间序列分析传统上侧重于在时间域建模数据，通常一些方法会将等效傅里叶域表示作为辅助谱特征。在这项工作中，我们将主要关注转向频率表示，在傅里叶域中完全直接地对时间序列数据进行建模。我们引入了神经傅里叶建模（NFM），这是一种紧凑而强大的时间序列分析解决方案。NFM基于傅里叶变换（FT）的两个关键特性：（i）能够将有限长度的时间序列建模为傅里叶域中的函数，将它们视为函数空间中的连续时间元素；（ii）在傅里叶域内进行数据操作（例如重采样和时间跨度扩展）的能力。我们将傅里叶域数据操作重新解释为频率外推和内插，并将其作为NFM中的核心学习机制，适用于各种任务。为了支持灵活的频率扩展和有效的频率表示调制，我们提出了两个学习模块：可学习频率标记（LFT）和隐式神经傅里叶滤波器（INFF）。这些模块使得在傅里叶域内的建模变得紧凑和富有表现力。大量实验证明，NFM在各种任务（预测、异常检测和分类）上实现了最先进的性能，包括在测试时具有以前未见的采样率的挑战性时间序列情景。此外，NFM非常紧凑，每个任务只需要不到40K个参数，时间序列长度范围从100到16K不等。

更新时间: 2024-10-07 02:39:55

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.04703v1

A Clifford Algebraic Approach to E(n)-Equivariant High-order Graph Neural Networks

Designing neural network architectures that can handle data symmetry is crucial. This is especially important for geometric graphs whose properties are equivariance under Euclidean transformations. Current equivariant graph neural networks (EGNNs), particularly those using message passing, have a limitation in expressive power. Recent high-order graph neural networks can overcome this limitation, yet they lack equivariance properties, representing a notable drawback in certain applications in chemistry and physical sciences. In this paper, we introduce the Clifford Group Equivariant Graph Neural Networks (CG-EGNNs), a novel EGNN that enhances high-order message passing by integrating high-order local structures in the context of Clifford algebras. As a key benefit of using Clifford algebras, CG-EGNN can learn functions that capture equivariance from positional features. By adopting the high-order message passing mechanism, CG-EGNN gains richer information from neighbors, thus improving model performance. Furthermore, we establish the universality property of the $k$-hop message passing framework, showcasing greater expressive power of CG-EGNNs with additional $k$-hop message passing mechanism. We empirically validate that CG-EGNNs outperform previous methods on various benchmarks including n-body, CMU motion capture, and MD17, highlighting their effectiveness in geometric deep learning.

Updated: 2024-10-07 02:12:42

标题: 使用Clifford代数方法构建E(n)-等变高阶图神经网络

摘要: 设计能处理数据对称性的神经网络架构至关重要。这对于几何图形特性在欧几里得变换下等变的情况尤为重要。当前的等变图神经网络（EGNNs），特别是那些使用消息传递的网络，在表达能力方面存在局限。最近的高阶图神经网络可以克服这一局限，但它们缺乏等变性质，在化学和物理科学中的某些应用中表示出明显的缺陷。在本文中，我们引入了 Clifford 群等变图神经网络（CG-EGNNs），这是一种新颖的 EGNN，通过在 Clifford 代数的上下文中集成高阶局部结构来增强高阶消息传递。作为使用 Clifford 代数的关键优势，CG-EGNN 能够学习捕捉来自位置特征的等变性的函数。通过采用高阶消息传递机制，CG-EGNN 可以从邻居处获取更丰富的信息，从而提高模型性能。此外，我们建立了$k$-跳消息传递框架的普适性属性，展示了 CG-EGNNs 具有更强的表达能力，具有额外的$k$-跳消息传递机制。我们在各种基准测试中经验验证了 CG-EGNNs 在 n-body、CMU 运动捕捉和 MD17 等方面优于先前的方法，突显了它们在几何深度学习中的有效性。

更新时间: 2024-10-07 02:12:42

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.04692v1

Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning

Fine-tuning and in-context learning (ICL) are two prevalent methods in imbuing large language models with task-specific knowledge. It is commonly believed that fine-tuning can surpass ICL given sufficient training samples as it allows the model to adjust its internal parameters based on the data. However, this paper presents a counterintuitive finding: For tasks with implicit patterns, ICL captures these patterns significantly better than fine-tuning. We developed several datasets featuring implicit patterns, such as sequences determining answers through parity or identifying reducible terms in calculations. We then evaluated the models' understanding of these patterns under both fine-tuning and ICL across models ranging from 0.5B to 7B parameters. The results indicate that models employing ICL can quickly grasp deep patterns and significantly improve accuracy. In contrast, fine-tuning, despite utilizing thousands of times more training samples than ICL, achieved only limited improvements. We also proposed circuit shift theory from a mechanistic interpretability's view to explain why ICL wins.

Updated: 2024-10-07 02:12:22

标题: 不需要更新的深入洞察力：上下文学习胜过微调的力量

摘要: 微调和上下文学习（ICL）是赋予大型语言模型任务特定知识的两种普遍方法。人们普遍认为，在足够的训练样本的情况下，微调可以超越ICL，因为它允许模型根据数据调整其内部参数。然而，本文提出了一个令人费解的发现：对于具有隐含模式的任务，ICL比微调更好地捕捉这些模式。我们开发了几个包含隐含模式的数据集，比如通过奇偶判断答案或识别计算中可简化项的序列。然后我们评估了模型对这些模式的理解，在0.5B到7B参数范围内的模型中进行了微调和ICL的比较。结果表明，采用ICL的模型可以迅速掌握深层模式并显著提高准确性。相比之下，尽管微调利用了比ICL多数千倍的训练样本，但只取得了有限的改进。我们还提出了电路转移理论，从机械可解释性的角度解释为什么ICL胜出。

更新时间: 2024-10-07 02:12:22

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.04691v1

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Recent advancements in generative modeling have significantly enhanced the reconstruction of audio waveforms from various representations. While diffusion models are adept at this task, they are hindered by latency issues due to their operation at the individual sample point level and the need for numerous sampling steps. In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow approach designed to reconstruct high-fidelity audio waveforms from Mel-spectrograms or discrete acoustic tokens. RFWave uniquely generates complex spectrograms and operates at the frame level, processing all subbands simultaneously to boost efficiency. Leveraging Rectified Flow, which targets a straight transport trajectory, RFWave achieves reconstruction with just 10 sampling steps. Our empirical evaluations show that RFWave not only provides outstanding reconstruction quality but also offers vastly superior computational efficiency, enabling audio generation at speeds up to 160 times faster than real-time on a GPU. An online demonstration is available at: https://rfwave-demo.github.io/rfwave/.

Updated: 2024-10-07 02:08:05

标题: RFWave：用于音频波形重建的多频段整流流量

摘要: 最近生成建模的进展显著增强了从各种表示中重建音频波形。虽然扩散模型擅长这项任务，但由于它们在个别样本点水平运作以及需要大量采样步骤，它们受到延迟问题的阻碍。在这项研究中，我们介绍了RFWave，这是一种先进的多频段矫正流方法，旨在从Mel-频谱图或离散声学令牌中重建高保真度音频波形。RFWave独特地生成复杂的频谱图，并在帧级别运行，同时处理所有子带以提高效率。利用矫正流，目标是直线传输轨迹，RFWave仅需10个采样步骤就能实现重建。我们的经验评估表明，RFWave不仅提供出色的重建质量，而且提供大大优越的计算效率，使音频生成速度高达GPU实时速度的160倍。在线演示可在以下网址进行：https://rfwave-demo.github.io/rfwave/。

更新时间: 2024-10-07 02:08:05

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2403.05010v3

SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

We present SegINR, a novel approach to neural Text-to-Speech (TTS) that addresses sequence alignment without relying on an auxiliary duration predictor and complex autoregressive (AR) or non-autoregressive (NAR) frame-level sequence modeling. SegINR simplifies the process by converting text sequences directly into frame-level features. It leverages an optimal text encoder to extract embeddings, transforming each into a segment of frame-level features using a conditional implicit neural representation (INR). This method, named segment-wise INR (SegINR), models temporal dynamics within each segment and autonomously defines segment boundaries, reducing computational costs. We integrate SegINR into a two-stage TTS framework, using it for semantic token prediction. Our experiments in zero-shot adaptive TTS scenarios demonstrate that SegINR outperforms conventional methods in speech quality with computational efficiency.

Updated: 2024-10-07 02:04:58

标题: SegINR: 段落级隐式神经表示在神经文本转语音中的序列对齐

摘要: 我们提出了SegINR，这是一种新颖的神经文本转语音（TTS）方法，解决了序列对齐的问题，而无需依赖辅助的持续时间预测器和复杂的自回归（AR）或非自回归（NAR）帧级序列建模。SegINR通过将文本序列直接转换为帧级特征简化了这一过程。它利用最优文本编码器提取嵌入，使用条件隐式神经表示（INR）将每个嵌入转换为一段帧级特征。这种方法，称为分段INR（SegINR），在每个段内建模时间动态，并自主定义段边界，降低了计算成本。我们将SegINR集成到一个两阶段TTS框架中，用于语义标记预测。我们在零样本自适应TTS场景中的实验表明，SegINR在语音质量和计算效率方面优于传统方法。

更新时间: 2024-10-07 02:04:58

领域: eess.AS,cs.LG

下载: http://arxiv.org/abs/2410.04690v1

CPFD: Confidence-aware Privileged Feature Distillation for Short Video Classification

Dense features, customized for different business scenarios, are essential in short video classification. However, their complexity, specific adaptation requirements, and high computational costs make them resource-intensive and less accessible during online inference. Consequently, these dense features are categorized as `Privileged Dense Features'.Meanwhile, end-to-end multi-modal models have shown promising results in numerous computer vision tasks. In industrial applications, prioritizing end-to-end multi-modal features, can enhance efficiency but often leads to the loss of valuable information from historical privileged dense features. To integrate both features while maintaining efficiency and manageable resource costs, we present Confidence-aware Privileged Feature Distillation (CPFD), which empowers features of an end-to-end multi-modal model by adaptively distilling privileged features during training. Unlike existing privileged feature distillation (PFD) methods, which apply uniform weights to all instances during distillation, potentially causing unstable performance across different business scenarios and a notable performance gap between teacher model (Dense Feature enhanced multimodal-model DF-X-VLM) and student model (multimodal-model only X-VLM), our CPFD leverages confidence scores derived from the teacher model to adaptively mitigate the performance variance with the student model. We conducted extensive offline experiments on five diverse tasks demonstrating that CPFD improves the video classification F1 score by 6.76% compared with end-to-end multimodal-model (X-VLM) and by 2.31% with vanilla PFD on-average. And it reduces the performance gap by 84.6% and achieves results comparable to teacher model DF-X-VLM. The effectiveness of CPFD is further substantiated by online experiments, and our framework has been deployed in production systems for over a dozen models.

Updated: 2024-10-07 02:04:21

标题: CPFD：自信感知特权特征蒸馏用于短视频分类

摘要: 密集特征，针对不同业务场景定制，对于短视频分类至关重要。然而，它们的复杂性、特定适应性要求和高计算成本使它们在在线推断过程中资源密集且不易获取。因此，这些密集特征被分类为“特权密集特征”。与此同时，端到端多模态模型在许多计算机视觉任务中展现出有希望的结果。在工业应用中，优先考虑端到端多模态特征，可以提高效率，但往往导致来自历史特权密集特征的宝贵信息丢失。为了在保持效率和可管理的资源成本的同时整合这两种特征，我们提出了置信度感知特权特征蒸馏（CPFD），它通过在训练过程中自适应地提炼特权特征，增强端到端多模态模型的特征。与现有的特权特征蒸馏（PFD）方法不同，后者在蒸馏过程中对所有实例应用统一权重，可能导致在不同业务场景下性能不稳定，并且导致教师模型（密集特征增强的多模态模型DF-X-VLM）和学生模型（仅多模态模型X-VLM）之间的显著性能差距。我们的CPFD利用从教师模型派生的置信度分数，自适应地减少学生模型的性能变化。我们进行了广泛的离线实验，涵盖了五个不同的任务，结果显示，相比端到端多模态模型（X-VLM），CPFD将视频分类F1分数提高了6.76%，相比普通PFD，提高了2.31%。它减少了性能差距84.6%，并且取得了与教师模型DF-X-VLM可比的结果。CPFD的有效性通过在线实验进一步得到证实，我们的框架已在生产系统中部署了十几个模型。

更新时间: 2024-10-07 02:04:21

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.03038v2

A Survey on Trustworthiness in Foundation Models for Medical Image Analysis

The rapid advancement of foundation models in medical imaging represents a significant leap toward enhancing diagnostic accuracy and personalized treatment. However, the deployment of foundation models in healthcare necessitates a rigorous examination of their trustworthiness, encompassing privacy, robustness, reliability, explainability, and fairness. The current body of survey literature on foundation models in medical imaging reveals considerable gaps, particularly in the area of trustworthiness. Additionally, existing surveys on the trustworthiness of foundation models do not adequately address their specific variations and applications within the medical imaging domain. This survey aims to fill that gap by presenting a novel taxonomy of foundation models used in medical imaging and analyzing the key motivations for ensuring their trustworthiness. We review current research on foundation models in major medical imaging applications, focusing on segmentation, medical report generation, medical question and answering (Q\&A), and disease diagnosis. These areas are highlighted because they have seen a relatively mature and substantial number of foundation models compared to other applications. We focus on literature that discusses trustworthiness in medical image analysis manuscripts. We explore the complex challenges of building trustworthy foundation models for each application, summarizing current concerns and strategies for enhancing trustworthiness. Furthermore, we examine the potential of these models to revolutionize patient care. Our analysis underscores the imperative for advancing towards trustworthy AI in medical image analysis, advocating for a balanced approach that fosters innovation while ensuring ethical and equitable healthcare delivery.

Updated: 2024-10-07 02:03:30

标题: 《医学图像分析基础模型中的可信度调查》

摘要: 医学影像基础模型的快速发展代表着在增强诊断准确性和个性化治疗方面迈出了重要的一步。然而，在医疗保健中部署基础模型必须对其可信度进行严格审查，包括隐私、健壮性、可靠性、可解释性和公平性。目前关于医学影像基础模型的调查文献显示出相当大的差距，特别是在可信度方面。此外，现有关于基础模型可信度的调查并未充分涵盖其在医学影像领域内的具体变化和应用。本调查旨在填补这一空白，通过提出一种新颖的用于医学影像的基础模型分类法，并分析确保其可信度的关键动机。我们审查了主要医学影像应用中基础模型的当前研究，重点关注分割、医学报告生成、医学问答（Q&A）和疾病诊断。之所以选择这些领域进行重点研究，是因为相比其他应用，它们已经看到了相对成熟和大量的基础模型。我们着重于讨论医学图像分析文献中的可信度。我们探讨了为每个应用构建可信的基础模型所面临的复杂挑战，总结了当前关注的问题和增强可信度的策略。此外，我们还探讨了这些模型改革患者护理的潜力。我们的分析强调了在医学图像分析中朝着可信度AI的发展的迫切性，倡导一种平衡的方法，促进创新的同时确保道德和公平的医疗保健交付。

更新时间: 2024-10-07 02:03:30

领域: cs.CV,cs.AI,cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.15851v2

BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection

Multimodal contrastive learning methods (e.g., CLIP) have shown impressive zero-shot classification performance due to their strong ability to joint representation learning for visual and textual modalities. However, recent research revealed that multimodal contrastive learning on poisoned pre-training data with a small proportion of maliciously backdoored data can induce backdoored CLIP that could be attacked by inserted triggers in downstream tasks with a high success rate. To defend against backdoor attacks on CLIP, existing defense methods focus on either the pre-training stage or the fine-tuning stage, which would unfortunately cause high computational costs due to numerous parameter updates. In this paper, we provide the first attempt at a computationally efficient backdoor detection method to defend against backdoored CLIP in the inference stage. We empirically find that the visual representations of backdoored images are insensitive to both benign and malignant changes in class description texts. Motivated by this observation, we propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting. Specifically, we first prompt the language model (e.g., GPT-4) to produce class-related description texts (benign) and class-perturbed random texts (malignant) by specially designed instructions. Then, the distribution difference in cosine similarity between images and the two types of class description texts can be used as the criterion to detect backdoor samples. Extensive experiments validate that our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency.

Updated: 2024-10-07 01:47:28

标题: BDetCLIP：多模态提示对比测试时间后门检测

摘要: 多模态对比学习方法（例如，CLIP）由于其在视觉和文本模态的联合表示学习能力强，已经展示出令人印象深刻的零样本分类性能。然而，最近的研究表明，在带有一小部分恶意后门数据的受污染的预训练数据上进行多模态对比学习可能导致受感染的CLIP，在下游任务中插入触发器的成功率很高。为了防御针对CLIP的后门攻击，现有的防御方法主要集中在预训练阶段或微调阶段，但这可能会导致由于大量参数更新而造成高计算成本。在本文中，我们提出了一种计算效率高的后门检测方法，用于在推理阶段防御受感染的CLIP。我们在实验中发现，受感染图像的视觉表示对类别描述文本的良性和恶性变化都不敏感。受到这一观察的启发，我们提出了BDetCLIP，一种基于对比提示的新型测试时后门检测方法。具体来说，我们首先提示语言模型（例如，GPT-4）通过特别设计的指令生成与类别相关的描述文本（良性）和类别扰动的随机文本（恶性）。然后，图像与两种类型的类别描述文本之间的余弦相似度分布差异可以用作检测后门样本的标准。大量实验证明，我们提出的BDetCLIP在效果和效率方面优于最先进的后门检测方法。

更新时间: 2024-10-07 01:47:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.15269v2

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.

Updated: 2024-10-07 01:45:38

标题: 喜鹊：通过提示对齐的LLMs从零开始进行对齐数据综合

摘要: 高质量的指导数据对于调整大型语言模型(LLMs)至关重要。尽管一些模型，如Llama-3-Instruct，具有开放的权重，但它们的调整数据仍然保持私密，这阻碍了人工智能的民主化。高昂的人工成本和有限的、预定义的提示范围阻碍了现有的开源数据创建方法有效扩展，可能限制了公共调整数据集的多样性和质量。通过直接从对齐的LLM中提取数据，是否可能合成大规模的高质量指导数据？我们提出了一种用于生成大规模对齐数据的自我合成方法，命名为Magpie。我们的关键观察是，像Llama-3-Instruct这样的对齐LLMs可以在我们仅输入左侧模板直到用户消息位置保留的位置时生成用户查询，这要归功于它们的自回归性质。我们使用这种方法提示Llama-3-Instruct并生成400万条指导以及它们对应的响应。我们对提取的数据进行了全面分析，并选择了30万个高质量实例。为了将Magpie数据与其他公共指导数据集进行比较，我们使用每个数据集微调Llama-3-8B-Base，并评估微调模型的性能。我们的结果表明，在某些任务中，使用Magpie微调的模型表现与官方的Llama-3-8B-Instruct相当，尽管后者通过受监督微调(SFT)和随后的反馈学习增强了1000万个数据点。我们还表明，仅使用Magpie进行SFT可以超越以往用于SFT和偏好优化的公共数据集的性能，比如使用UltraFeedback进行直接偏好优化。这种优势在对齐基准测试中明显，如AlpacaEval、ArenaHard和WildBench上。

更新时间: 2024-10-07 01:45:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.08464v2

Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL

Although there is a growing demand for cooking behaviours as one of the expected tasks for robots, a series of cooking behaviours based on new recipe descriptions by robots in the real world has not yet been realised. In this study, we propose a robot system that integrates real-world executable robot cooking behaviour planning using the Large Language Model (LLM) and classical planning of PDDL descriptions, and food ingredient state recognition learning from a small number of data using the Vision-Language model (VLM). We succeeded in experiments in which PR2, a dual-armed wheeled robot, performed cooking from arranged new recipes in a real-world environment, and confirmed the effectiveness of the proposed system.

Updated: 2024-10-07 01:39:25

标题: 基于食物状态识别和基础模型以及PDDL的菜谱为基础的真实世界烹饪机器人系统

摘要: 尽管越来越多的人对机器人作为厨师执行任务的需求日益增长，但基于机器人在现实世界中根据新食谱描述制定烹饪行为的一系列行为尚未实现。在这项研究中，我们提出了一个机器人系统，该系统整合了使用大型语言模型（LLM）进行实际可执行机器人烹饪行为规划和PDDL描述的经典规划，以及使用Vision-Language模型（VLM）从少量数据中学习食材状态识别。我们在实验中成功地让双臂轮式机器人PR2在真实环境中执行根据新食谱准备食物，并确认了所提出系统的有效性。

更新时间: 2024-10-07 01:39:25

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.02874v2

Combining Structural and Unstructured Data: A Topic-based Finite Mixture Model for Insurance Claim Prediction

Modeling insurance claim amounts and classifying claims into different risk levels are critical yet challenging tasks. Traditional predictive models for insurance claims often overlook the valuable information embedded in claim descriptions. This paper introduces a novel approach by developing a joint mixture model that integrates both claim descriptions and claim amounts. Our method establishes a probabilistic link between textual descriptions and loss amounts, enhancing the accuracy of claims clustering and prediction. In our proposed model, the latent topic/component indicator serves as a proxy for both the thematic content of the claim description and the component of loss distributions. Specifically, conditioned on the topic/component indicator, the claim description follows a multinomial distribution, while the claim amount follows a component loss distribution. We propose two methods for model calibration: an EM algorithm for maximum a posteriori estimates, and an MH-within-Gibbs sampler algorithm for the posterior distribution. The empirical study demonstrates that the proposed methods work effectively, providing interpretable claims clustering and prediction.

Updated: 2024-10-07 01:37:07

标题: 结合结构化和非结构化数据：基于主题的有限混合模型用于保险理赔预测

摘要: 建模保险索赔金额并将索赔分类为不同风险水平是至关重要但具有挑战性的任务。传统的保险索赔预测模型经常忽视索赔描述中蕴含的有价值信息。本文通过开发一个整合索赔描述和索赔金额的联合混合模型，引入了一种新颖的方法。我们的方法建立了文本描述和损失金额之间的概率联系，增强了索赔聚类和预测的准确性。在我们提出的模型中，潜在的主题/成分指示器充当索赔描述的主题内容和损失分布的成分的代理。具体来说，在主题/成分指示器的条件下，索赔描述遵循多项式分布，而索赔金额遵循成分损失分布。我们提出了两种模型校准方法：用于最大后验估计的EM算法，以及用于后验分布的MH-within-Gibbs采样器算法。实证研究表明，提出的方法有效地工作，提供了可解释的索赔聚类和预测。

更新时间: 2024-10-07 01:37:07

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2410.04684v1

Towards Measuring Goal-Directedness in AI Systems

Recent advances in deep learning have brought attention to the possibility of creating advanced, general AI systems that outperform humans across many tasks. However, if these systems pursue unintended goals, there could be catastrophic consequences. A key prerequisite for AI systems pursuing unintended goals is whether they will behave in a coherent and goal-directed manner in the first place, optimizing for some unknown goal; there exists significant research trying to evaluate systems for said behaviors. However, the most rigorous definitions of goal-directedness we currently have are difficult to compute in real-world settings. Drawing upon this previous literature, we explore policy goal-directedness within reinforcement learning (RL) environments. In our findings, we propose a different family of definitions of the goal-directedness of a policy that analyze whether it is well-modeled as near-optimal for many (sparse) reward functions. We operationalize this preliminary definition of goal-directedness and test it in toy Markov decision process (MDP) environments. Furthermore, we explore how goal-directedness could be measured in frontier large-language models (LLMs). Our contribution is a definition of goal-directedness that is simpler and more easily computable in order to approach the question of whether AI systems could pursue dangerous goals. We recommend further exploration of measuring coherence and goal-directedness, based on our findings.

Updated: 2024-10-07 01:34:42

标题: 朝着在人工智能系统中测量目标导向性的方向前进

摘要: 最近深度学习的进展引起了人们对创造优于人类在许多任务中的高级通用人工智能系统的可能性的关注。然而，如果这些系统追求了意外的目标，可能会造成灾难性后果。AI系统追求意外目标的一个关键前提是它们是否会在首先就以一种连贯和目标导向的方式行事，优化某个未知目标；目前存在大量研究试图评估系统的这种行为。然而，我们目前最严格的目标导向性定义在现实世界设置中难以计算。借鉴以前的文献，我们探讨了在强化学习（RL）环境中的策略目标导向性。在我们的研究结果中，我们提出了一个不同的家族定义策略目标导向性的定义，分析它是否可以被视为对于许多（稀疏）奖励函数近似最优。我们将这个初步的目标导向性定义操作化，并在玩具马尔可夫决策过程（MDP）环境中进行测试。此外，我们探索了如何在前沿大型语言模型（LLM）中测量目标导向性。我们提出了一个更简单和更容易计算的目标导向性定义，以接近AI系统是否可能追求危险目标的问题。基于我们的研究结果，我们建议进一步探索测量连贯性和目标导向性。

更新时间: 2024-10-07 01:34:42

领域: cs.LG,cs.AI,I.2.0

下载: http://arxiv.org/abs/2410.04683v1

SpinQuant: LLM quantization with learned rotations

Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures while enhancing quantization accuracy. In addition, we find that some random rotations lead to much better quantization than others, with an up to 13 points difference in downstream zero-shot reasoning performance. As a result, we propose SpinQuant, a novel approach that incorporates learned rotation matrices for optimal quantized network accuracy. With 4-bit quantization of weight, activation, and KV-cache, SpinQuant narrows the accuracy gap on zero-shot reasoning tasks with full precision to merely 2.9 points on the LLaMA-2 7B model, surpassing LLM-QAT by 19.1 points and SmoothQuant by 25.0 points. Furthermore, SpinQuant also outperforms concurrent work QuaRot, which applies random rotations to remove outliers. In particular, for LLaMA-3 8B models that are hard to quantize, SpinQuant reduces the gap to full precision by up to 45.1% relative to QuaRot.

Updated: 2024-10-07 01:27:59

标题: SpinQuant: 使用学习旋转的LLM量子化

摘要: 后训练量化（PTQ）技术应用于权重、激活和KV缓存，极大地减少了大型语言模型（LLMs）的内存使用、延迟和功耗，但在存在异常值时可能会导致较大的量化误差。旋转激活或权重矩阵有助于消除异常值并有益于量化。在这项工作中，我们确定了一系列适用的旋转参数化，可以在全精度Transformer架构中产生相同的输出，同时提高了量化精度。此外，我们发现一些随机旋转比其他旋转效果要好得多，下游零射推理性能相差高达13个点。因此，我们提出了SpinQuant，一种融合了学习旋转矩阵以实现最佳量化网络精度的新方法。通过对权重、激活和KV缓存进行4位量化，SpinQuant将零射推理任务的准确性差距缩小到仅为LLaMA-27B模型上的2.9个点，超过LLM-QAT 19.1个点和SmoothQuant 25.0个点。此外，SpinQuant还优于并发工作QuaRot，后者应用随机旋转来消除异常值。特别是对于难以量化的LLaMA-38B模型，SpinQuant将相对于QuaRot的全精度差距减少了高达45.1%。

更新时间: 2024-10-07 01:27:59

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.16406v3

FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists

Flavor development in the food industry is increasingly challenged by the need for rapid innovation and precise flavor profile creation. Traditional flavor research methods typically rely on iterative, subjective testing, which lacks the efficiency and scalability required for modern demands. This paper presents three contributions to address the challenges. Firstly, we define a new problem domain for scientific agents in flavor science, conceptualized as the generation of hypotheses for flavor profile sourcing and understanding. To facilitate research in this area, we introduce the FoodPuzzle, a challenging benchmark consisting of 978 food items and 1,766 flavor molecules profiles. We propose a novel Scientific Agent approach, integrating in-context learning and retrieval augmented techniques to generate grounded hypotheses in the domain of food science. Experimental results indicate that our model significantly surpasses traditional methods in flavor profile prediction tasks, demonstrating its potential to transform flavor development practices.

Updated: 2024-10-07 01:26:23

标题: FoodPuzzle：将大型语言模型代理作为风味科学家进行开发

摘要: 在食品行业中，口味的发展越来越受到挑战，需要快速创新和精确的口味配置。传统的口味研究方法通常依赖于迭代的、主观的测试，这种方法缺乏现代需求所需的效率和可扩展性。本文提出了三个解决这些挑战的方法。首先，我们为口味科学中的科学代理定义了一个新的问题领域，概念化为口味配置和理解的假设生成。为了促进这一领域的研究，我们引入了一个具有挑战性的基准测试FoodPuzzle，其中包含978种食品和1,766种口味分子配置。我们提出了一种新颖的科学代理方法，将上下文学习和检索增强技术整合在一起，以在食品科学领域生成扎实的假设。实验结果表明，我们的模型在口味配置预测任务中显著超越了传统方法，展示了它改变口味开发实践的潜力。

更新时间: 2024-10-07 01:26:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.12832v3

PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization

Parameter-Efficient Fine-Tuning (PEFT) effectively adapts pre-trained vision transformers to downstream tasks. However, the optimization for tasks performance often comes at the cost of generalizability in fine-tuned models. To address this issue, we theoretically connect smaller weight gradient norms during training and larger datasets to the improved model generalization. Motivated by this connection, we propose reducing gradient norms for enhanced generalization and aligning fine-tuned model with the pre-trained counterpart to retain knowledge from large-scale pre-training data. Yet, naive alignment does not guarantee gradient reduction and can potentially cause gradient explosion, complicating efforts to manage gradients. To address such issues, we propose PACE, marrying generalization of PArameter-efficient fine-tuning with Consistency rEgularization. We perturb features learned from the adapter with the multiplicative noise and ensure the fine-tuned model remains consistent for same sample under different perturbations. Theoretical analysis shows that PACE not only implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge. Experimental evidence supports our theories. PACE outperforms existing PEFT methods in four visual adaptation tasks: VTAB-1k, FGVC, few-shot learning and domain adaptation. Code will be available at https://github.com/MaxwellYaoNi/PACE

Updated: 2024-10-07 01:00:46

标题: PACE：将PArameter-efficient fine-tuning中的泛化与一致性正则化结合起来

摘要: Parameter-Efficient Fine-Tuning (PEFT)有效地将预训练的视觉transformers适应到下游任务中。然而，为了任务性能的优化往往会以微调模型的泛化能力为代价。为了解决这个问题，我们从理论上将训练过程中较小的权重梯度范数与较大的数据集连接起来，以提高模型的泛化性。在这种连接的启发下，我们提出减小梯度范数以增强泛化性，并将微调模型与预训练模型对齐，以保留来自大规模预训练数据的知识。然而，简单的对齐并不能保证梯度的减小，并且可能导致梯度爆炸，使得管理梯度变得更加复杂。为了解决这些问题，我们提出了PACE，将PArameter-efficient fine-tuning的泛化性与Consistency regularization相结合。我们通过对适配器学习的特征进行乘法噪声扰动，并确保微调模型在不同扰动下对相同样本保持一致。理论分析表明，PACE不仅隐含地对梯度进行正则化以提高泛化性，还隐含地对微调和预训练模型进行对齐以保留知识。实验证据支持了我们的理论。PACE在四项视觉适应任务中表现出色：VTAB-1k、FGVC、少样本学习和领域适应。代码将在https://github.com/MaxwellYaoNi/PACE 上提供。

更新时间: 2024-10-07 01:00:46

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2409.17137v2

The role of interface boundary conditions and sampling strategies for Schwarz-based coupling of projection-based reduced order models

This paper presents and evaluates a framework for the coupling of subdomain-local projection-based reduced order models (PROMs) using the Schwarz alternating method following a domain decomposition (DD) of the spatial domain on which a given problem of interest is posed. In this approach, the solution on the full domain is obtained via an iterative process in which a sequence of subdomain-local problems are solved, with information propagating between subdomains through transmission boundary conditions (BCs). We explore several new directions involving the Schwarz alternating method aimed at maximizing the method's efficiency and flexibility, and demonstrate it on three challenging two-dimensional nonlinear hyperbolic problems: the shallow water equations, Burgers' equation, and the compressible Euler equations. We demonstrate that, for a cell-centered finite volume discretization and a non-overlapping DD, it is possible to obtain a stable and accurate coupled model utilizing Dirichlet-Dirichlet (rather than Robin-Robin or alternating Dirichlet-Neumann) transmission BCs on the subdomain boundaries. We additionally explore the impact of boundary sampling when utilizing the Schwarz alternating method to couple subdomain-local hyper-reduced PROMs. Our numerical results suggest that the proposed methodology has the potential to improve PROM accuracy by enabling the spatial localization of these models via domain decomposition, and achieve up to two orders of magnitude speedup over equivalent coupled full order model solutions and moderate speedups over analogous monolithic solutions.

Updated: 2024-10-07 00:44:22

标题: 界面边界条件和采样策略在基于Schwarz耦合投影型减少模型中的作用

摘要: 本文提出并评估了一种框架，用于耦合基于子域局部投影的降阶模型（PROMs），采用施瓦茨交替方法在给定问题所在的空间域上进行域分解（DD）。在这种方法中，通过一个迭代过程获得全域的解，其中解决了一系列子域局部问题，信息通过传输边界条件（BCs）在子域之间传播。我们探索了几个新方向，涉及施瓦茨交替方法，旨在最大化该方法的效率和灵活性，并在三个具有挑战性的二维非线性双曲问题上进行了演示：浅水方程、Burgers方程和可压缩欧拉方程。我们证明，对于基于单元中心的有限体积离散化和非重叠的DD，可以利用子域边界上的迪利希特-迪利希特（而不是罗宾-罗宾或交替的迪利希特-诺伊曼）传输BCs获得稳定和准确的耦合模型。此外，当利用施瓦茨交替方法耦合子域局部超降阶PROMs时，我们还探讨了边界采样的影响。我们的数值结果表明，所提出的方法有潜力通过域分解实现这些模型的空间局部化，相比等效的耦合全阶模型解，可以实现高达两个数量级的加速，并且相比类似的单体解获得适度的加速。

更新时间: 2024-10-07 00:44:22

领域: math.NA,cs.LG,cs.NA,math-ph,math.MP

下载: http://arxiv.org/abs/2410.04668v1

Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates

This paper explores optimal architectures for evaluating the outputs of large language models (LLMs) using LLMs themselves. We propose a novel framework that interprets LLMs as advocates within an ensemble of interacting agents, allowing them to defend their answers and reach conclusions through a judge and jury system. This approach offers a more dynamic and comprehensive evaluation process compared to traditional human-based assessments or automated metrics. We discuss the motivation behind this framework, its key components, and comparative advantages. We also present a probabilistic model to evaluate the error reduction achieved by iterative advocate systems. Finally, we outline experiments to validate the effectiveness of multi-advocate architectures and discuss future research directions.

Updated: 2024-10-07 00:22:07

标题: 通过迭代辩论对大型语言模型进行敌对多智能体评估

摘要: 本文探讨了利用大型语言模型（LLMs）自身评估输出的最佳架构。我们提出了一个新颖的框架，将LLMs解释为在相互作用的代理人集合中的拥护者，使它们能够通过法官和陪审团系统捍卫他们的答案并得出结论。与传统基于人工评估或自动化指标相比，这种方法提供了更动态和全面的评估过程。我们讨论了这一框架背后的动机、其关键组成部分以及比较优势。我们还提出了一个概率模型来评估迭代拥护系统所实现的错误减少。最后，我们概述了用于验证多拥护者架构有效性的实验，并讨论了未来的研究方向。

更新时间: 2024-10-07 00:22:07

领域: cs.CL,cs.LG,cs.MA

下载: http://arxiv.org/abs/2410.04663v1

Federated Learning Nodes Can Reconstruct Peers' Image Data

Federated learning (FL) is a privacy-preserving machine learning framework that enables multiple nodes to train models on their local data and periodically average weight updates to benefit from other nodes' training. Each node's goal is to collaborate with other nodes to improve the model's performance while keeping its training data private. However, this framework does not guarantee data privacy. Prior work has shown that the gradient-sharing steps in FL can be vulnerable to data reconstruction attacks from an honest-but-curious central server. In this work, we show that an honest-but-curious node/client can also launch attacks to reconstruct peers' image data in a centralized system, presenting a severe privacy risk. We demonstrate that a single client can silently reconstruct other clients' private images using diluted information available within consecutive updates. We leverage state-of-the-art diffusion models to enhance the perceptual quality and recognizability of the reconstructed images, further demonstrating the risk of information leakage at a semantic level. This highlights the need for more robust privacy-preserving mechanisms that protect against silent client-side attacks during federated training.

Updated: 2024-10-07 00:18:35

标题: 联邦学习节点可以重建同行的图像数据

摘要: 联邦学习（FL）是一种保护隐私的机器学习框架，它使多个节点能够在本地数据上训练模型，并定期平均权重更新以从其他节点的训练中受益。每个节点的目标是与其他节点合作，以改善模型的性能，同时保持其训练数据私密。然而，这种框架并不能保证数据隐私。先前的研究表明，在FL中的梯度共享步骤可能会受到一个诚实但好奇的中央服务器的数据重构攻击的威胁。在本研究中，我们展示了一个诚实但好奇的节点/客户端也可以发动攻击，重构同行的图像数据在一个集中式系统中，呈现出严重的隐私风险。我们证明一个单个客户端可以利用连续更新中可用的稀疏信息，悄悄地重构其他客户端的私人图像。我们利用最先进的扩散模型来提高重构图像的感知质量和可识别性，进一步展示在语义级别上信息泄漏的风险。这突显了需要更强大的保护隐私机制，以防止在联邦训练期间发生悄悄的客户端攻击。

更新时间: 2024-10-07 00:18:35

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.04661v1

Knowledge Graph Based Agent for Complex, Knowledge-Intensive QA in Medicine

Biomedical knowledge is uniquely complex and structured, requiring distinct reasoning strategies compared to other scientific disciplines like physics or chemistry. Biomedical scientists do not rely on a single approach to reasoning; instead, they use various strategies, including rule-based, prototype-based, and case-based reasoning. This diversity calls for flexible approaches that accommodate multiple reasoning strategies while leveraging in-domain knowledge. We introduce KGARevion, a knowledge graph (KG) based agent designed to address the complexity of knowledge-intensive medical queries. Upon receiving a query, KGARevion generates relevant triplets by using the knowledge base of the LLM. These triplets are then verified against a grounded KG to filter out erroneous information and ensure that only accurate, relevant data contribute to the final answer. Unlike RAG-based models, this multi-step process ensures robustness in reasoning while adapting to different models of medical reasoning. Evaluations on four gold-standard medical QA datasets show that KGARevion improves accuracy by over 5.2%, outperforming 15 models in handling complex medical questions. To test its capabilities, we curated three new medical QA datasets with varying levels of semantic complexity, where KGARevion achieved a 10.4% improvement in accuracy.

Updated: 2024-10-07 00:17:37

标题: 基于知识图谱的医学领域复杂、知识密集型问答系统代理

摘要: 生物医学知识具有独特的复杂性和结构化特点，与物理或化学等其他科学学科相比，需要不同的推理策略。生物医学科学家不依赖于单一的推理方法；相反，他们使用各种策略，包括基于规则、基于原型和基于案例的推理。这种多样性需要灵活的方法，能够容纳多种推理策略，同时利用领域内的知识。我们介绍了KGARevion，这是一个基于知识图谱（KG）的代理程序，旨在解决知识密集型医学查询的复杂性。收到查询后，KGARevion使用LLM的知识库生成相关的三元组。然后，这些三元组通过一个基础的知识图谱进行验证，以过滤出错误信息，并确保只有准确、相关的数据对最终答案有贡献。与基于RAG的模型不同，这个多步骤过程确保了推理的稳健性，同时适应不同的医学推理模式。对四个黄金标准医学问答数据集的评估显示，KGARevion将准确率提高了超过5.2%，在处理复杂的医学问题方面胜过了15个模型。为了测试其能力，我们精选了三个具有不同语义复杂度水平的医学问答数据集，KGARevion在准确率上实现了10.4%的提升。

更新时间: 2024-10-07 00:17:37

领域: cs.AI

下载: http://arxiv.org/abs/2410.04660v1

Contrastive Learning to Improve Retrieval for Real-world Fact Checking

Recent work on fact-checking addresses a realistic setting where models incorporate evidence retrieved from the web to decide the veracity of claims. A bottleneck in this pipeline is in retrieving relevant evidence: traditional methods may surface documents directly related to a claim, but fact-checking complex claims requires more inferences. For instance, a document about how a vaccine was developed is relevant to addressing claims about what it might contain, even if it does not address them directly. We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for this setting. By leveraging the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents, we fine-tune Contriever with a contrastive objective based on multiple training signals, including distillation from GPT-4, evaluating subquestion answers, and gold labels in the dataset. We evaluate our model on both retrieval and end-to-end veracity judgments about claims. On the AVeriTeC dataset, we find a 6\% improvement in veracity classification accuracy. We also show our gains can be transferred to FEVER, ClaimDecomp, HotpotQA, and a synthetic dataset requiring retrievers to make inferences.

Updated: 2024-10-07 00:09:50

标题: 对比学习以提高现实世界事实核查的检索

摘要: 最近关于事实核查的研究涉及到一个现实设置，其中模型整合从网络中检索到的证据来决定声明的真实性。这一流程中的瓶颈在于检索相关证据：传统方法可能会提供与声明直接相关的文档，但核查复杂的声明需要更多推断。例如，关于疫苗开发的文档与处理有关其可能包含什么的声明相关，即使它并没有直接涉及这些声明。我们提出了Contrastive Fact-Checking Reranker (CFR)，这是一个针对这一设置的改进的检索器。通过利用AVeriTeC数据集，该数据集用人类编写的证据文档回答声明的子问题进行注释，我们使用对比目标对Contriever进行微调，基于多个训练信号，包括从GPT-4中蒸馏，评估子问题答案以及数据集中的金标签。我们对声明的检索和端到端真实性判断评估了我们的模型。在AVeriTeC数据集上，我们发现真实性分类准确度提高了6％。我们还展示了我们的收益可以转移到FEVER、ClaimDecomp、HotpotQA以及需要检索器进行推理的合成数据集。

更新时间: 2024-10-07 00:09:50

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.04657v1