Arxiv Day: Article

Planning In Natural Language Improves LLM Search For Code Generation

While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs, leading to inefficient search due to models repeatedly sampling highly similar, yet incorrect generations. We empirically demonstrate that this lack of diversity can be mitigated by searching over candidate plans for solving a problem in natural language. Based on this insight, we propose PlanSearch, a novel search algorithm which shows strong results across HumanEval+, MBPP+, and LiveCodeBench (a contamination-free benchmark for competitive coding). PlanSearch generates a diverse set of observations about the problem and then uses these observations to construct plans for solving the problem. By searching over plans in natural language rather than directly over code solutions, PlanSearch explores a significantly more diverse range of potential solutions compared to baseline search methods. Using PlanSearch on top of Claude 3.5 Sonnet achieves a state-of-the-art pass@200 of 77.0% on LiveCodeBench, outperforming both the best score achieved without search (pass@1 = 41.4%) and using standard repeated sampling (pass@200 = 60.6%). Finally, we show that, across all models, search algorithms, and benchmarks analyzed, we can accurately predict performance gains due to search as a direct function of the diversity over generated ideas. Code can be found at https://github.com/scaleapi/plansearch.

Updated: 2024-10-18 23:53:07

标题: 自然语言规划改进了LLM代码生成的搜索

摘要: 尽管扩展训练计算已经在大型语言模型（LLMs）中取得了显著的改进，但扩展推理计算尚未产生类似的收益。我们假设一个核心缺失的组成部分是缺乏多样化的LLM输出，导致由于模型重复采样高度相似但不正确的生成物而导致搜索效率低下。我们通过实证方法证明，通过搜索自然语言中解决问题的候选计划，可以缓解这种缺乏多样性。基于这一观点，我们提出了PlanSearch，一种新颖的搜索算法，它在HumanEval+、MBPP+和LiveCodeBench（一个无污染的竞争编码基准）等方面取得了强大的结果。PlanSearch生成了关于问题的多样化观察，并利用这些观察来构建解决问题的计划。通过在自然语言中搜索计划，而不是直接搜索代码解决方案，PlanSearch探索了比基准搜索方法更广泛的潜在解决方案范围。在LiveCodeBench上使用PlanSearch，Claude 3.5 Sonnet实现了77.0%的最新通过率，优于无搜索时取得的最佳分数（通过率为41.4%）和使用标准重复采样（通过率为60.6%）。最后，我们展示了，通过分析所有模型、搜索算法和基准测试，我们可以准确预测由于搜索而产生的性能增益，这是由于生成的想法的多样性而直接作为函数。代码可以在https://github.com/scaleapi/plansearch找到。

更新时间: 2024-10-18 23:53:07

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.03733v2

A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models

The robustness of Vision-Language Models (VLMs) such as CLIP is critical for their deployment in safety-critical applications like autonomous driving, healthcare diagnostics, and security systems, where accurate interpretation of visual and textual data is essential. However, these models are highly susceptible to adversarial attacks, which can severely compromise their performance and reliability in real-world scenarios. Previous methods have primarily focused on improving robustness through adversarial training and generating adversarial examples using models like FGSM, AutoAttack, and DeepFool. However, these approaches often rely on strong assumptions, such as fixed perturbation norms or predefined attack patterns, and involve high computational complexity, making them challenging to implement in practical settings. In this paper, we propose a novel adversarial training framework that integrates multiple attack strategies and advanced machine learning techniques to significantly enhance the robustness of VLMs against a broad range of adversarial attacks. Experiments conducted on real-world datasets, including CIFAR-10 and CIFAR-100, demonstrate that the proposed method significantly enhances model robustness. The fine-tuned CLIP model achieved an accuracy of 43.5% on adversarially perturbed images, compared to only 4% for the baseline model. The neural network model achieved a high accuracy of 98% in these challenging classification tasks, while the XGBoost model reached a success rate of 85.26% in prediction tasks.

Updated: 2024-10-18 23:47:46

标题: 一个增强视觉-语言模型对抗鲁棒性的混合防御策略

摘要: 视觉语言模型（VLMs）的鲁棒性，如CLIP，对于它们在自动驾驶、医疗诊断和安全系统等安全关键应用中的部署至关重要，其中对视觉和文本数据的准确解释至关重要。然而，这些模型极易受到对抗性攻击的影响，这可能严重损害它们在现实场景中的性能和可靠性。先前的方法主要集中在通过对抗性训练和使用FGSM、AutoAttack和DeepFool等模型生成对抗性示例来提高鲁棒性。然而，这些方法通常依赖于强假设，如固定扰动规范或预定义的攻击模式，并涉及高计算复杂性，使其难以在实际环境中实施。本文提出了一个新颖的对抗性训练框架，将多种攻击策略和先进的机器学习技术整合在一起，显著增强VLMs对广泛范围的对抗性攻击的鲁棒性。在包括CIFAR-10和CIFAR-100在内的真实数据集上进行的实验证明了所提出的方法显著增强了模型的鲁棒性。经过微调的CLIP模型在对抗性扰动图像上达到了43.5%的准确率，而基线模型仅为4%。神经网络模型在这些具有挑战性的分类任务中达到了98%的高准确率，而XGBoost模型在预测任务中达到了85.26%的成功率。

更新时间: 2024-10-18 23:47:46

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14911v1

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of merely final outcomes, shows great potential in enhancing students' reasoning capabilities. However, current methods struggle with sequence level KD under long-tailed data distributions, adversely affecting generalization on sparsely represented domains. We introduce the Multi-Stage Balanced Distillation (BalDistill) framework, which iteratively balances training data within a fixed computational budget. By dynamically selecting representative head domain examples and synthesizing tail domain examples, BalDistill achieves state-of-the-art performance across diverse long-tailed datasets, enhancing both the efficiency and efficacy of the distilled models.

Updated: 2024-10-18 23:46:40

标题: 多阶段平衡蒸馏：解决序列级知识蒸馏中的长尾挑战

摘要: 大型语言模型(LLMs)已经显著提升了各种自然语言处理任务，但部署它们仍然需要大量计算资源。知识蒸馏(KD)是一个有希望的解决方案，使得能够从更大的教师LLMs向更紧凑的学生模型传递能力。特别是，序列级KD，它不仅可以提炼最终结果，还可以提炼基于原理的推理过程，显示出在增强学生推理能力方面具有巨大潜力。然而，当前的方法在长尾数据分布下面对序列级KD的困难，对稀疏表示的领域的泛化能力产生了不利影响。我们介绍了多阶段平衡蒸馏(BalDistill)框架，该框架在固定的计算预算内迭代平衡训练数据。通过动态选择代表性的头领域示例和合成尾领域示例，BalDistill在各种长尾数据集上实现了最先进的性能，增强了被蒸馏模型的效率和功效。

更新时间: 2024-10-18 23:46:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13114v2

SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing

Effective processing, interpretation, and management of sensor data have emerged as a critical component of cyber-physical systems. Traditionally, processing sensor data requires profound theoretical knowledge and proficiency in signal-processing tools. However, recent works show that Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems. To explore this potential, we construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective. The benchmark incorporates diverse real-world sensor datasets for various tasks. The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks with parameter selections compared to engineering experts. Additionally, we investigate four prompting strategies for sensor processing and show that self-verification can outperform all other baselines in 48% of tasks. Our study provides a comprehensive benchmark and prompting analysis for future developments, paving the way toward an LLM-based sensor processing copilot.

Updated: 2024-10-18 23:29:49

标题: SensorBench：基于编码的传感器处理中LLM的基准测试

摘要: 有效处理、解释和管理传感器数据已经成为网络物理系统的一个关键组成部分。传统上，处理传感器数据需要深厚的理论知识和熟练掌握信号处理工具。然而，最近的研究表明，大型语言模型(LLMs)在处理感知数据方面具有很有前途的能力，表明它们有潜力成为开发感知系统的副驾驶员。为了探索这种潜力，我们构建了一个全面的基准测试，SensorBench，以建立一个可量化的目标。该基准测试整合了各种各样的真实世界传感器数据集用于各种任务。结果显示，虽然LLMs在较简单的任务中表现出相当的熟练度，但与工程专家相比，它们在处理具有参数选择的组合任务时面临固有挑战。此外，我们研究了四种传感器处理的提示策略，并表明自我验证在48%的任务中可以超越所有其他基线。我们的研究为未来的发展提供了全面的基准测试和提示分析，为基于LLM的传感器处理副驾驶员铺平了道路。

更新时间: 2024-10-18 23:29:49

领域: cs.AI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.10741v2

From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items

LLMs can now perform a variety of complex writing tasks. They also excel in answering questions pertaining to natural language inference and commonsense reasoning. Composing these questions is itself a skilled writing task, so in this paper we consider LLMs as authors of commonsense assessment items. We prompt LLMs to generate items in the style of a prominent benchmark for commonsense reasoning, the Choice of Plausible Alternatives (COPA). We examine the outcome according to analyses facilitated by the LLMs and human annotation. We find that LLMs that succeed in answering the original COPA benchmark are also more successful in authoring their own items.

Updated: 2024-10-18 22:42:23

标题: 从答题到出题：审视LLM创作常识评估项目

摘要: LLMs现在可以执行各种复杂的写作任务。他们在回答涉及自然语言推理和常识推理的问题方面也表现出色。撰写这些问题本身就是一个熟练的写作任务，因此在本文中，我们将LLMs视为常识评估项目的作者。我们提示LLMs以Choice of Plausible Alternatives (COPA)的风格生成项目。我们根据LLMs和人类注释所实现的分析来检查结果。我们发现，在成功回答原始COPA基准测试的LLMs中，也更成功地创作他们自己的项目。

更新时间: 2024-10-18 22:42:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14897v1

Truncated Consistency Models

Consistency models have recently been introduced to accelerate sampling from diffusion models by directly predicting the solution (i.e., data) of the probability flow ODE (PF ODE) from initial noise. However, the training of consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. This task is much more challenging than the ultimate objective of one-step generation, which only concerns the PF ODE's noise-to-data mapping. We empirically find that this training paradigm limits the one-step generation performance of consistency models. To address this issue, we generalize consistency training to the truncated time range, which allows the model to ignore denoising tasks at earlier time steps and focus its capacity on generation. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution. Experiments on CIFAR-10 and ImageNet $64\times64$ datasets show that our method achieves better one-step and two-step FIDs than the state-of-the-art consistency models such as iCT-deep, using more than 2$\times$ smaller networks. Project page: https://truncated-cm.github.io/

Updated: 2024-10-18 22:38:08

标题: 截断一致性模型

摘要: 一致性模型最近被引入以加速从扩散模型中进行抽样，通过直接预测概率流ODE（PF ODE）的解（即数据）来实现。然而，一致性模型的训练需要学习将沿着PF ODE轨迹的所有中间点映射到它们对应的端点。这个任务比仅涉及PF ODE的噪声到数据映射的一步生成的最终目标要具有挑战性得多。我们在实践中发现，这种训练范式限制了一致性模型的一步生成性能。为了解决这个问题，我们将一致性训练推广到截断时间范围，这允许模型忽略较早时间步的去噪任务，并将其容量集中在生成上。我们提出了一种一致性函数的新参数化和一个两阶段训练过程，防止截断时间训练崩溃为一个平凡的解决方案。在CIFAR-10和ImageNet $64\times64$数据集上的实验表明，我们的方法比iCT-deep等最先进的一致性模型实现了更好的一步和两步FID，同时使用更小的网络。项目页面：https://truncated-cm.github.io/

更新时间: 2024-10-18 22:38:08

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.14895v1

Soft-Label Integration for Robust Toxicity Classification

Toxicity classification in textual content remains a significant problem. Data with labels from a single annotator fall short of capturing the diversity of human perspectives. Therefore, there is a growing need to incorporate crowdsourced annotations for training an effective toxicity classifier. Additionally, the standard approach to training a classifier using empirical risk minimization (ERM) may fail to address the potential shifts between the training set and testing set due to exploiting spurious correlations. This work introduces a novel bi-level optimization framework that integrates crowdsourced annotations with the soft-labeling technique and optimizes the soft-label weights by Group Distributionally Robust Optimization (GroupDRO) to enhance the robustness against out-of-distribution (OOD) risk. We theoretically prove the convergence of our bi-level optimization algorithm. Experimental results demonstrate that our approach outperforms existing baseline methods in terms of both average and worst-group accuracy, confirming its effectiveness in leveraging crowdsourced annotations to achieve more effective and robust toxicity classification.

Updated: 2024-10-18 22:36:03

标题: 软标签集成用于强大的毒性分类

摘要: 在文本内容中进行毒性分类仍然是一个重要问题。来自单一注释者的带标签数据不足以捕捉人类观点的多样性。因此，越来越需要将众包注释纳入培训有效毒性分类器的过程中。此外，使用经验风险最小化（ERM）训练分类器的标准方法可能无法解决由于利用虚假相关性而导致的训练集和测试集之间的潜在偏移问题。本文介绍了一个新颖的双层优化框架，将众包注释与软标签技术相结合，并通过群体分布鲁棒优化（GroupDRO）优化软标签权重，以增强对分布外（OOD）风险的鲁棒性。我们在理论上证明了我们的双层优化算法的收敛性。实验结果表明，我们的方法在平均和最差组准确率方面优于现有基准方法，证实了其利用众包注释实现更有效和更稳健的毒性分类的有效性。

更新时间: 2024-10-18 22:36:03

领域: cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.14894v1

Reasoning, Memorization, and Fine-Tuning Language Models for Non-Cooperative Games

We develop a method that integrates the tree of thoughts and multi-agent framework to enhance the capability of pre-trained language models in solving complex, unfamiliar games. The method decomposes game-solving into four incremental tasks -- game summarization, area selection, action extraction, and action validation -- each assigned to a specific language-model agent. By constructing a tree of thoughts, the method simulates reasoning paths and allows agents to collaboratively distill game representations and tactics, mitigating the limitations of language models in reasoning and long-term memorization. Additionally, an automated fine-tuning process further optimizes the agents' performance by ranking query-response pairs based on game outcomes, e.g., winning or losing. We apply the method to a non-cooperative game and demonstrate a 65 percent winning rate against benchmark algorithms, with an additional 10 percent improvement after fine-tuning. In contrast to existing deep learning algorithms for game solving that require millions of training samples, the proposed method consumes approximately 1000 training samples, highlighting its efficiency and scalability.

Updated: 2024-10-18 22:28:22

标题: 推理、记忆和优化语言模型在非合作游戏中的应用

摘要: 我们开发了一种方法，该方法将思维树和多智能体框架整合在一起，以增强预训练语言模型在解决复杂、陌生游戏中的能力。该方法将游戏解决分解为四个增量任务——游戏总结、区域选择、行动提取和行动验证——每个任务分配给一个特定的语言模型智能体。通过构建思维树，该方法模拟推理路径，使智能体能够协作提炼游戏表示和策略，减轻语言模型在推理和长期记忆方面的局限性。此外，一个自动微调过程通过基于游戏结果（例如赢或输）对查询-响应对进行排名，进一步优化了智能体的性能。我们将该方法应用于一个非合作游戏，并展示了与基准算法相比的65%的获胜率，在微调后还有额外的10%的提升。与现有的需要数百万训练样本的游戏解决深度学习算法相比，所提出的方法仅消耗大约1000个训练样本，突出了其效率和可扩展性。

更新时间: 2024-10-18 22:28:22

领域: cs.AI

下载: http://arxiv.org/abs/2410.14890v1

Self-Satisfied: An end-to-end framework for SAT generation and prediction

The boolean satisfiability (SAT) problem asks whether there exists an assignment of boolean values to the variables of an arbitrary boolean formula making the formula evaluate to True. It is well-known that all NP-problems can be coded as SAT problems and therefore SAT is important both practically and theoretically. From both of these perspectives, better understanding the patterns and structure implicit in SAT data is of significant value. In this paper, we describe several advances that we believe will help open the door to such understanding: we introduce hardware accelerated algorithms for fast SAT problem generation, a geometric SAT encoding that enables the use of transformer architectures typically applied to vision tasks, and a simple yet effective technique we term head slicing for reducing sequence length representation inside transformer architectures. These advances allow us to scale our approach to SAT problems with thousands of variables and tens of thousands of clauses. We validate our architecture, termed Satisfiability Transformer (SaT), on the SAT prediction task with data from the SAT Competition (SATComp) 2022 problem sets. Prior related work either leveraged a pure machine learning approach, but could not handle SATComp-sized problems, or was hybrid in the sense of integrating a machine learning component in a standard SAT solving tool. Our pure machine learning approach achieves prediction accuracies comparable to recent work, but on problems that are an order of magnitude larger than previously demonstrated. A fundamental aspect of our work concerns the very nature of SAT data and its suitability for training machine learning models. We both describe experimental results that probe the landscape of where SAT data can be successfully used for learning and position these results within the broader context of complexity and learning.

Updated: 2024-10-18 22:25:54

标题: 自满：一种用于SAT生成和预测的端到端框架

摘要: 布尔可满足性（SAT）问题询问是否存在一种将布尔值分配给任意布尔公式变量的方法，使得该公式评估为真。众所周知，所有NP问题都可以编码为SAT问题，因此SAT在实践和理论上都很重要。从这两个角度来看，更好地理解SAT数据中隐含的模式和结构具有重要价值。在本文中，我们描述了几项我们认为将有助于打开这种理解之门的进展：我们引入了用于快速SAT问题生成的硬件加速算法，一种几何SAT编码，使得可以使用通常应用于视觉任务的Transformer架构，以及一种简单而有效的技术，我们称之为头部切片，用于减少Transformer架构中序列长度的表示。这些进展使我们能够将我们的方法扩展到具有数千个变量和数万个子句的SAT问题。我们在SAT竞赛（SATComp）2022问题集的数据上验证了我们的架构，称为可满足性Transformer（SaT）。先前的相关工作要么利用了纯机器学习方法，但无法处理SATComp规模的问题，要么是混合的，即在标准SAT求解工具中集成了一个机器学习组件。我们的纯机器学习方法在预测准确性上达到了与最近工作相当的水平，但解决的问题规模比以前演示的要大一个数量级。我们工作的一个基本方面涉及SAT数据的本质及其适用于训练机器学习模型的情况。我们描述了试验结果，探讨了SAT数据可以成功用于学习的领域，并将这些结果置于复杂性和学习的更广泛背景中。

更新时间: 2024-10-18 22:25:54

领域: cs.LG,cs.AI,cs.LO,03D99,I.5.2; I.5.1; I.2.3; F.0

下载: http://arxiv.org/abs/2410.14888v1

Zero-shot Generalist Graph Anomaly Detection with Unified Neighborhood Prompts

Graph anomaly detection (GAD), which aims to identify nodes in a graph that significantly deviate from normal patterns, plays a crucial role in broad application domains. Existing GAD methods, whether supervised or unsupervised, are one-model-for-one-dataset approaches, i.e., training a separate model for each graph dataset. This limits their applicability in real-world scenarios where training on the target graph data is not possible due to issues like data privacy. To overcome this limitation, we propose a novel zero-shot generalist GAD approach UNPrompt that trains a one-for-all detection model, requiring the training of one GAD model on a single graph dataset and then effectively generalizing to detect anomalies in other graph datasets without any retraining or fine-tuning. The key insight in UNPrompt is that i) the predictability of latent node attributes can serve as a generalized anomaly measure and ii) highly generalized normal and abnormal graph patterns can be learned via latent node attribute prediction in a properly normalized node attribute space. UNPrompt achieves generalist GAD through two main modules: one module aligns the dimensionality and semantics of node attributes across different graphs via coordinate-wise normalization in a projected space, while another module learns generalized neighborhood prompts that support the use of latent node attribute predictability as an anomaly score across different datasets. Extensive experiments on real-world GAD datasets show that UNPrompt significantly outperforms diverse competing methods under the generalist GAD setting, and it also has strong superiority under the one-model-for-one-dataset setting.

Updated: 2024-10-18 22:23:59

标题: Zero-shot通用图异常检测与统一邻域提示

摘要: 图异常检测（GAD）旨在识别图中明显偏离正常模式的节点，在广泛的应用领域发挥着关键作用。现有的GAD方法，无论是监督还是无监督的，都是一种模型对一种数据集的方法，即为每个图数据集训练一个单独的模型。这限制了它们在现实场景中的适用性，因为由于数据隐私等问题，无法对目标图数据进行训练。为了克服这一限制，我们提出了一种新颖的零样本通用GAD方法UNPrompt，它训练一个适用于所有检测模型，只需在单个图数据集上训练一个GAD模型，然后有效地泛化到其他图数据集中检测异常，无需重新训练或微调。UNPrompt的关键见解是：i）潜在节点属性的可预测性可以作为通用的异常度量；ii）通过在适当归一化的节点属性空间中预测潜在节点属性，可以学习高度泛化的正常和异常图模式。UNPrompt通过两个主要模块实现通用GAD：一个模块通过在投影空间中进行逐坐标归一化来对齐不同图中节点属性的维度和语义，另一个模块学习支持在不同数据集中使用潜在节点属性可预测性作为异常分数的泛化邻域提示。对真实世界GAD数据集的大量实验表明，UNPrompt在通用GAD设置下明显优于各种竞争方法，并且在一种模型对一种数据集的设置下也具有很强的优势。

更新时间: 2024-10-18 22:23:59

领域: cs.LG

下载: http://arxiv.org/abs/2410.14886v1

Class-RAG: Content Moderation with Retrieval Augmented Generation

Robust content moderation classifiers are essential for the safety of Generative AI systems. Content moderation, or safety classification, is notoriously ambiguous: differences between safe and unsafe inputs are often extremely subtle, making it difficult for classifiers (and indeed, even humans) to properly distinguish violating vs. benign samples without further context or explanation. Furthermore, as these technologies are deployed across various applications and audiences, scaling risk discovery and mitigation through continuous model fine-tuning becomes increasingly challenging and costly. To address these challenges, we propose a Classification approach employing Retrieval-Augmented Generation (Class-RAG). Class-RAG extends the capability of its base LLM through access to a retrieval library which can be dynamically updated to enable semantic hotfixing for immediate, flexible risk mitigation. Compared to traditional fine-tuned models, Class-RAG demonstrates flexibility and transparency in decision-making. As evidenced by empirical studies, Class-RAG outperforms on classification and is more robust against adversarial attack. Besides, our findings suggest that Class-RAG performance scales with retrieval library size, indicating that increasing the library size is a viable and low-cost approach to improve content moderation.

Updated: 2024-10-18 22:07:36

标题: Class-RAG:使用检索增强生成的内容调节

摘要: 强大的内容审核分类器对生成式人工智能系统的安全至关重要。内容审核，或安全分类，通常存在模糊性：安全和不安全输入之间的差异通常非常微妙，使得分类器（甚至人类）很难在没有进一步上下文或解释的情况下正确区分违规和良性样本。此外，随着这些技术在各种应用和受众中部署，通过持续模型微调来扩展风险发现和缓解变得越来越具有挑战性和昂贵。为了解决这些挑战，我们提出了一种采用检索增强生成（Class-RAG）的分类方法。Class-RAG通过访问一个可以动态更新的检索库来扩展其基本LLM的功能，以实现即时、灵活的风险缓解的语义热修复。与传统的微调模型相比，Class-RAG在决策制定方面表现出灵活性和透明度。根据实证研究，Class-RAG在分类上表现更优，对抗性攻击更加稳健。此外，我们的研究结果表明，Class-RAG的性能随着检索库的增加而提高，这表明增加库大小是改善内容审核的一种可行且低成本的方法。

更新时间: 2024-10-18 22:07:36

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14881v1

Vital Insight: Assisting Experts' Sensemaking Process of Multi-modal Personal Tracking Data Using Visualization and LLM

Researchers have long recognized the socio-technical gaps in personal tracking research, where machines can never fully model the complexity of human behavior, making it only able to produce basic rule-based outputs or "black-box" results that lack clear explanations. Real-world deployments rely on experts for this complex translation from sparse data to meaningful insights. In this study, we consider this translation process from data to insights by experts as "sensemaking" and explore how HCI researchers can support it through Vital Insight, an evidence-based 'sensemaking' system that combines direct representation and indirect inference through visualization and Large Language Models. We evaluate Vital Insight in user testing sessions with 14 experts in multi-modal tracking, synthesize design implications, and develop an expert sensemaking model where they iteratively move between direct data representations and AI-supported inferences to explore, retrieve, question, and validate insights.

Updated: 2024-10-18 21:56:35

标题: 重要见解：利用可视化和LLM辅助专家对多模式个人跟踪数据的理解过程

摘要: 研究人员长期以来一直认识到个人追踪研究中的社会技术差距，机器永远无法完全模拟人类行为的复杂性，因此只能产生基本基于规则的输出或缺乏明确解释的“黑匣子”结果。现实世界的部署依赖于专家对稀疏数据进行复杂转换以获得有意义的洞察。在这项研究中，我们将专家将数据转化为洞察的过程称为“感知”，并探讨人机交互研究人员如何通过基于证据的“感知”系统Vital Insight来支持这一过程，该系统通过可视化和大型语言模型结合直接表征和间接推理。我们通过在14名多模式追踪专家的用户测试会话中评估Vital Insight，综合设计启示，并开发专家感知模型，他们可以在直接数据表征和AI支持的推理之间迭代移动，以探索、检索、质疑和验证洞察。

更新时间: 2024-10-18 21:56:35

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.14879v1

Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection

As LLMs increase in accessibility, LLM-generated texts have proliferated across several fields, such as scientific, academic, and creative writing. However, LLMs are not created equally; they may have different architectures and training datasets. Thus, some LLMs may be more challenging to detect than others. Using two datasets spanning four total writing domains, we train AI-generated (AIG) text classifiers using the LibAUC library - a deep learning library for training classifiers with imbalanced datasets. Our results in the Deepfake Text dataset show that AIG-text detection varies across domains, with scientific writing being relatively challenging. In the Rewritten Ivy Panda (RIP) dataset focusing on student essays, we find that the OpenAI family of LLMs was substantially difficult for our classifiers to distinguish from human texts. Additionally, we explore possible factors that could explain the difficulties in detecting OpenAI-generated texts.

Updated: 2024-10-18 21:42:37

标题: 哪些LLM难以检测？对导致LLM文本检测困难的潜在因素进行详细分析

摘要: 随着低语言模型（LLM）的可访问性增加，LLM生成的文本在科学、学术和创意写作等多个领域中不断增加。然而，LLM并非一视同仁；它们可能具有不同的架构和训练数据集。因此，有些LLM可能比其他LLM更难检测。我们使用涵盖四个总写作领域的两个数据集，使用LibAUC库训练人工智能生成（AIG）文本分类器-这是一个用于训练具有不平衡数据集的深度学习库。我们在Deepfake Text数据集中的结果显示，AIG文本检测在不同领域之间存在差异，科学写作相对较具挑战性。在专注于学生作文的Rewritten Ivy Panda（RIP）数据集中，我们发现OpenAI系列的LLM对我们的分类器来说与人类文本难以区分。此外，我们探讨可能解释OpenAI生成文本检测困难的因素。

更新时间: 2024-10-18 21:42:37

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14875v1

How to Evaluate Reward Models for RLHF

We introduce a new benchmark for reward models that quantifies their ability to produce strong language models through RLHF (Reinforcement Learning from Human Feedback). The gold-standard approach is to run a full RLHF training pipeline and directly probe downstream LLM performance. However, this process is prohibitively expensive. To address this, we build a predictive model of downstream LLM performance by evaluating the reward model on proxy tasks. These proxy tasks consist of a large-scale human preference and a verifiable correctness preference dataset, in which we measure 12 metrics across 12 domains. To investigate which reward model metrics are most correlated to gold-standard RLHF outcomes, we launch an end-to-end RLHF experiment on a large-scale crowdsourced human preference platform to view real reward model downstream performance as ground truth. Ultimately, we compile our data and findings into Preference Proxy Evaluations (PPE), the first reward model benchmark explicitly linked to post-RLHF real-world human preference performance, which we open-source for public use and further development. Our code and evaluations can be found at https://github.com/lmarena/PPE .

Updated: 2024-10-18 21:38:21

标题: 如何评估RLHF的奖励模型

摘要: 我们引入了一个新的基准，用于评估奖励模型产生强大语言模型的能力，通过RLHF（从人类反馈中学习强化学习）。黄金标准方法是运行完整的RLHF训练流程，并直接探究下游LLM性能。然而，这个过程成本过高。为了解决这个问题，我们建立了一个预测模型，通过评估奖励模型在代理任务上的表现来评估下游LLM性能。这些代理任务包括大规模的人类偏好和可验证的正确性偏好数据集，我们在其中测量了12个领域中的12个指标。为了研究哪些奖励模型指标与黄金标准RLHF结果最相关，我们在一个大规模的众包人类偏好平台上启动了一个端到端的RLHF实验，以查看真实奖励模型下游性能作为基准。最终，我们将我们的数据和发现汇编成偏好代理评估（PPE），这是第一个明确与RLHF后的真实世界人类偏好表现相关联的奖励模型基准，我们开源供公众使用和进一步发展。我们的代码和评估可在https://github.com/lmarena/PPE 找到。

更新时间: 2024-10-18 21:38:21

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14872v1

SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network

We investigate the inherent bias of Stochastic Gradient Descent (SGD) toward learning low-rank weight matrices during the training of deep neural networks. Our results demonstrate that training with mini-batch SGD and weight decay induces a bias toward rank minimization in the weight matrices. Specifically, we show both theoretically and empirically that this bias becomes more pronounced with smaller batch sizes, higher learning rates, or stronger weight decay. Additionally, we predict and empirically confirm that weight decay is essential for this bias to occur. Unlike previous literature, our analysis does not rely on assumptions about the data, convergence, or optimality of the weight matrices, making it applicable to a wide range of neural network architectures of any width or depth. Finally, we empirically explore the connection between this bias and generalization, finding that it has a marginal effect on the test performance.

Updated: 2024-10-18 21:32:39

标题: SGD和权重衰减隐秘地最小化您神经网络的秩

摘要: 我们研究了随机梯度下降（SGD）在深度神经网络训练过程中对学习低秩权重矩阵的固有偏差。我们的结果表明，使用小批量SGD和权重衰减进行训练会导致权重矩阵中的秩最小化偏差。具体来说，我们理论上和实证上展示了这种偏差在批量大小较小、学习率较高或权重衰减较强时会变得更加明显。此外，我们预测并实证确认了权重衰减是这种偏差发生的必要条件。不同于以往的文献，我们的分析不依赖于对数据、收敛性或权重矩阵的最优性的假设，因此适用于任何宽度或深度的神经网络结构。最后，我们实证探讨了这种偏差与泛化之间的关系，发现它对测试性能的影响较小。

更新时间: 2024-10-18 21:32:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2206.05794v7

Algorithmic Challenges in Ensuring Fairness at the Time of Decision

Algorithmic decision-making in societal contexts, such as retail pricing, loan administration, recommendations on online platforms, etc., can be framed as stochastic optimization under bandit feedback, which typically requires experimentation with different decisions for the sake of learning. Such experimentation often results in perceptions of unfairness among people impacted by these decisions; for instance, there have been several recent lawsuits accusing companies that deploy algorithmic pricing practices of price gouging. Motivated by the changing legal landscape surrounding algorithmic decision-making, we introduce the well-studied fairness notion of envy-freeness within the context of stochastic convex optimization. Our notion requires that upon receiving decisions in the present time, groups do not envy the decisions received by any of the other groups, both in the present as well as the past. This results in a novel trajectory-constrained stochastic optimization problem that renders existing techniques inapplicable. The main technical contribution of this work is to show problem settings where there is no gap in achievable regret (up to logarithmic factors) when envy-freeness is imposed. In particular, in our main result, we develop a near-optimal envy-free algorithm that achieves $\tilde{O}(\sqrt{T})$ regret for smooth convex functions that satisfy the PL inequality. This algorithm has a coordinate-descent structure, in which we carefully leverage gradient information to ensure monotonic sampling along each dimension, while avoiding overshooting the constrained optimum with high probability. This latter aspect critically uses smoothness and the structure of the envy-freeness constraints, while the PL inequality allows for sufficient progress towards the optimal solution. We discuss several open questions that arise from this analysis, which may be of independent interest.

Updated: 2024-10-18 21:29:47

标题: 决策时确保公平性的算法挑战

摘要: 在社会背景下的算法决策，比如零售定价、贷款管理、在线平台推荐等，可以被视为带有赌徒反馈的随机优化问题，通常需要通过尝试不同的决策来学习。这种实验通常会导致受到这些决策影响的人们认为不公平；例如，最近有几起诉讼指控采用算法定价实践的公司定价过高。受围绕算法决策变化的法律环境的启发，我们在随机凸优化的背景下引入了公平性概念——无嫉妒性。我们的概念要求，在收到当前时间的决策后，各组不会嫉妒其他组收到的决策，无论是在现在还是过去。这导致了一个新颖的轨迹受限随机优化问题，使得现有技术不适用。这项工作的主要技术贡献在于展示了在实施无嫉妒性时，可以达到可行后悔的问题设置（最多是对数因子）。特别是在我们的主要结果中，我们开发了一个接近最优的无嫉妒算法，对于满足PL不等式的光滑凸函数，实现了$\tilde{O}(\sqrt{T})$的后悔。这个算法具有坐标下降结构，我们仔细利用梯度信息确保沿着每个维度进行单调采样，同时避免以高概率超越受限制的最优解。后一方面关键地利用了光滑性和无嫉妒性约束的结构，而PL不等式允许朝着最优解取得足够的进展。我们讨论了分析中出现的几个开放问题，这些问题可能是独立感兴趣的。

更新时间: 2024-10-18 21:29:47

领域: cs.LG,cs.AI,cs.CY,math.OC

下载: http://arxiv.org/abs/2103.09287v3

Joint Verification and Refinement of Language Models for Safety-Constrained Planning

Although pre-trained language models can generate executable plans (e.g., programmatic policies) for solving robot tasks, the generated plans may violate task-relevant logical specifications due to the models' black-box nature. A significant gap remains between the language models' outputs and verifiable executions of plans. We develop a method to generate executable plans and formally verify them against task-relevant safety specifications. Given a high-level task description in natural language, the proposed method queries a language model to generate plans in the form of executable robot programs. It then converts the generated plan into an automaton-based representation, allowing formal verification of the automaton against the specifications. We prove that given a set of verified plans, the composition of these plans also satisfies the safety specifications. This proof ensures the safety of complex, multi-component plans, obviating the computation complexity of verifying the composed plan. We then propose an automated fine-tuning process that refines the language model to generate specification-compliant plans without the need for human labeling. The empirical results show a 30 percent improvement in the probability of generating plans that meet task specifications after fine-tuning.

Updated: 2024-10-18 21:16:30

标题: 联合验证和语言模型的细化用于受限安全规划

摘要: 尽管预训练语言模型可以生成可执行计划（例如，程序化策略）来解决机器人任务，但由于模型的黑盒特性，生成的计划可能会违反与任务相关的逻辑规范。语言模型的输出与计划的可验证执行之间仍存在显著差距。我们开发了一种方法来生成可执行计划，并针对任务相关的安全规范进行正式验证。给定自然语言中的高级任务描述，所提出的方法查询语言模型以生成以可执行机器人程序形式的计划。然后将生成的计划转换为基于自动机的表示，允许对自动机针对规范进行形式验证。我们证明，给定一组经过验证的计划，这些计划的组合也满足安全规范。这一证明确保了复杂、多组件计划的安全性，消除了验证组合计划的计算复杂性。然后，我们提出了一种自动微调过程，用于完善语言模型以生成符合规范的计划，无需人工标记。实证结果显示，在微调后，生成符合任务规范的计划的概率提高了30%。

更新时间: 2024-10-18 21:16:30

领域: cs.AI,cs.FL,cs.RO

下载: http://arxiv.org/abs/2410.14865v1

Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets

Paralinguistic traits like cognitive load and emotion are increasingly recognized as pivotal areas in speech recognition research, often examined through specialized datasets like CLSE and IEMOCAP. However, the integrity of these datasets is seldom scrutinized for text-dependency. This paper critically evaluates the prevalent assumption that machine learning models trained on such datasets genuinely learn to identify paralinguistic traits, rather than merely capturing lexical features. By examining the lexical overlap in these datasets and testing the performance of machine learning models, we expose significant text-dependency in trait-labeling. Our results suggest that some machine learning models, especially large pre-trained models like HuBERT, might inadvertently focus on lexical characteristics rather than the intended paralinguistic features. The study serves as a call to action for the research community to reevaluate the reliability of existing datasets and methodologies, ensuring that machine learning models genuinely learn what they are designed to recognize.

Updated: 2024-10-18 20:46:05

标题: 超越标签：揭示语音识别数据中的文本依赖性

摘要: 类似认知负荷和情感等语言特征在语音识别研究中被越来越认可，通常通过CLSE和IEMOCAP等专门数据集进行研究。然而，这些数据集的完整性很少受到文本依赖性的审查。本文批判性地评估了机器学习模型在这些数据集上训练时真正学习识别语言特征的普遍假设，而不仅仅是捕捉词汇特征。通过检查这些数据集中的词汇重叠并测试机器学习模型的性能，我们揭示了特征标记中的显着文本依赖性。我们的结果表明，一些机器学习模型，特别是像HuBERT这样的大型预训练模型，可能无意中专注于词汇特征，而不是预期的语言特征。该研究呼吁研究界重新评估现有数据集和方法的可靠性，确保机器学习模型真正学习它们被设计来识别的内容。

更新时间: 2024-10-18 20:46:05

领域: eess.AS,cs.LG,eess.SP

下载: http://arxiv.org/abs/2403.07767v2

DFlow: Diverse Dialogue Flow Simulation with Large Language Models

Developing language model-based dialogue agents requires effective data to train models that can follow specific task logic. However, most existing data augmentation methods focus on increasing diversity in language, topics, or dialogue acts at the utterance level, largely neglecting a critical aspect of task logic diversity at the dialogue level. This paper proposes a novel data augmentation method designed to enhance the diversity of synthetic dialogues by focusing on task execution logic. Our method uses LLMs to generate decision tree-structured task plans, which enables the derivation of diverse dialogue trajectories for a given task. Each trajectory, referred to as a "dialog flow", guides the generation of a multi-turn dialogue that follows a unique trajectory. We apply this method to generate a task-oriented dialogue dataset comprising 3,886 dialogue flows across 15 different domains. We validate the effectiveness of this dataset using the next action prediction task, where models fine-tuned on our dataset outperform strong baselines, including GPT-4. Upon acceptance of this paper, we plan to release the code and data publicly.

Updated: 2024-10-18 20:35:28

标题: DFlow: 利用大型语言模型进行多样化对话流程模拟

摘要: 开发基于语言模型的对话代理需要有效的数据来训练能够遵循特定任务逻辑的模型。然而，大多数现有的数据增强方法侧重于增加语言、主题或话语行为在话语级别的多样性，很大程度上忽视了对话级别的任务逻辑多样性这一关键方面。本文提出了一种新颖的数据增强方法，旨在通过关注任务执行逻辑来增强合成对话的多样性。我们的方法使用LLM生成决策树结构的任务计划，从而能够为给定任务推导出多样的对话轨迹。每条轨迹，被称为“对话流”，指导生成一个遵循独特轨迹的多轮对话。我们应用这种方法生成一个包含15个不同领域中3,886个对话流的面向任务的对话数据集。我们使用下一个动作预测任务验证了这个数据集的有效性，其中在我们的数据集上微调的模型胜过了包括GPT-4在内的强基线。在本文被接受后，我们计划公开发布代码和数据。

更新时间: 2024-10-18 20:35:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14853v1

FedSpaLLM: Federated Pruning of Large Language Models

Large Language Models (LLMs) achieve state-of-the-art performance but are challenging to deploy due to their high computational and storage demands. Pruning can reduce model size, yet existing methods assume public access to calibration data, which is impractical for privacy-sensitive applications. To address the challenge of pruning LLMs in privacy-preserving settings, we propose FedSpaLLM, the first federated learning framework designed specifically for pruning LLMs. FedSpaLLM enables clients to prune their models locally based on private data while accounting for system heterogeneity and maintaining communication efficiency. Our framework introduces several key innovations: (1) a novel $\ell_0$-norm aggregation function that ensures only non-zero weights are averaged across clients, preserving important model parameters; (2) an adaptive mask expansion technique that meets global sparsity targets while accommodating client-specific pruning decisions; and (3) a layer sampling strategy that reduces communication overhead and personalizes the pruning process based on client resources. Extensive experiments show that FedSpaLLM improves pruning performance in diverse federated settings. The source code will be released upon publication.

Updated: 2024-10-18 20:33:12

标题: FedSpaLLM：大型语言模型的联合剪枝

摘要: 大型语言模型（LLMs）实现了最先进的性能，但由于其高计算和存储需求，部署具有挑战性。修剪可以减小模型大小，但现有方法假定可以公开访问校准数据，这对于注重隐私的应用程序是不切实际的。为了解决在保护隐私设置中修剪LLMs的挑战，我们提出了FedSpaLLM，这是专门设计用于修剪LLMs的第一个联邦学习框架。FedSpaLLM使客户端能够基于私有数据本地修剪其模型，同时考虑系统异构性并保持通信效率。我们的框架引入了几个关键创新：（1）一种新颖的$\ell_0$-范数聚合函数，确保只有非零权重在客户端之间平均，保留重要的模型参数；（2）一种自适应掩码扩展技术，满足全局稀疏目标同时适应客户端特定的修剪决策；和（3）一种层采样策略，减少通信开销并根据客户端资源个性化修剪过程。大量实验表明，FedSpaLLM在不同的联邦设置中改善了修剪性能。源代码将在发表后发布。

更新时间: 2024-10-18 20:33:12

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.14852v1

Variational Distillation of Diffusion Policies into Mixture of Experts

This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models. Specifically, VDD leverages a decompositional upper bound of the variational objective that allows the training of each expert separately, resulting in a robust optimization scheme for MoEs. VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training MoE.

Updated: 2024-10-18 20:28:06

标题: 将扩散策略变分蒸馏为专家混合模型

摘要: 这项工作介绍了变分扩散蒸馏（VDD），这是一种将去噪扩散策略蒸馏成专家混合模型（MoE）的新方法，通过变分推断实现。由于扩散模型在准确学习和表示复杂、多模态分布方面的出色能力，它们是生成建模领域的最新技术。这种能力使得扩散模型能够复制人类行为中固有的多样性，使它们成为行为学习（如从人类演示中学习）中首选的模型。然而，扩散模型也存在一些缺点，包括似然性的难以计算和由于其迭代抽样过程而产生的长推理时间。特别是推理时间对于实时应用，如机器人控制，构成了重大挑战。相比之下，MoEs有效地解决了上述问题，同时保留了表示复杂分布的能力，但训练难度很大。VDD是第一种将预训练的扩散模型蒸馏为MoE模型的方法，因此，结合了扩散模型的表达能力和混合模型的好处。具体而言，VDD利用变分目标的分解上界，允许对每个专家进行训练，从而为MoE提供了稳健的优化方案。VDD在九个复杂的行为学习任务中表现出，它能够：i）精确地蒸馏扩散模型学习的复杂分布，ii）胜过现有的最新蒸馏方法，iii）超越传统的MoE训练方法。

更新时间: 2024-10-18 20:28:06

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.12538v2

Metric Flows with Neural Networks

We develop a general theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincar\'e conjecture. We demonstrate that such fixed kernel regimes lead to poor learning of numerical Calabi-Yau metrics, as is expected since the associated neural networks do not learn features. Conversely, we demonstrate that well-learned numerical metrics at finite-width exhibit an evolving metric-NTK, associated with feature learning. Our theory of neural network metric flows therefore explains why neural networks are better at learning Calabi-Yau metrics than fixed kernel methods, such as the Ricci flow.

Updated: 2024-10-18 20:21:22

标题: 用神经网络的度量流量

摘要: 我们发展了一个关于由神经网络梯度下降所诱导的黎曼度量空间中流动的普遍理论。这在一定程度上受到了最近利用神经网络逼近Calabi-Yau度量的进展的启发，并且得益于对神经网络空间中流动的理解的最新进展。我们推导出相应的度量流动方程，这些方程受度量神经切向核控制，这是一个随时间演化的复杂的非局部对象。然而，许多体系具有一个宽度无穷的极限，在这种情况下核将固定，动力学简化。额外的假设可以引入流动中的局部性，从而实现Perelman关于Ricci流的表述，该表述被用来解决三维Poincaré猜想。我们证明了这种固定核区域导致数值Calabi-Yau度量的学习效果不佳，这是因为相关的神经网络未学习特征。相反，我们证明了在有限宽度下良好学习的数值度量展示了一个随时间演化的度量-NTK，与特征学习相关。因此，我们的神经网络度量流理论解释了为什么神经网络在学习Calabi-Yau度量方面比固定核方法（如Ricci流）更好。

更新时间: 2024-10-18 20:21:22

领域: hep-th,cs.LG,math.DG

下载: http://arxiv.org/abs/2310.19870v2

Koopman Learning with Episodic Memory

Koopman operator theory has found significant success in learning models of complex, real-world dynamical systems, enabling prediction and control. The greater interpretability and lower computational costs of these models, compared to traditional machine learning methodologies, make Koopman learning an especially appealing approach. Despite this, little work has been performed on endowing Koopman learning with the ability to leverage its own failures. To address this, we equip Koopman methods - developed for predicting non-autonomous time-series - with an episodic memory mechanism, enabling global recall of (or attention to) periods in time where similar dynamics previously occurred. We find that a basic implementation of Koopman learning with episodic memory leads to significant improvements in prediction on synthetic and real-world data. Our framework has considerable potential for expansion, allowing for future advances, and opens exciting new directions for Koopman learning.

Updated: 2024-10-18 20:18:47

标题: 使用记忆片段的Koopman学习

摘要: Koopman算子理论在学习复杂、现实世界动态系统模型方面取得了重要成功，实现了预测和控制。与传统机器学习方法相比，这些模型具有更高的可解释性和更低的计算成本，使Koopman学习成为一种特别吸引人的方法。尽管如此，很少有研究致力于赋予Koopman学习利用其失败的能力。为了解决这个问题，我们为Koopman方法配备了一种用于预测非自主时间序列的情节记忆机制，使其能够全局地回忆（或关注）之前发生相似动态的时间段。我们发现，具有情节记忆的Koopman学习的基本实现在合成数据和现实世界数据的预测方面取得了显著改进。我们的框架具有较大的扩展潜力，为未来的进步开辟了新的方向，为Koopman学习打开了令人激动的新方向。

更新时间: 2024-10-18 20:18:47

领域: math.DS,cs.LG

下载: http://arxiv.org/abs/2311.12615v2

Identifying built environment factors influencing driver yielding behavior at unsignalized intersections: A naturalistic open-source dataset collected in Minnesota

Many factors influence the yielding result of a driver-pedestrian interaction, including traffic volume, vehicle speed, roadway characteristics, etc. While individual aspects of these interactions have been explored, comprehensive, naturalistic studies, particularly those considering the built environment's influence on driver-yielding behavior, are lacking. To address this gap, our study introduces an extensive open-source dataset, compiled from video data at 18 unsignalized intersections across Minnesota. Documenting more than 3000 interactions, this dataset provides a detailed view of driver-pedestrian interactions and over 50 distinct contextual variables. The data, which covers individual driver-pedestrian interactions and contextual factors, is made publicly available at https://github.com/tianyi17/pedestrian_yielding_data_MN. Using logistic regression, we developed a classification model that predicts driver yielding based on the identified variables. Our analysis indicates that vehicle speed, the presence of parking lots, proximity to parks or schools, and the width of major road crossings significantly influence driver yielding at unsignalized intersections. Through our findings and by publishing one of the most comprehensive driver-pedestrian datasets in the United States, our study will support communities across Minnesota and the United States in their ongoing efforts to improve road safety for pedestrians and be helpful for automated vehicle design.

Updated: 2024-10-18 20:11:32

标题: 确定影响司机在无信号路口让行行为的建筑环境因素：明尼苏达州自然开源数据集的收集

摘要: 许多因素影响了驾驶员-行人互动的产出结果，包括交通量、车速、道路特征等。虽然这些互动的个别方面已经得到探讨，但综合自然主义研究，特别是考虑了建筑环境对驾驶员让行行为的影响的研究还不足。为了填补这一空白，我们的研究引入了一个广泛的开源数据集，该数据集由明尼苏达州18个非信号化十字路口的视频数据编制而成。这一数据集记录了3000多个互动过程，提供了对驾驶员-行人互动和50多个不同环境变量的详细视图。这些数据涵盖了个别驾驶员-行人互动和环境因素，并可在https://github.com/tianyi17/pedestrian_yielding_data_MN 上公开获取。我们利用逻辑回归开发了一个基于识别变量预测驾驶员让行行为的分类模型。我们的分析表明，车速、停车场的存在、邻近公园或学校以及主要道路过街口的宽度显著影响非信号化十字路口的驾驶员让行行为。通过我们的研究结果并发表美国最全面的驾驶员-行人数据集之一，我们的研究将支持明尼苏达和美国各地社区不断努力改善行人道路安全，并有助于自动驾驶车辆设计。

更新时间: 2024-10-18 20:11:32

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2312.15113v2

Transfer Learning Adapts to Changing PSD in Gravitational Wave Data

The detection of gravitational waves has opened unparalleled opportunities for observing the universe, particularly through the study of black hole inspirals. These events serve as unique laboratories to explore the laws of physics under conditions of extreme energies. However, significant noise in gravitational wave (GW) data from observatories such as Advanced LIGO and Virgo poses major challenges in signal identification. Traditional noise suppression methods often fall short in fully addressing the non-Gaussian effects in the data, including the fluctuations in noise power spectral density (PSD) over short time intervals. These challenges have led to the exploration of an AI approach that, while overcoming previous obstacles, introduced its own challenges, such as scalability, reliability issues, and the vanishing gradient problem. Our approach addresses these issues through a simplified architecture. To compensate for the potential limitations of a simpler model, we have developed a novel training methodology that enables it to accurately detect gravitational waves amidst highly complex noise. Employing this strategy, our model achieves over 99% accuracy in non-white noise scenarios and shows remarkable adaptability to changing noise PSD conditions. By leveraging the principles of transfer learning, our model quickly adapts to new noise profiles with just a few epochs of fine-tuning, facilitating real-time applications in dynamically changing noise environments.

Updated: 2024-10-18 20:08:01

标题: 迁移学习在引力波数据中适应变化的PSD

摘要: 引力波的探测为观测宇宙开辟了前所未有的机遇，特别是通过对黑洞螺旋运动的研究。这些事件为探索在极端能量条件下的物理定律提供了独特的实验室。然而，来自先进LIGO和Virgo等观测站的引力波（GW）数据中存在显著噪音，这给信号识别带来了重大挑战。传统的噪声抑制方法通常无法充分解决数据中的非高斯效应，包括噪声功率谱密度（PSD）在短时间间隔内的波动。这些挑战促使人们探索了一种人工智能方法，虽然克服了以往的障碍，但也引入了自己的挑战，比如可扩展性、可靠性问题和消失梯度问题。我们的方法通过简化的架构解决了这些问题。为了弥补更简单模型的潜在局限性，我们开发了一种新颖的训练方法，使其能够准确地在高度复杂的噪声中检测引力波。采用这种策略，我们的模型在非白噪声场景中实现了超过99%的准确率，并展现出对不断变化的噪声PSD条件的显著适应性。通过利用迁移学习原则，我们的模型可以在几次微调周期内快速适应新的噪声配置文件，从而在动态变化的噪声环境中实现实时应用。

更新时间: 2024-10-18 20:08:01

领域: gr-qc,astro-ph.HE,astro-ph.IM,cs.AI

下载: http://arxiv.org/abs/2410.11911v2

TIMeSynC: Temporal Intent Modelling with Synchronized Context Encodings for Financial Service Applications

Users engage with financial services companies through multiple channels, often interacting with mobile applications, web platforms, call centers, and physical locations to service their accounts. The resulting interactions are recorded at heterogeneous temporal resolutions across these domains. This multi-channel data can be combined and encoded to create a comprehensive representation of the customer's journey for accurate intent prediction. This demands sequential learning solutions. NMT transformers achieve state-of-the-art sequential representation learning by encoding context and decoding for the next best action to represent long-range dependencies. However, three major challenges exist while combining multi-domain sequences within an encode-decoder transformers architecture for intent prediction applications: a) aligning sequences with different sampling rates b) learning temporal dynamics across multi-variate, multi-domain sequences c) combining dynamic and static sequences. We propose an encoder-decoder transformer model to address these challenges for contextual and sequential intent prediction in financial servicing applications. Our experiments show significant improvement over the existing tabular method.

Updated: 2024-10-18 20:05:47

标题: TIMeSynC: 用于金融服务应用的同步上下文编码的时间意图建模

摘要: 用户通过多个渠道与金融服务公司进行互动，通常与移动应用程序、网络平台、呼叫中心和实体位置进行交互，以服务他们的账户。由此产生的互动以异构的时间分辨率记录在这些领域之间。这种多渠道数据可以结合和编码，以创建客户旅程的全面表示，以进行准确的意图预测。这需要顺序学习解决方案。NMT变压器通过编码上下文和解码下一个最佳动作来表示长距离依赖关系，实现了最先进的顺序表示学习。然而，在为意图预测应用程序结合多领域序列时，存在三个主要挑战：a) 对齐具有不同采样率的序列 b) 学习跨多变量、多领域序列的时间动态 c) 结合动态和静态序列。我们提出了一个编码-解码变压器模型，以解决金融服务应用程序中的上下文和顺序意图预测的挑战。我们的实验显示与现有表格方法相比有显著改进。

更新时间: 2024-10-18 20:05:47

领域: q-fin.GN,cs.LG

下载: http://arxiv.org/abs/2410.12825v2

Predictive variational inference: Learn the predictively optimal posterior distribution

Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification. We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density such that the resulting posterior predictive distribution is as close to the true data generating process as possible, while this this closeness is measured by multiple scoring rules. By optimizing the objective, the predictive variational inference is generally not the same as, or even attempting to approximate, the Bayesian posterior, even asymptotically. Rather, we interpret it as implicit hierarchical expansion. Further, the learned posterior uncertainty detects heterogeneity of parameters among the population, enabling automatic model diagnosis. This framework applies to both likelihood-exact and likelihood-free models. We demonstrate its application in real data examples.

Updated: 2024-10-18 19:44:57

标题: 预测变分推断：学习预测最优后验分布

摘要: 香草变分推断找到了贝叶斯后验分布的最佳近似，但即使精确的贝叶斯后验在模型错误规范下通常也不具有意义。我们提出了预测性变分推断（PVI）：一种通用的推断框架，旨在寻找并从最佳后验密度中抽样，使得所得后验预测分布尽可能接近真实的数据生成过程，同时通过多个评分规则来衡量这种接近程度。通过优化目标，预测性变分推断通常不同于贝叶斯后验，甚至在渐近情况下也不是在尝试逼近贝叶斯后验。相反，我们将其解释为隐含的层次扩展。此外，学习后验的不确定性可以检测人群参数的异质性，实现自动模型诊断。该框架适用于似然准确和似然无模型。我们在真实数据示例中展示了其应用。

更新时间: 2024-10-18 19:44:57

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.14843v1

Multi-Task Dynamic Pricing in Credit Market with Contextual Information

We study the dynamic pricing problem faced by a broker that buys and sells a large number of financial securities in the credit market, such as corporate bonds, government bonds, loans, and other credit-related securities. One challenge in pricing these securities is their infrequent trading, which leads to insufficient data for individual pricing. However, many of these securities share structural features that can be utilized. Building on this, we propose a multi-task dynamic pricing framework that leverages these shared structures across securities, enhancing pricing accuracy through learning. In our framework, a security is fully characterized by a $d$ dimensional contextual/feature vector. The customer will buy (sell) the security from the broker if the broker quotes a price lower (higher) than that of the competitors. We assume a linear contextual model for the competitor's pricing, with unknown parameters a prior. The parameters for pricing different securities may or may not be similar to each other. The firm's objective is to minimize the expected regret, namely, the expected revenue loss against a clairvoyant policy which has the knowledge of the parameters of the competitor's pricing model. We show that the regret of our policy is better than both a policy that treats each security individually and a policy that treats all securities as the same. Moreover, the regret is bounded by $\tilde{O} ( \delta_{\max} \sqrt{T M d} + M d ) $, where $M$ is the number of securities and $\delta_{\max}$ characterizes the overall dissimilarity across securities in the basket.

Updated: 2024-10-18 19:37:36

标题: 在具有背景信息的信用市场中的多任务动态定价

摘要: 我们研究了一家经纪人面临的动态定价问题，该经纪人在信贷市场上购买和出售大量金融证券，如公司债券、政府债券、贷款和其他与信贷相关的证券。定价这些证券的一个挑战是它们交易不频繁，导致个别定价数据不足。然而，许多这些证券具有可以利用的结构特征。基于此，我们提出了一个多任务动态定价框架，利用这些证券之间共享的结构，通过学习提高定价的准确性。在我们的框架中，一个证券由一个$d$维的上下文/特征向量完全描述。如果经纪人的报价低于（高于）竞争对手的价格，客户将从经纪人那里购买（出售）该证券。我们假设竞争对手的定价遵循一个线性上下文模型，其中参数是先验未知的。定价不同证券的参数可能相似，也可能不相似。公司的目标是最小化预期遗憾，即预期收入损失与具有对竞争对手定价模型参数知识的预见政策相比。我们展示了我们政策的遗憾比单独对待每个证券的政策和将所有证券视为相同的政策都要好。此外，遗憾受到界限$\tilde{O} ( \delta_{\max} \sqrt{T M d} + M d ) $的限制，其中$M$是证券数量，$\delta_{\max}$表征篮子中证券之间的整体差异。

更新时间: 2024-10-18 19:37:36

领域: q-fin.PR,cs.LG

下载: http://arxiv.org/abs/2410.14839v1

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

Large Language Models (LLMs) have been widely adopted to process long-context tasks. However, the large memory overhead of the key-value (KV) cache poses significant challenges in long-context scenarios. Existing training-free KV cache compression methods typically focus on quantization and token pruning, which have compression limits, and excessive sparsity can lead to severe performance degradation. Other methods design new architectures with less KV overhead but require significant training overhead. To address the above two drawbacks, we further explore the redundancy in the channel dimension and apply an architecture-level design with minor training costs. Therefore, we introduce CSKV, a training-efficient Channel Shrinking technique for KV cache compression: (1) We first analyze the singular value distribution of the KV cache, revealing significant redundancy and compression potential along the channel dimension. Based on this observation, we propose using low-rank decomposition for key and value layers and storing the low-dimension features. (2) To preserve model performance, we introduce a bi-branch KV cache, including a window-based full-precision KV cache and a low-precision compressed KV cache. (3) To reduce the training costs, we minimize the layer-wise reconstruction loss for the compressed KV cache instead of retraining the entire LLMs. Extensive experiments show that CSKV can reduce the memory overhead of the KV cache by 80% while maintaining the model's long-context capability. Moreover, we show that our method can be seamlessly combined with quantization to further reduce the memory overhead, achieving a compression ratio of up to 95%. Code is available at https://github.com/wln20/CSKV.

Updated: 2024-10-18 19:30:35

标题: CSKV：长上下文场景中KV缓存的训练高效通道收缩

摘要: 大型语言模型（LLMs）被广泛采用用于处理长文本任务。然而，键值（KV）缓存的大内存开销在长文本情况下会带来重大挑战。现有的无培训KV缓存压缩方法通常专注于量化和令牌修剪，这些方法有压缩限制，过度的稀疏性可能导致严重的性能下降。其他方法设计了新的架构，减少KV开销，但需要较大的训练开销。为了解决上述两个缺点，我们进一步探索通道维度中的冗余，并应用一种架构级设计，带有较小的训练成本。因此，我们引入CSKV，一种用于KV缓存压缩的训练高效的通道收缩技术：（1）我们首先分析KV缓存的奇异值分布，揭示了通道维度中的显著冗余和压缩潜力。基于这一观察，我们提出使用低秩分解用于键和值层，并存储低维特征。（2）为了保持模型性能，我们引入了一个双分支KV缓存，包括基于窗口的全精度KV缓存和低精度压缩KV缓存。（3）为了减少训练成本，我们最小化对压缩KV缓存的逐层重建损失，而不是重新训练整个LLMs。大量实验表明，CSKV可以将KV缓存的内存开销减少80%，同时保持模型的长文本能力。此外，我们展示了我们的方法可以与量化无缝结合，进一步降低内存开销，实现高达95%的压缩比。代码可在https://github.com/wln20/CSKV找到。

更新时间: 2024-10-18 19:30:35

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.10593v3

Deep Radar Inverse Sensor Models for Dynamic Occupancy Grid Maps

To implement autonomous driving, one essential step is to model the vehicle environment based on the sensor inputs. Radars, with their well-known advantages, became a popular option to infer the occupancy state of grid cells surrounding the vehicle. To tackle data sparsity and noise of radar detections, we propose a deep learning-based Inverse Sensor Model (ISM) to learn the mapping from sparse radar detections to polar measurement grids. Improved lidar-based measurement grids are used as reference. The learned radar measurement grids, combined with radar Doppler velocity measurements, are further used to generate a Dynamic Grid Map (DGM). Experiments in real-world highway scenarios show that our approach outperforms the hand-crafted geometric ISMs. In comparison to state-of-the-art deep learning methods, our approach is the first one to learn a single-frame measurement grid in the polar scheme from radars with a limited Field Of View (FOV). The learning framework makes the learned ISM independent of the radar mounting. This enables us to flexibly use one or more radar sensors without network retraining and without requirements on 360{\deg} sensor coverage.

Updated: 2024-10-18 19:29:23

标题: 深度雷达动态占据栅格地图的逆传感器模型

摘要: 为了实现自动驾驶，一个至关重要的步骤是基于传感器输入对车辆环境进行建模。雷达以其众所周知的优势成为推断车辆周围网格单元占用状态的流行选择。为了解决雷达检测数据稀疏性和噪声问题，我们提出了一种基于深度学习的逆传感器模型（ISM），以学习从稀疏雷达检测到极坐标测量网格的映射。改进的基于激光雷达的测量网格被用作参考。学习到的雷达测量网格，结合雷达多普勒速度测量，进一步用于生成动态网格地图（DGM）。在真实世界的高速公路场景中的实验证明，我们的方法优于手工制作的几何ISM。与最先进的深度学习方法相比，我们的方法是第一个从具有有限视野的雷达学习极坐标方案中的单帧测量网格。学习框架使学习到的ISM独立于雷达安装。这使我们能够灵活地使用一个或多个雷达传感器，而无需重新训练网络，也无需对360°传感器覆盖范围有要求。

更新时间: 2024-10-18 19:29:23

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2305.12409v3

Rank Suggestion in Non-negative Matrix Factorization: Residual Sensitivity to Initial Conditions (RSIC)

Determining the appropriate rank in Non-negative Matrix Factorization (NMF) is a critical challenge that often requires extensive parameter tuning and domain-specific knowledge. Traditional methods for rank determination focus on identifying a single optimal rank, which may not capture the complex structure inherent in real-world datasets. In this study, we introduce a novel approach called Residual Sensitivity to Intial Conditions (RSIC) that suggests potentially multiple ranks of interest by analyzing the sensitivity of the relative residuals (e.g. relative reconstruction error) to different initializations. By computing the Mean Coordinatewise Interquartile Range (MCI) of the residuals across multiple random initializations, our method identifies regions where the NMF solutions are less sensitive to initial conditions and potentially more meaningful. We evaluate RSIC on a diverse set of datasets, including single-cell gene expression data, image data, and text data, and compare it against current state-of-the-art existing rank determination methods. Our experiments demonstrate that RSIC effectively identifies relevant ranks consistent with the underlying structure of the data, outperforming traditional methods in scenarios where they are computationally infeasible or less accurate. This approach provides a more scalable and generalizable solution for rank determination in NMF that does not rely on domain-specific knowledge or assumptions.

Updated: 2024-10-18 19:18:57

标题: 非负矩阵分解中的等级建议：对初始条件的残差敏感性 (RSIC)

摘要: 确定非负矩阵分解（NMF）中的适当秩是一个关键挑战，通常需要广泛的参数调整和领域特定知识。传统的秩确定方法侧重于识别单个最佳秩，这可能无法捕捉现实世界数据集中固有的复杂结构。在本研究中，我们引入了一种名为初始条件残余敏感性（RSIC）的新方法，通过分析相对残差（例如相对重构误差）对不同初始化的敏感性，提出了可能的多个感兴趣的秩。通过计算多个随机初始化的残差的均值坐标间四分位距离（MCI），我们的方法确定了NMF解在初始条件下不太敏感的区域，并可能更有意义。我们在各种数据集上评估了RSIC，包括单细胞基因表达数据、图像数据和文本数据，并将其与当前最先进的现有秩确定方法进行比较。我们的实验表明，RSIC有效地识别了与数据底层结构一致的相关秩，在计算上不可行或不够准确的情况下，优于传统方法。这种方法为NMF中的秩确定提供了一种更具可扩展性和泛化性的解决方案，不依赖于领域特定知识或假设。

更新时间: 2024-10-18 19:18:57

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.14838v1

Topological obstruction to the training of shallow ReLU neural networks

Studying the interplay between the geometry of the loss landscape and the optimization trajectories of simple neural networks is a fundamental step for understanding their behavior in more complex settings. This paper reveals the presence of topological obstruction in the loss landscape of shallow ReLU neural networks trained using gradient flow. We discuss how the homogeneous nature of the ReLU activation function constrains the training trajectories to lie on a product of quadric hypersurfaces whose shape depends on the particular initialization of the network's parameters. When the neural network's output is a single scalar, we prove that these quadrics can have multiple connected components, limiting the set of reachable parameters during training. We analytically compute the number of these components and discuss the possibility of mapping one to the other through neuron rescaling and permutation. In this simple setting, we find that the non-connectedness results in a topological obstruction, which, depending on the initialization, can make the global optimum unreachable. We validate this result with numerical experiments.

Updated: 2024-10-18 19:17:48

标题: 浅层ReLU神经网络训练的拓扑障碍

摘要: 研究损失景观几何形状与简单神经网络优化轨迹之间的相互作用是理解它们在更复杂环境中行为的基本步骤。本文揭示了在使用梯度流训练的浅层ReLU神经网络的损失景观中存在拓扑障碍。我们讨论了ReLU激活函数的均匀性如何限制训练轨迹位于依赖于网络参数特定初始化的二次超曲面乘积上。当神经网络的输出是单个标量时，我们证明这些二次曲面可以具有多个连通分量，限制了训练过程中可达到的参数集。我们分析计算了这些分量的数量，并讨论通过神经元重新缩放和置换将一个映射到另一个的可能性。在这种简单设置中，我们发现非连通性导致了拓扑障碍，这取决于初始化，可能使全局最优值无法达到。我们通过数值实验验证了这一结果。

更新时间: 2024-10-18 19:17:48

领域: cs.LG,math.AG,math.AT,68T07 (Primary) 57N65, 14R05 (Secondary),I.2.6

下载: http://arxiv.org/abs/2410.14837v1

Automated Road Extraction from Satellite Imagery Integrating Dense Depthwise Dilated Separable Spatial Pyramid Pooling with DeepLabV3+

Road Extraction is a sub-domain of Remote Sensing applications; it is a subject of extensive and ongoing research. The procedure of automatically extracting roads from satellite imagery encounters significant challenges due to the multi-scale and diverse structures of roads; improvement in this field is needed. The DeepLab series, known for its proficiency in semantic segmentation due to its efficiency in interpreting multi-scale objects' features, addresses some of these challenges caused by the varying nature of roads. The present work proposes the utilization of DeepLabV3+, the latest version of the DeepLab series, by introducing an innovative Dense Depthwise Dilated Separable Spatial Pyramid Pooling (DenseDDSSPP) module and integrating it in place of the conventional Atrous Spatial Pyramid Pooling (ASPP) module. This modification enhances the extraction of complex road structures from satellite images. This study hypothesizes that the integration of DenseDDSSPP, combined with an appropriately selected backbone network and a Squeeze-and-Excitation block, will generate an efficient dense feature map by focusing on relevant features, leading to more precise and accurate road extraction from Remote Sensing images. The results section presents a comparison of our model's performance against state-of-the-art models, demonstrating better results that highlight the effectiveness and success of the proposed approach.

Updated: 2024-10-18 19:14:07

标题: 卫星图像中的自动道路提取：将密集深度可分离空间金字塔池化与DeepLabV3+集成

摘要: 道路提取是遥感应用的一个子领域；它是一个广泛且持续研究的主题。自动从卫星图像中提取道路的过程面临着重要挑战，这是由于道路的多尺度和多样化结构所导致的；这个领域需要改进。以其在解释多尺度对象特征方面的效率而闻名的DeepLab系列，解决了由于道路性质的多变性而产生的一些挑战。本研究提出了利用DeepLab系列的最新版本DeepLabV3+，通过引入一种创新的密集深度可扩展分离空间金字塔池化（DenseDDSSPP）模块，并将其集成到传统的空洞空间金字塔池化（ASPP）模块的位置。这种修改增强了从卫星图像中提取复杂道路结构的能力。本研究假设，整合DenseDDSSPP、结合适当选择的骨干网络和一个Squeeze-and-Excitation块，将通过专注于相关特征生成一个高效的密集特征图，从而实现更精确和准确地从遥感图像中提取道路。结果部分展示了我们模型与最先进模型的性能比较，展示了更好的结果，突显了所提出方法的有效性和成功性。

更新时间: 2024-10-18 19:14:07

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.14836v1

Better Batch for Deep Probabilistic Time Series Forecasting

Deep probabilistic time series forecasting has gained attention for its ability to provide nonlinear approximation and valuable uncertainty quantification for decision-making. However, existing models often oversimplify the problem by assuming a time-independent error process and overlooking serial correlation. To overcome this limitation, we propose an innovative training method that incorporates error autocorrelation to enhance probabilistic forecasting accuracy. Our method constructs a mini-batch as a collection of $D$ consecutive time series segments for model training. It explicitly learns a time-varying covariance matrix over each mini-batch, encoding error correlation among adjacent time steps. The learned covariance matrix can be used to improve prediction accuracy and enhance uncertainty quantification. We evaluate our method on two different neural forecasting models and multiple public datasets. Experimental results confirm the effectiveness of the proposed approach in improving the performance of both models across a range of datasets, resulting in notable improvements in predictive accuracy.

Updated: 2024-10-18 18:52:45

标题: 更好的批次用于深度概率时间序列预测

摘要: 深度概率时间序列预测因其能够提供非线性逼近和有价值的不确定性量化而受到关注，有助于决策制定。然而，现有模型通常通过假设时间无关的误差过程并忽视序列相关性来过于简化问题。为克服这一限制，我们提出了一种创新的训练方法，该方法将误差自相关性纳入以增强概率预测的准确性。我们的方法将一个小批次构造为$D$个连续时间序列段的集合，用于模型训练。它明确地学习了每个小批次上的时间变化协方差矩阵，编码相邻时间步之间的错误相关性。学习的协方差矩阵可用于提高预测准确性和增强不确定性量化。我们在两种不同的神经预测模型和多个公共数据集上评估了我们的方法。实验结果证实了所提方法在改进两种模型在一系列数据集上的性能方面的有效性，从而显著提高了预测准确性。

更新时间: 2024-10-18 18:52:45

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.17028v5

Making LLMs Vulnerable to Prompt Injection via Poisoning Alignment

In a prompt injection attack, an attacker injects a prompt into the original one, aiming to make the LLM follow the injected prompt and perform a task chosen by the attacker. Existing prompt injection attacks primarily focus on how to blend the injected prompt into the original prompt without altering the LLM itself. Our experiments show that these attacks achieve some success, but there is still significant room for improvement. In this work, we show that an attacker can boost the success of prompt injection attacks by poisoning the LLM's alignment process. Specifically, we propose PoisonedAlign, a method to strategically create poisoned alignment samples. When even a small fraction of the alignment data is poisoned using our method, the aligned LLM becomes more vulnerable to prompt injection while maintaining its foundational capabilities. The code is available at https://github.com/Sadcardation/PoisonedAlign

Updated: 2024-10-18 18:52:16

标题: 使LLMs易受到通过对齐进行注入的攻击毒化

摘要: 在一次快速注入攻击中，攻击者将一个提示注入到原始提示中，目的是让LLM遵循注入的提示并执行攻击者选择的任务。现有的提示注入攻击主要集中在如何将注入的提示融入到原始提示中而不改变LLM本身。我们的实验表明，这些攻击取得了一定的成功，但仍有很大的改进空间。在这项工作中，我们展示了攻击者可以通过毒化LLM的对齐过程来提高提示注入攻击的成功率。具体来说，我们提出了PoisonedAlign，一种策略性地创建毒化对齐样本的方法。当使用我们的方法毒化了即使是一小部分对齐数据时，对齐的LLM变得更容易受到提示注入攻击，同时保持其基础功能。该代码可在https://github.com/Sadcardation/PoisonedAlign 上找到。

更新时间: 2024-10-18 18:52:16

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14827v1

SPRIG: Improving Large Language Model Performance by System Prompt Optimization

Large Language Models (LLMs) have shown impressive capabilities in many scenarios, but their performance depends, in part, on the choice of prompt. Past research has focused on optimizing prompts specific to a task. However, much less attention has been given to optimizing the general instructions included in a prompt, known as a system prompt. To address this gap, we propose SPRIG, an edit-based genetic algorithm that iteratively constructs prompts from prespecified components to maximize the model's performance in general scenarios. We evaluate the performance of system prompts on a collection of 47 different types of tasks to ensure generalizability. Our study finds that a single optimized system prompt performs on par with task prompts optimized for each individual task. Moreover, combining system and task-level optimizations leads to further improvement, which showcases their complementary nature. Experiments also reveal that the optimized system prompts generalize effectively across model families, parameter sizes, and languages. This study provides insights into the role of system-level instructions in maximizing LLM potential.

Updated: 2024-10-18 18:51:44

标题: SPRIG：通过系统提示优化提高大型语言模型性能

摘要: 大型语言模型（LLMs）在许多场景中展现出令人印象深刻的能力，但它们的性能在一定程度上取决于提示的选择。过去的研究集中在优化特定任务的提示上。然而，对于优化提示中包含的一般指令，即系统提示，却没有受到太多关注。为了弥补这一空白，我们提出了SPRIG，一种基于编辑的遗传算法，它从预先指定的组件中迭代构建提示，以最大化模型在一般场景中的性能。我们评估了系统提示在47种不同类型的任务集合上的表现，以确保泛化能力。我们的研究发现，一个经过优化的系统提示与为每个单独任务优化的任务提示表现相当。此外，将系统和任务级别的优化结合起来会进一步提高性能，展示了它们互补的性质。实验还表明，经过优化的系统提示在模型系列、参数大小和语言之间具有有效的泛化能力。这项研究为最大化LLM潜力中系统级指令的作用提供了见解。

更新时间: 2024-10-18 18:51:44

领域: cs.CL,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.14826v1

Categorical composable cryptography: extended version

We formalize the simulation paradigm of cryptography in terms of category theory and show that protocols secure against abstract attacks form a symmetric monoidal category, thus giving an abstract model of composable security definitions in cryptography. Our model is able to incorporate computational security, set-up assumptions and various attack models such as colluding or independently acting subsets of adversaries in a modular, flexible fashion. We conclude by using string diagrams to rederive the security of the one-time pad, correctness of Diffie-Hellman key exchange and no-go results concerning the limits of bipartite and tripartite cryptography, ruling out e.g., composable commitments and broadcasting. On the way, we exhibit two categorical constructions of resource theories that might be of independent interest: one capturing resources shared among multiple parties and one capturing resource conversions that succeed asymptotically. This is a corrected version of the paper arXiv:2208.13232 published originally on December 18, 2023.

Updated: 2024-10-18 18:47:36

标题: 分类可组合密码学：扩展版本

摘要: 我们将加密的仿真范式形式化为范畴论的术语，并展示针对抽象攻击安全的协议形成对称单调范畴，从而在密码学中提供可组合安全定义的抽象模型。我们的模型能够以模块化、灵活的方式整合计算安全性、设置假设以及各种攻击模型，如勾结或独立行动的对手子集。我们通过使用字符串图表重新推导一次性密码的安全性、Diffie-Hellman密钥交换的正确性以及关于双方和三方密码学限制的无法实现结果，例如，排除了可组合的承诺和广播。在此过程中，我们展示了两种资源理论的范畴构造，可能具有独立的兴趣：一种捕捉多方共享资源，另一种捕捉渐近成功的资源转换。这是一篇改正后的论文，最初于2023年12月18日发表在arXiv:2208.13232上。

更新时间: 2024-10-18 18:47:36

领域: cs.CR,math.CT

下载: http://arxiv.org/abs/2208.13232v5

Optimization-based Causal Estimation from Heterogenous Environments

This paper presents a new optimization approach to causal estimation. Given data that contains covariates and an outcome, which covariates are causes of the outcome, and what is the strength of the causality? In classical machine learning (ML), the goal of optimization is to maximize predictive accuracy. However, some covariates might exhibit a non-causal association with the outcome. Such spurious associations provide predictive power for classical ML, but they prevent us from causally interpreting the result. This paper proposes CoCo, an optimization algorithm that bridges the gap between pure prediction and causal inference. CoCo leverages the recently-proposed idea of environments, datasets of covariates/response where the causal relationships remain invariant but where the distribution of the covariates changes from environment to environment. Given datasets from multiple environments-and ones that exhibit sufficient heterogeneity-CoCo maximizes an objective for which the only solution is the causal solution. We describe the theoretical foundations of this approach and demonstrate its effectiveness on simulated and real datasets. Compared to classical ML and existing methods, CoCo provides more accurate estimates of the causal model and more accurate predictions under interventions.

Updated: 2024-10-18 18:46:28

标题: 基于优化的异质环境因果估计

摘要: 本文提出了一种新的因果估计优化方法。给定包含协变量和结果的数据，哪些协变量是结果的原因，以及原因的强度是多少？在经典机器学习（ML）中，优化的目标是最大化预测准确性。然而，一些协变量可能表现出与结果的非因果关联。这种虚假关联为经典ML提供了预测能力，但阻止我们对结果进行因果解释。本文提出了CoCo，一种优化算法，弥合了纯预测和因果推断之间的差距。CoCo利用了最近提出的环境概念，即协变量/响应数据集，在这些环境中，因果关系保持不变，但协变量的分布从一个环境到另一个环境会发生变化。给定来自多个环境的数据集，并且这些数据集展现出足够的异质性，CoCo最大化一个目标函数，唯一的解就是因果解。我们描述了这种方法的理论基础，并在模拟和真实数据集上展示了其有效性。与经典ML和现有方法相比，CoCo提供了更准确的因果模型估计和在干预下更准确的预测。

更新时间: 2024-10-18 18:46:28

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2109.11990v4

Retrieval-Enhanced Machine Learning: Synthesis and Opportunities

In the field of language modeling, models augmented with retrieval components have emerged as a promising solution to address several challenges faced in the natural language processing (NLP) field, including knowledge grounding, interpretability, and scalability. Despite the primary focus on NLP, we posit that the paradigm of retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) such as computer vision, time series prediction, and computational biology. Therefore, this work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. Also, we found that while a number of studies employ retrieval components to augment their models, there is a lack of integration with foundational Information Retrieval (IR) research. We bridge this gap between the seminal IR research and contemporary REML studies by investigating each component that comprises the REML framework. Ultimately, the goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.

Updated: 2024-10-18 18:42:25

标题: 检索增强的机器学习：综合与机遇

摘要: 在语言建模领域，增加检索组件的模型已经成为解决自然语言处理（NLP）领域面临的多个挑战的一个有前途的解决方案，包括知识基础、可解释性和可扩展性。尽管主要关注于NLP，我们认为检索增强范式可以扩展到更广泛的机器学习（ML）领域，如计算机视觉、时间序列预测和计算生物学。因此，这项工作通过综合ML各个领域的文献，引入了一个形式化的框架，检索增强机器学习（REML），并使用一致的符号，这在当前文献中是缺少的。此外，我们发现虽然许多研究使用检索组件来增强他们的模型，但缺乏与基础信息检索（IR）研究的整合。我们通过调查构成REML框架的每个组件，弥合了开创性的IR研究和当代REML研究之间的差距。最终，这项工作的目标是为各个学科的研究人员提供一个全面、形式结构化的检索增强模型框架，从而促进跨学科的未来研究。

更新时间: 2024-10-18 18:42:25

领域: cs.LG,cs.CL,cs.IR

下载: http://arxiv.org/abs/2407.12982v2

A Complexity-Based Theory of Compositionality

Compositionality is believed to be fundamental to intelligence. In humans, it underlies the structure of thought, language, and higher-level reasoning. In AI, compositional representations can enable a powerful form of out-of-distribution generalization, in which a model systematically adapts to novel combinations of known concepts. However, while we have strong intuitions about what compositionality is, there currently exists no formal definition for it that is measurable and mathematical. Here, we propose such a definition, which we call representational compositionality, that accounts for and extends our intuitions about compositionality. The definition is conceptually simple, quantitative, grounded in algorithmic information theory, and applicable to any representation. Intuitively, representational compositionality states that a compositional representation satisfies three properties. First, it must be expressive. Second, it must be possible to re-describe the representation as a function of discrete symbolic sequences with re-combinable parts, analogous to sentences in natural language. Third, the function that relates these symbolic sequences to the representation, analogous to semantics in natural language, must be simple. Through experiments on both synthetic and real world data, we validate our definition of compositionality and show how it unifies disparate intuitions from across the literature in both AI and cognitive science. We also show that representational compositionality, while theoretically intractable, can be readily estimated using standard deep learning tools. Our definition has the potential to inspire the design of novel, theoretically-driven models that better capture the mechanisms of compositional thought.

Updated: 2024-10-18 18:37:27

标题: 基于复杂性的组合性理论

摘要: 组合性被认为是智能的基础。在人类中，它构成了思维、语言和高级推理的结构。在人工智能领域，组合性表示可以实现一种强大的超出分布泛化，模型可以系统地适应已知概念的新组合。然而，虽然我们对组合性有着强烈的直觉，但目前还没有一个可测量和数学化的正式定义。在这里，我们提出了这样一个定义，我们称之为表征组合性，它考虑并扩展了我们对组合性的直觉。这个定义在概念上很简单，量化的，基于算法信息理论，并适用于任何表示。直观地说，表征组合性表明一个组合性表示满足三个属性。首先，它必须是表达性的。其次，必须将表示重新描述为具有可重组部分的离散符号序列的函数，类似于自然语言中的句子。第三，将这些符号序列与表示关联的函数，类似于自然语言中的语义，必须是简单的。通过对合成和现实世界数据的实验，我们验证了我们对组合性的定义，并展示了它如何统一了AI和认知科学文献中的不同直觉。我们还展示了，虽然表征组合性在理论上很难处理，但可以通过标准的深度学习工具轻松估计。我们的定义有潜力激发设计新颖、理论驱动的模型的灵感，更好地捕捉组合性思维的机制。

更新时间: 2024-10-18 18:37:27

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14817v1

Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus

Multilingual LLMs support a variety of languages; however, their performance is suboptimal for low-resource languages. In this work, we emphasize the importance of continued pre-training of multilingual LLMs and the use of translation-based synthetic pre-training corpora for improving LLMs in low-resource languages. We conduct our study in the context of the low-resource Indic language Hindi. We introduce Nemotron-Mini-Hindi 4B, a bilingual SLM supporting both Hindi and English, based on Nemotron-Mini 4B. The model is trained using a mix of real and synthetic Hindi + English tokens, with continuous pre-training performed on 400B tokens. We demonstrate that both the base and instruct models achieve state-of-the-art results on Hindi benchmarks while remaining competitive on English tasks. Additionally, we observe that the continued pre-training approach enhances the model's overall factual accuracy.

Updated: 2024-10-18 18:35:19

标题: 将多语言LLMs适应低资源语言：使用持续预训练和合成语料库

摘要: 多语言LLMs支持多种语言；然而，它们在低资源语言方面的性能并不理想。在这项工作中，我们强调继续对多语言LLMs进行预训练以及使用基于翻译的合成预训练语料库来改进低资源语言中的LLMs的重要性。我们在低资源Indic语言印地语的背景下进行研究。我们介绍了Nemotron-Mini-Hindi 4B，一个支持印地语和英语的双语SLM，基于Nemotron-Mini 4B。该模型使用真实和合成的印地语+英语标记进行训练，连续预训练在400B标记上进行。我们证明基础模型和指导模型在印地语基准测试中均取得了最先进的结果，同时在英语任务上保持竞争力。此外，我们观察到持续预训练方法增强了模型的整体事实准确性。

更新时间: 2024-10-18 18:35:19

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14815v1

Effects of Soft-Domain Transfer and Named Entity Information on Deception Detection

In the modern age an enormous amount of communication occurs online, and it is difficult to know when something written is genuine or deceitful. There are many reasons for someone to deceive online (e.g., monetary gain, political gain) and detecting this behavior without any physical interaction is a difficult task. Additionally, deception occurs in several text-only domains and it is unclear if these various sources can be leveraged to improve detection. To address this, eight datasets were utilized from various domains to evaluate their effect on classifier performance when combined with transfer learning via intermediate layer concatenation of fine-tuned BERT models. We find improvements in accuracy over the baseline. Furthermore, we evaluate multiple distance measurements between datasets and find that Jensen-Shannon distance correlates moderately with transfer learning performance. Finally, the impact was evaluated of multiple methods, which produce additional information in a dataset's text via named entities, on BERT performance and we find notable improvement in accuracy of up to 11.2%.

Updated: 2024-10-18 18:35:13

标题: 软域转移和命名实体信息对欺骗检测的影响

摘要: 在现代，大量的交流发生在线上，很难确定某些文字的真实性或欺骗性。有许多原因导致人们在线上欺骗（例如，经济利益、政治利益），而在没有任何实际互动的情况下检测这种行为是一项困难的任务。此外，欺骗行为在多种仅文本的领域中发生，目前尚不清楚这些不同来源是否可以用于提高检测能力。为了解决这个问题，我们利用了来自各个领域的八个数据集，评估它们与通过对精细调整的BERT模型的中间层连接进行迁移学习时对分类器性能的影响。我们发现与基准相比精度有所提高。此外，我们评估了数据集之间的多种距离测量，并发现Jensen-Shannon距离与迁移学习性能有中等相关性。最后，我们评估了多种方法对数据集文本中通过命名实体产生额外信息的影响，发现BERT的准确性有显著提高，最高可达11.2%。

更新时间: 2024-10-18 18:35:13

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14814v1

The S2 Hierarchical Discrete Global Grid as a Nexus for Data Representation, Integration, and Querying Across Geospatial Knowledge Graphs

Geospatial Knowledge Graphs (GeoKGs) have become integral to the growing field of Geospatial Artificial Intelligence. Initiatives like the U.S. National Science Foundation's Open Knowledge Network program aim to create an ecosystem of nation-scale, cross-disciplinary GeoKGs that provide AI-ready geospatial data aligned with FAIR principles. However, building this infrastructure presents key challenges, including 1) managing large volumes of data, 2) the computational complexity of discovering topological relations via SPARQL, and 3) conflating multi-scale raster and vector data. Discrete Global Grid Systems (DGGS) help tackle these issues by offering efficient data integration and representation strategies. The KnowWhereGraph utilizes Google's S2 Geometry -- a DGGS framework -- to enable efficient multi-source data processing, qualitative spatial querying, and cross-graph integration. This paper outlines the implementation of S2 within KnowWhereGraph, emphasizing its role in topologically enriching and semantically compressing data. Ultimately, this work demonstrates the potential of DGGS frameworks, particularly S2, for building scalable GeoKGs.

Updated: 2024-10-18 18:30:05

标题: S2分层离散全局网格作为地理空间知识图数据表示、集成和查询的纽带

摘要: 地理知识图谱（GeoKGs）已成为不断发展的地理人工智能领域的重要组成部分。美国国家科学基金会的开放知识网络计划等倡议旨在创建一个以FAIR原则为基础的国家规模、跨学科的GeoKG生态系统，提供与人工智能兼容的地理空间数据。然而，构建这种基础设施面临着关键挑战，包括1）管理大量数据，2）通过SPARQL发现拓扑关系的计算复杂性，以及3）混合多尺度栅格和矢量数据。离散全球网格系统（DGGS）通过提供高效的数据集成和表示策略来应对这些问题。 KnowWhereGraph利用谷歌的S2几何结构（一种DGGS框架）实现了高效的多源数据处理、定性空间查询和跨图集成。本文概述了KnowWhereGraph中S2的实现，强调其在拓扑丰富化和语义压缩数据中的作用。最终，这项工作展示了DGGS框架（尤其是S2）在构建可扩展的GeoKGs方面的潜力。

更新时间: 2024-10-18 18:30:05

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.14808v1

Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds

Rapid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportunities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold using geodesic-guided flows. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over the state-of-the-art methods in single-cell population-level trajectory inference.

Updated: 2024-10-18 18:27:10

标题: 几何感知生成自动编码器用于扭曲里曼度量学习和数据流形上的生成建模

摘要: 高维数据集在单细胞RNA测序和空间基因组学等领域的快速增长，为科学发现提供了前所未有的机会，但也带来了独特的计算和统计挑战。传统方法在意识到几何数据生成、沿着有意义轨迹的插值以及通过可行路径传输人口方面存在困难。为了解决这些问题，我们引入了几何感知生成自编码器（GAGA），这是一个将可扩展流形学习与生成建模相结合的新框架。 GAGA构建了一个神经网络嵌入空间，尊重流形学习所发现的固有几何，并学习了数据空间上的一种新型扭曲黎曼度量。这种扭曲度量来自于数据流形上的点和流形之外的负样本，使其能够表征整个潜在空间中的有意义几何。利用这个度量，GAGA可以均匀地在流形上采样点，沿着测地线生成点，并使用测地线引导的流在学习的流形上在不同人口之间进行插值。 GAGA在模拟和真实数据集中展现出竞争性表现，包括在单细胞人口层次轨迹推断中比现有方法提高30%。

更新时间: 2024-10-18 18:27:10

领域: cs.LG,math.DG,stat.ML

下载: http://arxiv.org/abs/2410.12779v2

More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing

The evolution of biological neural systems has led to both modularity and sparse coding, which enables efficiency in energy usage, and robustness across the diversity of tasks in the lifespan. In contrast, standard neural networks rely on dense, non-specialized architectures, where all model parameters are simultaneously updated to learn multiple tasks, leading to representation interference. Current sparse neural network approaches aim to alleviate this issue, but are often hindered by limitations such as 1) trainable gating functions that cause representation collapse; 2) non-overlapping experts that result in redundant computation and slow learning; and 3) reliance on explicit input or task IDs that impose significant constraints on flexibility and scalability. In this paper we propose Conditionally Overlapping Mixture of ExperTs (COMET), a general deep learning method that addresses these challenges by inducing a modular, sparse architecture with an exponential number of overlapping experts. COMET replaces the trainable gating function used in Sparse Mixture of Experts with a fixed, biologically inspired random projection applied to individual input representations. This design causes the degree of expert overlap to depend on input similarity, so that similar inputs tend to share more parameters. This facilitates positive knowledge transfer, resulting in faster learning and improved generalization. We demonstrate the effectiveness of COMET on a range of tasks, including image classification, language modeling, and regression, using several popular deep learning architectures.

Updated: 2024-10-18 18:26:38

标题: 比星系更多的专家：具有生物启发固定路由的条件重叠专家

摘要: 生物神经系统的进化导致了模块化和稀疏编码，这使得能效更高，并且在整个生命周期中跨任务的多样性中更加稳健。相比之下，标准神经网络依赖于密集的、非专门化的架构，其中所有模型参数同时更新以学习多个任务，导致表示干扰。当前稀疏神经网络方法旨在缓解这一问题，但往往受到诸如1）可训练的门控函数导致表示坍塌；2）不重叠的专家导致冗余计算和学习缓慢；以及3）依赖明确输入或任务ID对灵活性和可扩展性施加重大限制等局限性的阻碍。在本文中，我们提出了条件重叠专家混合模型（COMET），这是一种通用的深度学习方法，旨在通过引入具有指数数量重叠专家的模块化稀疏结构来解决这些挑战。COMET用生物启发的固定随机投影替代了稀疏专家混合模型中使用的可训练门控函数，应用于单个输入表示。这种设计导致专家重叠程度取决于输入相似性，因此相似的输入倾向于共享更多参数。这有助于积极的知识传递，导致更快的学习和改善的泛化能力。我们通过使用多种流行的深度学习架构，在包括图像分类、语言建模和回归在内的一系列任务上展示了COMET的有效性。

更新时间: 2024-10-18 18:26:38

领域: cs.LG

下载: http://arxiv.org/abs/2410.08003v2

Aligning AI Agents via Information-Directed Sampling

The staggering feats of AI systems have brought to attention the topic of AI Alignment: aligning a "superintelligent" AI agent's actions with humanity's interests. Many existing frameworks/algorithms in alignment study the problem on a myopic horizon or study learning from human feedback in isolation, relying on the contrived assumption that the agent has already perfectly identified the environment. As a starting point to address these limitations, we define a class of bandit alignment problems as an extension of classic multi-armed bandit problems. A bandit alignment problem involves an agent tasked with maximizing long-run expected reward by interacting with an environment and a human, both involving details/preferences initially unknown to the agent. The reward of actions in the environment depends on both observed outcomes and human preferences. Furthermore, costs are associated with querying the human to learn preferences. Therefore, an effective agent ought to intelligently trade-off exploration (of the environment and human) and exploitation. We study these trade-offs theoretically and empirically in a toy bandit alignment problem which resembles the beta-Bernoulli bandit. We demonstrate while naive exploration algorithms which reflect current practices and even touted algorithms such as Thompson sampling both fail to provide acceptable solutions to this problem, information-directed sampling achieves favorable regret.

Updated: 2024-10-18 18:23:41

标题: 通过信息导向抽样对齐人工智能代理

摘要: 人工智能系统的惊人壮举引起了人们对人工智能对齐问题的关注：将“超智能”人工智能代理的行动与人类利益保持一致。许多现有的对齐研究框架/算法在研究问题时都局限于狭隘的视野，或者独立地研究从人类反馈中学习，依赖于一个人工设定的假设，即代理已经完美地识别了环境。为了解决这些局限性，我们将一类赌博对齐问题定义为经典多臂赌博问题的延伸。赌博对齐问题涉及一个代理人被要求通过与一个对环境和人类的互动来最大化长期期望奖励，这两者都包含了代理人最初不知道的细节/偏好。环境中行动的奖励取决于观察到的结果和人类偏好。此外，与询问人类以了解偏好相关的成本。因此，一个有效的代理应该智能地权衡探索（环境和人类）和利用。我们在一个类似于贝塔-伯努利赌博的玩具赌博对齐问题中从理论和经验上研究了这些权衡。我们证明，尽管天真的探索算法反映了当前的实践，甚至被吹捧的算法如汤普森抽样都未能提供可接受的解决方案，信息导向抽样却取得了有利的后悔结果。

更新时间: 2024-10-18 18:23:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14807v1

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

On-device control agents, especially on mobile devices, are responsible for operating mobile devices to fulfill users' requests, enabling seamless and intuitive interactions. Integrating Multimodal Large Language Models (MLLMs) into these agents enhances their ability to understand and execute complex commands, thereby improving user experience. However, fine-tuning MLLMs for on-device control presents significant challenges due to limited data availability and inefficient online training processes. This paper introduces DistRL, a novel framework designed to enhance the efficiency of online RL fine-tuning for mobile device control agents. DistRL employs centralized training and decentralized data acquisition to ensure efficient fine-tuning in the context of dynamic online interactions. Additionally, the framework is backed by our tailor-made RL algorithm, which effectively balances exploration with the prioritized utilization of collected data to ensure stable and robust training. Our experiments show that, on average, DistRL delivers a 3X improvement in training efficiency and enables training data collection 2.4X faster than the leading synchronous multi-machine methods. Notably, after training, DistRL achieves a 20% relative improvement in success rate compared to state-of-the-art methods on general Android tasks from an open benchmark, significantly outperforming existing approaches while maintaining the same training time. These results validate DistRL as a scalable and efficient solution, offering substantial improvements in both training efficiency and agent performance for real-world, in-the-wild device control tasks.

Updated: 2024-10-18 18:19:56

标题: DistRL：一种异步分布式强化学习框架，用于设备上的控制代理

摘要: 在设备上的控制代理，特别是在移动设备上，负责操作移动设备以满足用户的请求，实现无缝和直观的交互。将多模态大型语言模型（MLLMs）整合到这些代理中，增强它们理解和执行复杂命令的能力，从而提高用户体验。然而，为设备上的控制代理进行微调MLLMs存在重要挑战，因为数据有限可用且在线训练过程低效。本文介绍了DistRL，这是一个旨在增强移动设备控制代理在线强化学习微调效率的新框架。DistRL采用集中式训练和分散式数据采集，以确保在动态在线交互环境中的有效微调。此外，该框架支持我们量身定制的强化学习算法，有效平衡探索和优先利用收集到的数据，以确保稳定和健壮的训练。我们的实验表明，DistRL平均提高了3倍的训练效率，并使训练数据收集速度比领先的同步多机方法快2.4倍。值得注意的是，在训练后，与来自开放基准测试的通用Android任务的最新方法相比，DistRL的成功率相对提高了20％，明显优于现有方法，同时保持相同的训练时间。这些结果验证了DistRL作为一个可扩展和高效的解决方案，为现实世界中的设备控制任务提供了训练效率和代理性能的实质性改进。

更新时间: 2024-10-18 18:19:56

领域: cs.LG,cs.AI,cs.DC,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.14803v1

Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

Sharpness-aware minimization (SAM) improves generalization of various deep learning tasks. Motivated by popular architectures such as LoRA, we explore the implicit regularization of SAM for scale-invariant problems involving two groups of variables. Instead of focusing on commonly used sharpness, this work introduces a concept termed balancedness, defined as the difference between the squared norm of two variables. This allows us to depict richer global behaviors of SAM. In particular, our theoretical and empirical findings reveal that i) SAM promotes balancedness; and ii) the regularization on balancedness is data-responsive -- outliers have stronger impact. The latter coincides with empirical observations that SAM outperforms SGD in the presence of outliers. Leveraging the implicit regularization, we develop a resource-efficient SAM variant, balancedness-aware regularization (BAR), tailored for scale-invariant problems such as finetuning language models with LoRA. BAR saves 95% computational overhead of SAM, with enhanced test performance across various tasks on RoBERTa, GPT2, and OPT-1.3B.

Updated: 2024-10-18 18:19:18

标题: 尺度不变问题的锐度感知最小化的隐式正则化

摘要: 锐度感知最小化（SAM）改善了各种深度学习任务的泛化能力。受到LoRA等流行架构的启发，我们探索了SAM对涉及两组变量的尺度不变问题的隐式正则化。与专注于常用锐度的方法不同，本文介绍了一个称为平衡性的概念，定义为两个变量的平方范数之间的差异。这使我们能够描绘SAM的更丰富的全局行为。特别地，我们的理论和实证发现表明，i）SAM促进了平衡性；ii）对平衡性的正则化是数据响应的--异常值具有更强的影响力。后者与实证观察相吻合，即在存在异常值时SAM优于SGD。利用隐式正则化，我们开发了一种资源高效的SAM变体，称为平衡性感知正则化（BAR），专门针对涉及尺度不变问题的任务，例如使用LoRA对语言模型进行微调。BAR节省了SAM 95%的计算开销，同时在RoBERTa、GPT2和OPT-1.3B上的各种任务中提高了测试性能。

更新时间: 2024-10-18 18:19:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.14802v1

Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors

In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs). The knowledge acquired during pre-training is crucial for this few-shot capability, providing the model with task priors. However, recent studies have shown that ICL predominantly relies on retrieving task priors rather than "learning" to perform tasks. This limitation is particularly evident in complex subjective domains such as emotion and morality, where priors significantly influence posterior predictions. In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt. Moreover, we evaluate the posterior bias towards certain annotators by grounding our study in appropriate, quantitative measures of LLM priors. Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead. However, aggregation does not explain the entire gap between ICL and the state of the art, meaning other factors in such tasks also account for the observed phenomena. Finally, by rigorously studying annotator-level labels, we find that it is possible for minority annotators to both better align with LLMs and have their perspectives further amplified.

Updated: 2024-10-18 18:17:41

标题: 主观任务中的聚合效应导致大型语言模型的后验概率坍塌

摘要: 在上下文学习（ICL）已成为使用大型语言模型（LLMs）执行自然语言任务的主要方法。预训练期间获得的知识对于这种少样本能力至关重要，为模型提供了任务先验。然而，最近的研究表明，ICL主要依赖于检索任务先验，而不是“学习”执行任务。这种限制在情感和道德等复杂主观领域特别明显，其中先验显着影响后验预测。在这项工作中，我们研究了这是否是相应数据集中使用的聚合的结果，试图结合低一致性、不同的注释可能导致注释人为错误，从而在提示中产生有害噪音。此外，通过将我们的研究基于适当的、定量的LLM先验度量，我们评估了后验偏向于某些注释者的偏见。我们的结果表明，聚合是建模主观任务中的混淆因素，并主张专注于建模个体。然而，聚合并不能解释ICL与艺术状态之间的整个差距，这意味着在这种任务中其他因素也解释了观察到的现象。最后，通过严格研究注释者级别的标签，我们发现少数注释者可能与LLMs更好地对齐，并且他们的观点可能会被进一步放大。

更新时间: 2024-10-18 18:17:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.13776v2

Deep Generic Dynamic Object Detection Based on Dynamic Grid Maps

This paper describes a method to detect generic dynamic objects for automated driving. First, a LiDAR-based dynamic grid is generated online. Second, a deep learning-based detector is trained on the dynamic grid to infer the presence of dynamic objects of any type, which is a prerequisite for safe automated vehicles in arbitrary, edge-case scenarios. The Rotation-equivariant Detector (ReDet) - originally designed for oriented object detection on aerial images - was chosen due to its high detection performance. Experiments are conducted based on real sensor data and the benefits in comparison to classic dynamic cell clustering strategies are highlighted. The false positive object detection rate is strongly reduced by the proposed approach.

Updated: 2024-10-18 18:15:32

标题: 基于动态网格地图的深度通用动态物体检测

摘要: 本文描述了一种用于自动驾驶的通用动态物体检测方法。首先，在线生成基于LiDAR的动态网格。其次，基于深度学习的检测器在动态网格上进行训练，以推断任何类型的动态物体的存在，这是在任意边缘案例场景中实现安全自动化车辆的先决条件。选择了旋转等变检测器（ReDet）- 最初设计用于航空图像上的定向物体检测-由于其高检测性能。基于真实传感器数据进行实验，并突出与经典动态单元聚类策略相比的优点。所提出的方法大大降低了误报物体检测率。

更新时间: 2024-10-18 18:15:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.14799v1

CountCrypt: Quantum Cryptography between QCMA and PP

We construct a quantum oracle relative to which BQP = QCMA but quantum-computation-classical-communication (QCCC) key exchange, QCCC commitments, and two-round quantum key distribution exist. We also construct an oracle relative to which BQP = QMA, but quantum lightning (a stronger variant of quantum money) exists. This extends previous work by Kretschmer [Kretschmer, TQC22], which showed that there is a quantum oracle relative to which BQP = QMA but pseudorandom state generators (a quantum variant of pseudorandom generators) exist. We also show that QCCC key exchange, QCCC commitments, and two-round quantum key distribution can all be used to build one-way puzzles. One-way puzzles are a version of "quantum samplable" one-wayness and are an intermediate primitive between pseudorandom state generators and EFI pairs, the minimal quantum primitive. In particular, one-way puzzles cannot exist if BQP = PP. Our results together imply that aside from pseudorandom state generators, there is a large class of quantum cryptographic primitives which can exist even if BQP = QCMA, but are broken if BQP = PP. Furthermore, one-way puzzles are a minimal primitive for this class. We denote this class "CountCrypt".

Updated: 2024-10-18 18:04:27

标题: CountCrypt：QCMA和PP之间的量子密码学

摘要: 我们构建了一个相对于量子预言机的量子预言机，其中BQP = QCMA，但是存在量子计算-经典通信（QCCC）密钥交换，QCCC承诺以及两轮量子密钥分发。我们还构建了一个相对于量子预言机的预言机，其中BQP = QMA，但是存在量子闪电（一种更强大的量子货币变体）。这扩展了Kretschmer在TQC22中的先前工作，该工作表明存在一个相对于量子预言机的量子预言机，其中BQP = QMA，但是存在伪随机状态生成器（一种量子伪随机生成器变体）。我们还表明，QCCC密钥交换，QCCC承诺以及两轮量子密钥分发都可以用来构建单向谜题。单向谜题是“量子可采样”单向性的一种版本，是伪随机状态生成器和EFI对之间的中间原语。特别地，如果BQP = PP，则单向谜题是不存在的。我们的结果表明，除了伪随机状态生成器外，还有一大类量子密码原语可以存在，即使BQP = QCMA，但在BQP = PP时会被破解。此外，单向谜题是这一类的最小原语。我们将这一类称为“CountCrypt”。

更新时间: 2024-10-18 18:04:27

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2410.14792v1

Differentially Private Covariate Balancing Causal Inference

Differential privacy is the leading mathematical framework for privacy protection, providing a probabilistic guarantee that safeguards individuals' private information when publishing statistics from a dataset. This guarantee is achieved by applying a randomized algorithm to the original data, which introduces unique challenges in data analysis by distorting inherent patterns. In particular, causal inference using observational data in privacy-sensitive contexts is challenging because it requires covariate balance between treatment groups, yet checking the true covariates is prohibited to prevent leakage of sensitive information. In this article, we present a differentially private two-stage covariate balancing weighting estimator to infer causal effects from observational data. Our algorithm produces both point and interval estimators with statistical guarantees, such as consistency and rate optimality, under a given privacy budget.

Updated: 2024-10-18 18:02:13

标题: 差异隐私协变量平衡因果推断

摘要: 差分隐私是隐私保护的领先数学框架，提供了一种概率保证，可以在发布数据集统计信息时保护个人的私人信息。这种保证是通过将随机算法应用于原始数据来实现的，这在数据分析中引入了独特的挑战，扭曲了固有模式。特别是，在隐私敏感的情境中使用观测数据进行因果推断是具有挑战性的，因为它要求在处理组之间实现协变量平衡，然而检查真实的协变量是被禁止的，以防止敏感信息的泄漏。在本文中，我们提出了一种差分隐私的两阶段协变量平衡加权估计器，用于从观测数据中推断因果效应。我们的算法可以在给定的隐私预算下产生点估计和区间估计，并具有统计保证，如一致性和速率最优性。

更新时间: 2024-10-18 18:02:13

领域: stat.ME,cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.14789v1

Simultaneously Solving FBSDEs with Neural Operators of Logarithmic Depth, Constant Width, and Sub-Linear Rank

Forward-backwards stochastic differential equations (FBSDEs) are central in optimal control, game theory, economics, and mathematical finance. Unfortunately, the available FBSDE solvers operate on \textit{individual} FBSDEs, meaning that they cannot provide a computationally feasible strategy for solving large families of FBSDEs as these solvers must be re-run several times. \textit{Neural operators} (NOs) offer an alternative approach for \textit{simultaneously solving} large families of FBSDEs by directly approximating the solution operator mapping \textit{inputs:} terminal conditions and dynamics of the backwards process to \textit{outputs:} solutions to the associated FBSDE. Though universal approximation theorems (UATs) guarantee the existence of such NOs, these NOs are unrealistically large. We confirm that ``small'' NOs can uniformly approximate the solution operator to structured families of FBSDEs with random terminal time, uniformly on suitable compact sets determined by Sobolev norms, to any prescribed error $\varepsilon>0$ using a depth of $\mathcal{O}(\log(1/\varepsilon))$, a width of $\mathcal{O}(1)$, and a sub-linear rank; i.e. $\mathcal{O}(1/\varepsilon^r)$ for some $r<1$. This result is rooted in our second main contribution, which shows that convolutional NOs of similar depth, width, and rank can approximate the solution operator to a broad class of Elliptic PDEs. A key insight here is that the convolutional layers of our NO can efficiently encode the Green's function associated to the Elliptic PDEs linked to our FBSDEs. A byproduct of our analysis is the first theoretical justification for the benefit of lifting channels in NOs: they exponentially decelerate the growth rate of the NO's rank.

Updated: 2024-10-18 18:01:40

标题: 同时解决具有对数深度、常数宽度和次线性秩的神经算子的FBSDEs

摘要: 前向-后向随机微分方程（FBSDEs）在最优控制、博弈论、经济学和数学金融中起着核心作用。不幸的是，现有的FBSDE求解器只能操作\textit{单个}FBSDE，这意味着它们无法为解决大量FBSDE提供计算可行的策略，因为这些求解器必须多次重新运行。\textit{神经算子}（NOs）通过直接逼近解算符映射\textit{输入：}终端条件和后向过程动力学到\textit{输出：}相关FBSDE的解，提供了一种\textit{同时解决}大量FBSDE的替代方法。尽管普适逼近定理（UATs）保证了这样的NOs的存在，但这些NOs实际上是非常庞大的。我们证实“小型”NOs可以在适当由Sobolev范数确定的紧致集上，均匀地逼近随机终端时间的结构化FBSDE族的解算符，以任意预设误差$\varepsilon>0$，使用$\mathcal{O}(\log(1/\varepsilon))$的深度，$\mathcal{O}(1)$的宽度和次线性秩；即$\mathcal{O}(1/\varepsilon^r)$，其中$r<1$。这一结果根植于我们的第二个主要贡献，该贡献表明深度、宽度和秩相似的卷积NOs可以逼近广泛类别的椭圆PDE的解算符。这里的一个关键洞察是我们的NO的卷积层可以有效地编码与我们的FBSDE相关联的椭圆PDE的Green函数。我们分析的一个副产品是对NOs提升通道的好处的首个理论证明：它们指数地减缓了NO秩的增长速率。

更新时间: 2024-10-18 18:01:40

领域: math.OC,cs.LG,cs.NA,math.NA,math.PR,q-fin.CP

下载: http://arxiv.org/abs/2410.14788v1

Privacy for Free in the Over-Parameterized Regime

Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data. In the last decade, the problem of understanding its performance cost with respect to standard GD has received remarkable attention from the research community, which formally derived upper bounds on the excess population risk $R_{P}$ in different learning settings. However, existing bounds typically degrade with over-parameterization, i.e., as the number of parameters $p$ gets larger than the number of training samples $n$ -- a regime which is ubiquitous in current deep-learning practice. As a result, the lack of theoretical insights leaves practitioners without clear guidance, leading some to reduce the effective number of trainable parameters to improve performance, while others use larger models to achieve better results through scale. In this work, we show that in the popular random features model with quadratic loss, for any sufficiently large $p$, privacy can be obtained for free, i.e., $\left|R_{P} \right| = o(1)$, not only when the privacy parameter $\varepsilon$ has constant order, but also in the strongly private setting $\varepsilon = o(1)$. This challenges the common wisdom that over-parameterization inherently hinders performance in private learning.

Updated: 2024-10-18 18:01:11

标题: 在过度参数化的模式下的免费隐私

摘要: 差分私密梯度下降（DP-GD）是一种流行的算法，用于训练具有对训练数据隐私性可证保证的深度学习模型。在过去的十年中，理解其性能成本与标准GD之间的关系的问题引起了研究界的极大关注，研究界正式推导了在不同学习设置中对超额人口风险$R_{P}$的上界。然而，现有的界限通常随着过度参数化而降低，即当参数数量$p$大于训练样本数量$n$时--这是当前深度学习实践中普遍存在的情况。因此，缺乏理论洞见使从业者缺乏清晰的指导，一些人通过减少可训练参数数量来提高性能，而另一些人则使用更大的模型通过扩展来取得更好的结果。在这项工作中，我们展示了在流行的随机特征模型中，对于任何足够大的$p$，可以免费获得隐私，即$\left|R_{P} \right| = o(1)$，不仅当隐私参数$\varepsilon$具有恒定阶时，而且在强隐私设置下$\varepsilon = o(1)$。这挑战了过度参数化本质上阻碍私密学习性能的普遍智慧。

更新时间: 2024-10-18 18:01:11

领域: stat.ML,cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.14787v1

High-Dimensional Tensor Discriminant Analysis with Incomplete Tensors

Tensor classification has gained prominence across various fields, yet the challenge of handling partially observed tensor data in real-world applications remains largely unaddressed. This paper introduces a novel approach to tensor classification with incomplete data, framed within the tensor high-dimensional linear discriminant analysis. Specifically, we consider a high-dimensional tensor predictor with missing observations under the Missing Completely at Random (MCR) assumption and employ the Tensor Gaussian Mixture Model to capture the relationship between the tensor predictor and class label. We propose the Tensor LDA-MD algorithm, which manages high-dimensional tensor predictors with missing entries by leveraging the low-rank structure of the discriminant tensor. A key feature of our approach is a novel covariance estimation method under the tensor-based MCR model, supported by theoretical results that allow for correlated entries under mild conditions. Our work establishes the convergence rate of the estimation error of the discriminant tensor with incomplete data and minimax optimal bounds for the misclassification rate, addressing key gaps in the literature. Additionally, we derive large deviation results for the generalized mode-wise (separable) sample covariance matrix and its inverse, which are crucial tools in our analysis and hold independent interest. Our method demonstrates excellent performance in simulations and real data analysis, even with significant proportions of missing data. This research advances high-dimensional LDA and tensor learning, providing practical tools for applications with incomplete data and a solid theoretical foundation for classification accuracy in complex settings.

Updated: 2024-10-18 18:00:16

标题: 高维张量鉴别分析中的不完整张量

摘要: 张量分类在各个领域中变得越来越重要，然而在现实应用中处理部分观测到的张量数据的挑战仍然没有得到很好的解决。本文介绍了一种新颖的处理不完整数据的张量分类方法，它基于张量高维线性判别分析。具体来说，我们考虑一个高维张量预测器在随机缺失数据假设下的情况，并采用张量高斯混合模型来捕捉张量预测器和类别标签之间的关系。我们提出了张量LDA-MD算法，通过利用判别张量的低秩结构来处理高维张量预测器中的缺失条目。我们方法的一个关键特征是在基于张量的随机缺失模型下进行协方差估计的新颖方法，支持在温和条件下允许相关条目的理论结果。我们的工作建立了具有不完整数据的判别张量估计误差的收敛速率，并为误分类率提供了极小的最优界，解决了文献中的关键空白。此外，我们推导了广义模式分解（可分解）样本协方差矩阵及其逆的大偏差结果，这些结果是我们分析中的关键工具，也具有独立的兴趣。我们的方法在模拟和实际数据分析中表现出色，即使在有大量缺失数据的情况下也如此。这项研究推进了高维LDA和张量学习，提供了处理不完整数据的实用工具，并为复杂环境中的分类准确性提供了坚实的理论基础。

更新时间: 2024-10-18 18:00:16

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.14783v1

SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment

Existing preference alignment is a one-size-fits-all alignment mechanism, where the part of the large language model (LLM) parametric knowledge with non-preferred features is uniformly blocked to all the users. However, this part of knowledge can be useful to advanced users whose expertise qualifies them to handle these information. The one-size-fits-all alignment mechanism undermines LLM's utility for these qualified users. To address this problem, we propose SudoLM, a framework that lets LLMs learn access control over specific parametric knowledge for users with different credentials via authorization alignment. SudoLM allows authorized users to unlock their access to all the parametric knowledge with an assigned SUDO key while blocking access to non-qualified users. Experiments on two application scenarios demonstrate that SudoLM effectively controls the user's access to the parametric knowledge and maintains its general utility.

Updated: 2024-10-18 17:59:51

标题: SudoLM：通过授权对齐学习参数化知识的访问控制

摘要: 现有的偏好对齐是一种一刀切的对齐机制，其中大型语言模型（LLM）参数知识中具有非首选特征的部分被统一地阻止给所有用户。然而，这部分知识对于具有资格处理这些信息的高级用户可能是有用的。一刀切的对齐机制削弱了LLM对这些合格用户的效用。为解决这一问题，我们提出了SudoLM，一个框架，让LLMs通过授权对齐学习不同资格用户对特定参数知识的访问控制。SudoLM允许授权用户通过分配SUDO密钥解锁他们对所有参数知识的访问，同时阻止非合格用户的访问。在两个应用场景的实验中表明，SudoLM有效控制了用户对参数知识的访问，并保持了其整体效用。

更新时间: 2024-10-18 17:59:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14676v1

Enhancing Large Language Models' Situated Faithfulness to External Contexts

Large Language Models (LLMs) are often augmented with external information as contexts, but this external information can sometimes be inaccurate or even intentionally misleading. We argue that robust LLMs should demonstrate situated faithfulness, dynamically calibrating their trust in external information based on their confidence in the internal knowledge and the external context. To benchmark this capability, we evaluate LLMs across several QA datasets, including a newly created dataset called RedditQA featuring in-the-wild incorrect contexts sourced from Reddit posts. We show that when provided with both correct and incorrect contexts, both open-source and proprietary models tend to overly rely on external information, regardless of its factual accuracy. To enhance situated faithfulness, we propose two approaches: Self-Guided Confidence Reasoning (SCR) and Rule-Based Confidence Reasoning (RCR). SCR enables models to self-access the confidence of external information relative to their own internal knowledge to produce the most accurate answer. RCR, in contrast, extracts explicit confidence signals from the LLM and determines the final answer using predefined rules. Our results show that for LLMs with strong reasoning capabilities, such as GPT-4o and GPT-4o mini, SCR outperforms RCR, achieving improvements of up to 24.2% over a direct input augmentation baseline. Conversely, for a smaller model like Llama-3-8B, RCR outperforms SCR. Fine-tuning SCR with our proposed Confidence Reasoning Direct Preference Optimization (CR-DPO) method improves performance on both seen and unseen datasets, yielding an average improvement of 8.9% on Llama-3-8B. In addition to quantitative results, we offer insights into the relative strengths of SCR and RCR. Our findings highlight promising avenues for improving situated faithfulness in LLMs. The data and code are released.

Updated: 2024-10-18 17:59:47

标题: 增强大型语言模型对外部语境的忠实性

摘要: 大型语言模型（LLMs）通常会使用外部信息作为上下文进行增强，但这些外部信息有时可能不准确，甚至可能是有意误导的。我们认为，强大的LLMs应该表现出情境忠实性，根据其对内部知识和外部上下文的信心动态校准对外部信息的信任。为了评估这种能力，我们评估了LLMs在几个问答数据集上的表现，包括一个新创建的名为RedditQA的数据集，其中包含来自Reddit帖子的现实中错误的上下文。我们表明，当提供正确和错误的上下文时，无论其事实准确性如何，开源和专有模型都倾向于过度依赖外部信息。为了增强情境忠实性，我们提出了两种方法：自导向信心推理（SCR）和基于规则的信心推理（RCR）。SCR使模型能够自主访问外部信息相对于其自身内部知识的信心，以产生最准确的答案。相反，RCR从LLM中提取明确的信心信号，并根据预定义规则确定最终答案。我们的结果表明，对于像GPT-4o和GPT-4o mini这样具有强大推理能力的LLMs，SCR优于RCR，在直接输入增强基线上实现了高达24.2%的改进。相反，对于像Llama-3-8B这样的较小模型，RCR优于SCR。使用我们提出的信心推理直接偏好优化（CR-DPO）方法对SCR进行微调，提高了在已知和未知数据集上的性能，在Llama-3-8B上平均提高了8.9%。除了定量结果外，我们还提供了关于SCR和RCR相对优势的见解。我们的发现突显了改进LLMs中情境忠实性的有希望的途径。数据和代码已发布。

更新时间: 2024-10-18 17:59:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14675v1

Self-supervised contrastive learning performs non-linear system identification

Self-supervised learning (SSL) approaches have brought tremendous success across many tasks and domains. It has been argued that these successes can be attributed to a link between SSL and identifiable representation learning: Temporal structure and auxiliary variables ensure that latent representations are related to the true underlying generative factors of the data. Here, we deepen this connection and show that SSL can perform system identification in latent space. We propose DynCL, a framework to uncover linear, switching linear and non-linear dynamics under a non-linear observation model, give theoretical guarantees and validate them empirically.

Updated: 2024-10-18 17:59:25

标题: 无监督对比学习在非线性系统识别中的表现

摘要: 自监督学习（SSL）方法在许多任务和领域取得了巨大成功。人们认为这些成功可以归因于SSL与可识别表示学习之间的联系：时间结构和辅助变量确保潜在表示与数据的真实潜在生成因素相关。在这里，我们深化了这种联系，并展示了SSL可以在潜在空间中执行系统识别。我们提出了DynCL，一个揭示非线性观测模型下的线性、切换线性和非线性动态的框架，给出了理论保证并进行了实证验证。

更新时间: 2024-10-18 17:59:25

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.14673v1

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. BiGR is the first conditional generative model that unifies generation and discrimination within the same framework. BiGR features a binary tokenizer, a masked modeling mechanism, and a binary transcoder for binary code prediction. Additionally, we introduce a novel entropy-ordered sampling method to enable efficient image generation. Extensive experiments validate BiGR's superior performance in generation quality, as measured by FID-50k, and representation capabilities, as evidenced by linear-probe accuracy. Moreover, BiGR showcases zero-shot generalization across various vision tasks, enabling applications such as image inpainting, outpainting, editing, interpolation, and enrichment, without the need for structural modifications. Our findings suggest that BiGR unifies generative and discriminative tasks effectively, paving the way for further advancements in the field.

Updated: 2024-10-18 17:59:04

标题: BiGR：利用二进制潜在编码进行图像生成和改进的视觉表示能力

摘要: 我们介绍了一种新颖的条件图像生成模型BiGR，使用紧凑的二进制潜在代码进行生成训练，旨在增强生成和表示能力。BiGR是第一个将生成和判别统一在同一框架内的条件生成模型。BiGR具有二进制分词器、掩码建模机制和用于二进制代码预测的二进制转码器。此外，我们引入了一种新颖的熵排序抽样方法，以实现高效的图像生成。大量实验证实了BiGR在生成质量（以FID-50k衡量）和表示能力（通过线性探测准确性证明）方面的优越性能。此外，BiGR展示了在各种视觉任务中的零样本泛化能力，实现了诸如图像修补、外部绘制、编辑、插值和丰富化等应用，而无需进行结构修改。我们的发现表明，BiGR有效地统一了生成和判别任务，为该领域的进一步发展铺平了道路。

更新时间: 2024-10-18 17:59:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.14672v1

Decomposing The Dark Matter of Sparse Autoencoders

Sparse autoencoders (SAEs) are a promising technique for decomposing language model activations into interpretable linear features. However, current SAEs fall short of completely explaining model performance, resulting in "dark matter": unexplained variance in activations. This work investigates dark matter as an object of study in its own right. Surprisingly, we find that much of SAE dark matter--about half of the error vector itself and >90% of its norm--can be linearly predicted from the initial activation vector. Additionally, we find that the scaling behavior of SAE error norms at a per token level is remarkably predictable: larger SAEs mostly struggle to reconstruct the same contexts as smaller SAEs. We build on the linear representation hypothesis to propose models of activations that might lead to these observations, including postulating a new type of "introduced error"; these insights imply that the part of the SAE error vector that cannot be linearly predicted ("nonlinear" error) might be fundamentally different from the linearly predictable component. To validate this hypothesis, we empirically analyze nonlinear SAE error and show that 1) it contains fewer not yet learned features, 2) SAEs trained on it are quantitatively worse, 3) it helps predict SAE per-token scaling behavior, and 4) it is responsible for a proportional amount of the downstream increase in cross entropy loss when SAE activations are inserted into the model. Finally, we examine two methods to reduce nonlinear SAE error at a fixed sparsity: inference time gradient pursuit, which leads to a very slight decrease in nonlinear error, and linear transformations from earlier layer SAE outputs, which leads to a larger reduction.

Updated: 2024-10-18 17:58:53

标题: 分解稀疏自编码器的暗物质

摘要: 稀疏自动编码器（SAEs）是一种将语言模型激活分解为可解释线性特征的有前途的技术。然而，目前的SAEs未能完全解释模型性能，导致“暗物质”：激活中未解释的方差。这项研究将暗物质作为独立研究对象进行调查。令人惊讶的是，我们发现很大一部分SAE暗物质--大约一半的误差向量本身和>90%的范数--可以从初始激活向量线性预测出来。此外，我们发现SAE误差范数在每个标记级别上的缩放行为是非常可预测的：较大的SAEs主要难以重建与较小SAEs相同的上下文。我们基于线性表示假设提出了可能导致这些观察结果的激活模型，包括提出一种新类型的“引入误差”；这些见解暗示，无法被线性预测的SAE误差向量的部分（“非线性”误差）可能与线性可预测组件有根本不同。为了验证这一假设，我们通过实证分析非线性SAE误差，并展示了以下结果：1）它包含较少尚未学习的特征，2）在其上训练的SAEs在数量上更差，3）它有助于预测SAE每标记的缩放行为，4）当SAE激活插入模型时，它导致交叉熵损失的下游增加的比例量。最后，我们考察了两种方法来减少在固定稀疏度下的非线性SAE误差：推理时间梯度追求，导致非常轻微的非线性误差减少，以及来自较早层SAE输出的线性变换，导致更大的减少。

更新时间: 2024-10-18 17:58:53

领域: cs.LG

下载: http://arxiv.org/abs/2410.14670v1

Stochastic Gradient Descent Jittering for Inverse Problems: Alleviating the Accuracy-Robustness Tradeoff

Inverse problems aim to reconstruct unseen data from corrupted or perturbed measurements. While most work focuses on improving reconstruction quality, generalization accuracy and robustness are equally important, especially for safety-critical applications. Model-based architectures (MBAs), such as loop unrolling methods, are considered more interpretable and achieve better reconstructions. Empirical evidence suggests that MBAs are more robust to perturbations than black-box solvers, but the accuracy-robustness tradeoff in MBAs remains underexplored. In this work, we propose a simple yet effective training scheme for MBAs, called SGD jittering, which injects noise iteration-wise during reconstruction. We theoretically demonstrate that SGD jittering not only generalizes better than the standard mean squared error training but is also more robust to average-case attacks. We validate SGD jittering using denoising toy examples, seismic deconvolution, and single-coil MRI reconstruction. The proposed method achieves cleaner reconstructions for out-of-distribution data and demonstrates enhanced robustness to adversarial attacks.

Updated: 2024-10-18 17:57:01

标题: 随机梯度下降抖动用于逆问题：缓解准确性和稳健性之间的权衡Tradeoff

摘要: 逆问题旨在从受损或受扰动的测量中重建未见数据。尽管大多数工作侧重于改善重建质量，但泛化准确性和鲁棒性同样重要，尤其对于安全关键应用。基于模型的架构（MBAs），如循环展开方法，被认为更易解释并实现更好的重建。经验证据表明，MBAs对扰动的鲁棒性比黑盒解算器更好，但MBAs中的准确性-鲁棒性权衡仍未得到充分探讨。在这项工作中，我们提出了一种简单而有效的MBAs训练方案，称为SGD抖动，它在重建过程中逐次注入噪声。我们在理论上证明了SGD抖动不仅比标准均方误差训练更好地泛化，而且更具平均情况攻击的鲁棒性。我们使用去噪玩具示例、地震反褶积和单线圈MRI重建来验证SGD抖动。该方法实现了对分布外数据更清晰的重建，并展示了对敌对攻击的增强鲁棒性。

更新时间: 2024-10-18 17:57:01

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.14667v1

DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph

Summarizing movie screenplays presents a unique set of challenges compared to standard document summarization. Screenplays are not only lengthy, but also feature a complex interplay of characters, dialogues, and scenes, with numerous direct and subtle relationships and contextual nuances that are difficult for machine learning models to accurately capture and comprehend. Recent attempts at screenplay summarization focus on fine-tuning transformer-based pre-trained models, but these models often fall short in capturing long-term dependencies and latent relationships, and frequently encounter the "lost in the middle" issue. To address these challenges, we introduce DiscoGraMS, a novel resource that represents movie scripts as a movie character-aware discourse graph (CaD Graph). This approach is well-suited for various downstream tasks, such as summarization, question-answering, and salience detection. The model aims to preserve all salient information, offering a more comprehensive and faithful representation of the screenplay's content. We further explore a baseline method that combines the CaD Graph with the corresponding movie script through a late fusion of graph and text modalities, and we present very initial promising results.

Updated: 2024-10-18 17:56:11

标题: DiscoGraMS：利用电影角色意识话语图提升电影剧本摘要化

摘要: 总结电影剧本与标准文档摘要相比具有独特的挑战。剧本不仅长度较长，而且还涉及角色、对话和场景之间复杂的相互关系，具有许多直接和微妙的关系以及难以机器学习模型准确捕捉和理解的上下文细微差别。最近对剧本总结的尝试集中在微调基于变压器的预训练模型，但这些模型往往在捕捉长期依赖性和潜在关系方面表现不佳，并经常遇到“中间丢失”问题。为了解决这些挑战，我们引入了DiscoGraMS，一个新颖的资源，将电影剧本表示为电影角色感知的话语图（CaD图）。这种方法非常适合各种下游任务，如摘要、问答和显著性检测。该模型旨在保留所有重要信息，提供剧本内容更全面和忠实的表示。我们进一步探讨了一种基线方法，通过图和文本模态的后期融合将CaD图与相应的电影剧本结合起来，并呈现了非常初步的有希望的结果。

更新时间: 2024-10-18 17:56:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14666v1

Online Reinforcement Learning with Passive Memory

This paper considers an online reinforcement learning algorithm that leverages pre-collected data (passive memory) from the environment for online interaction. We show that using passive memory improves performance and further provide theoretical guarantees for regret that turns out to be near-minimax optimal. Results show that the quality of passive memory determines sub-optimality of the incurred regret. The proposed approach and results hold in both continuous and discrete state-action spaces.

Updated: 2024-10-18 17:55:15

标题: Passive Memory 在线强化学习

摘要: 本文考虑了一种在线强化学习算法，该算法利用来自环境的预先收集数据（被动内存）进行在线交互。我们展示了使用被动内存可以提高性能，并进一步为遗憾提供了理论保证，结果接近最小最大值。结果表明，被动内存的质量决定了所产生遗憾的次优性。所提出的方法和结果适用于连续和离散状态-动作空间。

更新时间: 2024-10-18 17:55:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14665v1

Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing

The locate-then-edit paradigm has shown significant promise for knowledge editing (KE) in Large Language Models (LLMs). While previous methods perform well on single-hop fact recall tasks, they consistently struggle with multi-hop factual recall tasks involving newly edited knowledge. In this paper, leveraging tools in mechanistic interpretability, we first identify that in multi-hop tasks, LLMs tend to retrieve implicit subject knowledge from deeper MLP layers, unlike single-hop tasks, which rely on earlier layers. This distinction explains the poor performance of current methods in multi-hop queries, as they primarily focus on editing shallow layers, leaving deeper layers unchanged. To address this, we propose IFMET, a novel locate-then-edit KE approach designed to edit both shallow and deep MLP layers. IFMET employs multi-hop editing prompts and supplementary sets to locate and modify knowledge across different reasoning stages. Experimental results demonstrate that IFMET significantly improves performance on multi-hop factual recall tasks, effectively overcoming the limitations of previous locate-then-edit methods.

Updated: 2024-10-18 17:53:46

标题: "定位-编辑：知识编辑下的多跳事实回溯"

摘要: 定位-编辑范式在大型语言模型（LLMs）中对知识编辑（KE）表现出显著的潜力。虽然先前的方法在单跳事实回忆任务上表现良好，但在涉及新编辑知识的多跳事实回忆任务中却一直面临困难。在本文中，利用机械式可解释性工具，我们首先确定在多跳任务中，LLMs倾向于从更深的MLP层中检索隐含主题知识，而不像单跳任务那样依赖于较早的层。这一区别解释了当前方法在多跳查询中表现不佳的原因，因为它们主要专注于编辑浅层，而忽略了更深的层。为了解决这个问题，我们提出了IFMET，一种新颖的定位-编辑KE方法，旨在编辑浅层和深层MLP层。IFMET采用多跳编辑提示和补充集来定位和修改不同推理阶段的知识。实验结果表明，IFMET显著提高了在多跳事实回忆任务上的性能，有效地克服了先前定位-编辑方法的局限性。

更新时间: 2024-10-18 17:53:46

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.06331v2

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

Large Language Models (LLMs) have shown significant potential in designing reward functions for Reinforcement Learning (RL) tasks. However, obtaining high-quality reward code often involves human intervention, numerous LLM queries, or repetitive RL training. To address these issues, we propose CARD, a LLM-driven Reward Design framework that iteratively generates and improves reward function code. Specifically, CARD includes a Coder that generates and verifies the code, while a Evaluator provides dynamic feedback to guide the Coder in improving the code, eliminating the need for human feedback. In addition to process feedback and trajectory feedback, we introduce Trajectory Preference Evaluation (TPE), which evaluates the current reward function based on trajectory preferences. If the code fails the TPE, the Evaluator provides preference feedback, avoiding RL training at every iteration and making the reward function better aligned with the task objective. Empirical results on Meta-World and ManiSkill2 demonstrate that our method achieves an effective balance between task performance and token efficiency, outperforming or matching the baselines across all tasks. On 10 out of 12 tasks, CARD shows better or comparable performance to policies trained with expert-designed rewards, and our method even surpasses the oracle on 3 tasks.

Updated: 2024-10-18 17:51:51

标题: 一个基于大型语言模型驱动的动态反馈强化学习奖励设计框架

摘要: 大型语言模型（LLMs）在设计强化学习（RL）任务的奖励函数方面展现出了巨大潜力。然而，获得高质量的奖励代码通常需要人工干预、大量LLM查询或重复的RL训练。为了解决这些问题，我们提出了CARD，一个LLM驱动的奖励设计框架，可以迭代生成和改进奖励函数代码。具体而言，CARD包括一个生成和验证代码的编码器，同时一个评估器提供动态反馈以指导编码器改进代码，消除了对人工反馈的需求。除了过程反馈和轨迹反馈，我们引入了轨迹偏好评估（TPE），根据轨迹偏好评估当前奖励函数。如果代码未通过TPE，则评估器提供偏好反馈，避免在每次迭代中进行RL训练，并使奖励函数更好地与任务目标对齐。在Meta-World和ManiSkill2上的实证结果表明，我们的方法在任务性能和令牌效率之间取得了有效的平衡，在所有任务上表现出优于或匹配基线的性能。在12项任务中，CARD在10项任务中显示出比专家设计奖励训练的策略更好或相当的性能，我们的方法甚至在3项任务中超越了Oracle。

更新时间: 2024-10-18 17:51:51

领域: cs.LG

下载: http://arxiv.org/abs/2410.14660v1

Harnessing Causality in Reinforcement Learning With Bagged Decision Times

We consider reinforcement learning (RL) for a class of problems with bagged decision times. A bag contains a finite sequence of consecutive decision times. The transition dynamics are non-Markovian and non-stationary within a bag. Further, all actions within a bag jointly impact a single reward, observed at the end of the bag. Our goal is to construct an online RL algorithm to maximize the discounted sum of the bag-specific rewards. To handle non-Markovian transitions within a bag, we utilize an expert-provided causal directed acyclic graph (DAG). Based on the DAG, we construct the states as a dynamical Bayesian sufficient statistic of the observed history, which results in Markovian state transitions within and across bags. We then frame this problem as a periodic Markov decision process (MDP) that allows non-stationarity within a period. An online RL algorithm based on Bellman-equations for stationary MDPs is generalized to handle periodic MDPs. To justify the proposed RL algorithm, we show that our constructed state achieves the maximal optimal value function among all state constructions for a periodic MDP. Further we prove the Bellman optimality equations for periodic MDPs. We evaluate the proposed method on testbed variants, constructed with real data from a mobile health clinical trial.

Updated: 2024-10-18 17:51:37

标题: 利用袋装决策时间在强化学习中驾驭因果关系

摘要: 我们考虑对一类具有袋式决策时间的问题进行强化学习（RL）。一个袋子包含一个有限的连续决策时间序列。转移动态在一个袋子内是非马尔可夫的、非稳态的。此外，一个袋子内的所有行为共同影响一个在袋子末尾观察到的单一奖励。我们的目标是构建一个在线RL算法，以最大化袋子特定奖励的折现总和。为了处理一个袋子内的非马尔可夫转移，我们利用专家提供的因果有向无环图（DAG）。基于DAG，我们将状态构建为观察历史的动态贝叶斯充分统计量，从而实现袋子内和跨袋子之间的马尔可夫状态转移。然后，我们将这个问题框架化为一个允许周期性非稳态的马尔可夫决策过程（MDP）。基于Bellman方程的在线RL算法被推广为处理周期性MDP。为了证明所提出的RL算法的合理性，我们证明了我们构建的状态在周期性MDP的所有状态构建中实现了最大的最优值函数。此外，我们证明了周期性MDP的Bellman最优性方程。我们在使用移动健康临床试验真实数据构建的测试台变体上评估了所提出的方法。

更新时间: 2024-10-18 17:51:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.14659v1

EasyRec: Simple yet Effective Language Models for Recommendation

Deep neural networks have become a powerful technique for learning representations from user-item interaction data in collaborative filtering (CF) for recommender systems. However, many existing methods heavily rely on unique user and item IDs, which limits their ability to perform well in practical zero-shot learning scenarios where sufficient training data may be unavailable. Inspired by the success of language models (LMs) and their strong generalization capabilities, a crucial question arises: How can we harness the potential of language models to empower recommender systems and elevate its generalization capabilities to new heights? In this study, we propose EasyRec - an effective and easy-to-use approach that seamlessly integrates text-based semantic understanding with collaborative signals. EasyRec employs a text-behavior alignment framework, which combines contrastive learning with collaborative language model tuning, to ensure a strong alignment between the text-enhanced semantic space and the collaborative behavior information. Extensive empirical evaluations across diverse real-world datasets demonstrate the superior performance of EasyRec compared to state-of-the-art alternative models, particularly in the challenging text-based zero-shot recommendation scenarios. Furthermore, the study highlights the potential of seamlessly integrating EasyRec as a plug-and-play component into text-enhanced collaborative filtering frameworks, thereby empowering existing recommender systems to elevate their recommendation performance and adapt to the evolving user preferences in dynamic environments. For better result reproducibility of our EasyRec framework, the model implementation details, source code, and datasets are available at the link: https://github.com/HKUDS/EasyRec.

Updated: 2024-10-18 17:50:57

标题: EasyRec：简单而有效的推荐语言模型

摘要: 深度神经网络已成为协同过滤（CF）中学习用户-物品交互数据表示的强大技术，用于推荐系统。然而，许多现有方法严重依赖于唯一的用户和物品ID，这限制了它们在实际的零样本学习场景中表现良好的能力，其中可能缺乏足够的训练数据。受语言模型（LMs）的成功和它们强大的泛化能力的启发，一个关键问题出现了：我们如何利用语言模型的潜力来增强推荐系统的能力并将其泛化能力提升到新的高度？在这项研究中，我们提出了EasyRec - 一种有效且易于使用的方法，无缝地将基于文本的语义理解与协同信号集成在一起。EasyRec采用了一个文本行为对齐框架，将对比学习与协同语言模型调整相结合，以确保文本增强语义空间与协同行为信息之间的强大对齐。对各种真实世界数据集进行的广泛实证评估表明，与最先进的替代模型相比，EasyRec表现出更优异的性能，特别是在具有挑战性的基于文本的零样本推荐场景中。此外，该研究突出了将EasyRec无缝集成为一个即插即用组件，融入基于文本增强的协同过滤框架中的潜力，从而赋予现有推荐系统提升其推荐性能并适应动态环境中不断变化的用户偏好的能力。为了更好地重现我们的EasyRec框架的结果，模型实现细节、源代码和数据集可在以下链接找到：https://github.com/HKUDS/EasyRec。

更新时间: 2024-10-18 17:50:57

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2408.08821v3

Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens

Language models are often trained to maximize the likelihood of the next token given past tokens in the training dataset. However, during inference time, they are utilized differently, generating text sequentially and auto-regressively by using previously generated tokens as input to predict the next one. Marginal differences in predictions at each step can cascade over successive steps, resulting in different distributions from what the models were trained for and potentially leading to unpredictable behavior. This paper proposes two simple approaches based on model own generation to address this discrepancy between the training and inference time. Our first approach is Batch-Scheduled Sampling, where, during training, we stochastically choose between the ground-truth token from the dataset and the model's own generated token as input to predict the next token. This is done in an offline manner, modifying the context window by interleaving ground-truth tokens with those generated by the model. Our second approach is Reference-Answer-based Correction, where we explicitly incorporate a self-correction capability into the model during training. This enables the model to effectively self-correct the gaps between the generated sequences and the ground truth data without relying on an external oracle model. By incorporating our proposed strategies during training, we have observed an overall improvement in performance compared to baseline methods, as demonstrated by our extensive experiments using summarization, general question-answering, and math question-answering tasks.

Updated: 2024-10-18 17:48:27

标题: 通过利用自动生成的标记来弥合LLMs中的训练-推理差距

摘要: 语言模型通常被训练以最大化在训练数据集中过去 token 给出下一个 token 的可能性。然而，在推理时，它们被不同地利用，通过使用先前生成的 token 作为输入依次生成文本，并自回归地预测下一个。在每一步的预测中的微小差异会在连续的步骤中级联，导致不同于模型训练目的的分布，并潜在地导致不可预测的行为。本文提出了两种基于模型自身生成的简单方法来解决训练和推理时间之间的差异。我们的第一种方法是批量计划抽样，即在训练期间，我们在数据集中随机选择地面真实 token 和模型自己生成的 token 之间作为输入来预测下一个 token。这是以离线方式完成的，通过将地面真实 token 与模型生成的 token 交错修改上下文窗口。我们的第二种方法是基于参考答案的校正，即在训练期间，我们明确地将自我校正能力纳入模型中。这使得模型能够有效地自我校正生成序列与地面真实数据之间的差距，而无需依赖外部 oracle 模型。通过在训练期间引入我们提出的策略，我们观察到与基准方法相比整体性能有所提高，这在我们使用摘要、一般问题回答和数学问题回答任务进行的广泛实验中得到了证明。

更新时间: 2024-10-18 17:48:27

领域: cs.LG

下载: http://arxiv.org/abs/2410.14655v1

Real-time Fake News from Adversarial Feedback

We show that existing evaluations for fake news detection based on conventional sources, such as claims on fact-checking websites, result in an increasing accuracy over time for LLM-based detectors -- even after their knowledge cutoffs. This suggests that recent popular political claims, which form the majority of fake news on such sources, are easily classified using surface-level shallow patterns. Instead, we argue that a proper fake news detection dataset should test a model's ability to reason factually about the current world by retrieving and reading related evidence. To this end, we develop a novel pipeline that leverages natural language feedback from a RAG-based detector to iteratively modify real-time news into deceptive fake news that challenges LLMs. Our iterative rewrite decreases the binary classification AUC by an absolute 17.5 percent for a strong RAG GPT-4o detector. Our experiments reveal the important role of RAG in both detecting and generating fake news, as retrieval-free LLM detectors are vulnerable to unseen events and adversarial attacks, while feedback from RAG detection helps discover more deceitful patterns in fake news.

Updated: 2024-10-18 17:47:11

标题: 实时通过对抗反馈生成虚假新闻

摘要: 我们展示了基于传统来源的假新闻检测的现有评估（例如事实核查网站上的声明）对于基于LLM的检测器随着时间的推移表现出越来越高的准确性，甚至在它们的知识截止日期之后。这表明，最近流行的政治声明，这些声明构成了这些来源上大部分假新闻，很容易使用表面层次的浅层模式进行分类。相反，我们认为，一个合适的假新闻检测数据集应该测试模型根据相关证据对当前世界进行事实推理的能力。为此，我们开发了一个新颖的流程，利用基于RAG的检测器的自然语言反馈，迭代地修改实时新闻成为挑战LLMs的欺骗性假新闻。我们的迭代重写将一个强大的RAG GPT-4o检测器的二元分类AUC绝对降低了17.5%。我们的实验揭示了RAG在检测和生成假新闻中的重要作用，因为无检索的LLM检测器容易受到未知事件和对抗性攻击的影响，而来自RAG检测的反馈有助于发现假新闻中更多的欺骗模式。

更新时间: 2024-10-18 17:47:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14651v1

EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search

The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by \emph{dynamic, non-uniform} compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on heuristics for identifying the "importance" of a given layer towards the loss, based on assumptions such as \emph{error monotonicity}, i.e. that the end-to-end model compression error is proportional to the sum of layer-wise errors. In this paper, we revisit this area, and propose a new and general approach for dynamic compression that is provably optimal in a given input range. We begin from the motivating observation that, in general, \emph{error monotonicity does not hold for LLMs}: compressed models with lower sum of per-layer errors can perform \emph{worse} than models with higher error sums. To address this, we propose a new general evolutionary framework for dynamic LLM compression called EvoPress, which has provable convergence, and low sample and evaluation complexity. We show that these theoretical guarantees lead to highly competitive practical performance for dynamic compression of Llama, Mistral and Phi models. Via EvoPress, we set new state-of-the-art results across all compression approaches: structural pruning (block/layer dropping), unstructured sparsity, as well as quantization with dynamic bitwidths. Our code is available at https://github.com/IST-DASLab/EvoPress.

Updated: 2024-10-18 17:46:37

标题: EvoPress：通过进化搜索实现动态模型压缩的最佳方法

摘要: 大型语言模型（LLMs）的高计算成本导致了LLM压缩研究的激增，其中包括量化、稀疏化或结构化修剪等方法。在这一领域的一个新前沿是由\emph{动态、非均匀}压缩方法提供的，这些方法根据每个块或者甚至每个层来调整压缩级别（例如稀疏度），以最小化精度损失，同时保证全局压缩阈值。然而，当前方法依赖于启发式方法来确定给定层对损失的“重要性”，基于诸如\emph{误差单调性}的假设，即端到端模型压缩误差与逐层误差之和成正比。在本文中，我们重新审视了这一领域，并提出了一种新的、通用的动态压缩方法，该方法在给定输入范围内可证明是最优的。我们从这样一个激励性观察开始，即一般来说，\emph{LLMs的误差单调性并不成立}：具有更低逐层误差之和的压缩模型可能比具有更高误差之和的模型表现更差。为了解决这个问题，我们提出了一个名为EvoPress的新的通用进化框架，用于动态LLM压缩，该框架具有可证明的收敛性和低样本和评估复杂性。我们展示了这些理论保证导致对Llama、Mistral和Phi模型的动态压缩具有极具竞争力的实际性能。通过EvoPress，我们在所有压缩方法中取得了新的最先进结果：结构修剪（块/层丢弃）、非结构化稀疏性，以及具有动态比特宽度的量化。我们的代码可在https://github.com/IST-DASLab/EvoPress获得。

更新时间: 2024-10-18 17:46:37

领域: cs.LG

下载: http://arxiv.org/abs/2410.14649v1

HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB ($\textsf{RLinUCB}$) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit ($\textsf{HR-Bandit}$), which integrates human expertise to enhance performance. $\textsf{HR-Bandit}$ offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.

Updated: 2024-10-18 17:41:19

标题: HR-Bandit：人工智能协作的线性资源赌博算法

摘要: 人类医生经常建议可行的资源，让患者可以修改他们的状况以获得更有效的治疗。受到这样的医疗场景的启发，我们提出了Recourse Linear UCB（RLinUCB）算法，通过平衡探索和开发来优化行动选择和特征修改。我们进一步将其扩展为Human-AI Linear Recourse Bandit（HR-Bandit），该算法整合了人类专业知识以增强性能。HR-Bandit提供三个关键保证：（i）改善初始性能的预热保证，（ii）最小化所需人类交互的人力保证，以及（iii）即使人类决策不佳也能保证次线性遗憾的鲁棒性保证。实证结果，包括一个医疗案例研究，验证了其在现有基准测试中的优越性能。

更新时间: 2024-10-18 17:41:19

领域: cs.LG

下载: http://arxiv.org/abs/2410.14640v1

Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs

Positional bias in large language models (LLMs) hinders their ability to effectively process long inputs. A prominent example is the "lost in the middle" phenomenon, where LLMs struggle to utilize relevant information situated in the middle of the input. While prior research primarily focuses on single pieces of relevant information, real-world applications often involve multiple relevant information pieces. To bridge this gap, we present LongPiBench, a benchmark designed to assess positional bias involving multiple pieces of relevant information. Thorough experiments are conducted with five commercial and six open-source models. These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces. These findings highlight the importance of evaluating and reducing positional biases to advance LLM's capabilities.

Updated: 2024-10-18 17:41:19

标题: 相关信息片段之间的距离导致长上下文语言模型中的偏见

摘要: 大语言模型（LLMs）中的位置偏差妨碍了它们有效处理长输入的能力。一个突出的例子是“中间迷失”现象，LLMs难以利用位于输入中间的相关信息。尽管先前的研究主要关注单个相关信息片段，但实际应用通常涉及多个相关信息片段。为了弥合这一差距，我们提出了LongPiBench，这是一个旨在评估涉及多个相关信息片段的位置偏差的基准。我们对五种商业模型和六种开源模型进行了彻底实验。这些实验表明，虽然大多数当前模型对“中间迷失”问题具有鲁棒性，但存在与相关信息片段间距有关的显著偏差。这些发现突显了评估和减少位置偏差以提高LLM能力的重要性。

更新时间: 2024-10-18 17:41:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14641v1

Convergence of Manifold Filter-Combine Networks

In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). The filter-combine framework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as the manifold analog of various popular GNNs. We then propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating the manifold by a sparse graph. We prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity.

Updated: 2024-10-18 17:40:58

标题: 流形滤波器-组合网络的收敛性

摘要: 为了更好地理解流形神经网络（MNNs），我们引入了流形滤波组合网络（MFCNs）。滤波组合框架类似于图神经网络（GNNs）的流行聚合-组合范式，自然地提出了许多有趣的MNNs家族，可以解释为各种流行GNNs的流形模拟。然后，我们提出了一种在高维点云上实现MFCNs的方法，该方法依赖于通过稀疏图逼近流形。我们证明我们的方法在某种意义上是一致的，即随着数据点数量趋向无穷大，它收敛到一个连续极限。

更新时间: 2024-10-18 17:40:58

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2410.14639v1

Learning Generative Interactive Environments By Trained Agent Exploration

World models are increasingly pivotal in interpreting and simulating the rules and actions of complex environments. Genie, a recent model, excels at learning from visually diverse environments but relies on costly human-collected data. We observe that their alternative method of using random agents is too limited to explore the environment. We propose to improve the model by employing reinforcement learning based agents for data generation. This approach produces diverse datasets that enhance the model's ability to adapt and perform well across various scenarios and realistic actions within the environment. In this paper, we first release the model GenieRedux - an implementation based on Genie. Additionally, we introduce GenieRedux-G, a variant that uses the agent's readily available actions to factor out action prediction uncertainty during validation. Our evaluation, including a replication of the Coinrun case study, shows that GenieRedux-G achieves superior visual fidelity and controllability using the trained agent exploration. The proposed approach is reproducable, scalable and adaptable to new types of environments. Our codebase is available at https://github.com/insait-institute/GenieRedux .

Updated: 2024-10-18 17:37:51

标题: 通过训练代理探索学习生成交互环境

摘要: 世界模型在解释和模拟复杂环境的规则和行为方面变得越来越关键。最近的模型Genie在学习来自视觉多样环境方面表现出色，但依赖昂贵的人工收集数据。我们观察到他们使用随机代理的替代方法在探索环境方面受到限制。我们建议通过使用基于强化学习的代理来改进模型以进行数据生成。这种方法产生多样化的数据集，增强了模型在各种情景和环境中适应和表现良好的能力。在本文中，我们首先发布了基于Genie的实现模型GenieRedux。此外，我们介绍了使用代理的现成可用行动来消除验证过程中的行动预测不确定性的GenieRedux-G变体。我们的评估，包括对Coinrun案例研究的复制，显示GenieRedux-G在使用训练代理探索时实现了更高的视觉保真度和可控性。所提出的方法可重现、可扩展，并可适应新类型的环境。我们的代码库可在https://github.com/insait-institute/GenieRedux 上获取。

更新时间: 2024-10-18 17:37:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.06445v2

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings

Training-free embedding methods directly leverage pretrained large language models (LLMs) to embed text, bypassing the costly and complex procedure of contrastive learning. Previous training-free embedding methods have mainly focused on optimizing embedding prompts and have overlooked the benefits of utilizing the generative abilities of LLMs. We propose a novel method, GenEOL, which uses LLMs to generate diverse transformations of a sentence that preserve its meaning, and aggregates the resulting embeddings of these transformations to enhance the overall sentence embedding. GenEOL significantly outperforms the existing training-free embedding methods by an average of 2.85 points across several LLMs on the sentence semantic text similarity (STS) benchmark. Our analysis shows that GenEOL stabilizes representation quality across LLM layers and is robust to perturbations of embedding prompts. GenEOL also achieves notable gains on multiple clustering, reranking and pair-classification tasks from the MTEB benchmark.

Updated: 2024-10-18 17:36:53

标题: GenEOL：利用LLM的生成能力进行无需训练的句子嵌入

摘要: 无需训练的嵌入方法直接利用预训练的大型语言模型（LLMs）来嵌入文本，避开了昂贵和复杂的对比学习过程。先前的无需训练的嵌入方法主要集中在优化嵌入提示上，并忽略了利用LLMs的生成能力的好处。我们提出了一种新颖的方法，GenEOL，它利用LLMs生成保留句子含义的多样化转换，并聚合这些转换的结果嵌入以增强整体句子嵌入。GenEOL在句子语义文本相似性（STS）基准测试中，相对于几个LLMs，平均提高了2.85分。我们的分析显示，GenEOL稳定了LLM层之间的表示质量，并且对于嵌入提示的扰动具有鲁棒性。GenEOL还在MTEB基准测试中的多个聚类、重新排名和对分类任务中取得了显著的增益。

更新时间: 2024-10-18 17:36:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14635v1

Parallel Backpropagation for Inverse of a Convolution with Application to Normalizing Flows

Inverse of an invertible convolution is an important operation that comes up in Normalizing Flows, Image Deblurring, etc. The naive algorithm for backpropagation of this operation using Gaussian elimination has running time $O(n^3)$ where $n$ is the number of pixels in the image. We give a fast parallel backpropagation algorithm with running time $O(\sqrt{n})$ for a square image and provide a GPU implementation of the same. Inverse Convolutions are usually used in Normalizing Flows in the sampling pass, making them slow. We propose to use Inverse Convolutions in the forward (image to latent vector) pass of the Normalizing flow. Since the sampling pass is the inverse of the forward pass, it will use convolutions only, resulting in efficient sampling times. We use our parallel backpropagation algorithm for optimizing the inverse convolution layer resulting in fast training times also. We implement this approach in various Normalizing Flow backbones, resulting in our Inverse-Flow models. We benchmark Inverse-Flow on standard datasets and show significantly improved sampling times with similar bits per dimension compared to previous models.

Updated: 2024-10-18 17:35:33

标题: 并行反向传播用于卷积的反向传播，并应用于归一化流

摘要: 反向可逆卷积是在归一化流、图像去模糊等领域中经常遇到的重要操作。使用高斯消元的朴素算法来进行此操作的反向传播，其运行时间为$O(n^3)$，其中$n$是图像中像素的数量。我们提出了一种快速并行反向传播算法，对于方形图像，其运行时间为$O(\sqrt{n})$，并提供了相应的GPU实现。反向卷积通常用于归一化流中的采样过程，导致速度较慢。我们建议将反向卷积用于归一化流的前向（图像到潜在向量）过程。由于采样过程是前向过程的逆过程，它将仅使用卷积，从而实现高效的采样时间。我们使用并行反向传播算法来优化反向卷积层，从而实现快速的训练时间。我们在各种归一化流主干中实施了这种方法，形成了我们的反向流模型。我们在标准数据集上对反向流进行基准测试，并展示了与先前模型相比具有类似比特每维的显着改进的采样时间。

更新时间: 2024-10-18 17:35:33

领域: cs.CV,cs.LG,cs.MM,math.PR

下载: http://arxiv.org/abs/2410.14634v1

A Distance-based Anomaly Detection Framework for Deep Reinforcement Learning

In deep reinforcement learning (RL) systems, abnormal states pose significant risks by potentially triggering unpredictable behaviors and unsafe actions, thus impeding the deployment of RL systems in real-world scenarios. It is crucial for reliable decision-making systems to have the capability to cast an alert whenever they encounter unfamiliar observations that they are not equipped to handle. In this paper, we propose a novel Mahalanobis distance-based (MD) anomaly detection framework, called \textit{MDX}, for deep RL algorithms. MDX simultaneously addresses random, adversarial, and out-of-distribution (OOD) state outliers in both offline and online settings. It utilizes Mahalanobis distance within class-conditional distributions for each action and operates within a statistical hypothesis testing framework under the Gaussian assumption. We further extend it to robust and distribution-free versions by incorporating Robust MD and conformal inference techniques. Through extensive experiments on classical control environments, Atari games, and autonomous driving scenarios, we demonstrate the effectiveness of our MD-based detection framework. MDX offers a simple, unified, and practical anomaly detection tool for enhancing the safety and reliability of RL systems in real-world applications.

Updated: 2024-10-18 17:32:27

标题: 一个基于距离的深度强化学习异常检测框架

摘要: 在深度强化学习（RL）系统中，异常状态可能会引发不可预测的行为和不安全的操作，从而阻碍RL系统在现实世界场景中的部署。对于可靠的决策系统而言，具有在遇到无法处理的陌生观察时发出警报的能力是至关重要的。本文提出了一种新颖的基于马氏距离的异常检测框架，称为MDX，用于深度RL算法。MDX同时解决了离群值在离线和在线设置中的随机、对抗性和超出分布（OOD）状态。它利用每个动作的类条件分布中的马氏距离，并在高斯假设下在统计假设检验框架内运行。我们进一步将其扩展为稳健和无分布版本，通过整合Robust MD和符合推断技术。通过在经典控制环境、Atari游戏和自动驾驶场景上进行广泛实验，我们展示了基于MD的检测框架的有效性。MDX提供了一个简单、统一和实用的异常检测工具，用于增强RL系统在现实世界应用中的安全性和可靠性。

更新时间: 2024-10-18 17:32:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2109.09889v3

On the Regularization of Learnable Embeddings for Time Series Processing

In processing multiple time series, accounting for the individual features of each sequence can be challenging. To address this, modern deep learning methods for time series analysis combine a shared (global) model with local layers, specific to each time series, often implemented as learnable embeddings. Ideally, these local embeddings should encode meaningful representations of the unique dynamics of each sequence. However, when these are learned end-to-end as parameters of a forecasting model, they may end up acting as mere sequence identifiers. Shared processing blocks may then become reliant on such identifiers, limiting their transferability to new contexts. In this paper, we address this issue by investigating methods to regularize the learning of local learnable embeddings for time series processing. Specifically, we perform the first extensive empirical study on the subject and show how such regularizations consistently improve performance in widely adopted architectures. Furthermore, we show that methods preventing the co-adaptation of local and global parameters are particularly effective in this context. This hypothesis is validated by comparing several methods preventing the downstream models from relying on sequence identifiers, going as far as completely resetting the embeddings during training. The obtained results provide an important contribution to understanding the interplay between learnable local parameters and shared processing layers: a key challenge in modern time series processing models and a step toward developing effective foundation models for time series.

Updated: 2024-10-18 17:30:20

标题: 关于时间序列处理中可学习嵌入的正则化

摘要: 在处理多个时间序列时，考虑到每个序列的个体特征可能是具有挑战性的。为了解决这个问题，现代深度学习方法将一个共享（全局）模型与本地层结合起来，针对每个时间序列实现为可学习的嵌入。理想情况下，这些本地嵌入应该编码每个序列独特动态的有意义表示。然而，当这些作为预测模型的参数进行端到端学习时，它们可能最终只起到序列标识符的作用。共享处理块可能会依赖于这些标识符，从而限制其在新环境中的可传递性。在本文中，我们通过研究规范化时间序列处理中本地可学习嵌入的学习方法来解决这个问题。具体而言，我们进行了该主题上的首次广泛的实证研究，并展示了这些规范化方法如何一致地提高了广泛采用的架构的性能。此外，我们发现阻止本地和全局参数相互适应的方法在这个背景下特别有效。通过比较几种阻止下游模型依赖序列标识符的方法，甚至在训练过程中完全重置嵌入，从而验证了这一假设。获得的结果为理解可学习本地参数和共享处理层之间相互作用提供了重要的贡献：这是现代时间序列处理模型中的一个关键挑战，也是朝着为时间序列开发有效的基础模型迈出的一步。

更新时间: 2024-10-18 17:30:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14630v1

SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity

Free-space trajectory similarity calculation, e.g., DTW, Hausdorff, and Frechet, often incur quadratic time complexity, thus learning-based methods have been proposed to accelerate the computation. The core idea is to train an encoder to transform trajectories into representation vectors and then compute vector similarity to approximate the ground truth. However, existing methods face dual challenges of effectiveness and efficiency: 1) they all utilize Euclidean distance to compute representation similarity, which leads to the severe curse of dimensionality issue -- reducing the distinguishability among representations and significantly affecting the accuracy of subsequent similarity search tasks; 2) most of them are trained in triplets manner and often necessitate additional information which downgrades the efficiency; 3) previous studies, while emphasizing the scalability in terms of efficiency, overlooked the deterioration of effectiveness when the dataset size grows. To cope with these issues, we propose a simple, yet accurate, fast, scalable model that only uses a single-layer vanilla transformer encoder as the feature extractor and employs tailored representation similarity functions to approximate various ground truth similarity measures. Extensive experiments demonstrate our model significantly mitigates the curse of dimensionality issue and outperforms the state-of-the-arts in effectiveness, efficiency, and scalability.

Updated: 2024-10-18 17:30:17

标题: SIMformer：单层香草变压器可以学习自由空间轨迹相似性

摘要: 自由空间轨迹相似性计算，例如DTW、豪斯多夫和Frechet，通常会产生二次时间复杂度，因此已经提出了基于学习的方法来加速计算。核心思想是训练一个编码器来将轨迹转换为表示向量，然后计算向量相似性来近似地表示真实情况。然而，现有方法面临着有效性和效率的双重挑战：1）它们都利用欧氏距离来计算表示相似性，这导致了维度诅咒问题的严重性--降低了表示之间的可区分性，严重影响了后续相似性搜索任务的准确性；2）大多数方法都是以三元组的方式进行训练，通常需要额外的信息，这降低了效率；3）先前的研究虽然强调了效率方面的可扩展性，但在数据集大小增长时忽略了有效性的恶化。为了解决这些问题，我们提出了一个简单而准确、快速、可扩展的模型，它只使用单层普通变压器编码器作为特征提取器，并采用定制的表示相似性函数来近似各种真实相似性度量。大量实验证明我们的模型显著减轻了维度诅咒问题，并在有效性、效率和可扩展性方面优于现有技术。

更新时间: 2024-10-18 17:30:17

领域: cs.LG,cs.DB,cs.IR

下载: http://arxiv.org/abs/2410.14629v1

CELI: Controller-Embedded Language Model Interactions

We introduce Controller-Embedded Language Model Interactions (CELI), a framework that integrates control logic directly within language model (LM) prompts, facilitating complex, multi-stage task execution. CELI addresses limitations of existing prompt engineering and workflow optimization techniques by embedding control logic directly within the operational context of language models, enabling dynamic adaptation to evolving task requirements. Our framework transfers control from the traditional programming execution environment to the LMs, allowing them to autonomously manage computational workflows while maintaining seamless interaction with external systems and functions. CELI supports arbitrary function calls with variable arguments, bridging the gap between LMs' adaptive reasoning capabilities and conventional software paradigms' structured control mechanisms. To evaluate CELI's versatility and effectiveness, we conducted case studies in two distinct domains: code generation (HumanEval benchmark) and multi-stage content generation (Wikipedia-style articles). The results demonstrate notable performance improvements across a range of domains. CELI achieved a 4.9 percentage point improvement over the best reported score of the baseline GPT-4 model on the HumanEval code generation benchmark. In multi-stage content generation, 94.4% of CELI-produced Wikipedia-style articles met or exceeded first draft quality when optimally configured, with 44.4% achieving high quality. These outcomes underscore CELI's potential for optimizing AI-driven workflows across diverse computational domains.

Updated: 2024-10-18 17:29:56

标题: CELI：控制器嵌入式语言模型交互

摘要: 我们介绍了Controller-Embedded Language Model Interactions（CELI），这是一个将控制逻辑直接集成到语言模型（LM）提示中的框架，促进复杂的多阶段任务执行。CELI通过将控制逻辑直接嵌入到语言模型的操作上下文中，解决了现有提示工程和工作流优化技术的局限性，从而实现对不断发展的任务需求的动态适应。我们的框架将控制从传统的编程执行环境转移到LM，使它们能够自主管理计算工作流，同时保持与外部系统和功能的无缝交互。CELI支持具有可变参数的任意函数调用，弥合了LM的自适应推理能力与传统软件范式的结构化控制机制之间的差距。为了评估CELI的多功能性和有效性，我们在两个不同领域进行了案例研究：代码生成（HumanEval基准）和多阶段内容生成（维基百科风格文章）。结果表明，在各个领域中均实现了显著的性能改进。在HumanEval代码生成基准测试中，CELI相对于基准GPT-4模型的最佳报告分数实现了4.9个百分点的改进。在多阶段内容生成中，当进行最佳配置时，94.4%的CELI生成的维基百科风格文章达到或超过了初稿质量，其中44.4%达到了高质量。这些结果突显了CELI在优化跨不同计算领域的AI驱动工作流方面的潜力。

更新时间: 2024-10-18 17:29:56

领域: cs.SE,cs.AI,cs.CL,68T50, 68Q32, 68N19,I.2.6; I.2.7; D.2.2

下载: http://arxiv.org/abs/2410.14627v1

Enhancing AI Accessibility in Veterinary Medicine: Linking Classifiers and Electronic Health Records

In the rapidly evolving landscape of veterinary healthcare, integrating machine learning (ML) clinical decision-making tools with electronic health records (EHRs) promises to improve diagnostic accuracy and patient care. However, the seamless integration of ML classifiers into existing EHRs in veterinary medicine is frequently hindered by the rigidity of EHR systems or the limited availability of IT resources. To address this shortcoming, we present Anna, a freely-available software solution that provides ML classifier results for EHR laboratory data in real-time.

Updated: 2024-10-18 17:27:07

标题: 提高兽医医学中的人工智能可访问性：链接分类器和电子健康记录

摘要: 在兽医保健领域快速发展的背景下，将机器学习（ML）临床决策工具与电子健康记录（EHRs）相结合，承诺提高诊断准确性和患者护理质量。然而，在兽医医学领域，将ML分类器无缝集成到现有EHRs中往往受制于EHR系统的刚性或有限的IT资源的可用性。为解决这一不足，我们提出了Anna，一种免费提供ML分类器结果以实时处理EHR实验室数据的软件解决方案。

更新时间: 2024-10-18 17:27:07

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.14625v1

syren-new: Precise formulae for the linear and nonlinear matter power spectra with massive neutrinos and dynamical dark energy

Current and future large scale structure surveys aim to constrain the neutrino mass and the equation of state of dark energy. We aim to construct accurate and interpretable symbolic approximations to the linear and nonlinear matter power spectra as a function of cosmological parameters in extended $\Lambda$CDM models which contain massive neutrinos and non-constant equations of state for dark energy. This constitutes an extension of the syren-halofit emulators to incorporate these two effects, which we call syren-new (SYmbolic-Regression-ENhanced power spectrum emulator with NEutrinos and $W_0-w_a$). We also obtain a simple approximation to the derived parameter $\sigma_8$ as a function of the cosmological parameters for these models. Our results for the linear power spectrum are designed to emulate CLASS, whereas for the nonlinear case we aim to match the results of EuclidEmulator2. We compare our results to existing emulators and $N$-body simulations. Our analytic emulators for $\sigma_8$, the linear and nonlinear power spectra achieve root mean squared errors of 0.1%, 0.3% and 1.3%, respectively, across a wide range of cosmological parameters, redshifts and wavenumbers. We verify that emulator-related discrepancies are subdominant compared to observational errors and other modelling uncertainties when computing shear power spectra for LSST-like surveys. Our expressions have similar accuracy to existing (numerical) emulators, but are at least an order of magnitude faster, both on a CPU and GPU. Our work greatly improves the accuracy, speed and range of applicability of current symbolic approximations to the linear and nonlinear matter power spectra. We provide publicly available code for all symbolic approximations found.

Updated: 2024-10-18 17:22:38

标题: Syren-new: 具有大质量中微子和动态暗能量的线性和非线性物质功率谱的精确公式

摘要: 当前和未来的大尺度结构调查旨在限制中微子质量和暗能量状态方程。我们的目标是构建准确且可解释的符号逼近，作为宇宙参数的函数来描述线性和非线性物质功率谱，在包含大质量中微子和暗能量非恒定状态方程的扩展ΛCDM模型中。这是对syren-halofit模拟器的扩展，以包含这两个效应，我们称之为syren-new (带有中微子和$W_0-w_a$的SYmbolic-Regression-ENhanced功率谱模拟器)。我们还获得了一个简单的逼近，将派生参数$\sigma_8$表示为这些模型的宇宙参数的函数。我们对线性功率谱的结果旨在模拟CLASS，而对于非线性情况，我们的目标是匹配EuclidEmulator2的结果。我们将我们的结果与现有的模拟器和$N$-体模拟进行比较。我们的$\sigma_8$，线性和非线性功率谱的解析模拟器，在广泛范围的宇宙参数、红移和波数下分别实现了0.1%、0.3%和1.3%的均方根误差。我们验证了，在计算LSST类似调查的剪切功率谱时，与模拟器相关的差异相对于观测误差和其他建模不确定性是次要的。我们的表达式具有类似于现有（数值）模拟器的准确性，但在CPU和GPU上至少快一个数量级。我们的工作极大地提高了当前符号逼近线性和非线性物质功率谱的准确性、速度和适用范围。我们为所有发现的符号逼近提供公开可用的代码。

更新时间: 2024-10-18 17:22:38

领域: astro-ph.CO,astro-ph.IM,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.14623v1

JAMUN: Transferable Molecular Conformational Ensemble Generation with Walk-Jump Sampling

Conformational ensembles of protein structures are immensely important both to understanding protein function, and for drug discovery in novel modalities such as cryptic pockets. Current techniques for sampling ensembles are computationally inefficient, or do not transfer to systems outside their training data. We present walk-Jump Accelerated Molecular ensembles with Universal Noise (JAMUN), a step towards the goal of efficiently sampling the Boltzmann distribution of arbitrary proteins. By extending Walk-Jump Sampling to point clouds, JAMUN enables ensemble generation at orders of magnitude faster rates than traditional molecular dynamics or state-of-the-art ML methods. Further, JAMUN is able to predict the stable basins of small peptides that were not seen during training.

Updated: 2024-10-18 17:21:25

标题: JAMUN：使用Walk-Jump采样生成可转移的分子构象集ensemble

摘要: 蛋白质结构的构象集合对于理解蛋白质功能以及在新型模式下的药物发现（如隐蔽口袋）都非常重要。目前用于采样构象集合的技术在计算效率上存在问题，或者无法推广到其训练数据之外的系统。我们提出了具有通用噪声的Walk-Jump加速分子集合（JAMUN），这是有效采样任意蛋白质玻尔兹曼分布目标的一步。通过将Walk-Jump采样扩展到点云，JAMUN能够以比传统分子动力学或最先进的机器学习方法快几个数量级的速度生成集合。此外，JAMUN能够预测在训练过程中未见过的小肽的稳定盆地。

更新时间: 2024-10-18 17:21:25

领域: physics.bio-ph,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2410.14621v1

Liger Kernel: Efficient Triton Kernels for LLM Training

Training Large Language Models (LLMs) efficiently at scale presents a formidable challenge, driven by their ever-increasing computational demands and the need for enhanced performance. In this work, we introduce Liger-Kernel, an open-sourced set of Triton kernels developed specifically for LLM training. With kernel optimization techniques like kernel operation fusing and input chunking, our kernels achieve on average a 20% increase in training throughput and a 60% reduction in GPU memory usage for popular LLMs compared to HuggingFace implementations. In addition, Liger-Kernel is designed with modularity, accessibility, and adaptability in mind, catering to both casual and expert users. Comprehensive benchmarks and integration tests are built in to ensure compatibility, performance, correctness, and convergence across diverse computing environments and model architectures. The source code is available under a permissive license at: github.com/linkedin/Liger-Kernel.

Updated: 2024-10-18 17:21:17

标题: 狮虎核心：用于LLM训练的高效Triton核心

摘要: 在大规模、高效地训练大型语言模型（LLMs）方面面临着巨大的挑战，这是由于它们日益增长的计算需求和对性能提升的需求所驱动的。在这项工作中，我们介绍了Liger-Kernel，这是一个专门为LLM训练开发的开源的Triton内核集合。通过内核优化技术，如内核操作融合和输入分块，与HuggingFace实现相比，我们的内核平均实现了训练吞吐量的20%增加和GPU内存使用量的60%减少。此外，Liger-Kernel的设计考虑到了模块化、易用性和适应性，满足了众多用户的需求，包括初学者和专家。全面的基准测试和集成测试被构建进来，以确保在各种计算环境和模型架构中的兼容性、性能、正确性和收敛性。源代码在github.com/linkedin/Liger-Kernel上以宽松的许可证可用。

更新时间: 2024-10-18 17:21:17

领域: cs.LG,cs.AI,cs.CL,cs.DC

下载: http://arxiv.org/abs/2410.10989v2

Contextual Document Embeddings

Dense document embeddings are central to neural retrieval. The dominant paradigm is to train and construct embeddings by running encoders directly on individual documents. In this work, we argue that these embeddings, while effective, are implicitly out-of-context for targeted use cases of retrieval, and that a contextualized document embedding should take into account both the document and neighboring documents in context - analogous to contextualized word embeddings. We propose two complementary methods for contextualized document embeddings: first, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss; second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation. Results show that both methods achieve better performance than biencoders in several settings, with differences especially pronounced out-of-domain. We achieve state-of-the-art results on the MTEB benchmark with no hard negative mining, score distillation, dataset-specific instructions, intra-GPU example-sharing, or extremely large batch sizes. Our method can be applied to improve performance on any contrastive learning dataset and any biencoder.

Updated: 2024-10-18 17:18:24

标题: 上下文文档嵌入

摘要: 密集文档嵌入是神经检索的核心。主导范式是通过直接在单个文档上运行编码器来训练和构建嵌入。在这项工作中，我们认为这些嵌入，虽然有效，对于检索的有针对性用例而言是隐含的脱离上下文的，一个上下文化的文档嵌入应该考虑文档和上下文中的邻近文档 - 类似于上下文化的词嵌入。我们提出了两种互补的方法来获得上下文化的文档嵌入：第一种是一种替代对比学习目标，明确将文档邻居纳入批内上下文损失中；第二种是一种新的上下文架构，明确地将邻近文档信息编码到编码表示中。结果表明，这两种方法在多种设置中比双编码器表现更好，特别是在领域外情况下差异尤为显著。我们在MTEB基准测试中取得了最先进的结果，而没有使用硬负采矿、得分精炼、数据集特定的指导、批内GPU示例共享或极大的批量大小。我们的方法可以应用于改进任何对比学习数据集和任何双编码器的性能。

更新时间: 2024-10-18 17:18:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.02525v3

Learning Linear Attention in Polynomial Time

Previous research has explored the computational expressivity of Transformer models in simulating Boolean circuits or Turing machines. However, the learnability of these simulators from observational data has remained an open question. Our study addresses this gap by providing the first polynomial-time learnability results (specifically strong, agnostic PAC learning) for single-layer Transformers with linear attention. We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS. As a consequence, the problem of learning any linear transformer may be converted into the problem of learning an ordinary linear predictor in an expanded feature space, and any such predictor may be converted back into a multiheaded linear transformer. Moving to generalization, we show how to efficiently identify training datasets for which every empirical risk minimizer is equivalent (up to trivial symmetries) to the linear Transformer that generated the data, thereby guaranteeing the learned model will correctly generalize across all inputs. Finally, we provide examples of computations expressible via linear attention and therefore polynomial-time learnable, including associative memories, finite automata, and a class of Universal Turing Machine (UTMs) with polynomially bounded computation histories. We empirically validate our theoretical findings on three tasks: learning random linear attention networks, key--value associations, and learning to execute finite automata. Our findings bridge a critical gap between theoretical expressivity and learnability of Transformers, and show that flexible and general models of computation are efficiently learnable.

Updated: 2024-10-18 17:15:09

标题: 在多项式时间学习线性注意力

摘要: 先前的研究探讨了Transformer模型在模拟布尔电路或图灵机方面的计算表达能力。然而，这些模拟器从观测数据中的可学习性仍然是一个未解决的问题。我们的研究通过提供第一个具有多项式时间可学习性结果（特别是强大的、不可知的PAC学习）来填补这一空白，针对具有线性注意力的单层Transformer。我们展示线性注意力可以被视为在适当定义的RKHS中的线性预测器。因此，学习任何线性变压器的问题可以转换为在扩展的特征空间中学习一个普通的线性预测器的问题，并且任何这样的预测器可以再次转换为多头线性变压器。在泛化方面，我们展示如何有效地识别训练数据集，其中每个经验风险最小化器等价于生成数据的线性Transformer（直到平凡对称性）。最终，我们提供了通过线性注意力表达的可计算示例，因此是多项式时间可学习的，包括联想记忆、有限自动机和具有多项式边界计算历史的一类通用图灵机（UTMs）。我们在三个任务上对我们的理论发现进行了实证验证：学习随机线性注意网络、键值关联和学习执行有限自动机。我们的发现弥合了Transformer的理论表达能力和可学习性之间的关键差距，并表明灵活和通用的计算模型是高效可学习的。

更新时间: 2024-10-18 17:15:09

领域: cs.LG,cs.AI,cs.CL,cs.DS

下载: http://arxiv.org/abs/2410.10101v2

Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments

Deep Reinforcement learning (DRL) is used to enable autonomous navigation in unknown environments. Most research assume perfect sensor data, but real-world environments may contain natural and artificial sensor noise and denial. Here, we present a benchmark of both well-used and emerging DRL algorithms in a navigation task with configurable sensor denial effects. In particular, we are interested in comparing how different DRL methods (e.g. model-free PPO vs. model-based DreamerV3) are affected by sensor denial. We show that DreamerV3 outperforms other methods in the visual end-to-end navigation task with a dynamic goal - and other methods are not able to learn this. Furthermore, DreamerV3 generally outperforms other methods in sensor-denied environments. In order to improve robustness, we use adversarial training and demonstrate an improved performance in denied environments, although this generally comes with a performance cost on the vanilla environments. We anticipate this benchmark of different DRL methods and the usage of adversarial training to be a starting point for the development of more elaborate navigation strategies that are capable of dealing with uncertain and denied sensor readings.

Updated: 2024-10-18 17:14:28

标题: 在受限传感器环境中导航的深度强化学习基准测试

摘要: 深度强化学习（DRL）被用于在未知环境中实现自主导航。大多数研究假设传感器数据完美，但现实世界环境可能包含自然和人工传感器噪声和拒绝。在这里，我们提出一个基准，评估在具有可配置传感器拒绝效果的导航任务中使用的常用和新兴DRL算法。特别地，我们有兴趣比较不同DRL方法（例如无模型PPO vs. 基于模型的DreamerV3）受传感器拒绝影响的程度。我们展示DreamerV3在具有动态目标的视觉端到端导航任务中优于其他方法，其他方法无法学习。此外，DreamerV3通常在传感器拒绝的环境中表现优异。为了提高鲁棒性，我们使用对抗训练，并展示在拒绝环境中的改进性能，尽管这通常会导致在普通环境中性能成本的提高。我们预期这个不同DRL方法的基准和对抗训练的使用将成为更复杂的导航策略发展的起点，这些策略能够处理不确定和被拒绝的传感器读数。

更新时间: 2024-10-18 17:14:28

领域: cs.RO,cs.AI,cs.LG,I.2.9

下载: http://arxiv.org/abs/2410.14616v1

Asymptotically Optimal Change Detection for Unnormalized Pre- and Post-Change Distributions

This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible. This situation happens in many scenarios in physics such as in ferromagnetism, crystallography, magneto-hydrodynamics, and thermodynamics, where the energy models are difficult to normalize. Our approach is based on the estimation of the Cumulative Sum (CUSUM) statistics, which is known to produce optimal performance. We first present an intuitively appealing approximation method. Unfortunately, this produces a biased estimator of the CUSUM statistics and may cause performance degradation. We then propose the Log-Partition Approximation Cumulative Sum (LPA-CUSUM) algorithm based on thermodynamic integration (TI) in order to estimate the log-ratio of normalizing constants of pre- and post-change distributions. It is proved that this approach gives an unbiased estimate of the log-partition function and the CUSUM statistics, and leads to an asymptotically optimal performance. Moreover, we derive a relationship between the required sample size for thermodynamic integration and the desired detection delay performance, offering guidelines for practical parameter selection. Numerical studies are provided demonstrating the efficacy of our approach.

Updated: 2024-10-18 17:13:29

标题: 非标准化变化检测的渐近最优解

摘要: 本文讨论了在只有未标准化的变化前和变化后分布可访问时，检测变化的问题。这种情况在物理学的许多场景中经常发生，比如在铁磁性、晶体学、磁流体力学和热力学中，能量模型难以归一化。我们的方法基于累积和（CUSUM）统计量的估计，这已知能产生最佳性能。我们首先提出了一种直观吸引人的近似方法。不幸的是，这会产生CUSUM统计的有偏估计，并可能导致性能下降。然后，我们提出了基于热力学积分（TI）的对数分区近似累积和（LPA-CUSUM）算法，以估计变化前后分布的归一化常数的对数比。证明了这种方法给出了对数分区函数和CUSUM统计的无偏估计，并导致渐近最佳性能。此外，我们推导了热力学积分所需样本大小与所需检测延迟性能之间的关系，提供了实际参数选择的指导。通过数值研究证明了我们方法的有效性。

更新时间: 2024-10-18 17:13:29

领域: stat.ML,cs.AI,cs.IT,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2410.14615v1

One size doesn't fit all: Predicting the Number of Examples for In-Context Learning

In-context learning (ICL) refers to the process of adding a small number of localized examples (ones that are semantically similar to the input) from a training set of labelled data to an LLM's prompt with an objective to effectively control the generative process seeking to improve the downstream task performance. Existing ICL approaches use an identical number of examples (a pre-configured hyper-parameter) for each data instance. Our work alleviates the limitations of this 'one fits all' approach by dynamically predicting the number of examples for each data instance to be used in few-shot inference with LLMs. In particular, we employ a multi-label classifier, the parameters of which are fitted using a training set, where the label for each instance in the training set indicates if using a specific value of k (number of most similar examples from 0 up to a maximum value) leads to correct k-shot downstream predictions. Our experiments on a number of text classification benchmarks show that AICL substantially outperforms standard ICL by up to 17%.

Updated: 2024-10-18 17:10:05

标题: 一个尺寸并不能适用所有情况：预测上下文学习中的示例数量

摘要: 上下文学习（ICL）是指向LLM的提示添加一小部分与输入语义相似的本地化示例（来自带标签数据的训练集），以有效控制生成过程，从而改善下游任务性能的过程。现有的ICL方法对每个数据实例使用相同数量的示例（预先配置的超参数）。我们的研究通过动态预测每个数据实例要在LLMs的few-shot推理中使用的示例数量，缓解了这种“一刀切”的方法的局限性。具体而言，我们使用一个多标签分类器，其参数使用训练集进行拟合，训练集中每个实例的标签指示使用特定值k（从0到最大值的最相似示例的数量）是否导致正确的k-shot下游预测。我们在多个文本分类基准上的实验表明，AICL的性能明显优于标准ICL，高达17%。

更新时间: 2024-10-18 17:10:05

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.06402v2

Modular Boundaries in Recurrent Neural Networks

Recent theoretical and experimental work in neuroscience has focused on the representational and dynamical character of neural manifolds --subspaces in neural activity space wherein many neurons coactivate. Importantly, neural populations studied under this "neural manifold hypothesis" are continuous and not cleanly divided into separate neural populations. This perspective clashes with the "modular hypothesis" of brain organization, wherein neural elements maintain an "all-or-nothing" affiliation with modules. In line with this modular hypothesis, recent research on recurrent neural networks suggests that multi-task networks become modular across training, such that different modules specialize for task-general dynamical motifs. If the modular hypothesis is true, then it would be important to use a dimensionality reduction technique that captures modular structure. Here, we investigate the features of such a method. We leverage RNNs as a model system to study the character of modular neural populations, using a community detection method from network science known as modularity maximization to partition neurons into distinct modules. These partitions allow us to ask the following question: do these modular boundaries matter to the system? ...

Updated: 2024-10-18 17:07:01

标题: 循环神经网络中的模块化边界

摘要: 最近神经科学领域的理论和实验工作集中在神经流形的表示和动力学特性上-神经活动空间中的子空间，在其中许多神经元同时激活。重要的是，在“神经流形假说”下研究的神经群体是连续的，而不是清晰地分为独立的神经群体。这一观点与大脑组织的“模块假说”相冲突，后者认为神经元素与模块保持“全有或全无”的关联。与这一模块假说一致，最近对循环神经网络的研究表明，在训练过程中，多任务网络变得模块化，不同模块专门用于任务通用的动态模式。如果模块假说成立，那么重要的是使用一个能捕捉模块化结构的降维技术。在这里，我们调查了这种方法的特征。我们利用RNN作为一个模型系统来研究模块化神经群体的特性，使用网络科学中的一种称为模块化最大化的社区检测方法将神经元划分为不同的模块。这些划分让我们提出了以下问题：这些模块边界对系统有影响吗？

更新时间: 2024-10-18 17:07:01

领域: q-bio.NC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.20601v2

Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs

Jailbreak attack can be used to access the vulnerabilities of Large Language Models (LLMs) by inducing LLMs to generate the harmful content. And the most common method of the attack is to construct semantically ambiguous prompts to confuse and mislead the LLMs. To access the security and reveal the intrinsic relation between the input prompt and the output for LLMs, the distribution of attention weight is introduced to analyze the underlying reasons. By using statistical analysis methods, some novel metrics are defined to better describe the distribution of attention weight, such as the Attention Intensity on Sensitive Words (Attn_SensWords), the Attention-based Contextual Dependency Score (Attn_DepScore) and Attention Dispersion Entropy (Attn_Entropy). By leveraging the distinct characteristics of these metrics, the beam search algorithm and inspired by the military strategy "Feint and Attack", an effective jailbreak attack strategy named as Attention-Based Attack (ABA) is proposed. In the ABA, nested attack prompts are employed to divert the attention distribution of the LLMs. In this manner, more harmless parts of the input can be used to attract the attention of the LLMs. In addition, motivated by ABA, an effective defense strategy called as Attention-Based Defense (ABD) is also put forward. Compared with ABA, the ABD can be used to enhance the robustness of LLMs by calibrating the attention distribution of the input prompt. Some comparative experiments have been given to demonstrate the effectiveness of ABA and ABD. Therefore, both ABA and ABD can be used to access the security of the LLMs. The comparative experiment results also give a logical explanation that the distribution of attention weight can bring great influence on the output for LLMs.

Updated: 2024-10-18 17:02:13

标题: 佯攻与真攻：基于注意力的LLM越狱和保护策略

摘要: 越狱攻击可以通过诱使大型语言模型（LLMs）生成有害内容来访问LLMs的漏洞。攻击的最常见方法是构建语义模糊的提示，以混淆和误导LLMs。为了访问安全性并揭示LLMs输入提示和输出之间的内在关系，引入了注意力权重分布来分析潜在原因。通过使用统计分析方法，定义了一些新颖的度量标准来更好地描述注意力权重的分布，如敏感词汇的注意力强度（Attn_SensWords）、基于注意力的上下文依赖得分（Attn_DepScore）和注意力分散熵（Attn_Entropy）。通过利用这些度量标准的独特特征，结合光束搜索算法并受到军事战略“诱敌深入”的启发，提出了一种有效的越狱攻击策略，名为基于注意力的攻击（ABA）。在ABA中，采用嵌套攻击提示来转移LLMs的注意力分布。通过这种方式，可以利用输入的更多无害部分来吸引LLMs的注意力。此外，受到ABA的启发，还提出了一种有效的防御策略，称为基于注意力的防御（ABD）。与ABA相比，ABD可以通过校准输入提示的注意力分布来增强LLMs的鲁棒性。进行了一些对比实验来展示ABA和ABD的有效性。因此，ABA和ABD都可以用来访问LLMs的安全性。对比实验结果也给出了一个合乎逻辑的解释，即注意力权重的分布对LLMs的输出产生重大影响。

更新时间: 2024-10-18 17:02:13

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.16327v1

Evaluating Privacy Measures in Healthcare Apps Predominantly Used by Older Adults

The widespread adoption of telehealth systems has led to a significant increase in the use of healthcare apps among older adults, but this rapid growth has also heightened concerns about the privacy of their health information. While HIPAA in the US and GDPR in the EU establish essential privacy protections for health information, limited research exists on the effectiveness of healthcare app privacy policies, particularly those used predominantly by older adults. To address this, we evaluated 28 healthcare apps across multiple dimensions, including regulatory compliance, data handling practices, and privacy-focused usability. To do this, we created a Privacy Risk Assessment Framework (PRAF) and used it to evaluate the privacy risks associated with these healthcare apps designed for older adults. Our analysis revealed significant gaps in compliance with privacy standards to such, only 25% of apps explicitly state compliance with HIPAA, and only 18% mention GDPR. Surprisingly, 79% of these applications lack breach protocols, putting older adults at risk in the event of a data breach.

Updated: 2024-10-18 17:01:14

标题: 评估主要由老年人使用的医疗应用程序中的隐私措施

摘要: 随着远程医疗系统的广泛应用，老年人群体中使用医疗应用程序的数量显著增加，但这种快速增长也加剧了人们对其健康信息隐私的担忧。虽然美国的HIPAA和欧盟的GDPR为健康信息建立了基本的隐私保护措施，但对医疗应用程序隐私政策的有效性，尤其是老年人主要使用的应用程序，存在有限的研究。为了解决这个问题，我们评估了28个医疗应用程序在多个方面，包括合规性、数据处理实践和以隐私为重点的可用性。为此，我们创建了一个隐私风险评估框架（PRAF），并使用它来评估为老年人设计的这些医疗应用程序所涉及的隐私风险。我们的分析显示，在符合隐私标准方面存在重大差距，仅有25%的应用程序明确声明符合HIPAA，仅有18%提及GDPR。令人惊讶的是，79%的这些应用程序缺乏违规协议，使老年人在数据泄露事件中面临风险。

更新时间: 2024-10-18 17:01:14

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2410.14607v1

Streaming Deep Reinforcement Learning Finally Works

Natural intelligence processes experience as a continuous stream, sensing, acting, and learning moment-by-moment in real time. Streaming learning, the modus operandi of classic reinforcement learning (RL) algorithms like Q-learning and TD, mimics natural learning by using the most recent sample without storing it. This approach is also ideal for resource-constrained, communication-limited, and privacy-sensitive applications. However, in deep RL, learners almost always use batch updates and replay buffers, making them computationally expensive and incompatible with streaming learning. Although the prevalence of batch deep RL is often attributed to its sample efficiency, a more critical reason for the absence of streaming deep RL is its frequent instability and failure to learn, which we refer to as stream barrier. This paper introduces the stream-x algorithms, the first class of deep RL algorithms to overcome stream barrier for both prediction and control and match sample efficiency of batch RL. Through experiments in Mujoco Gym, DM Control Suite, and Atari Games, we demonstrate stream barrier in existing algorithms and successful stable learning with our stream-x algorithms: stream Q, stream AC, and stream TD, achieving the best model-free performance in DM Control Dog environments. A set of common techniques underlies the stream-x algorithms, enabling their success with a single set of hyperparameters and allowing for easy extension to other algorithms, thereby reviving streaming RL.

Updated: 2024-10-18 17:00:29

标题: 流式深度强化学习终于奏效

摘要: 自然智能将经验视为连续的流，实时按照每时每刻感知、行动和学习。流式学习是经典强化学习（RL）算法（如Q-learning和TD）的操作方式，通过使用最新样本而不存储它来模拟自然学习。这种方法也非常适合资源受限、通信受限和隐私敏感的应用。然而，在深度RL中，学习者几乎总是使用批量更新和重放缓冲区，使它们在计算上昂贵，并且与流式学习不兼容。尽管批量深度RL的流行通常归因于其样本效率，但流式深度RL缺乏的更为关键的原因是其频繁不稳定和学习失败，我们称之为流障碍。本文介绍了stream-x算法，这是第一类能够克服流障碍并实现预测和控制的深度RL算法，同时与批量RL匹配样本效率。通过在Mujoco Gym、DM Control Suite和Atari Games中的实验，我们展示了现有算法中的流障碍以及我们的stream-x算法成功稳定学习的情况：stream Q、stream AC和stream TD，在DM Control Dog环境中实现了最佳的无模型性能。一组常见技术支撑着stream-x算法的成功，使其能够使用一组超参数，并且便于扩展到其他算法，从而使流式RL复苏。

更新时间: 2024-10-18 17:00:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14606v1

Learning to Control the Smoothness of Graph Convolutional Network Features

The pioneering work of Oono and Suzuki [ICLR, 2020] and Cai and Wang [arXiv:2006.13318] initializes the analysis of the smoothness of graph convolutional network (GCN) features. Their results reveal an intricate empirical correlation between node classification accuracy and the ratio of smooth to non-smooth feature components. However, the optimal ratio that favors node classification is unknown, and the non-smooth features of deep GCN with ReLU or leaky ReLU activation function diminish. In this paper, we propose a new strategy to let GCN learn node features with a desired smoothness -- adapting to data and tasks -- to enhance node classification. Our approach has three key steps: (1) We establish a geometric relationship between the input and output of ReLU or leaky ReLU. (2) Building on our geometric insights, we augment the message-passing process of graph convolutional layers (GCLs) with a learnable term to modulate the smoothness of node features with computational efficiency. (3) We investigate the achievable ratio between smooth and non-smooth feature components for GCNs with the augmented message-passing scheme. Our extensive numerical results show that the augmented message-passing schemes significantly improve node classification for GCN and some related models.

Updated: 2024-10-18 16:57:27

标题: 学习控制图卷积网络特征的平滑性

摘要: Oono和Suzuki [ICLR, 2020]以及Cai和Wang [arXiv:2006.13318]的开创性工作启动了对图卷积网络（GCN）特征平滑性的分析。他们的结果揭示了节点分类准确性与平滑和非平滑特征成分比之间复杂的经验相关性。然而，有利于节点分类的最佳比例尚不清楚，带有ReLU或leaky ReLU激活函数的深度GCN的非平滑特征会减弱。在本文中，我们提出了一种新策略，让GCN学习具有所需平滑性的节点特征，以增强节点分类。我们的方法包括三个关键步骤：（1）我们建立了ReLU或leaky ReLU的输入和输出之间的几何关系。（2）基于我们的几何见解，我们通过一个可学习项增强了图卷积层（GCLs）的消息传递过程，以调节节点特征的平滑性，并提高计算效率。（3）我们研究了带有增强消息传递方案的GCNs之间平滑和非平滑特征成分之间可实现的比例。我们广泛的数值结果表明，增强的消息传递方案显著改善了GCN和一些相关模型的节点分类能力。

更新时间: 2024-10-18 16:57:27

领域: cs.LG,cs.NA,math.NA,68T01, 68T07

下载: http://arxiv.org/abs/2410.14604v1

How Does Data Diversity Shape the Weight Landscape of Neural Networks?

To enhance the generalization of machine learning models to unseen data, techniques such as dropout, weight decay ($L_2$ regularization), and noise augmentation are commonly employed. While regularization methods (i.e., dropout and weight decay) are geared toward adjusting model parameters to prevent overfitting, data augmentation increases the diversity of the input training set, a method purported to improve accuracy and calibration error. In this paper, we investigate the impact of each of these techniques on the parameter space of neural networks, with the goal of understanding how they alter the weight landscape in transfer learning scenarios. To accomplish this, we employ Random Matrix Theory to analyze the eigenvalue distributions of pre-trained models, fine-tuned using these techniques but using different levels of data diversity, for the same downstream tasks. We observe that diverse data influences the weight landscape in a similar fashion as dropout. Additionally, we compare commonly used data augmentation methods with synthetic data created by generative models. We conclude that synthetic data can bring more diversity into real input data, resulting in a better performance on out-of-distribution test instances.

Updated: 2024-10-18 16:57:05

标题: 数据多样性如何塑造神经网络的权重景观？

摘要: 为了增强机器学习模型对未见数据的泛化能力，常常采用一些技术，如dropout、权重衰减（$L_2$正则化）和噪声增强。虽然正则化方法（即dropout和权重衰减）旨在调整模型参数以防止过拟合，数据增强则增加了输入训练集的多样性，据称可以提高准确性和校准误差。在本文中，我们研究了这些技术对神经网络参数空间的影响，目的是了解它们如何改变权重景观在迁移学习场景中。为了实现这一目标，我们采用随机矩阵理论来分析预训练模型的特征值分布，使用这些技术进行微调，但使用不同程度的数据多样性，用于相同的下游任务。我们观察到多样化数据对权重景观的影响与dropout类似。此外，我们比较了常用的数据增强方法与由生成模型创建的合成数据。我们得出结论，合成数据可以为真实输入数据带来更多多样性，导致在分布外测试实例上表现更好。

更新时间: 2024-10-18 16:57:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14602v1

A dataset for cyber threat intelligence modeling of connected autonomous vehicles

Cyber attacks have become a vital threat to connected autonomous vehicles in intelligent transportation systems. Cyber threat intelligence, as the collection of cyber threat information, provides an ideal approach for responding to emerging vehicle cyber threats and enabling proactive security defense. Obtaining valuable information from enormous cybersecurity data using knowledge extraction technologies to achieve cyber threat intelligence modeling is an effective means to ensure automotive cybersecurity. Unfortunately, there is no existing cybersecurity dataset available for cyber threat intelligence modeling research in the automotive field. This paper reports the creation of a cyber threat intelligence corpus focusing on vehicle cybersecurity knowledge mining. This dataset, annotated using a joint labeling strategy, comprises 908 real automotive cybersecurity reports, containing 3678 sentences, 8195 security entities and 4852 semantic relations. We further conduct a comprehensive analysis of cyber threat intelligence mining algorithms based on this corpus. The proposed dataset will serve as a valuable resource for evaluating the performance of existing algorithms and advancing research in cyber threat intelligence modeling within the automotive field.

Updated: 2024-10-18 16:55:12

标题: 一个用于连接自动驾驶车辆网络威胁情报建模的数据集

摘要: 网络攻击已经成为智能交通系统中连接的自动驾驶车辆面临的重要威胁。网络威胁情报作为网络威胁信息的收集，为应对新兴的车辆网络威胁并实现积极的安全防御提供了理想的方法。利用知识提取技术从庞大的网络安全数据中获取有价值的信息，以实现网络威胁情报建模，是确保汽车网络安全性的有效手段。不幸的是，在汽车领域中并没有现有的网络安全数据集可用于网络威胁情报建模研究。本文报告了一个专注于车辆网络安全知识挖掘的网络威胁情报语料库的创建。这个数据集采用联合标记策略进行标注，包括908份真实的汽车网络安全报告，包含3678个句子，8195个安全实体和4852个语义关系。我们进一步对基于这个语料库的网络威胁情报挖掘算法进行了全面分析。这个提议的数据集将作为一个有价值的资源，用于评估现有算法的性能，并推动汽车领域网络威胁情报建模研究的进展。

更新时间: 2024-10-18 16:55:12

领域: cs.CR

下载: http://arxiv.org/abs/2410.14600v1

TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs

Multi-relational temporal graphs are powerful tools for modeling real-world data, capturing the evolving and interconnected nature of entities over time. Recently, many novel models are proposed for ML on such graphs intensifying the need for robust evaluation and standardized benchmark datasets. However, the availability of such resources remains scarce and evaluation faces added complexity due to reproducibility issues in experimental protocols. To address these challenges, we introduce Temporal Graph Benchmark 2.0 (TGB 2.0), a novel benchmarking framework tailored for evaluating methods for predicting future links on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs with a focus on large-scale datasets, extending the Temporal Graph Benchmark. TGB 2.0 facilitates comprehensive evaluations by presenting eight novel datasets spanning five domains with up to 53 million edges. TGB 2.0 datasets are significantly larger than existing datasets in terms of number of nodes, edges, or timestamps. In addition, TGB 2.0 provides a reproducible and realistic evaluation pipeline for multi-relational temporal graphs. Through extensive experimentation, we observe that 1) leveraging edge-type information is crucial to obtain high performance, 2) simple heuristic baselines are often competitive with more complex methods, 3) most methods fail to run on our largest datasets, highlighting the need for research on more scalable methods.

Updated: 2024-10-18 16:50:56

标题: TGB 2.0：学习时序知识图和异构图的基准Benchmark

摘要: 多关系时间图是建模真实世界数据的强大工具，捕捉实体随时间演变和相互关联的特性。最近，提出了许多新颖的模型用于在这样的图上进行机器学习，加剧了对健壮评估和标准基准数据集的需求。然而，这类资源的可用性仍然稀缺，评估面临着实验协议中的可重现性问题带来的额外复杂性。为解决这些挑战，我们引入了Temporal Graph Benchmark 2.0（TGB 2.0），一个专为评估在时间知识图和时间异构图上预测未来链接方法而设计的新型基准框架，重点关注大规模数据集，扩展了Temporal Graph Benchmark。TGB 2.0通过展示涵盖五个领域的八个新颖数据集，其中最多包含5300万条边，促进了全面的评估。TGB 2.0数据集在节点数、边数或时间戳方面都显著大于现有数据集。此外，TGB 2.0为多关系时间图提供了可重现和现实的评估流程。通过大量实验，我们观察到：1）利用边类型信息对获得高性能至关重要；2）简单的启发式基线通常与更复杂的方法竞争；3）大多数方法无法在我们最大的数据集上运行，突显了对更可扩展方法的研究的需求。

更新时间: 2024-10-18 16:50:56

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2406.09639v2

Teaching Models to Balance Resisting and Accepting Persuasion

Large language models (LLMs) are susceptible to persuasion, which can pose risks when models are faced with an adversarial interlocutor. We take a first step towards defending models against persuasion while also arguing that defense against adversarial (i.e. negative) persuasion is only half of the equation: models should also be able to accept beneficial (i.e. positive) persuasion to improve their answers. We show that optimizing models for only one side results in poor performance on the other. In order to balance positive and negative persuasion, we introduce Persuasion-Balanced Training (or PBT), which leverages multi-agent recursive dialogue trees to create data and trains models via preference optimization to accept persuasion when appropriate. PBT consistently improves resistance to misinformation and resilience to being challenged while also resulting in the best overall performance on holistic data containing both positive and negative persuasion. Crucially, we show that PBT models are better teammates in multi-agent debates. We find that without PBT, pairs of stronger and weaker models have unstable performance, with the order in which the models present their answers determining whether the team obtains the stronger or weaker model's performance. PBT leads to better and more stable results and less order dependence, with the stronger model consistently pulling the weaker one up.

Updated: 2024-10-18 16:49:36

标题: 教学模型平衡抵制与接受说服

摘要: 大语言模型（LLMs）容易受到说服，当模型面对对抗对话者时可能会带来风险。我们迈出了向模型防御说服的第一步，同时也认为对抗对护（即负面）说服只是方程的一半：模型还应该能够接受有益的（即积极）说服以改进他们的答案。我们展示了仅优化模型一侧会导致另一方表现不佳。为了平衡积极和消极说服，我们引入了平衡说服训练（PBT），利用多代理递归对话树创建数据，并通过偏好优化训练模型在适当时接受说服。PBT持续提高对错误信息的抵抗力和对挑战的韧性，同时在包含积极和消极说服的整体数据上取得最佳表现。至关重要的是，我们发现PBT模型在多代理辩论中是更好的团队成员。我们发现，没有PBT，更强和更弱模型的配对表现不稳定，模型呈现答案的顺序决定了团队获得更强或更弱模型表现的情况。PBT导致更好和更稳定的结果，减少了顺序依赖性，更强的模型始终拉动更弱的模型。

更新时间: 2024-10-18 16:49:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14596v1

Temporal Fair Division of Indivisible Items

We study a fair division model where indivisible items arrive sequentially, and must be allocated immediately and irrevocably. Previous work on online fair division has shown impossibility results in achieving approximate envy-freeness under these constraints. In contrast, we consider an informed setting where the algorithm has complete knowledge of future items, and aim to ensure that the cumulative allocation at each round satisfies approximate envy-freeness -- which we define as temporal envy-freeness up to one item (TEF1). We focus on settings where items can be exclusively goods or exclusively chores. For goods, while TEF1 allocations may not always exist, we identify several special cases where they do -- two agents, two item types, generalized binary valuations, unimodal preferences -- and provide polynomial-time algorithms for these cases. We also prove that determining the existence of a TEF1 allocation is NP-hard. For chores, we establish analogous results for the special cases, but present a slightly weaker intractability result. We also establish the incompatibility between TEF1 and Pareto-optimality, with the implication that it is intractable to find a TEF1 allocation that maximizes any $p$-mean welfare, even for two agents.

Updated: 2024-10-18 16:43:36

标题: 时间公平分配不可分割物品

摘要: 我们研究了一个公平分配模型，其中不可分割的物品按顺序到达，并必须立即且不可撤销地分配。先前关于在线公平分配的工作表明，在这些约束条件下实现近似无嫉妒性是不可能的。相反，我们考虑了一个信息完备的设置，算法完全了解未来物品，并且旨在确保每一轮的累积分配满足近似无嫉妒性 - 我们将其定义为一项物品的时间嫉妒性 (TEF1)。我们专注于物品可以是独占性商品或独占性家务的设置。对于商品，虽然TEF1分配并不总是存在，但我们确定了几种特殊情况，其中它们确实存在 - 两个代理人，两种物品类型，广义二进制估值，单峰偏好 - 并为这些情况提供了多项式时间算法。我们还证明了确定TEF1分配的存在是NP难的。对于家务，我们建立了类似的特殊情况的结果，但提出了一个略微较弱的难解性结果。我们还确定了TEF1和帕累托最优性之间的不相容性，这意味着即使对于两个代理人，找到最大化任何$p$-均值福利的TEF1分配也是困难的。

更新时间: 2024-10-18 16:43:36

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2410.14593v1

Contractivity and linear convergence in bilinear saddle-point problems: An operator-theoretic approach

We study the convex-concave bilinear saddle-point problem $\min_x \max_y f(x) + y^\top Ax - g(y)$, where both, only one, or none of the functions $f$ and $g$ are strongly convex, and suitable rank conditions on the matrix $A$ hold. The solution of this problem is at the core of many machine learning tasks. By employing tools from operator theory, we systematically prove the contractivity (in turn, the linear convergence) of several first-order primal-dual algorithms, including the Chambolle-Pock method. Our approach results in concise and elegant proofs, and it yields new convergence guarantees and tighter bounds compared to known results.

Updated: 2024-10-18 16:43:10

标题: 双线性鞍点问题中的收缩性和线性收敛性：一种算子理论方法

摘要: 我们研究了凸凹双线性鞍点问题$\min_x \max_y f(x) + y^\top Ax - g(y)$，其中函数$f$和$g$中的一个、两个或者都不是强凸的，并且矩阵$A$满足适当的秩条件。解决这个问题是许多机器学习任务的核心。通过运用算子理论工具，我们系统地证明了多个一阶原始-对偶算法的收缩性（进而是线性收敛性），包括Chambolle-Pock方法。我们的方法导致了简洁而优雅的证明，并且相比已知结果，它产生了新的收敛保证和更紧的界限。

更新时间: 2024-10-18 16:43:10

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2410.14592v1

MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback

Automatic question generation (QG) is essential for AI and NLP, particularly in intelligent tutoring, dialogue systems, and fact verification. Generating multiple-choice questions (MCQG) for professional exams, like the United States Medical Licensing Examination (USMLE), is particularly challenging, requiring domain expertise and complex multi-hop reasoning for high-quality questions. However, current large language models (LLMs) like GPT-4 struggle with professional MCQG due to outdated knowledge, hallucination issues, and prompt sensitivity, resulting in unsatisfactory quality and difficulty. To address these challenges, we propose MCQG-SRefine, an LLM self-refine-based (Critique and Correction) framework for converting medical cases into high-quality USMLE-style questions. By integrating expert-driven prompt engineering with iterative self-critique and self-correction feedback, MCQG-SRefine significantly enhances human expert satisfaction regarding both the quality and difficulty of the questions. Furthermore, we introduce an LLM-as-Judge-based automatic metric to replace the complex and costly expert evaluation process, ensuring reliable and expert-aligned assessments.

Updated: 2024-10-18 16:42:01

标题: MCQG-SRefine：通过迭代式自我批评、修正和比较反馈生成和评估多项选择题

摘要: 自动生成问题(QG)对于人工智能和自然语言处理至关重要，特别是在智能辅导、对话系统和事实验证方面。为专业考试生成多项选择题(MCQG)，比如美国医师资格考试(USMLE)，尤其具有挑战性，需要领域专业知识和复杂的多跳推理来生成高质量的问题。然而，当前的大型语言模型(LLMs)如GPT-4在专业MCQG方面存在困难，主要是由于过时的知识、幻觉问题和提示敏感性，导致质量不佳和难度大。为了解决这些挑战，我们提出了MCQG-SRefine，这是一个基于LLM自我改进(批判和纠正)框架，用于将医学案例转化为高质量的USMLE风格问题。通过将专家驱动的提示工程与迭代式自我批判和自我纠正反馈相结合，MCQG-SRefine显著提高了人类专家对问题质量和难度的满意程度。此外，我们引入了一个基于LLM作为评判者的自动度量标准，以取代复杂和昂贵的专家评估过程，确保可靠且与专家一致的评估。

更新时间: 2024-10-18 16:42:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.13191v2

A Lipschitz spaces view of infinitely wide shallow neural networks

We revisit the mean field parametrization of shallow neural networks, using signed measures on unbounded parameter spaces and duality pairings that take into account the regularity and growth of activation functions. This setting directly leads to the use of unbalanced Kantorovich-Rubinstein norms defined by duality with Lipschitz functions, and of spaces of measures dual to those of continuous functions with controlled growth. These allow to make transparent the need for total variation and moment bounds or penalization to obtain existence of minimizers of variational formulations, under which we prove a compactness result in strong Kantorovich-Rubinstein norm, and in the absence of which we show several examples demonstrating undesirable behavior. Further, the Kantorovich-Rubinstein setting enables us to combine the advantages of a completely linear parametrization and ensuing reproducing kernel Banach space framework with optimal transport insights. We showcase this synergy with representer theorems and uniform large data limits for empirical risk minimization, and in proposed formulations for distillation and fusion applications.

Updated: 2024-10-18 16:41:37

标题: 一个利普希茨空间视角下的无限宽浅层神经网络

摘要: 我们重新审视了浅层神经网络的平均场参数化，使用未受限制的参数空间上的符号测度和考虑激活函数的正则性和增长的对偶配对。这种设置直接导致使用通过与利普希茨函数对偶定义的不平衡Kantorovich-Rubinstein范数，以及对连续函数的受控增长的对偶空间的度量。这些允许明确透露出需要总变差和矩边界或惩罚来获得变分形式的最小化器的存在性，在此基础上我们证明了在强Kantorovich-Rubinstein范数下的紧致性结果，而在没有这种情况下，我们展示了几个示例，证明了不良行为。此外，Kantorovich-Rubinstein设置使我们能够结合完全线性参数化和随后再生核Banach空间框架的优势与最优传输洞察力。我们展示了这种协同作用，通过表现定理和经验风险最小化的统一大数据极限，以及用于蒸馏和融合应用的提出的公式。

更新时间: 2024-10-18 16:41:37

领域: math.FA,cs.LG,stat.ML,68T07, 46E27, 46B20

下载: http://arxiv.org/abs/2410.14591v1

Learning With Multi-Group Guarantees For Clusterable Subpopulations

A canonical desideratum for prediction problems is that performance guarantees should hold not just on average over the population, but also for meaningful subpopulations within the overall population. But what constitutes a meaningful subpopulation? In this work, we take the perspective that relevant subpopulations should be defined with respect to the clusters that naturally emerge from the distribution of individuals for which predictions are being made. In this view, a population refers to a mixture model whose components constitute the relevant subpopulations. We suggest two formalisms for capturing per-subgroup guarantees: first, by attributing each individual to the component from which they were most likely drawn, given their features; and second, by attributing each individual to all components in proportion to their relative likelihood of having been drawn from each component. Using online calibration as a case study, we study a \variational algorithm that provides guarantees for each of these formalisms by handling all plausible underlying subpopulation structures simultaneously, and achieve an $O(T^{1/2})$ rate even when the subpopulations are not well-separated. In comparison, the more natural cluster-then-predict approach that first recovers the structure of the subpopulations and then makes predictions suffers from a $O(T^{2/3})$ rate and requires the subpopulations to be separable. Along the way, we prove that providing per-subgroup calibration guarantees for underlying clusters can be easier than learning the clusters: separation between median subgroup features is required for the latter but not the former.

Updated: 2024-10-18 16:38:55

标题: 学习具有多组保证的可聚类子群体

摘要: 预测问题的一个经典要求是性能保证不仅在整体人群上平均有效，也适用于整体人群中有意义的亚群。但是什么构成了有意义的亚群呢？在这项工作中，我们从相关子群应该根据自然产生的个体分布中的聚类来定义的角度出发。在这种观点下，人口指的是一个混合模型，其组成部分构成了相关的亚群。我们提出了两种形式化方法来捕捉每个子组的保证：首先，通过将每个个体归属于最有可能根据其特征被抽取的组件；其次，通过将每个个体按其相对于从每个组件中被抽取的相对可能性来归属于所有组件。以在线校准为案例研究，我们研究了一种变分算法，通过同时处理所有可能的潜在子群结构为每个形式提供保证，并在子群之间不明显分离时实现了$O(T^{1/2})$的速率。相比之下，更自然的先聚类再预测方法首先恢复子群的结构然后进行预测，速率为$O(T^{2/3})$，需要子群是可分离的。在此过程中，我们证明提供基于底层聚类的每个子群校准保证可能比学习聚类更容易：对于后者需要中位数子组特征之间的分离，而对于前者则不需要。

更新时间: 2024-10-18 16:38:55

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2410.14588v1

Neuro-Symbolic Traders: Assessing the Wisdom of AI Crowds in Markets

Deep generative models are becoming increasingly used as tools for financial analysis. However, it is unclear how these models will influence financial markets, especially when they infer financial value in a semi-autonomous way. In this work, we explore the interplay between deep generative models and market dynamics. We develop a form of virtual traders that use deep generative models to make buy/sell decisions, which we term neuro-symbolic traders, and expose them to a virtual market. Under our framework, neuro-symbolic traders are agents that use vision-language models to discover a model of the fundamental value of an asset. Agents develop this model as a stochastic differential equation, calibrated to market data using gradient descent. We test our neuro-symbolic traders on both synthetic data and real financial time series, including an equity stock, commodity, and a foreign exchange pair. We then expose several groups of neuro-symbolic traders to a virtual market environment. This market environment allows for feedback between the traders belief of the underlying value to the observed price dynamics. We find that this leads to price suppression compared to the historical data, highlighting a future risk to market stability. Our work is a first step towards quantifying the effect of deep generative agents on markets dynamics and sets out some of the potential risks and benefits of this approach in the future.

Updated: 2024-10-18 16:37:52

标题: 神经符号交易者：评估市场中人工智能群体的智慧

摘要: 深度生成模型越来越被用作金融分析工具。然而，目前尚不清楚这些模型将如何影响金融市场，特别是当它们以半自主方式推断金融价值时。在这项工作中，我们探讨了深度生成模型与市场动态之间的相互作用。我们开发了一种虚拟交易员形式，他们使用深度生成模型进行买卖决策，我们将其称为神经符号交易员，并将其置于虚拟市场中。在我们的框架下，神经符号交易员是使用视觉-语言模型发现资产基本价值模型的代理人。代理人将这一模型发展为一种随机微分方程，并使用梯度下降校准市场数据。我们在合成数据和真实金融时间序列上测试了我们的神经符号交易员，包括股票、商品和外汇对。然后，我们将几组神经符号交易员置于虚拟市场环境中。这个市场环境允许交易员对潜在价值的信念与观察到的价格动态之间的反馈。我们发现与历史数据相比，这导致价格抑制，突显了对市场稳定性的未来风险。我们的工作是量化深度生成代理对市场动态影响的第一步，并阐明了今后这种方法的一些潜在风险和收益。

更新时间: 2024-10-18 16:37:52

领域: cs.LG,q-fin.CP

下载: http://arxiv.org/abs/2410.14587v1

Neural Combinatorial Clustered Bandits for Recommendation Systems

We consider the contextual combinatorial bandit setting where in each round, the learning agent, e.g., a recommender system, selects a subset of "arms," e.g., products, and observes rewards for both the individual base arms, which are a function of known features (called "context"), and the super arm (the subset of arms), which is a function of the base arm rewards. The agent's goal is to simultaneously learn the unknown reward functions and choose the highest-reward arms. For example, the "reward" may represent a user's probability of clicking on one of the recommended products. Conventional bandit models, however, employ restrictive reward function models in order to obtain performance guarantees. We make use of deep neural networks to estimate and learn the unknown reward functions and propose Neural UCB Clustering (NeUClust), which adopts a clustering approach to select the super arm in every round by exploiting underlying structure in the context space. Unlike prior neural bandit works, NeUClust uses a neural network to estimate the super arm reward and select the super arm, thus eliminating the need for a known optimization oracle. We non-trivially extend prior neural combinatorial bandit works to prove that NeUClust achieves $\widetilde{O}\left(\widetilde{d}\sqrt{T}\right)$ regret, where $\widetilde{d}$ is the effective dimension of a neural tangent kernel matrix, $T$ the number of rounds. Experiments on real world recommendation datasets show that NeUClust achieves better regret and reward than other contextual combinatorial and neural bandit algorithms.

Updated: 2024-10-18 16:37:28

标题: 神经组合聚类赌博机用于推荐系统

摘要: 我们考虑上下文组合赌博设置，在每一轮中，学习代理，例如推荐系统，选择一个“臂”的子集，例如产品，并观察两者的奖励个体基本臂，这些奖励是已知特征（称为“上下文”）的函数，以及超级臂（臂的子集），这是基本臂奖励的函数。代理的目标是同时学习未知的奖励函数并选择最高奖励的臂。例如，“奖励”可能表示用户点击推荐产品之一的概率。然而，传统的赌博模型采用了限制性的奖励函数模型，以获得性能保证。我们利用深度神经网络来估计和学习未知的奖励函数，并提出了神经UCB聚类（NeUClust），它采用聚类方法在每一轮中选择超级臂，利用上下文空间中的潜在结构。与先前的神经赌博作品不同，NeUClust使用神经网络估计超级臂奖励并选择超级臂，从而消除了对已知优化预测的需求。我们非平凡地扩展了先前的神经组合赌博作品，证明NeUClust实现了$\widetilde{O}\left(\widetilde{d}\sqrt{T}\right)$后悔，其中$\widetilde{d}$是神经切线核矩阵的有效维度，$T$是轮次数。对真实世界的推荐数据集进行的实验表明，NeUClust比其他上下文组合和神经赌博算法实现了更好的后悔和奖励。

更新时间: 2024-10-18 16:37:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14586v1

MCSFF: Multi-modal Consistency and Specificity Fusion Framework for Entity Alignment

Multi-modal entity alignment (MMEA) is essential for enhancing knowledge graphs and improving information retrieval and question-answering systems. Existing methods often focus on integrating modalities through their complementarity but overlook the specificity of each modality, which can obscure crucial features and reduce alignment accuracy. To solve this, we propose the Multi-modal Consistency and Specificity Fusion Framework (MCSFF), which innovatively integrates both complementary and specific aspects of modalities. We utilize Scale Computing's hyper-converged infrastructure to optimize IT management and resource allocation in large-scale data processing. Our framework first computes similarity matrices for each modality using modality embeddings to preserve their unique characteristics. Then, an iterative update method denoises and enhances modality features to fully express critical information. Finally, we integrate the updated information from all modalities to create enriched and precise entity representations. Experiments show our method outperforms current state-of-the-art MMEA baselines on the MMKG dataset, demonstrating its effectiveness and practical potential.

Updated: 2024-10-18 16:35:25

标题: MCSFF：用于实体对齐的多模态一致性和特异性融合框架

摘要: 多模态实体对齐（MMEA）对于增强知识图和改善信息检索和问答系统至关重要。现有方法通常侧重于通过它们的互补性集成模态，但忽视了每个模态的特异性，这可能会掩盖关键特征并降低对齐准确性。为了解决这个问题，我们提出了多模态一致性和特异性融合框架（MCSFF），创新地整合了模态的互补和特定方面。我们利用Scale Computing的超融合基础设施来优化大规模数据处理中的IT管理和资源分配。我们的框架首先使用模态嵌入计算每个模态的相似性矩阵，以保留它们独特的特征。然后，一个迭代更新方法对模态特征进行去噪和增强，以完全表达关键信息。最后，我们整合所有模态的更新信息，创建丰富和精确的实体表示。实验证明，我们的方法在MMKG数据集上优于当前最先进的MMEA基线，展示了其有效性和实际潜力。

更新时间: 2024-10-18 16:35:25

领域: cs.AI

下载: http://arxiv.org/abs/2410.14584v1

How hard can it be? Quantifying MITRE attack campaigns with attack trees and cATM logic

The landscape of cyber threats grows more complex by the day. Advanced Persistent Threats carry out systematic attack campaigns against which cybersecurity practitioners must defend. Examples of such organized attacks are operations Dream Job, Wocao, WannaCry or the SolarWinds Compromise. To evaluate which risks are most threatening, and which campaigns to prioritize against when defending, cybersecurity experts must be equipped with the right toolbox. In particular, they must be able to (a) obtain likelihood values for each attack campaign recorded in the wild and (b) reliably and transparently operationalize these values to carry out quantitative comparisons among campaigns. This will allow security experts to perform quantitatively-informed decision making that is transparent and accountable. In this paper we construct such a framework by: (1) quantifying the likelihood of attack campaigns via data-driven procedures on the MITRE knowledge base and (2) introducing a methodology for automatic modelling of MITRE intelligence data: this is complete in the sense that it captures any attack campaign via template attack tree models. (3) We further propose a computational framework to carry out this comparisons based on the cATM formal logic, and implement this into an open-source Python tool. Finally, we validate our approach by quantifying the likelihood of all MITRE campaigns, and comparing the likelihood of the Wocao and Dream Job MITRE campaigns -- generated with our proposed approach -- against "ad hoc" traditionally-built attack tree models, demonstrating how our methodology is substantially lighter in modelling effort, and still capable of capturing all the quantitative relevant data.

Updated: 2024-10-18 16:34:24

标题: 有多难？用攻击树和cATM逻辑量化MITRE攻击活动

摘要: 网络威胁的格局每天都在变得更加复杂。高级持续威胁对系统性攻击活动进行攻击，网络安全从业者必须进行防御。这种有组织的攻击的例子包括Dream Job、Wocao、WannaCry或SolarWinds Compromise。为了评估哪些风险最具威胁性，以及在进行防御时应优先考虑哪些活动，网络安全专家必须配备适当的工具。特别是，他们必须能够（a）获得记录在野外的每个攻击活动的可能性值，并（b）可靠透明地运用这些值来进行攻击活动之间的定量比较。这将使安全专家能够采用定量信息进行决策，这种决策是透明和可追溯的。在本文中，我们通过以下方式构建了这样一个框架：（1）通过对MITRE知识库的数据驱动程序来量化攻击活动的可能性，（2）引入了一种自动建模MITRE情报数据的方法：这种方法在捕捉任何攻击活动时都是完整的，其模板攻击树模型。（3）我们进一步提出了一个基于cATM形式逻辑进行比较的计算框架，并将其实施到一个开源的Python工具中。最后，我们通过量化所有MITRE活动的可能性，并比较使用我们提出的方法生成的Wocao和Dream Job MITRE活动的可能性与“临时”传统构建的攻击树模型，证明我们的方法论在建模工作量上明显较轻，但仍能够捕捉所有定量相关数据。

更新时间: 2024-10-18 16:34:24

领域: cs.CR,cs.LO

下载: http://arxiv.org/abs/2410.06692v2

Privacy-Preserving Decentralized AI with Confidential Computing

This paper addresses privacy protection in decentralized Artificial Intelligence (AI) using Confidential Computing (CC) within the Atoma Network, a decentralized AI platform designed for the Web3 domain. Decentralized AI distributes AI services among multiple entities without centralized oversight, fostering transparency and robustness. However, this structure introduces significant privacy challenges, as sensitive assets such as proprietary models and personal data may be exposed to untrusted participants. Cryptography-based privacy protection techniques such as zero-knowledge machine learning (zkML) suffers prohibitive computational overhead. To address the limitation, we propose leveraging Confidential Computing (CC). Confidential Computing leverages hardware-based Trusted Execution Environments (TEEs) to provide isolation for processing sensitive data, ensuring that both model parameters and user data remain secure, even in decentralized, potentially untrusted environments. While TEEs face a few limitations, we believe they can bridge the privacy gap in decentralized AI. We explore how we can integrate TEEs into Atoma's decentralized framework.

Updated: 2024-10-18 16:33:05

标题: 隐私保护的机密计算下的分布式人工智能

摘要: 本文讨论了在去中心化人工智能（AI）中使用保密计算（CC）来保护隐私，在Atoma网络中实现这一目标，Atoma网络是一个专为Web3领域设计的去中心化AI平台。去中心化人工智能将AI服务分配给多个实体，没有集中监督，促进透明度和稳健性。然而，这种结构引入了重大的隐私挑战，因为敏感资产，如专有模型和个人数据可能会暴露给不受信任的参与者。基于密码学的隐私保护技术，如零知识机器学习（zkML）存在计算开销过高的问题。为了解决这一限制，我们提出利用保密计算（CC）。保密计算利用基于硬件的可信执行环境（TEEs）来为处理敏感数据提供隔离，确保模型参数和用户数据保持安全，即使在去中心化、潜在不受信任的环境中也是如此。虽然TEEs存在一些限制，但我们相信它们可以弥合去中心化AI中的隐私差距。我们探讨了如何将TEEs集成到Atoma的去中心化框架中。

更新时间: 2024-10-18 16:33:05

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.13752v2

Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection

Attention mechanisms have revolutionized several domains of artificial intelligence, such as natural language processing and computer vision, by enabling models to selectively focus on relevant parts of the input data. While recent work has characterized the optimization dynamics of gradient descent (GD) in attention-based models and the structural properties of its preferred solutions, less is known about more general optimization algorithms such as mirror descent (MD). In this paper, we investigate the convergence properties and implicit biases of a family of MD algorithms tailored for softmax attention mechanisms, with the potential function chosen as the $p$-th power of the $\ell_p$-norm. Specifically, we show that these algorithms converge in direction to a generalized hard-margin SVM with an $\ell_p$-norm objective when applied to a classification problem using a softmax attention model. Notably, our theoretical results reveal that the convergence rate is comparable to that of traditional GD in simpler models, despite the highly nonlinear and nonconvex nature of the present problem. Additionally, we delve into the joint optimization dynamics of the key-query matrix and the decoder, establishing conditions under which this complex joint optimization converges to their respective hard-margin SVM solutions. Lastly, our numerical experiments on real data demonstrate that MD algorithms improve generalization over standard GD and excel in optimal token selection.

Updated: 2024-10-18 16:32:06

标题: 使用镜像下降法优化注意力：广义最大间隔标记选择

摘要: 注意机制已经在人工智能的几个领域中发生了革命，例如自然语言处理和计算机视觉，通过使模型能够有选择地关注输入数据的相关部分。尽管最近的研究已经对基于注意力的模型中的梯度下降（GD）的优化动态和其首选解的结构特性进行了表征，但对于更一般的优化算法如镜像下降（MD）了解较少。在本文中，我们调查了一类专为softmax注意力机制量身定制的MD算法的收敛性质和隐含偏差，其中潜在函数选择为$\ell_p$-范数的$p$次幂。具体而言，我们展示了这些算法在应用于使用softmax注意力模型的分类问题时，收敛于一个带有$\ell_p$-范数目标的广义硬间隔SVM。值得注意的是，我们的理论结果表明，尽管当前问题具有高度非线性和非凸性质，但收敛速度与更简单模型中的传统GD相当。此外，我们深入研究了关键-查询矩阵和解码器的联合优化动态，建立了这种复杂联合优化收敛到它们各自的硬间隔SVM解的条件。最后，我们在真实数据上的数值实验表明，MD算法提高了对标准GD的泛化能力，并在最佳标记选择方面表现优异。

更新时间: 2024-10-18 16:32:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14581v1

Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification

Deep multimodal learning has shown remarkable success by leveraging contrastive learning to capture explicit one-to-one relations across modalities. However, real-world data often exhibits shared relations beyond simple pairwise associations. We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data. Our key contribution is a Mixup-based contrastive loss that learns robust representations by aligning mixed samples from one modality with their corresponding samples from other modalities thereby capturing shared relations between them. For multimodal classification tasks, we introduce a framework that integrates a fusion module with unimodal prediction modules for auxiliary supervision during training, complemented by our proposed Mixup-based contrastive loss. Through extensive experiments on diverse datasets (N24News, ROSMAP, BRCA, and Food-101), we demonstrate that M3CoL effectively captures shared multimodal relations and generalizes across domains. It outperforms state-of-the-art methods on N24News, ROSMAP, and BRCA, while achieving comparable performance on Food-101. Our work highlights the significance of learning shared relations for robust multimodal learning, opening up promising avenues for future research.

Updated: 2024-10-18 16:31:49

标题: 利用多模态混合对比学习捕捉共享关系以进行多模态分类

摘要: 深度多模态学习通过利用对比学习来捕捉跨模态的显式一对一关系，取得了显著的成功。然而，现实世界的数据通常展现出简单的成对关系之外的共享关系。我们提出了M3CoL，一种多模态混合对比学习方法，用于捕捉多模态数据中固有的微妙共享关系。我们的关键贡献是基于Mixup的对比损失，通过将一种模态的混合样本与其对应的其他模态的样本对齐，从而捕捉它们之间的共享关系，从而学习出稳健的表示。对于多模态分类任务，我们引入了一个框架，该框架将融合模块与单模态预测模块结合起来，在训练过程中进行辅助监督，同时配合我们提出的基于Mixup的对比损失。通过在多样化数据集（N24News、ROSMAP、BRCA和Food-101）上进行大量实验，我们展示了M3CoL有效地捕捉了共享的多模态关系，并在领域间进行泛化。它在N24News、ROSMAP和BRCA上优于最先进的方法，同时在Food-101上达到了可比较的性能。我们的工作强调了学习共享关系对于稳健的多模态学习的重要性，为未来研究开辟了有希望的途径。

更新时间: 2024-10-18 16:31:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.17777v2

Towards Unsupervised Validation of Anomaly-Detection Models

Unsupervised validation of anomaly-detection models is a highly challenging task. While the common practices for model validation involve a labeled validation set, such validation sets cannot be constructed when the underlying datasets are unlabeled. The lack of robust and efficient unsupervised model-validation techniques presents an acute challenge in the implementation of automated anomaly-detection pipelines, especially when there exists no prior knowledge of the model's performance on similar datasets. This work presents a new paradigm to automated validation of anomaly-detection models, inspired by real-world, collaborative decision-making mechanisms. We focus on two commonly-used, unsupervised model-validation tasks -- model selection and model evaluation -- and provide extensive experimental results that demonstrate the accuracy and robustness of our approach on both tasks.

Updated: 2024-10-18 16:27:04

标题: 朝向无监督异常检测模型验证

摘要: 无监督异常检测模型的验证是一个极具挑战性的任务。虽然常见的模型验证实践涉及有标签的验证集，但当底层数据集没有标签时，无法构建这样的验证集。缺乏健壮且高效的无监督模型验证技术在实施自动异常检测管道时提出了一个严峻的挑战，尤其是当对类似数据集上的模型性能没有先前知识时。本文提出了一个受现实世界协作决策机制启发的自动异常检测模型验证的新范式。我们专注于两个常用的无监督模型验证任务--模型选择和模型评估--并提供了大量实验结果，证明了我们的方法在这两个任务上的准确性和健壮性。

更新时间: 2024-10-18 16:27:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14579v1

Large Language Models Are Overparameterized Text Encoders

Large language models (LLMs) demonstrate strong performance as text embedding models when finetuned with supervised contrastive training. However, their large size balloons inference time and memory requirements. In this paper, we show that by pruning the last $p\%$ layers of an LLM before supervised training for only 1000 steps, we can achieve a proportional reduction in memory and inference time. We evaluate four different state-of-the-art LLMs on text embedding tasks and find that our method can prune up to 30\% of layers with negligible impact on performance and up to 80\% with only a modest drop. With only three lines of code, our method is easily implemented in any pipeline for transforming LLMs to text encoders. We also propose $\text{L}^3 \text{Prune}$, a novel layer-pruning strategy based on the model's initial loss that provides two optimal pruning configurations: a large variant with negligible performance loss and a small variant for resource-constrained settings. On average, the large variant prunes 21\% of the parameters with a $-0.3$ performance drop, and the small variant only suffers from a $-5.1$ decrease while pruning 74\% of the model. We consider these results strong evidence that LLMs are overparameterized for text embedding tasks, and can be easily pruned.

Updated: 2024-10-18 16:26:45

标题: 大型语言模型是超参数化的文本编码器

摘要: 大型语言模型（LLMs）在经过监督对比训练微调后作为文本嵌入模型表现出强大性能。然而，它们庞大的尺寸使推断时间和内存需求增加。在本文中，我们展示了通过在监督训练前修剪LLM的最后$p\%$层仅进行1000步，我们可以实现内存和推断时间的成比例减少。我们评估了四种不同的最先进LLM在文本嵌入任务上，并发现我们的方法可以修剪高达30\%的层，对性能影响微乎其微，而仅有轻微下降的情况下可以修剪高达80\%。我们的方法只需三行代码，在任何将LLM转换为文本编码器的流水线中都可以轻松实现。我们还提出了$\text{L}^3 \text{Prune}$，一种基于模型初始损失的新型层修剪策略，提供了两种最佳修剪配置：一个性能损失微乎其微的大型变体，和一个适用于资源受限环境的小型变体。平均而言，大型变体修剪了21\%的参数，性能下降了-0.3，而小型变体在修剪了74\%的模型的情况下仅下降了-5.1。我们认为这些结果强有力地证明LLMs在文本嵌入任务中过度参数化，可以轻松修剪。

更新时间: 2024-10-18 16:26:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14578v1

Scalable Drift Monitoring in Medical Imaging AI

The integration of artificial intelligence (AI) into medical imaging has advanced clinical diagnostics but poses challenges in managing model drift and ensuring long-term reliability. To address these challenges, we develop MMC+, an enhanced framework for scalable drift monitoring, building upon the CheXstray framework that introduced real-time drift detection for medical imaging AI models using multi-modal data concordance. This work extends the original framework's methodologies, providing a more scalable and adaptable solution for real-world healthcare settings and offers a reliable and cost-effective alternative to continuous performance monitoring addressing limitations of both continuous and periodic monitoring methods. MMC+ introduces critical improvements to the original framework, including more robust handling of diverse data streams, improved scalability with the integration of foundation models like MedImageInsight for high-dimensional image embeddings without site-specific training, and the introduction of uncertainty bounds to better capture drift in dynamic clinical environments. Validated with real-world data from Massachusetts General Hospital during the COVID-19 pandemic, MMC+ effectively detects significant data shifts and correlates them with model performance changes. While not directly predicting performance degradation, MMC+ serves as an early warning system, indicating when AI systems may deviate from acceptable performance bounds and enabling timely interventions. By emphasizing the importance of monitoring diverse data streams and evaluating data shifts alongside model performance, this work contributes to the broader adoption and integration of AI solutions in clinical settings.

Updated: 2024-10-18 16:26:30

标题: 医学影像人工智能中可扩展的漂移监测

摘要: 将人工智能（AI）整合到医学影像中已经推动了临床诊断的发展，但也带来了管理模型漂移和确保长期可靠性的挑战。为了解决这些挑战，我们开发了MMC+，这是一个增强的框架，用于可扩展的漂移监测，建立在引入多模态数据一致性的实时漂移检测的CheXstray框架之上，用于医学影像AI模型。这项工作扩展了原始框架的方法，为现实世界的医疗保健环境提供了一种更具可扩展性和适应性的解决方案，并提供了一种可靠且具有成本效益的替代方案，解决了连续和周期性监测方法的局限性。MMC+对原始框架进行了关键改进，包括更强大的处理多样化数据流的能力，通过集成基础模型如MedImageInsight，提高了高维图像嵌入的可扩展性，无需站点特定训练，并引入不确定性边界，以更好地捕捉动态临床环境中的漂移。通过在COVID-19大流行期间使用来自马萨诸塞州总医院的真实世界数据进行验证，MMC+有效地检测到重要的数据变化，并将其与模型性能变化相关联。虽然不直接预测性能退化，但MMC+作为一个早期警告系统，指示AI系统何时可能偏离可接受的性能范围，并促使及时干预。通过强调监测多样化的数据流的重要性，并评估数据变化与模型性能并行，这项工作有助于在临床环境中更广泛地采用和整合AI解决方案。

更新时间: 2024-10-18 16:26:30

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.13174v2

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability in deep learning. SMoE has the potential to exponentially increase parameter count while maintaining the efficiency of the model by only activating a small subset of these parameters for a given sample. However, it has been observed that SMoE suffers from unstable training and has difficulty adapting to new distributions, leading to the model's lack of robustness to data contamination. To overcome these limitations, we first establish a connection between the dynamics of the expert representations in SMoEs and gradient descent on a multi-objective optimization problem. Leveraging our framework, we then integrate momentum into SMoE and propose a new family of SMoEs named MomentumSMoE. We theoretically prove and numerically demonstrate that MomentumSMoE is more stable and robust than SMoE. In particular, we verify the advantages of MomentumSMoE over SMoE on a variety of practical tasks including ImageNet-1K object recognition and WikiText-103 language modeling. We demonstrate the applicability of MomentumSMoE to many types of SMoE models, including those in the Sparse MoE model for vision (V-MoE) and the Generalist Language Model (GLaM). We also show that other advanced momentum-based optimization methods, such as Adam, can be easily incorporated into the MomentumSMoE framework for designing new SMoE models with even better performance, almost negligible additional computation cost, and simple implementations.

Updated: 2024-10-18 16:20:22

标题: MomentumSMoE: 将动量集成到稀疏专家混合模型中

摘要: 稀疏专家混合（SMoE）已成为深度学习中突破性可伸缩性的关键。SMoE有潜力在保持模型效率的同时指数级增加参数数量，通过仅激活给定样本的这些参数的一个小子集。然而，观察到SMoE存在训练不稳定和难以适应新分布的问题，导致模型对数据污染缺乏鲁棒性。为了克服这些限制，我们首先建立了SMoE中专家表示的动态与多目标优化问题上的梯度下降之间的联系。利用我们的框架，我们将动量集成到SMoE中，并提出了一个名为MomentumSMoE的新型SMoE系列。我们理论上证明并数值上证明了MomentumSMoE比SMoE更稳定和更健壮。特别地，我们在包括ImageNet-1K目标识别和WikiText-103语言建模在内的各种实际任务上验证了MomentumSMoE相对于SMoE的优势。我们展示了MomentumSMoE对多种SMoE模型的适用性，包括视觉稀疏MoE模型（V-MoE）和通用语言模型（GLaM）中的模型。我们还展示了其他基于动量的先进优化方法，如Adam，可以轻松地纳入MomentumSMoE框架中，设计出性能更好的新SMoE模型，几乎可以忽略不计的额外计算成本，并具有简单的实现。

更新时间: 2024-10-18 16:20:22

领域: cs.LG,cs.AI,cs.CL,cs.CV,stat.ML

下载: http://arxiv.org/abs/2410.14574v1

Building Trust in Black-box Optimization: A Comprehensive Framework for Explainability

Optimizing costly black-box functions within a constrained evaluation budget presents significant challenges in many real-world applications. Surrogate Optimization (SO) is a common resolution, yet its proprietary nature introduced by the complexity of surrogate models and the sampling core (e.g., acquisition functions) often leads to a lack of explainability and transparency. While existing literature has primarily concentrated on enhancing convergence to global optima, the practical interpretation of newly proposed strategies remains underexplored, especially in batch evaluation settings. In this paper, we propose \emph{Inclusive} Explainability Metrics for Surrogate Optimization (IEMSO), a comprehensive set of model-agnostic metrics designed to enhance the transparency, trustworthiness, and explainability of the SO approaches. Through these metrics, we provide both intermediate and post-hoc explanations to practitioners before and after performing expensive evaluations to gain trust. We consider four primary categories of metrics, each targeting a specific aspect of the SO process: Sampling Core Metrics, Batch Properties Metrics, Optimization Process Metrics, and Feature Importance. Our experimental evaluations demonstrate the significant potential of the proposed metrics across different benchmarks.

Updated: 2024-10-18 16:20:17

标题: 在黑盒优化中建立信任：解释性的全面框架

摘要: 在有限的评估预算内优化昂贵的黑匣子函数在许多现实世界应用中都面临重大挑战。代理优化（SO）是一种常见的解决方案，然而其专有性质由代理模型和采样核心（例如获取函数）的复杂性引入，通常导致缺乏解释性和透明度。虽然现有文献主要集中在提高收敛到全局最优解，但新提出的策略的实际解释仍未充分探讨，特别是在批量评估设置中。在本文中，我们提出了代理优化的“包容性”解释度量（IEMSO），这是一套旨在增强SO方法透明性、可信度和解释性的模型无关度量。通过这些度量，我们提供中间和事后解释给从业者，在执行昂贵的评估之前和之后获得信任。我们考虑了四个主要类别的度量，每个类别针对SO过程的特定方面：采样核心度量、批处理属性度量、优化过程度量和特征重要性。我们的实验评估展示了提出的度量在不同基准测试中的显著潜力。

更新时间: 2024-10-18 16:20:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14573v1

TransBox: EL++-closed Ontology Embedding

OWL (Web Ontology Language) ontologies, which are able to represent both relational and type facts as standard knowledge graphs and complex domain knowledge in Description Logic (DL) axioms, are widely adopted in domains such as healthcare and bioinformatics. Inspired by the success of knowledge graph embeddings, embedding OWL ontologies has gained significant attention in recent years. Current methods primarily focus on learning embeddings for atomic concepts and roles, enabling the evaluation based on normalized axioms through specially designed score functions. However, they often neglect the embedding of complex concepts, making it difficult to infer with more intricate axioms. This limitation reduces their effectiveness in advanced reasoning tasks, such as Ontology Learning and ontology-mediated Query Answering. In this paper, we propose EL++-closed ontology embeddings which are able to represent any logical expressions in DL via composition. Furthermore, we develop TransBox, an effective EL++-closed ontology embedding method that can handle many-to-one, one-to-many and many-to-many relations. Our extensive experiments demonstrate that TransBox often achieves state-of-the-art performance across various real-world datasets for predicting complex axioms.

Updated: 2024-10-18 16:17:10

标题: TransBox：EL++闭合本体嵌入

摘要: OWL（Web Ontology Language）本体论能够以标准知识图和描述逻辑（DL）公理的形式表示关系和类型事实以及复杂的领域知识，在医疗保健和生物信息学等领域被广泛采用。受知识图嵌入成功的启发，近年来嵌入OWL本体论引起了广泛关注。目前的方法主要集中在学习原子概念和角色的嵌入，通过专门设计的得分函数进行基于规范化公理的评估。然而，它们通常忽视复杂概念的嵌入，使得难以推断更复杂的公理。这种限制降低了它们在高级推理任务（如本体学习和本体介导查询回答）中的有效性。在本文中，我们提出了能够通过组合表示DL中任何逻辑表达式的EL++-闭合本体论嵌入。此外，我们开发了TransBox，一种能够处理多对一、一对多和多对多关系的有效EL++-闭合本体论嵌入方法。我们广泛的实验表明，TransBox通常在各种真实世界数据集上实现了最先进的性能，用于预测复杂公理。

更新时间: 2024-10-18 16:17:10

领域: cs.AI

下载: http://arxiv.org/abs/2410.14571v1

Understanding the difficulty of low-precision post-training quantization of large language models

Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low numerical precision. This can be achieved either through post-training quantization by minimizing local, layer-wise quantization errors, or through quantization-aware fine-tuning by minimizing the global loss function. In this study, we discovered that, under the same data constraint, the former approach nearly always fared worse than the latter, a phenomenon particularly prominent when the numerical precision is very low. We further showed that this difficulty of post-training quantization arose from stark misalignment between optimization of the local and global objective functions. Our findings explains limited utility in minimization of local quantization error and the importance of direct quantization-aware fine-tuning, in the regime of large models at very low precision.

Updated: 2024-10-18 16:16:52

标题: 理解大型语言模型低精度后训练量化的困难

摘要: 高参数计数的大型语言模型在计算上是昂贵的，但可以通过将它们的权重压缩到非常低的数值精度来大大提高效率。这可以通过通过最小化本地、逐层量化误差进行的后训练量化，或通过最小化全局损失函数进行的量化感知微调来实现。在这项研究中，我们发现，在相同的数据约束下，前一种方法几乎总是表现比后一种方法差，这种现象尤其突出当数值精度非常低时。我们进一步表明，这种后训练量化的困难源于本地和全局目标函数优化之间的明显不一致。我们的发现解释了在大型模型和非常低精度环境中，最小化本地量化误差的有限效用以及直接的量化感知微调的重要性。

更新时间: 2024-10-18 16:16:52

领域: cs.LG

下载: http://arxiv.org/abs/2410.14570v1

When LLMs Go Online: The Emerging Threat of Web-Enabled LLMs

Recent advancements in Large Language Models (LLMs) have established them as agentic systems capable of planning and interacting with various tools. These LLM agents are often paired with web-based tools, enabling access to diverse sources and real-time information. Although these advancements offer significant benefits across various applications, they also increase the risk of malicious use, particularly in cyberattacks involving personal information. In this work, we investigate the risks associated with misuse of LLM agents in cyberattacks involving personal data. Specifically, we aim to understand: 1) how potent LLM agents can be when directed to conduct cyberattacks, 2) how cyberattacks are enhanced by web-based tools, and 3) how affordable and easy it becomes to launch cyberattacks using LLM agents. We examine three attack scenarios: the collection of Personally Identifiable Information (PII), the generation of impersonation posts, and the creation of spear-phishing emails. Our experiments reveal the effectiveness of LLM agents in these attacks: LLM agents achieved a precision of up to 95.9% in collecting PII, up to 93.9% of impersonation posts created by LLM agents were evaluated as authentic, and the click rate for links in spear phishing emails created by LLM agents reached up to 46.67%. Additionally, our findings underscore the limitations of existing safeguards in contemporary commercial LLMs, emphasizing the urgent need for more robust security measures to prevent the misuse of LLM agents.

Updated: 2024-10-18 16:16:34

标题: 在线LLM的出现：网络使LLM变得更具威胁

摘要: 最近对大型语言模型（LLMs）的进展已使它们成为能够规划和与各种工具互动的代理系统。这些LLM代理通常与基于网络的工具配对，使其能够访问多样化的来源和实时信息。尽管这些进展在各种应用中提供了显著的好处，但也增加了恶意使用的风险，特别是涉及个人信息的网络攻击。在这项工作中，我们调查了与LLM代理在涉及个人数据的网络攻击中的误用相关的风险。具体来说，我们的目标是了解：1）当指示进行网络攻击时，LLM代理可以有多强大，2）网络攻击如何通过基于网络的工具增强，3）使用LLM代理发动网络攻击变得多么经济实惠和容易。我们研究了三种攻击场景：收集个人可识别信息（PII），生成冒充帖子和创建定向网络钓鱼电子邮件。我们的实验揭示了LLM代理在这些攻击中的有效性：LLM代理在收集PII方面的准确度高达95.9％，由LLM代理创建的冒充帖子中有高达93.9％被评为真实，而由LLM代理创建的网络钓鱼电子邮件中的链接点击率高达46.67％。此外，我们的发现突显了当代商用LLM中现有保障措施的局限性，强调了迫切需要更加强健的安全措施来防止LLM代理的滥用。

更新时间: 2024-10-18 16:16:34

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.14569v1

RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions

Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries. However, many natural questions do not have good answers: about 25\% contain false assumptions~\cite{Yu2023:CREPE}, and over 50\% are ambiguous~\cite{Min2020:AmbigQA}. RAG agents need high-quality data to improve their responses to confusing questions. This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus. We conduct an empirical comparative evaluation of several large language models as RAG agents to measure the accuracy of confusion detection and appropriate response generation. We contribute a benchmark dataset to the public domain.

Updated: 2024-10-18 16:11:29

标题: RAG-ConfusionQA：一个用于评估LLMs在混淆问题上的基准测试

摘要: 对话式人工智能代理使用检索增强生成（RAG）来提供可验证的基于文档的用户查询响应。然而，许多自然问题没有很好的答案：约25％包含错误的假设\cite{Yu2023:CREPE}，超过50％是模糊的\cite{Min2020:AmbigQA}。RAG代理需要高质量的数据来改进对混乱问题的响应。本文提出了一种新颖的合成数据生成方法，可以高效地从给定的文档语料库中创建一组多样化的基于上下文的混乱问题。我们对几个大型语言模型作为RAG代理进行了实证比较评估，以衡量混乱检测的准确性和适当的响应生成。我们向公共领域贡献了一个基准数据集。

更新时间: 2024-10-18 16:11:29

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.14567v1

Learning diffusion at lightspeed

Diffusion regulates numerous natural processes and the dynamics of many successful generative models. Existing models to learn the diffusion terms from observational data rely on complex bilevel optimization problems and model only the drift of the system. We propose a new simple model, JKOnet*, which bypasses the complexity of existing architectures while presenting significantly enhanced representational capabilities: JKOnet* recovers the potential, interaction, and internal energy components of the underlying diffusion process. JKOnet* minimizes a simple quadratic loss and outperforms other baselines in terms of sample efficiency, computational complexity, and accuracy. Additionally, JKOnet* provides a closed-form optimal solution for linearly parametrized functionals, and, when applied to predict the evolution of cellular processes from real-world data, it achieves state-of-the-art accuracy at a fraction of the computational cost of all existing methods. Our methodology is based on the interpretation of diffusion processes as energy-minimizing trajectories in the probability space via the so-called JKO scheme, which we study via its first-order optimality conditions.

Updated: 2024-10-18 16:09:52

标题: 以光速学习扩散

摘要: 扩散调节许多自然过程和许多成功的生成模型的动态。现有的从观测数据中学习扩散项的模型依赖于复杂的双层优化问题，并且仅模拟系统的漂移。我们提出了一个新的简单模型JKOnet*，它绕过现有架构的复杂性，同时具有显著增强的表示能力：JKOnet*恢复了潜在、互动和内部能量组件的基础扩散过程。JKOnet*通过最小化简单二次损失，在样本效率、计算复杂性和准确性方面优于其他基线模型。此外，JKOnet*为线性参数化的功能提供了一个闭式最优解，当应用于从真实数据中预测细胞过程的演变时，它以所有现有方法的计算成本的一小部分实现了最先进的准确性。我们的方法论基于将扩散过程解释为通过所谓的JKO方案在概率空间中的能量最小化轨迹，我们通过其一阶最优性条件来研究这一方案。

更新时间: 2024-10-18 16:09:52

领域: cs.LG

下载: http://arxiv.org/abs/2406.12616v2

Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification

Graph contrastive learning (GCL) has been widely applied to text classification tasks due to its ability to generate self-supervised signals from unlabeled data, thus facilitating model training. However, existing GCL-based text classification methods often suffer from negative sampling bias, where similar nodes are incorrectly paired as negative pairs. This can lead to over-clustering, where instances of the same class are divided into different clusters. To address the over-clustering issue, we propose an innovative GCL-based method of graph contrastive learning via cluster-refined negative sampling for semi-supervised text classification, namely ClusterText. Firstly, we combine the pre-trained model Bert with graph neural networks to learn text representations. Secondly, we introduce a clustering refinement strategy, which clusters the learned text representations to obtain pseudo labels. For each text node, its negative sample set is drawn from different clusters. Additionally, we propose a self-correction mechanism to mitigate the loss of true negative samples caused by clustering inconsistency. By calculating the Euclidean distance between each text node and other nodes within the same cluster, distant nodes are still selected as negative samples. Our proposed ClusterText demonstrates good scalable computing, as it can effectively extract important information from from a large amount of data. Experimental results demonstrate the superiority of ClusterText in text classification tasks.

Updated: 2024-10-18 16:03:49

标题: 通过集群精细化负采样的图对比学习用于半监督文本分类

摘要: 图对比学习（GCL）已广泛应用于文本分类任务，因为它能够从未标记数据中生成自监督信号，从而促进模型训练。然而，现有基于GCL的文本分类方法通常受到负采样偏差的影响，即将相似节点错误地配对为负样本。这可能导致过度聚类，即同一类别的实例被分为不同的簇。为了解决过度聚类问题，我们提出了一种基于图对比学习的创新方法，通过簇细化负采样进行半监督文本分类，即ClusterText。首先，我们将预训练模型Bert与图神经网络相结合，学习文本表示。其次，我们引入了一种聚类细化策略，将学习到的文本表示进行聚类以获取伪标签。对于每个文本节点，其负样本集来自不同簇。此外，我们提出了一种自我校正机制，以减轻由于聚类不一致性而导致的真负样本的损失。通过计算每个文本节点与同一簇内其他节点之间的欧几里德距离，远离节点仍被选为负样本。我们提出的ClusterText展示了良好的可扩展计算能力，因为它能够有效地从大量数据中提取重要信息。实验结果表明，ClusterText在文本分类任务中表现出优越性。

更新时间: 2024-10-18 16:03:49

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.18130v1

Measuring Diversity: Axioms and Challenges

The concept of diversity is widely used in various applications: from image or molecule generation to recommender systems. Thus, being able to properly measure diversity is important. This paper addresses the problem of quantifying diversity for a set of objects. First, we make a systematic review of existing diversity measures and explore their undesirable behavior in some cases. Based on this review, we formulate three desirable properties (axioms) of a reliable diversity measure: monotonicity, uniqueness, and continuity. We show that none of the existing measures has all three properties and thus these measures are not suitable for quantifying diversity. Then, we construct two examples of measures that have all the desirable properties, thus proving that the list of axioms is not self-contradicting. Unfortunately, the constructed examples are too computationally complex for practical use, thus we pose an open problem of constructing a diversity measure that has all the listed properties and can be computed in practice.

Updated: 2024-10-18 15:59:54

标题: 测量多样性：公理和挑战

摘要: 多样性的概念在各种应用中被广泛使用：从图像或分子生成到推荐系统。因此，能够正确衡量多样性是重要的。本文解决了一组对象的多样性量化问题。首先，我们对现有的多样性度量进行了系统审查，并探讨了它们在某些情况下的不良行为。基于这一审查，我们制定了可靠多样性度量的三个理想特性（公理）：单调性、唯一性和连续性。我们表明没有任何现有的度量具有所有三个特性，因此这些度量不适用于量化多样性。然后，我们构建了两个具有所有理想特性的度量的例子，从而证明了公理列表并非自相矛盾。不幸的是，这些构建的例子在实际使用中过于计算复杂，因此我们提出了一个开放性问题，即构建一个具有所有列出的特性并且可以在实践中计算的多样性度量。

更新时间: 2024-10-18 15:59:54

领域: cs.LG

下载: http://arxiv.org/abs/2410.14556v1

Machine Learning Aided Modeling of Granular Materials: A Review

Artificial intelligence (AI) has become a buzz word since Google's AlphaGo beat a world champion in 2017. In the past five years, machine learning as a subset of the broader category of AI has obtained considerable attention in the research community of granular materials. This work offers a detailed review of the recent advances in machine learning-aided studies of granular materials from the particle-particle interaction at the grain level to the macroscopic simulations of granular flow. This work will start with the application of machine learning in the microscopic particle-particle interaction and associated contact models. Then, different neural networks for learning the constitutive behaviour of granular materials will be reviewed and compared. Finally, the macroscopic simulations of practical engineering or boundary value problems based on the combination of neural networks and numerical methods are discussed. We hope readers will have a clear idea of the development of machine learning-aided modelling of granular materials via this comprehensive review work.

Updated: 2024-10-18 15:53:04

标题: 机器学习辅助建模颗粒材料：综述

摘要: 人工智能（AI）自2017年谷歌的AlphaGo打败世界冠军以来，已成为一个热门词汇。在过去的五年中，作为AI更广泛类别的一个子集，机器学习在颗粒材料研究界获得了相当大的关注。本文对机器学习辅助研究颗粒材料的最新进展进行了详细回顾，从颗粒间的粒子相互作用到颗粒流的宏观模拟。本文将从微观粒子相互作用和相关接触模型的机器学习应用开始。然后，将回顾和比较不同的神经网络用于学习颗粒材料的本构行为。最后，基于神经网络和数值方法结合的宏观模拟实用工程或边界值问题进行讨论。我们希望读者通过这项全面的回顾工作，对颗粒材料的机器学习辅助建模发展有一个清晰的了解。

更新时间: 2024-10-18 15:53:04

领域: physics.geo-ph,cond-mat.soft,cs.LG

下载: http://arxiv.org/abs/2410.14767v1

Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks

Democratization of AI is an important topic within the broader topic of the digital divide. This issue is relevant to LLMs, which are becoming popular as AI co-pilots but suffer from a lack of accessibility due to high computational demand. In this study, we evaluate whether quantization is a viable approach toward enabling LLMs on generic consumer devices. The study assesses the performance of five quantized code LLMs in Lua code generation tasks. To evaluate the impact of quantization, the models with 7B parameters were tested on a consumer laptop at 2-, 4-, and 8-bit integer precisions and compared to non-quantized code LLMs with 1.3, 2, and 3 billion parameters. Lua is chosen as a low-level resource language to avoid models' biases related to high-resource languages. The results suggest that the models quantized at the 4-bit integer precision offer the best trade-off between performance and model size. These models can be comfortably deployed on an average laptop without a dedicated GPU. The performance significantly drops at the 2-bit integer precision. The models at 8-bit integer precision require more inference time that does not effectively translate to better performance. The 4-bit models with 7 billion parameters also considerably outperform non-quantized models with lower parameter numbers despite having comparable model sizes with respect to storage and memory demand. While quantization indeed increases the accessibility of smaller LLMs with 7 billion parameters, these LLMs demonstrate overall low performance (less than 50\%) on high-precision and low-resource tasks such as Lua code generation. While accessibility is improved, usability is still not at the practical level comparable to foundational LLMs such as GPT-4o or Llama 3.1 405B.

Updated: 2024-10-18 15:50:59

标题: 评估量化大型语言模型在低资源语言基准上的代码生成能力

摘要: 人工智能的民主化是数字鸿沟更广泛话题中的一个重要议题。这个问题与LLMs相关，它们作为人工智能副驾驶逐渐受到欢迎，但由于高计算需求而缺乏可访问性。本研究评估了量化是否是使LLMs在通用消费设备上可行的方法。该研究评估了五个量化代码LLMs在Lua代码生成任务中的性能。为了评估量化的影响，具有7B参数的模型以2、4和8位整数精度在消费者笔记本电脑上进行了测试，并与具有1.3、2和3十亿参数的非量化代码LLMs进行了比较。选择Lua作为低级资源语言，以避免与高资源语言相关的模型偏见。结果表明，以4位整数精度量化的模型在性能和模型大小之间提供了最佳折衷方案。这些模型可以轻松部署在普通笔记本电脑上，而无需专用GPU。性能在2位整数精度时显著下降。8位整数精度的模型需要更多推理时间，但效果并不会有效转化为更好的性能。具有70亿参数的4位模型在存储和内存需求方面具有可比的模型大小，尽管比较于参数数量更少的非量化模型，性能要好得多。虽然量化确实增加了具有70十亿参数的较小LLMs的可访问性，但这些LLMs在高精度和低资源任务（如Lua代码生成）上表现总体低于50\%。尽管可访问性得到改善，但可用性仍未达到与基础LLMs（如GPT-4o或Llama 3.1 405B）相媲美的实际水平。

更新时间: 2024-10-18 15:50:59

领域: cs.SE,cs.AI,cs.ET,cs.LG,cs.PL

下载: http://arxiv.org/abs/2410.14766v1

English offensive text detection using CNN based Bi-GRU model

Over the years, the number of users of social media has increased drastically. People frequently share their thoughts through social platforms, and this leads to an increase in hate content. In this virtual community, individuals share their views, express their feelings, and post photos, videos, blogs, and more. Social networking sites like Facebook and Twitter provide platforms to share vast amounts of content with a single click. However, these platforms do not impose restrictions on the uploaded content, which may include abusive language and explicit images unsuitable for social media. To resolve this issue, a new idea must be implemented to divide the inappropriate content. Numerous studies have been done to automate the process. In this paper, we propose a new Bi-GRU-CNN model to classify whether the text is offensive or not. The combination of the Bi-GRU and CNN models outperforms the existing model.

Updated: 2024-10-18 15:45:39

标题: 中文翻译：基于CNN和双向GRU模型的英文攻击性文本检测

摘要: 随着社交媒体用户数量的急剧增加，人们经常通过社交平台分享他们的想法，这导致了仇恨内容的增加。在这个虚拟社区中，个人分享他们的观点，表达他们的感受，发布照片、视频、博客等内容。像Facebook和Twitter这样的社交网络网站提供了分享大量内容的平台，只需点击一次即可。然而，这些平台不对上传的内容施加限制，其中可能包含不适合社交媒体的辱骂性语言和明显图像。为了解决这个问题，必须实施一种新的想法来区分不当内容。已经进行了大量研究来自动化这个过程。在本文中，我们提出了一个新的Bi-GRU-CNN模型，用于分类文本是否具有冒犯性。Bi-GRU和CNN模型的组合优于现有模型。

更新时间: 2024-10-18 15:45:39

领域: cs.CL,cs.LG,cs.SI

下载: http://arxiv.org/abs/2409.15652v3

Boosting K-means for Big Data by Fusing Data Streaming with Global Optimization

K-means clustering is a cornerstone of data mining, but its efficiency deteriorates when confronted with massive datasets. To address this limitation, we propose a novel heuristic algorithm that leverages the Variable Neighborhood Search (VNS) metaheuristic to optimize K-means clustering for big data. Our approach is based on the sequential optimization of the partial objective function landscapes obtained by restricting the Minimum Sum-of-Squares Clustering (MSSC) formulation to random samples from the original big dataset. Within each landscape, systematically expanding neighborhoods of the currently best (incumbent) solution are explored by reinitializing all degenerate and a varying number of additional centroids. Extensive and rigorous experimentation on a large number of real-world datasets reveals that by transforming the traditional local search into a global one, our algorithm significantly enhances the accuracy and efficiency of K-means clustering in big data environments, becoming the new state of the art in the field.

Updated: 2024-10-18 15:43:34

标题: 将K均值聚类算法与全局优化相结合，用于大数据的数据流处理

摘要: K-means聚类是数据挖掘的基石，但当面对大规模数据集时，其效率会下降。为了解决这一限制，我们提出了一种新颖的启发式算法，利用可变邻域搜索（VNS）元启发式来优化大数据的K-means聚类。我们的方法基于通过将最小平方和聚类（MSSC）公式限制在原始大数据集的随机样本中获得的局部目标函数景观的顺序优化。在每个景观中，通过重新初始化所有退化和不同数量的额外质心来系统地扩展当前最佳（现任）解决方案的邻域。对大量真实世界数据集进行广泛和严格的实验表明，通过将传统的局部搜索转换为全局搜索，我们的算法显著提高了K-means在大数据环境中的准确性和效率，成为该领域的新技术水平。

更新时间: 2024-10-18 15:43:34

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2410.14548v1

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with local label differential privacy (DP) which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at \textit{no extra privacy cost}; we call this \textit{consensus-based retraining}. As an example, when training ResNet-18 on CIFAR-100 with $\epsilon=3$ label DP, we obtain $6.4\%$ improvement in accuracy with consensus-based retraining.

Updated: 2024-10-18 15:43:02

标题: 使用预测的困难标签重新训练可证明地提高模型准确性

摘要: 使用带有噪声标签训练的模型通常通过简单地使用其自身预测的硬标签（即$1$/$0$标签）重新训练来改善性能。然而，对这种现象的详细理论特征尚未完全阐明。在本文中，我们在线性可分设置下对重新训练进行了理论分析，证明了重新训练可以提高通过最初训练给定（嘈杂的）标签获得的总体准确性。据我们所知，这是第一次出现这样的理论结果。重新训练在提高使用带有嘈杂标签的局部标签差分隐私（DP）训练中的应用中发挥作用。我们在实证中表明，对于预测标签与给定标签匹配的样本进行有选择性的重新训练显著提高了标签DP训练的准确性，而且没有额外的隐私成本；我们称之为基于共识的重新训练。例如，在使用$\epsilon=3$标签DP对CIFAR-100上的ResNet-18进行训练时，基于共识的重新训练可以使准确性提高$6.4\%。

更新时间: 2024-10-18 15:43:02

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2406.11206v2

Tell me what I need to know: Exploring LLM-based (Personalized) Abstractive Multi-Source Meeting Summarization

Meeting summarization is crucial in digital communication, but existing solutions struggle with salience identification to generate personalized, workable summaries, and context understanding to fully comprehend the meetings' content. Previous attempts to address these issues by considering related supplementary resources (e.g., presentation slides) alongside transcripts are hindered by models' limited context sizes and handling the additional complexities of the multi-source tasks, such as identifying relevant information in additional files and seamlessly aligning it with the meeting content. This work explores multi-source meeting summarization considering supplementary materials through a three-stage large language model approach: identifying transcript passages needing additional context, inferring relevant details from supplementary materials and inserting them into the transcript, and generating a summary from this enriched transcript. Our multi-source approach enhances model understanding, increasing summary relevance by ~9% and producing more content-rich outputs. We introduce a personalization protocol that extracts participant characteristics and tailors summaries accordingly, improving informativeness by ~10%. This work further provides insights on performance-cost trade-offs across four leading model families, including edge-device capable options. Our approach can be extended to similar complex generative tasks benefitting from additional resources and personalization, such as dialogue systems and action planning.

Updated: 2024-10-18 15:40:48

标题: 告诉我我需要知道的：探索基于LLM的（个性化）摘要性多源会议总结

摘要: 会议总结在数字通信中至关重要，但现有解决方案在突出标识和生成个性化、可操作的摘要以及理解会议内容的上下文方面存在困难。先前尝试通过考虑相关的补充资源（例如演示文稿）与转录一起处理这些问题，但受限于模型的有限上下文大小和处理多源任务的额外复杂性，比如在额外文件中识别相关信息并将其与会议内容无缝对齐。本文探讨了通过三阶段大型语言模型方法考虑补充材料的多源会议总结：识别需要额外上下文的转录段落，从补充材料中推断相关细节并将其插入到转录中，然后从这个丰富的转录中生成摘要。我们的多源方法增强了模型的理解，将摘要相关性提高了约9%，并生成了更丰富内容的输出。我们介绍了一个个性化协议，提取参与者特征并相应地调整摘要，将信息量提高了约10%。此外，本研究还提供了关于四个主要模型系列的性能成本权衡的见解，包括适用于边缘设备的选项。我们的方法可以扩展到类似的复杂生成任务，从额外资源和个性化中受益，例如对话系统和行动规划。

更新时间: 2024-10-18 15:40:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14545v1

Computational Grounding of Responsibility Attribution and Anticipation in LTLf

Responsibility is one of the key notions in machine ethics and in the area of autonomous systems. It is a multi-faceted notion involving counterfactual reasoning about actions and strategies. In this paper, we study different variants of responsibility in a strategic setting based on LTLf. We show a connection with notions in reactive synthesis, including synthesis of winning, dominant, and best-effort strategies. This connection provides the building blocks for a computational grounding of responsibility including complexity characterizations and sound, complete, and optimal algorithms for attributing and anticipating responsibility.

Updated: 2024-10-18 15:38:33

标题: 在LTLf中责任归因和预期的计算基础

摘要: 责任是机器伦理学和自主系统领域的关键概念之一。它是一个多方面的概念，涉及对行动和策略的反事实推理。在本文中，我们基于LTLf在战略环境中研究了责任的不同变体。我们展示了与反应合成中的概念的连接，包括制定获胜、主导和最佳努力策略。这种连接为责任的计算基础提供了构建块，包括复杂性表征以及用于归因和预测责任的声音、完整和最优算法。

更新时间: 2024-10-18 15:38:33

领域: cs.AI

下载: http://arxiv.org/abs/2410.14544v1

Clustering of timed sequences -- Application to the analysis of care pathways

Improving the future of healthcare starts by better understanding the current actual practices in hospital settings. This motivates the objective of discovering typical care pathways from patient data. Revealing typical care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms. In this article, we adapt two methods developed for time series to the clustering of timed sequences: the drop-DTW metric and the DBA approach for the construction of averaged time sequences. These methods are then applied in clustering algorithms to propose original and sound clustering algorithms for timed sequences. This approach is experimented with and evaluated on synthetic and real-world data.

Updated: 2024-10-18 15:38:16

标题: 定时序列的聚类--应用于护理路径分析

摘要: 改善医疗保健未来的开始是更好地了解当前医院设置中的实际实践。这激发了从患者数据中发现典型护理路径的目标。通过聚类可以揭示典型护理路径。聚类护理路径的困难在于定义语义适当的度量和聚类算法，这些路径由时间戳事件序列表示。在本文中，我们将两种用于时间序列的方法调整为定时序列的聚类：drop-DTW度量和用于构建平均时间序列的DBA方法。然后将这些方法应用于聚类算法，提出了适用于定时序列的原始和可靠的聚类算法。这种方法在合成和真实数据上进行实验并评估。

更新时间: 2024-10-18 15:38:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.15379v2

What's under the hood: Investigating Automatic Metrics on Meeting Summarization

Meeting summarization has become a critical task considering the increase in online interactions. While new techniques are introduced regularly, their evaluation uses metrics not designed to capture meeting-specific errors, undermining effective evaluation. This paper investigates what the frequently used automatic metrics capture and which errors they mask by correlating automatic metric scores with human evaluations across a broad error taxonomy. We commence with a comprehensive literature review on English meeting summarization to define key challenges like speaker dynamics and contextual turn-taking and error types such as missing information and linguistic inaccuracy, concepts previously loosely defined in the field. We examine the relationship between characteristic challenges and errors by using annotated transcripts and summaries from Transformer-based sequence-to-sequence and autoregressive models from the general summary QMSum dataset. Through experimental validation, we find that different model architectures respond variably to challenges in meeting transcripts, resulting in different pronounced links between challenges and errors. Current default-used metrics struggle to capture observable errors, showing weak to mid-correlations, while a third of the correlations show trends of error masking. Only a subset reacts accurately to specific errors, while most correlations show either unresponsiveness or failure to reflect the error's impact on summary quality.

Updated: 2024-10-18 15:34:41

标题: 引擎盖下是什么：探究会议总结的自动评估指标

摘要: 会议总结已成为一项关键任务，考虑到在线互动的增加。尽管定期引入新技术，但它们的评估使用的度量标准并不是为捕捉特定于会议的错误而设计的，从而削弱了有效评估。本文通过将自动度量分数与广泛的错误分类中的人类评估相关联，探讨了频繁使用的自动度量标准捕捉了什么以及它们掩盖了哪些错误。我们通过对英语会议总结的全面文献回顾来定义关键挑战，如发言者动态和语境性交替，以及以往在领域中定义不够明确的错误类型，如信息缺失和语言错误。我们使用来自通用总结QMSum数据集的基于Transformer的序列到序列和自回归模型的带注释的转录和总结，研究了特征挑战和错误之间的关系。通过实验验证，我们发现不同的模型架构对会议转录中的挑战有不同的反应，导致挑战和错误之间的不同显著联系。目前默认使用的度量标准难以捕捉可观察的错误，显示出弱到中等的相关性，而有三分之一的相关性显示出错误掩盖的趋势。只有一小部分对特定错误做出了准确反应，而大多数相关性显示出无响应或未能反映错误对总结质量的影响。

更新时间: 2024-10-18 15:34:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.11124v2

Diffusion-based Semi-supervised Spectral Algorithm for Regression on Manifolds

We introduce a novel diffusion-based spectral algorithm to tackle regression analysis on high-dimensional data, particularly data embedded within lower-dimensional manifolds. Traditional spectral algorithms often fall short in such contexts, primarily due to the reliance on predetermined kernel functions, which inadequately address the complex structures inherent in manifold-based data. By employing graph Laplacian approximation, our method uses the local estimation property of heat kernel, offering an adaptive, data-driven approach to overcome this obstacle. Another distinct advantage of our algorithm lies in its semi-supervised learning framework, enabling it to fully use the additional unlabeled data. This ability enhances the performance by allowing the algorithm to dig the spectrum and curvature of the data manifold, providing a more comprehensive understanding of the dataset. Moreover, our algorithm performs in an entirely data-driven manner, operating directly within the intrinsic manifold structure of the data, without requiring any predefined manifold information. We provide a convergence analysis of our algorithm. Our findings reveal that the algorithm achieves a convergence rate that depends solely on the intrinsic dimension of the underlying manifold, thereby avoiding the curse of dimensionality associated with the higher ambient dimension.

Updated: 2024-10-18 15:29:04

标题: 基于扩散的半监督谱回归算法在流形上的应用

摘要: 我们引入了一种新颖的基于扩散的谱算法，用于处理高维数据上的回归分析，特别是嵌入在低维流形内部的数据。传统的谱算法在这种情况下通常表现不佳，主要是由于依赖预先确定的核函数，这些核函数无法充分解决流形数据固有的复杂结构。通过采用图拉普拉斯逼近，我们的方法利用热核的局部估计特性，提供了一种自适应的、数据驱动的方法来克服这一障碍。我们算法的另一个明显优势在于其半监督学习框架，使其能够充分利用额外的未标记数据。这种能力通过允许算法挖掘数据流形的频谱和曲率来增强性能，从而更全面地理解数据集。此外，我们的算法完全是以数据驱动的方式运行的，直接在数据的内在流形结构中操作，而不需要任何预定义的流形信息。我们对算法的收敛性进行了分析。我们的研究结果表明，该算法实现的收敛速度仅取决于底层流形的内在维度，从而避免了与更高环境维度相关联的维度灾难。

更新时间: 2024-10-18 15:29:04

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.14539v1

On Debiasing Text Embeddings Through Context Injection

Current advances in Natural Language Processing (NLP) have made it increasingly feasible to build applications leveraging textual data. Generally, the core of these applications rely on having a good semantic representation of text into vectors, via embedding models. However, it has been shown that these embeddings capture and perpetuate biases already present in text. While a few techniques have been proposed to debias embeddings, they do not take advantage of the recent advances in context understanding of modern embedding models. In this paper, we fill this gap by conducting a review of 19 embedding models by quantifying their biases and how well they respond to context injection as a mean of debiasing. We show that higher performing models are more prone to capturing biases, but are also better at incorporating context. Surprisingly, we find that while models can easily embed affirmative semantics, they fail at embedding neutral semantics. Finally, in a retrieval task, we show that biases in embeddings can lead to non-desirable outcomes. We use our new-found insights to design a simple algorithm for top $k$ retrieval, where $k$ is dynamically selected. We show that our algorithm is able to retrieve all relevant gendered and neutral chunks.

Updated: 2024-10-18 15:26:55

标题: 通过上下文注入来消除文本嵌入的偏见

摘要: 自然语言处理（NLP）领域目前取得的进展使得利用文本数据构建应用程序变得越来越可行。一般来说，这些应用程序的核心依赖于将文本转换为向量的良好语义表示，通过嵌入模型实现。然而，已经证明这些嵌入捕捉并传播了文本中已经存在的偏见。虽然已经提出了一些技术来去偏见嵌入，但它们并没有充分利用现代嵌入模型对上下文理解的最新进展。在本文中，我们通过对19个嵌入模型进行评估，量化它们的偏见以及它们对上下文注入的去偏见效果，填补了这一空白。我们发现，性能更好的模型更容易捕捉偏见，但也更擅长整合上下文。令人惊讶的是，我们发现虽然模型可以轻松嵌入肯定的语义，但在嵌入中性语义方面却失败了。最后，在检索任务中，我们展示了嵌入中的偏见可能导致不良结果。我们利用这些新发现的见解设计了一个简单的用于top k检索的算法，其中k是动态选择的。我们展示了我们的算法能够检索到所有相关的性别化和中性化的段落。

更新时间: 2024-10-18 15:26:55

领域: cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.12874v2

What's New in My Data? Novelty Exploration via Contrastive Generation

Fine-tuning is widely used to adapt language models for specific goals, often leveraging real-world data such as patient records, customer-service interactions, or web content in languages not covered in pre-training. These datasets are typically massive, noisy, and often confidential, making their direct inspection challenging. However, understanding them is essential for guiding model deployment and informing decisions about data cleaning or suppressing any harmful behaviors learned during fine-tuning. In this study, we introduce the task of novelty discovery through generation, which aims to identify novel properties of a fine-tuning dataset by generating examples that illustrate these properties. Our approach, Contrastive Generative Exploration (CGE), assumes no direct access to the data but instead relies on a pre-trained model and the same model after fine-tuning. By contrasting the predictions of these two models, CGE can generate examples that highlight novel characteristics of the fine-tuning data. However, this simple approach may produce examples that are too similar to one another, failing to capture the full range of novel phenomena present in the dataset. We address this by introducing an iterative version of CGE, where the previously generated examples are used to update the pre-trained model, and this updated model is then contrasted with the fully fine-tuned model to generate the next example, promoting diversity in the generated outputs. Our experiments demonstrate the effectiveness of CGE in detecting novel content, such as toxic language, as well as new natural and programming languages. Furthermore, we show that CGE remains effective even when models are fine-tuned using differential privacy techniques.

Updated: 2024-10-18 15:24:05

标题: 我的数据中有什么新的内容？通过对比生成进行新颖性探索

摘要: 精细调整广泛用于为特定目标调整语言模型，通常利用现实世界数据，例如病人记录、客户服务互动或未在预训练中涵盖的语言的网络内容。这些数据集通常庞大、嘈杂，而且通常是机密的，这使得直接检查它们具有挑战性。然而，了解它们对于指导模型部署并就数据清理或抑制在精细调整过程中学习到的任何有害行为的决策至关重要。在本研究中，我们引入了通过生成进行新颖性发现的任务，旨在通过生成展示这些特性的示例来识别精细调整数据集的新颖性质。我们的方法，对比生成探索（CGE），假设不能直接访问数据，而是依赖于一个预训练模型和精细调整后的相同模型。通过对比这两个模型的预测结果，CGE可以生成突出精细调整数据的新颖特性的示例。然而，这种简单的方法可能会产生彼此过于相似的示例，未能捕捉数据集中存在的新颖现象的全部范围。我们通过引入CGE的迭代版本来解决这个问题，先前生成的示例用于更新预训练模型，然后将这个更新后的模型与完全精细调整的模型进行对比，以生成下一个示例，促进生成输出的多样性。我们的实验证明了CGE在检测新颖内容，如有毒语言，以及新的自然和编程语言方面的有效性。此外，我们展示了即使使用差分隐私技术进行模型的精细调整，CGE仍然有效。

更新时间: 2024-10-18 15:24:05

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14765v1

Multifidelity Kolmogorov-Arnold Networks

We develop a method for multifidelity Kolmogorov-Arnold networks (KANs), which use a low-fidelity model along with a small amount of high-fidelity data to train a model for the high-fidelity data accurately. Multifidelity KANs (MFKANs) reduce the amount of expensive high-fidelity data needed to accurately train a KAN by exploiting the correlations between the low- and high-fidelity data to give accurate and robust predictions in the absence of a large high-fidelity dataset. In addition, we show that multifidelity KANs can be used to increase the accuracy of physics-informed KANs (PIKANs), without the use of training data.

Updated: 2024-10-18 15:23:51

标题: 多精度科尔莫戈洛夫-阿诺德网络

摘要: 我们开发了一种用于多保真度科尔莫戈洛夫-阿诺德网络（KANs）的方法，该方法使用低保真度模型以及少量高保真度数据来准确训练高保真度数据的模型。多保真度KANs（MFKANs）通过利用低保真度数据与高保真度数据之间的相关性，减少了准确训练KAN所需的昂贵高保真度数据的数量，从而在缺乏大量高保真度数据集的情况下提供准确和稳健的预测。此外，我们展示了多保真度KANs可以用于提高物理信息KANs（PIKANs）的准确性，而无需使用训练数据。

更新时间: 2024-10-18 15:23:51

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.14764v1

Inferring Change Points in High-Dimensional Regression via Approximate Message Passing

We consider the problem of localizing change points in a generalized linear model (GLM), a model that covers many widely studied problems in statistical learning including linear, logistic, and rectified linear regression. We propose a novel and computationally efficient Approximate Message Passing (AMP) algorithm for estimating both the signals and the change point locations, and rigorously characterize its performance in the high-dimensional limit where the number of parameters $p$ is proportional to the number of samples $n$. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic Hausdorff error of our change point estimates, and allows us to tailor the algorithm to take advantage of any prior structural information on the signals and change points. Moreover, we show how our AMP iterates can be used to efficiently compute a Bayesian posterior distribution over the change point locations in the high-dimensional limit. We validate our theory via numerical experiments, and demonstrate the favorable performance of our estimators on both synthetic and real data in the settings of linear, logistic, and rectified linear regression.

Updated: 2024-10-18 15:23:26

标题: 通过近似消息传递推断高维回归中的变点

摘要: 我们考虑在广义线性模型（GLM）中定位变点的问题，这个模型涵盖了许多在统计学习中广泛研究的问题，包括线性、逻辑和修正线性回归。我们提出了一种新颖且计算效率高的近似消息传递（AMP）算法，用于估计信号和变点位置，并严格地表征了在高维极限下的性能，其中参数数量$p$与样本数量$n$成比例。这种表征是通过状态演化递归来实现的，它使我们能够精确计算性能度量，比如我们的变点估计的渐近豪斯多夫误差，并且使我们能够根据信号和变点上的任何先验结构信息来定制算法。此外，我们展示了我们的AMP迭代如何能够在高维极限下有效计算变点位置的贝叶斯后验分布。我们通过数值实验验证了我们的理论，并展示了我们的估计器在线性、逻辑和修正线性回归设置中在合成数据和真实数据上的有利性能。

更新时间: 2024-10-18 15:23:26

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2404.07864v2

Kernel Density Estimators in Large Dimensions

This paper studies Kernel Density Estimation for a high-dimensional distribution $\rho(x)$. Traditional approaches have focused on the limit of large number of data points $n$ and fixed dimension $d$. We analyze instead the regime where both the number $n$ of data points $y_i$ and their dimensionality $d$ grow with a fixed ratio $\alpha=(\log n)/d$. Our study reveals three distinct statistical regimes for the kernel-based estimate of the density $\hat \rho_h^{\mathcal {D}}(x)=\frac{1}{n h^d}\sum_{i=1}^n K\left(\frac{x-y_i}{h}\right)$, depending on the bandwidth $h$: a classical regime for large bandwidth where the Central Limit Theorem (CLT) holds, which is akin to the one found in traditional approaches. Below a certain value of the bandwidth, $h_{CLT}(\alpha)$, we find that the CLT breaks down. The statistics of $\hat\rho_h^{\mathcal {D}}(x)$ for a fixed $x$ drawn from $\rho(x)$ is given by a heavy-tailed distribution (an alpha-stable distribution). In particular below a value $h_G(\alpha)$, we find that $\hat\rho_h^{\mathcal {D}}(x)$ is governed by extreme value statistics: only a few points in the database matter and give the dominant contribution to the density estimator. We provide a detailed analysis for high-dimensional multivariate Gaussian data. We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper. As known by practitioners, when decreasing the bandwidth a Kernel-estimated estimated changes from a smooth curve to a collections of peaks centred on the data points. Our findings reveal that this general phenomenon is related to sharp transitions between phases characterized by different statistical properties, and offer new insights for Kernel density estimation in high-dimensional settings.

Updated: 2024-10-18 15:19:04

标题: 大维度中的核密度估计器

摘要: 这篇论文研究了高维分布$\rho(x)$的核密度估计。传统方法主要关注数据点数量$n$的极限和固定维度$d$。相反，我们分析了数据点数量$n$和维度$d$以固定比率$\alpha=(\log n)/d$增长的情况。我们的研究揭示了基于核的密度估计$\hat \rho_h^{\mathcal {D}}(x)=\frac{1}{n h^d}\sum_{i=1}^n K\left(\frac{x-y_i}{h}\right)$，根据带宽$h$有三个不同的统计区域：在大带宽情况下，中心极限定理（CLT）成立，类似于传统方法中的情况。在带宽$h_{CLT}(\alpha)$下某个特定值以下，我们发现CLT失效。从分布$\rho(x)$中抽取的固定$x$的$\hat\rho_h^{\mathcal {D}}(x)$的统计特性由重尾分布（alpha-stable distribution）给出。特别是在值$h_G(\alpha)$以下，我们发现$\hat\rho_h^{\mathcal {D}}(x)$由极值统计控制：数据库中只有少数点起主导作用并对密度估计器做出重要贡献。我们提供了高维多元高斯数据的详细分析。我们展示了基于Kullback-Leibler散度的最佳带宽阈值位于本文确定的新统计区域中。正如从业者所知，当减小带宽时，核估计的估计结果从平滑曲线变为以数据点为中心的峰值集合。我们的研究结果揭示了这一普遍现象与不同统计特性阶段之间的急剧转变有关，并为高维环境中的核密度估计提供了新的见解。

更新时间: 2024-10-18 15:19:04

领域: cs.LG,cond-mat.dis-nn,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2408.05807v3

The Traveling Bandit: A Framework for Bayesian Optimization with Movement Costs

This paper introduces a framework for Bayesian Optimization (BO) with metric movement costs, addressing a critical challenge in practical applications where input alterations incur varying costs. Our approach is a convenient plug-in that seamlessly integrates with the existing literature on batched algorithms, where designs within batches are observed following the solution of a Traveling Salesman Problem. The proposed method provides a theoretical guarantee of convergence in terms of movement costs for BO. Empirically, our method effectively reduces average movement costs over time while maintaining comparable regret performance to conventional BO methods. This framework also shows promise for broader applications in various bandit settings with movement costs.

Updated: 2024-10-18 15:14:25

标题: 旅行强盗：具有移动成本的贝叶斯优化框架

摘要: 本文介绍了一个带有度量移动成本的贝叶斯优化（BO）框架，解决了在实际应用中输入变化产生不同成本的关键挑战。我们的方法是一个方便的插件，与现有的批处理算法文献相互融合，在批次内设计在解决旅行推销员问题后被观察到。所提出的方法在BO方面提供了移动成本的收敛性理论保证。在实证方面，我们的方法有效地降低了随时间的平均移动成本，同时保持与传统BO方法相当的后悔性能。该框架还显示了在各种带有移动成本的赌博机设置中更广泛应用的潜力。

更新时间: 2024-10-18 15:14:25

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2410.14533v1

Using Sentiment and Technical Analysis to Predict Bitcoin with Machine Learning

Cryptocurrencies have gained significant attention in recent years due to their decentralized nature and potential for financial innovation. Thus, the ability to accurately predict its price has become a subject of great interest for investors, traders, and researchers. Some works in the literature show how Bitcoin's market sentiment correlates with its price fluctuations in the market. However, papers that consider the sentiment of the market associated with financial Technical Analysis indicators in order to predict Bitcoin's price are still scarce. In this paper, we present a novel approach for predicting Bitcoin price movements by combining the Fear & Greedy Index, a measure of market sentiment, Technical Analysis indicators, and the potential of Machine Learning algorithms. This work represents a preliminary study on the importance of sentiment metrics in cryptocurrency forecasting. Our initial experiments demonstrate promising results considering investment returns, surpassing the Buy & Hold baseline, and offering valuable insights about the combination of indicators of sentiment and market in a cryptocurrency prediction model.

Updated: 2024-10-18 15:13:07

标题: 利用情感和技术分析结合机器学习预测比特币

摘要: 加密货币近年来引起了广泛关注，这主要是因为其去中心化的特性和金融创新的潜力。因此，准确预测其价格已成为投资者、交易者和研究人员极感兴趣的话题。文献中的一些作品展示了比特币市场情绪与其价格波动之间的相关性。然而，考虑与金融技术分析指标相关的市场情绪以预测比特币价格的论文仍然很少。本文提出了一种新颖的方法，通过结合市场情绪的恐惧与贪婪指数、技术分析指标和机器学习算法的潜力来预测比特币价格的走势。这项工作代表了对加密货币预测中情绪指标重要性的初步研究。我们的初步实验展示了令人期待的结果，考虑了投资回报率，超过了买入持有基准，并提供了有关情绪和市场指标结合在加密货币预测模型中的有价值见解。

更新时间: 2024-10-18 15:13:07

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2410.14532v1

Domain Adaptive Safety Filters via Deep Operator Learning

Learning-based approaches for constructing Control Barrier Functions (CBFs) are increasingly being explored for safety-critical control systems. However, these methods typically require complete retraining when applied to unseen environments, limiting their adaptability. To address this, we propose a self-supervised deep operator learning framework that learns the mapping from environmental parameters to the corresponding CBF, rather than learning the CBF directly. Our approach leverages the residual of a parametric Partial Differential Equation (PDE), where the solution defines a parametric CBF approximating the maximal control invariant set. This framework accommodates complex safety constraints, higher relative degrees, and actuation limits. We demonstrate the effectiveness of the method through numerical experiments on navigation tasks involving dynamic obstacles.

Updated: 2024-10-18 15:10:55

标题: 通过深度操作学习实现领域自适应安全过滤器

摘要: 基于学习的方法用于构建控制屏障函数（CBFs）在安全关键控制系统中越来越受到关注。然而，这些方法通常在应用于未知环境时需要完全重新训练，限制了它们的适应性。为了解决这个问题，我们提出了一个自监督深度运算符学习框架，该框架学习从环境参数到相应CBF的映射，而不是直接学习CBF。我们的方法利用参数化偏微分方程（PDE）的残差，其中解定义了一个参数化CBF，近似表示最大控制不变集。该框架适应复杂的安全约束、更高的相对度和执行限制。通过对涉及动态障碍物的导航任务进行数值实验，我们证明了该方法的有效性。

更新时间: 2024-10-18 15:10:55

领域: eess.SY,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2410.14528v1

Less is More: Selective Reduction of CT Data for Self-Supervised Pre-Training of Deep Learning Models with Contrastive Learning Improves Downstream Classification Performance

Self-supervised pre-training of deep learning models with contrastive learning is a widely used technique in image analysis. Current findings indicate a strong potential for contrastive pre-training on medical images. However, further research is necessary to incorporate the particular characteristics of these images. We hypothesize that the similarity of medical images hinders the success of contrastive learning in the medical imaging domain. To this end, we investigate different strategies based on deep embedding, information theory, and hashing in order to identify and reduce redundancy in medical pre-training datasets. The effect of these different reduction strategies on contrastive learning is evaluated on two pre-training datasets and several downstream classification tasks. In all of our experiments, dataset reduction leads to a considerable performance gain in downstream tasks, e.g., an AUC score improvement from 0.78 to 0.83 for the COVID CT Classification Grand Challenge, 0.97 to 0.98 for the OrganSMNIST Classification Challenge and 0.73 to 0.83 for a brain hemorrhage classification task. Furthermore, pre-training is up to nine times faster due to the dataset reduction. In conclusion, the proposed approach highlights the importance of dataset quality and provides a transferable approach to improve contrastive pre-training for classification downstream tasks on medical images.

Updated: 2024-10-18 15:08:05

标题: Less is More: 选择性减少CT数据以进行深度学习模型的自监督预训练，对比学习提高下游分类性能

摘要: 使用对比学习进行深度学习模型的自监督预训练是图像分析中广泛使用的技术。当前研究结果表明，对比预训练在医学图像上具有很强的潜力。然而，需要进一步研究如何结合这些图像的特殊特征。我们假设医学图像的相似性阻碍了对比学习在医疗成像领域的成功。因此，我们研究了基于深度嵌入、信息理论和哈希的不同策略，以识别和减少医学预训练数据集中的冗余。这些不同的减少策略对对比学习在两个预训练数据集和多个下游分类任务上的影响进行了评估。在所有实验中，数据集减少导致下游任务性能显著提升，例如COVID CT分类大挑战的AUC得分从0.78提高到0.83，OrganSMNIST分类挑战从0.97提高到0.98，脑出血分类任务从0.73提高到0.83。此外，由于数据集减少，预训练速度提高了高达九倍。总之，所提出的方法强调了数据集质量的重要性，并提供了一种可转移的方法，以改进用于医学图像分类下游任务的对比预训练。

更新时间: 2024-10-18 15:08:05

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.14524v1

Rethinking Distance Metrics for Counterfactual Explainability

Counterfactual explanations have been a popular method of post-hoc explainability for a variety of settings in Machine Learning. Such methods focus on explaining classifiers by generating new data points that are similar to a given reference, while receiving a more desirable prediction. In this work, we investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution. Through this framing, we derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings. Through both quantitative and qualitative analyses of counterfactual generation methods, we show that this framing allows us to express more nuanced dependencies among the covariates.

Updated: 2024-10-18 15:06:50

标题: 重新思考用于反事实可解释性的距离度量

摘要: 反事实解释已成为机器学习中各种情境的流行后验可解释性方法。这些方法专注于通过生成与给定参考点相似且获得更理想预测的新数据点来解释分类器。在这项工作中，我们研究了一种反事实生成方法的框架，该框架将反事实视为与基础数据分布中的参考点共同抽样，而不是从参考点周围的区域独立抽样。通过这种框架，我们推导出一种针对反事实相似性定制的距离度量，可以应用于广泛的情境。通过对反事实生成方法的定量和定性分析，我们表明这种框架使我们能够表达协变量之间更微妙的依赖关系。

更新时间: 2024-10-18 15:06:50

领域: cs.LG

下载: http://arxiv.org/abs/2410.14522v1

Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media

Misinformation spreads rapidly on social media, confusing the truth and targetting potentially vulnerable people. To effectively mitigate the negative impact of misinformation, it must first be accurately detected before applying a mitigation strategy, such as X's community notes, which is currently a manual process. This study takes a knowledge-based approach to misinformation detection, modelling the problem similarly to one of natural language inference. The EffiARA annotation framework is introduced, aiming to utilise inter- and intra-annotator agreement to understand the reliability of each annotator and influence the training of large language models for classification based on annotator reliability. In assessing the EffiARA annotation framework, the Russo-Ukrainian Conflict Knowledge-Based Misinformation Classification Dataset (RUC-MCD) was developed and made publicly available. This study finds that sample weighting using annotator reliability performs the best, utilising both inter- and intra-annotator agreement and soft-label training. The highest classification performance achieved using Llama-3.2-1B was a macro-F1 of 0.757 and 0.740 using TwHIN-BERT-large.

Updated: 2024-10-18 14:54:40

标题: 社交媒体知识基础虚假信息检测的高效标注者可靠性评估和样本加权

摘要: 社交媒体上的虚假信息传播迅速，混淆了真相并针对潜在易受伤害的人群。为了有效减轻虚假信息的负面影响，在应用缓解策略（如X社区笔记）之前，必须首先准确地检测到虚假信息，目前这是一个手动过程。本研究采用基于知识的方法来检测虚假信息，将问题建模类似于自然语言推理之一。引入了EffiARA标注框架，旨在利用内部和外部标注者之间的一致性来了解每个标注者的可靠性，并影响基于标注者可靠性的大型语言模型的训练。在评估EffiARA标注框架时，开发并公开了俄乌冲突基于知识的虚假信息分类数据集（RUC-MCD）。本研究发现，使用标注者可靠性进行样本加权表现最佳，同时利用内部和外部标注者一致性以及软标签训练。使用Llama-3.2-1B实现的最高分类性能是0.757的macro-F1，使用TwHIN-BERT-large是0.740。

更新时间: 2024-10-18 14:54:40

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2410.14515v1

LEAD: Latent Realignment for Human Motion Diffusion

Our goal is to generate realistic human motion from natural language. Modern methods often face a trade-off between model expressiveness and text-to-motion alignment. Some align text and motion latent spaces but sacrifice expressiveness; others rely on diffusion models producing impressive motions, but lacking semantic meaning in their latent space. This may compromise realism, diversity, and applicability. Here, we address this by combining latent diffusion with a realignment mechanism, producing a novel, semantically structured space that encodes the semantics of language. Leveraging this capability, we introduce the task of textual motion inversion to capture novel motion concepts from a few examples. For motion synthesis, we evaluate LEAD on HumanML3D and KIT-ML and show comparable performance to the state-of-the-art in terms of realism, diversity, and text-motion consistency. Our qualitative analysis and user study reveal that our synthesized motions are sharper, more human-like and comply better with the text compared to modern methods. For motion textual inversion, our method demonstrates improved capacity in capturing out-of-distribution characteristics in comparison to traditional VAEs.

Updated: 2024-10-18 14:43:05

标题: 潜在重组：用于人体运动扩散的方法

摘要: 我们的目标是从自然语言生成逼真的人类动作。现代方法往往面临模型表达能力和文本到动作对齐之间的权衡。一些方法将文本和动作的潜在空间对齐，但会牺牲表达能力；另一些依赖于产生令人印象深刻的动作的扩散模型，但在其潜在空间中缺乏语义含义。这可能会影响现实主义、多样性和适用性。在这里，我们通过将潜在扩散与重新对齐机制结合起来，产生了一个新颖的、语义结构化的空间，它编码了语言的语义。利用这种能力，我们引入了文本动作反演任务，从少量示例中捕捉新颖的动作概念。对于动作合成，我们在HumanML3D和KIT-ML上评估了LEAD，并展示出在现实主义、多样性和文本-动作一致性方面与现有技术相媲美的性能。我们的定性分析和用户研究表明，与现代方法相比，我们合成的动作更加清晰、更具人类特征，并且与文本更好地符合。对于动作文本反演，我们的方法在捕捉与传统VAEs相比的分布特征方面表现出了改进的能力。

更新时间: 2024-10-18 14:43:05

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2410.14508v1

SignAttention: On the Interpretability of Transformer Models for Sign Language Translation

This paper presents the first comprehensive interpretability analysis of a Transformer-based Sign Language Translation (SLT) model, focusing on the translation from video-based Greek Sign Language to glosses and text. Leveraging the Greek Sign Language Dataset, we examine the attention mechanisms within the model to understand how it processes and aligns visual input with sequential glosses. Our analysis reveals that the model pays attention to clusters of frames rather than individual ones, with a diagonal alignment pattern emerging between poses and glosses, which becomes less distinct as the number of glosses increases. We also explore the relative contributions of cross-attention and self-attention at each decoding step, finding that the model initially relies on video frames but shifts its focus to previously predicted tokens as the translation progresses. This work contributes to a deeper understanding of SLT models, paving the way for the development of more transparent and reliable translation systems essential for real-world applications.

Updated: 2024-10-18 14:38:37

标题: 手语翻译的Transformer模型可解释性研究

摘要: 本文首次对基于Transformer的手语翻译（SLT）模型进行了全面的可解释性分析，重点关注从基于视频的希腊手语到注释和文本的翻译。利用希腊手语数据集，我们研究了模型内的注意力机制，以了解它如何处理和将视觉输入与序列化注释对齐。我们的分析显示，模型关注的是帧的聚类而不是单个帧，姿势和注释之间出现了对角线对齐模式，随着注释数量的增加，这种模式变得不那么明显。我们还探讨了每个解码步骤中跨注意力和自注意力的相对贡献，发现模型最初依赖于视频帧，但随着翻译的进行，它的焦点转移到先前预测的标记上。这项工作有助于更深入地理解SLT模型，为开发更透明和可靠的翻译系统铺平道路，这对于现实应用至关重要。

更新时间: 2024-10-18 14:38:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14506v1

Deep Implicit Optimization for Robust and Flexible Image Registration

Deep Learning in Image Registration (DLIR) methods have been tremendously successful in image registration due to their speed and ability to incorporate weak label supervision at training time. However, DLIR methods forego many of the benefits of classical optimization-based methods. The functional nature of deep networks do not guarantee that the predicted transformation is a local minima of the registration objective, the representation of the transformation (displacement/velocity field/affine) is fixed, and the networks are not robust to domain shift. Our method aims to bridge this gap between classical and learning methods by incorporating optimization as a layer in a deep network. A deep network is trained to predict multi-scale dense feature images that are registered using a black box iterative optimization solver. This optimal warp is then used to minimize image and label alignment errors. By implicitly differentiating end-to-end through an iterative optimization solver, our learned features are registration and label-aware, and the warp functions are guaranteed to be local minima of the registration objective in the feature space. Our framework shows excellent performance on in-domain datasets, and is agnostic to domain shift such as anisotropy and varying intensity profiles. For the first time, our method allows switching between arbitrary transformation representations (free-form to diffeomorphic) at test time with zero retraining. End-to-end feature learning also facilitates interpretability of features, and out-of-the-box promptability using additional label-fidelity terms at inference.

Updated: 2024-10-18 14:38:03

标题: 深度隐式优化用于强健和灵活的图像配准

摘要: 深度学习在图像配准（DLIR）方法在图像配准方面取得了巨大成功，这是因为它们的速度和能力能够在训练时集成弱标签监督。然而，DLIR方法放弃了许多传统基于优化方法的优点。深度网络的功能性质并不能保证预测的转换是注册目标的局部最小值，转换的表示（位移/速度场/仿射）是固定的，而且网络对领域转移不够健壮。我们的方法旨在通过将优化作为深度网络中的一层来弥补传统和学习方法之间的差距。一个深度网络被训练来预测多尺度密集特征图像，这些图像使用黑盒迭代优化求解器进行配准。然后利用这个最优扭曲来最小化图像和标签对齐误差。通过隐式地通过一个迭代优化求解器端到端不同，我们学习的特征是注册和标签感知的，扭曲函数保证是特征空间的注册目标的局部最小值。我们的框架在域内数据集上表现出色，并且对领域转移（如各向异性和不同强度曲线）不敏感。我们的方法首次允许在测试时在任意转换表示之间切换（从自由形式到可微形式）而无需重新训练。端到端特征学习还有助于特征的可解释性，并且在推断时使用额外的标签保真项进行即插即用。

更新时间: 2024-10-18 14:38:03

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.07361v2

Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

Reinforcement learning (RL) is rapidly reaching and surpassing human-level control capabilities. However, state-of-the-art RL algorithms often require timesteps and reaction times significantly faster than human capabilities, which is impractical in real-world settings and typically necessitates specialized hardware. Such speeds are difficult to achieve in the real world and often requires specialized hardware. We introduce Sequence Reinforcement Learning (SRL), an RL algorithm designed to produce a sequence of actions for a given input state, enabling effective control at lower decision frequencies. SRL addresses the challenges of learning action sequences by employing both a model and an actor-critic architecture operating at different temporal scales. We propose a "temporal recall" mechanism, where the critic uses the model to estimate intermediate states between primitive actions, providing a learning signal for each individual action within the sequence. Once training is complete, the actor can generate action sequences independently of the model, achieving model-free control at a slower frequency. We evaluate SRL on a suite of continuous control tasks, demonstrating that it achieves performance comparable to state-of-the-art algorithms while significantly reducing actor sample complexity. To better assess performance across varying decision frequencies, we introduce the Frequency-Averaged Score (FAS) metric. Our results show that SRL significantly outperforms traditional RL algorithms in terms of FAS, making it particularly suitable for applications requiring variable decision frequencies. Additionally, we compare SRL with model-based online planning, showing that SRL achieves superior FAS while leveraging the same model during training that online planners use for planning.

Updated: 2024-10-18 14:35:53

标题: 克服连续控制中缓慢的决策频率：基于模型的序列强化学习用于无模型控制

摘要: 强化学习（RL）迅速达到并超越人类水平的控制能力。然而，最先进的RL算法通常需要比人类能力更快的时间步骤和反应时间，这在现实世界的环境中是不切实际的，通常需要专门的硬件。这样的速度在现实世界中很难实现，通常需要专门的硬件。我们引入了序列强化学习（SRL），这是一种旨在为给定输入状态生成一系列动作的RL算法，使控制在较低的决策频率下更加有效。SRL通过使用在不同时间尺度上运行的模型和演员评论架构来解决学习动作序列的挑战。我们提出了一个"时间回忆"机制，评论者使用模型来估计原始动作之间的中间状态，为序列中的每个单独动作提供学习信号。一旦训练完成，演员可以独立于模型生成动作序列，实现在较慢频率下无模型控制。我们在一系列连续控制任务上评估了SRL，表明它在显著降低演员样本复杂性的同时实现了与最先进算法可比的性能。为了更好地评估在不同决策频率下的性能，我们引入了频率平均得分（FAS）指标。我们的结果显示，相对于传统的RL算法，SRL在FAS方面表现显著优异，特别适用于需要不同决策频率的应用。此外，我们将SRL与基于模型的在线规划进行了比较，结果显示SRL在训练中利用了与在线规划者用于规划的相同模型，同时实现了更优异的FAS。

更新时间: 2024-10-18 14:35:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.08979v2

Sample Compression Scheme Reductions

We present novel reductions from sample compression schemes in multiclass classification, regression, and adversarially robust learning settings to binary sample compression schemes. Assuming we have a compression scheme for binary classes of size $f(d_\mathrm{VC})$, where $d_\mathrm{VC}$ is the VC dimension, then we have the following results: (1) If the binary compression scheme is a majority-vote or a stable compression scheme, then there exists a multiclass compression scheme of size $O(f(d_\mathrm{G}))$, where $d_\mathrm{G}$ is the graph dimension. Moreover, for general binary compression schemes, we obtain a compression of size $O(f(d_\mathrm{G})\log|Y|)$, where $Y$ is the label space. (2) If the binary compression scheme is a majority-vote or a stable compression scheme, then there exists an $\epsilon$-approximate compression scheme for regression over $[0,1]$-valued functions of size $O(f(d_\mathrm{P}))$, where $d_\mathrm{P}$ is the pseudo-dimension. For general binary compression schemes, we obtain a compression of size $O(f(d_\mathrm{P})\log(1/\epsilon))$. These results would have significant implications if the sample compression conjecture, which posits that any binary concept class with a finite VC dimension admits a binary compression scheme of size $O(d_\mathrm{VC})$, is resolved (Littlestone and Warmuth, 1986; Floyd and Warmuth, 1995; Warmuth, 2003). Our results would then extend the proof of the conjecture immediately to other settings. We establish similar results for adversarially robust learning and also provide an example of a concept class that is robustly learnable but has no bounded-size compression scheme, demonstrating that learnability is not equivalent to having a compression scheme independent of the sample size, unlike in binary classification, where compression of size $2^{O(d_\mathrm{VC})}$ is attainable (Moran and Yehudayoff, 2016).

Updated: 2024-10-18 14:32:21

标题: 样本压缩方案的减少

摘要: 我们提出了新颖的将样本压缩方案从多类分类、回归和对抗鲁棒学习设置转化为二进制样本压缩方案的方法。假设我们有一个大小为$f(d_{VC})$的二进制类别压缩方案，其中$d_{VC}$是VC维度，那么我们有以下结果：(1) 如果二进制压缩方案是多数投票或稳定的压缩方案，则存在一个大小为$O(f(d_{G}))$的多类压缩方案，其中$d_{G}$是图维度。此外，对于一般的二进制压缩方案，我们得到一个大小为$O(f(d_{G})\log|Y|)$的压缩，其中$Y$是标签空间。(2) 如果二进制压缩方案是多数投票或稳定的压缩方案，则存在一个$[0,1]$值函数上的回归的$\epsilon$-近似压缩方案，大小为$O(f(d_{P}))$，其中$d_{P}$是伪维度。对于一般的二进制压缩方案，我们得到一个大小为$O(f(d_{P})\log(1/\epsilon))$的压缩。如果解决了样本压缩猜想（即任何具有有限VC维度的二进制概念类都具有大小为$O(d_{VC})$的二进制压缩方案）（Littlestone和Warmuth，1986；Floyd和Warmuth，1995；Warmuth，2003），这些结果将具有重要意义。我们的结果将立即将该猜想的证明扩展到其他设置。我们还为对抗性鲁棒学习建立了类似的结果，并提供了一个可靠学习的概念类的示例，但没有有界大小的压缩方案，证明了可学习性与具有与样本大小无关的压缩方案并不等价，不像在二进制分类中，其中大小为$2^{O(d_{VC})}$的压缩是可以实现的（Moran和Yehudayoff，2016）。

更新时间: 2024-10-18 14:32:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.13012v2

Safeguarding Blockchain Ecosystem: Understanding and Detecting Attack Transactions on Cross-chain Bridges

Cross-chain bridges are essential decentralized applications (DApps) to facilitate interoperability between different blockchain networks. Unlike regular DApps, the functionality of cross-chain bridges relies on the collaboration of information both on and off the chain, which exposes them to a wider risk of attacks. According to our statistics, attacks on cross-chain bridges have resulted in losses of nearly 4.3 billion dollars since 2021. Therefore, it is particularly necessary to understand and detect attacks on cross-chain bridges. In this paper, we collect the largest number of cross-chain bridge attack incidents to date, including 49 attacks that occurred between June 2021 and September 2024. Our analysis reveal that attacks against cross-chain business logic cause significantly more damage than those that do not. These cross-chain attacks exhibit different patterns compared to normal transactions in terms of call structure, which effectively indicates potential attack behaviors. Given the significant losses in these cases and the scarcity of related research, this paper aims to detect attacks against cross-chain business logic, and propose the BridgeGuard tool. Specifically, BridgeGuard models cross-chain transactions from a graph perspective, and employs a two-stage detection framework comprising global and local graph mining to identify attack patterns in cross-chain transactions. We conduct multiple experiments on the datasets with 203 attack transactions and 40,000 normal cross-chain transactions. The results show that BridgeGuard's reported recall score is 36.32\% higher than that of state-of-the-art tools and can detect unknown attack transactions.

Updated: 2024-10-18 14:25:05

标题: 保护区块链生态系统：理解和检测跨链桥上的攻击交易

摘要: 跨链桥是促进不同区块链网络之间互操作性的关键去中心化应用程序（DApps）。与常规DApps不同，跨链桥的功能依赖于链内外信息的协作，这使它们面临更广泛的攻击风险。根据我们的统计数据，自2021年以来，对跨链桥的攻击已经导致损失近43亿美元。因此，有必要特别了解和检测对跨链桥的攻击。本文收集了迄今为止最多的跨链桥攻击事件，包括2021年6月至2024年9月发生的49起攻击。我们的分析表明，针对跨链业务逻辑的攻击造成的损失明显多于其他类型的攻击。这些跨链攻击在调用结构方面与正常交易呈现不同模式，有效地指示潜在的攻击行为。鉴于这些案例中的巨大损失以及相关研究的稀缺性，本文旨在检测针对跨链业务逻辑的攻击，并提出了BridgeGuard工具。具体而言，BridgeGuard从图的角度建模跨链交易，并采用全局和本地图挖掘构成的两阶段检测框架来识别跨链交易中的攻击模式。我们对包含203个攻击交易和40,000个正常跨链交易的数据集进行了多次实验。结果显示，BridgeGuard报告的召回率分数比最先进工具高出36.32％，并且可以检测未知的攻击交易。

更新时间: 2024-10-18 14:25:05

领域: cs.CR

下载: http://arxiv.org/abs/2410.14493v1

Synthetic Data Generation in Cybersecurity: A Comparative Analysis

Synthetic data generation faces significant challenges in accurately replicating real data, particularly with tabular data, where achieving high fidelity and utility is critical. While numerous methods have been developed, the most effective approach for creating high-quality synthetic data for network traffic security remains to be seen. This study conducts a comprehensive comparative analysis of non-AI, conventional AI, and generative AI techniques for synthetic tabular data generation using two widely recognized cybersecurity datasets: NSL-KDD and CICIDS-2017. Particular emphasis was placed on prominent GAN models for tabular data generation, including CTGAN, CopulaGAN, GANBLR++, and CastGAN. The results indicate that GAN-based methods, particularly CTGAN and CopulaGAN, outperform non-AI and conventional AI approaches in terms of fidelity and utility. To the best of our knowledge, this research contributes to the field by offering the first comparative evaluation of these methods specifically for cybersecurity network traffic data, filling a critical gap in the literature. It also introduces mutual information for feature selection, further enhancing the quality of the generated synthetic data. These findings provide valuable guidance for researchers seeking the most suitable synthetic data generation method in cybersecurity applications.

Updated: 2024-10-18 14:19:25

标题: 网络安全中的合成数据生成：一项比较分析

摘要: 合成数据生成在准确复制真实数据方面面临着重大挑战，特别是在表格数据中，其中实现高保真度和实用性至关重要。虽然已经开发了许多方法，但对于为网络流量安全创建高质量合成数据的最有效方法尚未确定。本研究对非AI、传统AI和生成AI技术在使用两个广泛认可的网络安全数据集（NSL-KDD和CICIDS-2017）生成合成表格数据方面进行了全面的比较分析。特别关注了用于表格数据生成的突出GAN模型，包括CTGAN、CopulaGAN、GANBLR++和CastGAN。结果表明，基于GAN的方法，特别是CTGAN和CopulaGAN，在保真度和实用性方面优于非AI和传统AI方法。据我们所知，这项研究通过为网络安全网络流量数据提供这些方法的首次比较评估，填补了文献中的重要空白。它还引入了特征选择的互信息，进一步提高了生成的合成数据的质量。这些发现为寻求在网络安全应用中寻找最合适的合成数据生成方法的研究人员提供了宝贵的指导。

更新时间: 2024-10-18 14:19:25

领域: cs.CR

下载: http://arxiv.org/abs/2410.16326v1

An Integrated Deep Learning Model for Skin Cancer Detection Using Hybrid Feature Fusion Technique

Skin cancer is a serious and potentially fatal disease caused by DNA damage. Early detection significantly increases survival rates, making accurate diagnosis crucial. In this groundbreaking study, we present a hybrid framework based on Deep Learning (DL) that achieves precise classification of benign and malignant skin lesions. Our approach begins with dataset preprocessing to enhance classification accuracy, followed by training two separate pre-trained DL models, InceptionV3 and DenseNet121. By fusing the results of each model using the weighted sum rule, our system achieves exceptional accuracy rates. Specifically, we achieve a 92.27% detection accuracy rate, 92.33% sensitivity, 92.22% specificity, 90.81% precision, and 91.57% F1-score, outperforming existing models and demonstrating the robustness and trustworthiness of our hybrid approach. Our study represents a significant advance in skin cancer diagnosis and provides a promising foundation for further research in the field. With the potential to save countless lives through earlier detection, our hybrid deep-learning approach is a game-changer in the fight against skin cancer.

Updated: 2024-10-18 14:19:13

标题: 一种利用混合特征融合技术的皮肤癌检测集成深度学习模型

摘要: 皮肤癌是一种由DNA损伤引起的严重且潜在致命的疾病。早期检测显著提高了存活率，使准确诊断至关重要。在这项开创性的研究中，我们提出了一种基于深度学习（DL）的混合框架，实现对良性和恶性皮肤病变的精确分类。我们的方法从数据集预处理开始，以提高分类准确性，然后训练两个独立的预训练DL模型，InceptionV3和DenseNet121。通过使用加权和规则融合每个模型的结果，我们的系统实现了异常的准确率。具体来说，我们实现了92.27％的检测准确率，92.33％的敏感性，92.22％的特异性，90.81％的精度和91.57％的F1得分，超越了现有模型，并展示了我们混合方法的稳健性和可信度。我们的研究代表了皮肤癌诊断的重大进展，并为该领域的进一步研究奠定了有希望的基础。通过提前检测潜在挽救无数生命的潜力，我们的混合深度学习方法在与皮肤癌作斗争中具有改变游戏规则的作用。

更新时间: 2024-10-18 14:19:13

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.14489v1

Enabling Scalable Evaluation of Bias Patterns in Medical LLMs

Large language models (LLMs) have shown impressive potential in helping with numerous medical challenges. Deploying LLMs in high-stakes applications such as medicine, however, brings in many concerns. One major area of concern relates to biased behaviors of LLMs in medical applications, leading to unfair treatment of individuals. To pave the way for the responsible and impactful deployment of Med LLMs, rigorous evaluation is a key prerequisite. Due to the huge complexity and variability of different medical scenarios, existing work in this domain has primarily relied on using manually crafted datasets for bias evaluation. In this study, we present a new method to scale up such bias evaluations by automatically generating test cases based on rigorous medical evidence. We specifically target the challenges of a) domain-specificity of bias characterization, b) hallucinating while generating the test cases, and c) various dependencies between the health outcomes and sensitive attributes. To that end, we offer new methods to address these challenges integrated with our generative pipeline, using medical knowledge graphs, medical ontologies, and customized general LLM evaluation frameworks in our method. Through a series of extensive experiments, we show that the test cases generated by our proposed method can effectively reveal bias patterns in Med LLMs at larger and more flexible scales than human-crafted datasets. We publish a large bias evaluation dataset using our pipeline, which is dedicated to a few medical case studies. A live demo of our application for vignette generation is available at https://vignette.streamlit.app. Our code is also available at https://github.com/healthylaife/autofair.

Updated: 2024-10-18 14:17:03

标题: 实现对医学LLM中偏见模式的可扩展评估

摘要: 大型语言模型(LLMs)在许多医疗挑战中展现出了令人印象深刻的潜力。然而，在高风险应用程序中部署LLMs，如医学，会引发许多担忧。一个主要的担忧领域与LLMs在医疗应用中的偏见行为有关，导致对个体的不公平对待。为了为负责任和有影响力的Med LLMs部署铺平道路，严格的评估是一个关键的先决条件。由于不同医疗场景的巨大复杂性和变化性，该领域的现有工作主要依赖于使用手工制作的数据集进行偏见评估。在这项研究中，我们提出了一种新方法，通过根据严格的医学证据自动生成测试用例来扩大这种偏见评估。我们特别针对以下挑战：a)偏见表征的领域特异性，b)在生成测试用例时产生幻觉，c)健康结果和敏感属性之间的各种依赖关系。为此，我们提供了解决这些挑战的新方法，结合我们的生成管道，使用医学知识图谱、医学本体论和定制的通用LLM评估框架。通过一系列广泛的实验，我们展示了我们提出的方法生成的测试用例可以有效地揭示Med LLMs中的偏见模式，比人工制作的数据集更大规模和更灵活。我们使用我们的管道发布了一个专门用于几个医学案例研究的大型偏见评估数据集。我们的应用程序生成小品的在线演示可在https://vignette.streamlit.app上找到。我们的代码也可以在https://github.com/healthylaife/autofair上找到。

更新时间: 2024-10-18 14:17:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14763v1

ANT: Adaptive Noise Schedule for Time Series Diffusion Models

Advances in diffusion models for generative artificial intelligence have recently propagated to the time series (TS) domain, demonstrating state-of-the-art performance on various tasks. However, prior works on TS diffusion models often borrow the framework of existing works proposed in other domains without considering the characteristics of TS data, leading to suboptimal performance. In this work, we propose Adaptive Noise schedule for Time series diffusion models (ANT), which automatically predetermines proper noise schedules for given TS datasets based on their statistics representing non-stationarity. Our intuition is that an optimal noise schedule should satisfy the following desiderata: 1) It linearly reduces the non-stationarity of TS data so that all diffusion steps are equally meaningful, 2) the data is corrupted to the random noise at the final step, and 3) the number of steps is sufficiently large. The proposed method is practical for use in that it eliminates the necessity of finding the optimal noise schedule with a small additional cost to compute the statistics for given datasets, which can be done offline before training. We validate the effectiveness of our method across various tasks, including TS forecasting, refinement, and generation, on datasets from diverse domains. Code is available at this repository: https://github.com/seunghan96/ANT.

Updated: 2024-10-18 14:16:54

标题: 自适应噪声时间序列扩散模型的噪声调度ANT

摘要: 最近，生成人工智能的扩散模型在时间序列（TS）领域得到了推广，展示出在各种任务上的最新性能。然而，先前关于TS扩散模型的研究往往借鉴了其他领域提出的现有作品框架，而没有考虑TS数据的特征，导致性能不佳。在这项工作中，我们提出了适用于时间序列扩散模型（ANT）的自适应噪声调度，该模型根据代表非稳态性的统计数据自动预先确定给定TS数据集的适当噪声调度。我们的直觉是，最佳噪声调度应满足以下要求：1）它线性减小TS数据的非稳态性，使所有扩散步骤同样有意义，2）数据在最后一步被破坏为随机噪声，3）步骤数量足够大。所提出的方法在实际应用中很实用，因为它消除了找到最佳噪声调度的必要性，并且只需较小的额外成本来计算给定数据集的统计信息，这可以在训练之前离线完成。我们验证了我们的方法在各种任务上的有效性，包括TS预测、改进和生成，涉及不同领域的数据集。代码可在以下存储库中找到：https://github.com/seunghan96/ANT。

更新时间: 2024-10-18 14:16:54

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.14488v1

Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping

In this paper, we consider a transfer reinforcement learning problem involving agents with different action spaces. Specifically, for any new unseen task, the goal is to use a successful demonstration of this task by an expert agent in its action space to enable a learner agent learn an optimal policy in its own different action space with fewer samples than those required if the learner was learning on its own. Existing transfer learning methods across different action spaces either require handcrafted mappings between those action spaces provided by human experts, which can induce bias in the learning procedure, or require the expert agent to share its policy parameters with the learner agent, which does not generalize well to unseen tasks. In this work, we propose a method that learns a subgoal mapping between the expert agent policy and the learner agent policy. Since the expert agent and the learner agent have different action spaces, their optimal policies can have different subgoal trajectories. We learn this subgoal mapping by training a Long Short Term Memory (LSTM) network for a distribution of tasks and then use this mapping to predict the learner subgoal sequence for unseen tasks, thereby improving the speed of learning by biasing the agent's policy towards the predicted learner subgoal sequence. Through numerical experiments, we demonstrate that the proposed learning scheme can effectively find the subgoal mapping underlying the given distribution of tasks. Moreover, letting the learner agent imitate the expert agent's policy with the learnt subgoal mapping can significantly improve the sample efficiency and training time of the learner agent in unseen new tasks.

Updated: 2024-10-18 14:08:41

标题: 利用子目标映射在异构动作空间中进行强化学习的转移

摘要: 在本文中，我们考虑了涉及具有不同动作空间的代理的转移强化学习问题。具体而言，对于任何新的未见任务，目标是利用专家代理在其动作空间中对该任务的成功演示，使学习代理能够在自己不同的动作空间中学习出最优策略，而所需的样本比学习者单独学习时更少。现有的跨不同动作空间的转移学习方法要么需要人类专家提供这些动作空间之间的手工映射，这可能会在学习过程中引入偏见，要么需要专家代理与学习代理共享其策略参数，这在未见任务上泛化能力不强。在这项工作中，我们提出了一种方法，学习专家代理策略和学习代理策略之间的子目标映射。由于专家代理和学习代理具有不同的动作空间，它们的最优策略可能具有不同的子目标轨迹。我们通过训练一种长短期记忆（LSTM）网络来学习这种子目标映射，以此来预测未见任务的学习者子目标序列，从而通过偏向代理策略朝向预测的学习者子目标序列来提高学习速度。通过数值实验，我们证明了所提出的学习方案可以有效地找到给定任务分布下的子目标映射。此外，让学习代理模仿专家代理的策略，使用学习到的子目标映射，可以显著提高学习代理在未见新任务中的样本效率和训练时间。

更新时间: 2024-10-18 14:08:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14484v1

Spectral Representations for Accurate Causal Uncertainty Quantification with Gaussian Processes

Accurate uncertainty quantification for causal effects is essential for robust decision making in complex systems, but remains challenging in non-parametric settings. One promising framework represents conditional distributions in a reproducing kernel Hilbert space and places Gaussian process priors on them to infer posteriors on causal effects, but requires restrictive nuclear dominant kernels and approximations that lead to unreliable uncertainty estimates. In this work, we introduce a method, IMPspec, that addresses these limitations via a spectral representation of the Hilbert space. We show that posteriors in this model can be obtained explicitly, by extending a result in Hilbert space regression theory. We also learn the spectral representation to optimise posterior calibration. Our method achieves state-of-the-art performance in uncertainty quantification and causal Bayesian optimisation across simulations and a healthcare application.

Updated: 2024-10-18 14:06:49

标题: 高斯过程用于准确因果不确定性量化的频谱表示

摘要: 准确的因果效应不确定性量化对于复杂系统中的健壮决策至关重要，但在非参数设置中仍然具有挑战性。一种有前途的框架表示再现核希尔伯特空间中的条件分布，并在其上放置高斯过程先验以推断因果效应的后验，但需要限制性核和导致不可靠不确定性估计的近似。在这项工作中，我们引入一种名为IMPspec的方法，通过希尔伯特空间的谱表示来解决这些限制。我们展示了该模型中的后验可以通过扩展希尔伯特空间回归理论中的结果来显式获得。我们还学习谱表示以优化后验校准。我们的方法在模拟和医疗应用中实现了最先进的不确定性量化和因果贝叶斯优化性能。

更新时间: 2024-10-18 14:06:49

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.14483v1

DRL Optimization Trajectory Generation via Wireless Network Intent-Guided Diffusion Models for Optimizing Resource Allocation

With the rapid advancements in wireless communication fields, including low-altitude economies, 6G, and Wi-Fi, the scale of wireless networks continues to expand, accompanied by increasing service quality demands. Traditional deep reinforcement learning (DRL)-based optimization models can improve network performance by solving non-convex optimization problems intelligently. However, they heavily rely on online deployment and often require extensive initial training. Online DRL optimization models typically make accurate decisions based on current channel state distributions. When these distributions change, their generalization capability diminishes, which hinders the responsiveness essential for real-time and high-reliability wireless communication networks. Furthermore, different users have varying quality of service (QoS) requirements across diverse scenarios, and conventional online DRL methods struggle to accommodate this variability. Consequently, exploring flexible and customized AI strategies is critical. We propose a wireless network intent (WNI)-guided trajectory generation model based on a generative diffusion model (GDM). This model can be generated and fine-tuned in real time to achieve the objective and meet the constraints of target intent networks, significantly reducing state information exposure during wireless communication. Moreover, The WNI-guided optimization trajectory generation can be customized to address differentiated QoS requirements, enhancing the overall quality of communication in future intelligent networks. Extensive simulation results demonstrate that our approach achieves greater stability in spectral efficiency variations and outperforms traditional DRL optimization models in dynamic communication systems.

Updated: 2024-10-18 14:04:38

标题: 通过无线网络意图引导扩散模型优化资源分配的DRL优化轨迹生成

摘要: 随着无线通信领域的快速发展，包括低空经济、6G和Wi-Fi，无线网络的规模不断扩大，伴随着对服务质量的需求不断增加。传统的基于深度强化学习（DRL）的优化模型可以通过智能地解决非凸优化问题来改善网络性能。然而，它们严重依赖在线部署，通常需要大量的初始训练。在线DRL优化模型通常根据当前信道状态分布做出准确决策。当这些分布发生变化时，它们的泛化能力会减弱，这会阻碍实时和高可靠性无线通信网络所必需的响应能力。此外，不同用户在不同场景下有不同的服务质量（QoS）要求，传统的在线DRL方法难以适应这种变化。因此，探索灵活和定制的AI策略至关重要。我们提出了一种基于生成扩散模型（GDM）的无线网络意图（WNI）引导的轨迹生成模型。该模型可以实时生成和微调，以实现目标意图网络的目标并满足约束条件，显著降低无线通信过程中的状态信息暴露。此外，WNI引导的优化轨迹生成可以定制以满足不同的QoS要求，提升未来智能网络中通信的整体质量。大量的仿真结果表明，我们的方法在频谱效率变化方面具有更大的稳定性，并且在动态通信系统中胜过传统的DRL优化模型。

更新时间: 2024-10-18 14:04:38

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2410.14481v1

BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models

While large language models (LLMs) exhibit remarkable capabilities across various tasks, they encounter potential security risks such as jailbreak attacks, which exploit vulnerabilities to bypass security measures and generate harmful outputs. Existing jailbreak strategies mainly focus on maximizing attack success rate (ASR), frequently neglecting other critical factors, including the relevance of the jailbreak response to the query and the level of stealthiness. This narrow focus on single objectives can result in ineffective attacks that either lack contextual relevance or are easily recognizable. In this work, we introduce BlackDAN, an innovative black-box attack framework with multi-objective optimization, aiming to generate high-quality prompts that effectively facilitate jailbreaking while maintaining contextual relevance and minimizing detectability. BlackDAN leverages Multiobjective Evolutionary Algorithms (MOEAs), specifically the NSGA-II algorithm, to optimize jailbreaks across multiple objectives including ASR, stealthiness, and semantic relevance. By integrating mechanisms like mutation, crossover, and Pareto-dominance, BlackDAN provides a transparent and interpretable process for generating jailbreaks. Furthermore, the framework allows customization based on user preferences, enabling the selection of prompts that balance harmfulness, relevance, and other factors. Experimental results demonstrate that BlackDAN outperforms traditional single-objective methods, yielding higher success rates and improved robustness across various LLMs and multimodal LLMs, while ensuring jailbreak responses are both relevant and less detectable.

Updated: 2024-10-18 14:03:05

标题: BlackDAN：一种针对大型语言模型的有效和情境化越狱的黑盒多目标方法

摘要: 尽管大型语言模型（LLMs）在各种任务中展现出卓越的能力，但它们面临潜在的安全风险，例如越狱攻击，利用漏洞绕过安全措施并生成有害输出。现有的越狱策略主要集中在最大化攻击成功率（ASR），经常忽视其他关键因素，包括越狱响应与查询的相关性和隐蔽性水平。对单一目标的狭窄关注可能导致无效的攻击，要么缺乏上下文相关性，要么容易被识别。在这项工作中，我们引入了BlackDAN，一个创新的黑盒攻击框架，具有多目标优化，旨在生成高质量的提示，有效促进越狱，同时保持上下文相关性并减少可检测性。BlackDAN利用多目标进化算法（MOEAs），特别是NSGA-II算法，来优化跨多个目标的越狱，包括ASR、隐蔽性和语义相关性。通过整合突变、交叉和帕累托支配等机制，BlackDAN提供了一个透明且可解释的生成越狱的过程。此外，该框架允许基于用户偏好进行定制，使用户能够选择平衡有害性、相关性和其他因素的提示。实验结果表明，BlackDAN优于传统的单一目标方法，在各种LLMs和多模态LLMs中获得更高的成功率和改善的鲁棒性，同时确保越狱响应既相关又不易被检测。

更新时间: 2024-10-18 14:03:05

领域: cs.CR,cs.AI,cs.CL,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.09804v2

Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent text but remain limited by the static nature of their training data. Retrieval Augmented Generation (RAG) addresses this issue by combining LLMs with up-to-date information retrieval, but also expand the attack surface of the system. This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation, such as inserting harmful links, promoting unauthorized services, and initiating denial-of-service behaviors. We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component. Our experiments reveal that corpus poisoning can achieve significant attack success rates through the injection of a small number of compromised documents into the retriever corpus. In contrast, backdoor attacks demonstrate even higher success rates but necessitate a more complex setup, as the victim must fine-tune the retriever using the attacker poisoned dataset.

Updated: 2024-10-18 14:02:34

标题: 植入后门的检索器：用于对大型语言模型的检索增强生成进行快速注入攻击

摘要: 大型语言模型（LLMs）已经展示出在生成连贯文本方面的显著能力，但仍然受限于其训练数据的静态性质。检索增强生成（RAG）通过将LLMs与最新的信息检索相结合，解决了这个问题，但也扩大了系统的攻击面。本文调查了针对RAG的提示注入攻击，重点关注恶意目标超出误导信息，如插入有害链接、推广未经授权的服务和发起拒绝服务行为等。我们基于现有的语料库污染技术，并提出了一种针对密集检索器组件的微调过程的新型后门攻击。我们的实验表明，通过向检索器语料库注入少量受损文档，语料库污染可以实现显著的攻击成功率。相比之下，后门攻击表现出更高的成功率，但需要更复杂的设置，因为受害者必须使用攻击者污染的数据集来微调检索器。

更新时间: 2024-10-18 14:02:34

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.14479v1

Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups

Markov processes serve as a universal model for many real-world random processes. This paper presents a data-driven approach for learning these models through the spectral decomposition of the infinitesimal generator (IG) of the Markov semigroup. The unbounded nature of IGs complicates traditional methods such as vector-valued regression and Hilbert-Schmidt operator analysis. Existing techniques, including physics-informed kernel regression, are computationally expensive and limited in scope, with no recovery guarantees for transfer operator methods when the time-lag is small. We propose a novel method that leverages the IG's resolvent, characterized by the Laplace transform of transfer operators. This approach is robust to time-lag variations, ensuring accurate eigenvalue learning even for small time-lags. Our statistical analysis applies to a broader class of Markov processes than current methods while reducing computational complexity from quadratic to linear in the state dimension. Finally, we illustrate the behaviour of our method in two experiments.

Updated: 2024-10-18 14:02:06

标题: 基于拉普拉斯变换的连续马尔可夫半群低复杂度学习

摘要: 马尔可夫过程是许多现实世界随机过程的通用模型。本文提出了一种基于数据驱动的方法，通过马尔可夫半群的无穷小生成器（IG）的谱分解来学习这些模型。IG的无界性使传统方法（如矢量值回归和希尔伯特-施密特算子分析）变得复杂。现有技术，包括受物理启发的核回归，在计算上昂贵且范围有限，在时间滞后较小时，传输算子方法没有恢复保证。我们提出了一种新颖的方法，利用了IG的共轭，其特征是传输算子的拉普拉斯变换。这种方法对时间滞后变化具有鲁棒性，确保即使在时间滞后较小的情况下也能准确学习特征值。我们的统计分析适用于比当前方法更广泛的马尔可夫过程类别，同时将计算复杂度从状态维度的二次降低到线性。最后，我们在两个实验中展示了我们方法的行为。

更新时间: 2024-10-18 14:02:06

领域: cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.14477v1

Enhancing Cryptocurrency Market Forecasting: Advanced Machine Learning Techniques and Industrial Engineering Contributions

Cryptocurrencies, as decentralized digital assets, have experienced rapid growth and adoption, with over 23,000 cryptocurrencies and a market capitalization nearing \$1.1 trillion (about \$3,400 per person in the US) as of 2023. This dynamic market presents significant opportunities and risks, highlighting the need for accurate price prediction models to manage volatility. This chapter comprehensively reviews machine learning (ML) techniques applied to cryptocurrency price prediction from 2014 to 2024. We explore various ML algorithms, including linear models, tree-based approaches, and advanced deep learning architectures such as transformers and large language models. Additionally, we examine the role of sentiment analysis in capturing market sentiment from textual data like social media posts and news articles to anticipate price fluctuations. With expertise in optimizing complex systems and processes, industrial engineers are pivotal in enhancing these models. They contribute by applying principles of process optimization, efficiency, and risk mitigation to improve computational performance and data management. This chapter highlights the evolving landscape of cryptocurrency price prediction, the integration of emerging technologies, and the significant role of industrial engineers in refining predictive models. By addressing current limitations and exploring future research directions, this chapter aims to advance the development of more accurate and robust prediction systems, supporting better-informed investment decisions and more stable market behavior.

Updated: 2024-10-18 14:00:44

标题: 提升加密货币市场预测：先进机器学习技术和工业工程贡献

摘要: 加密货币作为去中心化的数字资产，已经经历了快速增长和采用，截至2023年，全球拥有超过23,000种加密货币，市值接近1.1万亿美元（美国每人约3400美元）。这一动态市场提供了重大机遇和风险，突显了需要准确的价格预测模型来管理波动性。本章全面审查了2014年至2024年应用于加密货币价格预测的机器学习（ML）技术。我们探讨了各种ML算法，包括线性模型、基于树的方法，以及高级深度学习架构，如变压器和大型语言模型。此外，我们研究了情感分析在从文本数据（如社交媒体帖子和新闻文章）中捕捉市场情绪以预测价格波动方面的作用。在优化复杂系统和流程方面拥有专业知识的工业工程师在提升这些模型方面起着关键作用。他们通过应用过程优化、效率和风险缓解原则来提高计算性能和数据管理。本章突出了加密货币价格预测的不断发展态势，新技术的整合，以及工业工程师在完善预测模型中的重要角色。通过解决当前的限制并探索未来的研究方向，本章旨在推进更准确和稳健的预测系统的发展，支持更明智的投资决策和更稳定的市场行为。

更新时间: 2024-10-18 14:00:44

领域: cs.LG

下载: http://arxiv.org/abs/2410.14475v1

How Do Training Methods Influence the Utilization of Vision Models?

Not all learnable parameters (e.g., weights) contribute equally to a neural network's decision function. In fact, entire layers' parameters can sometimes be reset to random values with little to no impact on the model's decisions. We revisit earlier studies that examined how architecture and task complexity influence this phenomenon and ask: is this phenomenon also affected by how we train the model? We conducted experimental evaluations on a diverse set of ImageNet-1k classification models to explore this, keeping the architecture and training data constant but varying the training pipeline. Our findings reveal that the training method strongly influences which layers become critical to the decision function for a given task. For example, improved training regimes and self-supervised training increase the importance of early layers while significantly under-utilizing deeper layers. In contrast, methods such as adversarial training display an opposite trend. Our preliminary results extend previous findings, offering a more nuanced understanding of the inner mechanics of neural networks. Code: https://github.com/paulgavrikov/layer_criticality

Updated: 2024-10-18 13:54:46

标题: 培训方法如何影响视觉模型的利用？

摘要: 并非所有可学习参数（例如权重）对神经网络的决策函数贡献相同。事实上，有时整个层的参数可以被重置为随机值，对模型的决策几乎没有影响。我们重访了早期研究，探讨了架构和任务复杂性如何影响这一现象，并提出：这一现象是否也受到我们训练模型的方式的影响？我们对一组多样化的ImageNet-1k分类模型进行了实验评估，以探讨这一问题，保持架构和训练数据不变，但变化训练流程。我们的研究结果表明，训练方法强烈影响对于给定任务哪些层对决策函数至关重要。例如，改进的训练制度和自监督训练增加了早期层的重要性，同时显著未充分利用更深层。相比之下，对抗训练等方法显示相反的趋势。我们的初步结果扩展了先前的发现，提供了对神经网络内部机制更细致的理解。代码：https://github.com/paulgavrikov/layer_criticality

更新时间: 2024-10-18 13:54:46

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14470v1

Flow-based Sampling for Entanglement Entropy and the Machine Learning of Defects

We introduce a novel technique to numerically calculate R\'enyi entanglement entropies in lattice quantum field theory using generative models. We describe how flow-based approaches can be combined with the replica trick using a custom neural-network architecture around a lattice defect connecting two replicas. Numerical tests for the $\phi^4$ scalar field theory in two and three dimensions demonstrate that our technique outperforms state-of-the-art Monte Carlo calculations, and exhibit a promising scaling with the defect size.

Updated: 2024-10-18 13:51:25

标题: 基于流的采样用于纠缠熵和缺陷机器学习

摘要: 我们引入了一种新颖的技术，利用生成模型在晶格量子场论中数值计算Rényi纠缠熵。我们描述了如何将基于流的方法与副本技巧结合起来，使用一个围绕连接两个副本的晶格缺陷的自定义神经网络架构。在二维和三维的$\phi^4$标量场理论中进行的数值测试表明，我们的技术胜过了最先进的蒙特卡罗计算，并展示了与缺陷尺寸有关的有希望的扩展。

更新时间: 2024-10-18 13:51:25

领域: quant-ph,cond-mat.stat-mech,cs.LG,hep-lat

下载: http://arxiv.org/abs/2410.14466v1

IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence

The proposed IncidentResponseGPT framework - a novel system that applies generative artificial intelligence (AI) to potentially enhance the efficiency and effectiveness of traffic incident response. This model allows for synthesis of region-specific incident response guidelines and generates incident response plans adapted to specific area, aiming to expedite decision-making for traffic management authorities. This approach aims to accelerate incident resolution times by suggesting various recommendations (e.g. optimal rerouting strategies, estimating resource needs) to minimize the overall impact on the urban traffic network. The system suggests specific actions, including dynamic lane closures, optimized rerouting and dispatching appropriate emergency resources. We utilize the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) to rank generated response plans based on criteria like impact minimization and resource efficiency based on their proximity to an human-proposed solution.

Updated: 2024-10-18 13:50:10

标题: 《IncidentResponseGPT: 利用生成人工智能生成交通事故应急响应计划》

摘要: 提出了一种名为IncidentResponseGPT框架的新型系统，该系统将生成人工智能（AI）应用于潜在地增强交通事故应急响应的效率和有效性。该模型允许综合特定区域的事故应急响应准则，并生成适应特定区域的事故应急响应计划，旨在加快交通管理机构的决策过程。该方法旨在通过建议各种建议（例如最佳重定向策略、估算资源需求）来缩短事故解决时间，以最小化对城市交通网络的整体影响。该系统建议特定行动，包括动态车道关闭、优化重定向以及调度适当的紧急资源。我们利用TOPSIS（Technique for Order Preference by Similarity to Ideal Solution）根据其与人类提出的解决方案的接近程度来对生成的应急响应计划进行排名，评估标准包括最小化影响和资源效率。

更新时间: 2024-10-18 13:50:10

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2404.18550v4

Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning

Electrocardiogram (ECG) interpretation requires specialized expertise, often involving synthesizing insights from ECG signals with complex clinical queries posed in natural language. The scarcity of labeled ECG data coupled with the diverse nature of clinical inquiries presents a significant challenge for developing robust and adaptable ECG diagnostic systems. This work introduces a novel multimodal meta-learning method for few-shot ECG question answering, addressing the challenge of limited labeled data while leveraging the rich knowledge encoded within large language models (LLMs). Our LLM-agnostic approach integrates a pre-trained ECG encoder with a frozen LLM (e.g., LLaMA and Gemma) via a trainable fusion module, enabling the language model to reason about ECG data and generate clinically meaningful answers. Extensive experiments demonstrate superior generalization to unseen diagnostic tasks compared to supervised baselines, achieving notable performance even with limited ECG leads. For instance, in a 5-way 5-shot setting, our method using LLaMA-3.1-8B achieves accuracy of 84.6%, 77.3%, and 69.6% on single verify, choose and query question types, respectively. These results highlight the potential of our method to enhance clinical ECG interpretation by combining signal processing with the nuanced language understanding capabilities of LLMs, particularly in data-constrained scenarios.

Updated: 2024-10-18 13:48:01

标题: 《基于心电图语言模型的元学习少样本问答》

摘要: 心电图（ECG）解读需要专业的专业知识，通常涉及将来自ECG信号的见解与自然语言中提出的复杂临床问题综合起来。标记ECG数据的稀缺性与临床查询的多样性结合在一起，为开发强大且适应性强的ECG诊断系统提出了重大挑战。这项工作引入了一种新颖的多模态元学习方法，用于少样本ECG问题回答，解决了有限标记数据的挑战，同时利用了大型语言模型（LLMs）中编码的丰富知识。我们的LLM-不可知方法通过一个可训练的融合模块将预训练的ECG编码器与一个冻结的LLM（例如LLaMA和Gemma）集成在一起，使语言模型能够对ECG数据进行推理并生成临床上有意义的答案。大量实验表明，与监督基线相比，我们的方法在未见过的诊断任务上具有更好的泛化性能，即使只有有限的ECG导联也能取得显著的表现。例如，在5种5次投射设置中，我们的方法使用LLaMA-3.1-8B，在单个验证、选择和查询问题类型上分别达到84.6％、77.3％和69.6％的准确率。这些结果突显了我们的方法结合信号处理与LLMs的微妙语言理解能力，特别是在数据受限的情况下，有可能增强临床ECG解读的潜力。

更新时间: 2024-10-18 13:48:01

领域: cs.LG

下载: http://arxiv.org/abs/2410.14464v1

Reinforcement Learning with Lookahead Information

We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take. Such observations are available in many applications, including transactions, navigation and more. When the environment is known, previous work shows that this lookahead information can drastically increase the collected reward. However, outside of specific applications, existing approaches for interacting with unknown environments are not well-adapted to these observations. In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve tight regret versus a baseline that also has access to lookahead information - linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information.

Updated: 2024-10-18 13:42:37

标题: 带有前瞻信息的强化学习

摘要: 我们研究强化学习（RL）问题，其中代理在决定采取哪种行动之前观察到他们当前状态下的奖励或转移实现。这种观察在许多应用中都是可用的，包括交易、导航等。在环境已知的情况下，先前的研究表明，这种前瞻性信息可以显著增加收集到的奖励。然而，在特定应用之外，现有方法与未知环境交互的现有方法并不适应这些观察。在这项工作中，我们填补了这一差距，并设计出能够整合前瞻性信息的可证明高效的学习算法。为了实现这一点，我们使用奖励和转移观察的经验分布进行规划，与仅依赖于估计期望的基本方法形成对比。我们证明，我们的算法实现了与也具有前瞻性信息访问权限的基线相比的严格后悔，线性增加了收集奖励的数量，与不能处理前瞻性信息的代理相比。

更新时间: 2024-10-18 13:42:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.02258v2

The Propensity for Density in Feed-forward Models

Does the process of training a neural network to solve a task tend to use all of the available weights even when the task could be solved with fewer weights? To address this question we study the effects of pruning fully connected, convolutional and residual models while varying their widths. We find that the proportion of weights that can be pruned without degrading performance is largely invariant to model size. Increasing the width of a model has little effect on the density of the pruned model relative to the increase in absolute size of the pruned network. In particular, we find substantial prunability across a large range of model sizes, where our biggest model is 50 times as wide as our smallest model. We explore three hypotheses that could explain these findings.

Updated: 2024-10-18 13:40:44

标题: 前馈模型中的密度倾向

摘要: 在解决任务的过程中，训练神经网络的过程是否倾向于利用所有可用的权重，即使任务可以用更少的权重解决？为了回答这个问题，我们研究了在变化宽度的情况下修剪全连接、卷积和残差模型的影响。我们发现，可以修剪的权重比例在很大程度上与模型大小无关。增加模型的宽度对修剪后模型的密度几乎没有影响，相对于修剪网络绝对大小的增加。特别地，我们发现在各种模型大小的范围内存在大量可修剪性，我们最大的模型比我们最小的模型宽50倍。我们探讨了三个假设来解释这些发现。

更新时间: 2024-10-18 13:40:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14461v1

A Cross Attention Approach to Diagnostic Explainability using Clinical Practice Guidelines for Depression

The lack of explainability using relevant clinical knowledge hinders the adoption of Artificial Intelligence-powered analysis of unstructured clinical dialogue. A wealth of relevant, untapped Mental Health (MH) data is available in online communities, providing the opportunity to address the explainability problem with substantial potential impact as a screening tool for both online and offline applications. We develop a method to enhance attention in popular transformer models and generate clinician-understandable explanations for classification by incorporating external clinical knowledge. Inspired by how clinicians rely on their expertise when interacting with patients, we leverage relevant clinical knowledge to model patient inputs, providing meaningful explanations for classification. This will save manual review time and engender trust. We develop such a system in the context of MH using clinical practice guidelines (CPG) for diagnosing depression, a mental health disorder of global concern. We propose an application-specific language model called ProcesS knowledge-infused cross ATtention (PSAT), which incorporates CPGs when computing attention. Through rigorous evaluation on three expert-curated datasets related to depression, we demonstrate application-relevant explainability of PSAT. PSAT also surpasses the performance of nine baseline models and can provide explanations where other baselines fall short. We transform a CPG resource focused on depression, such as the Patient Health Questionnaire (e.g. PHQ-9) and related questions, into a machine-readable ontology using SNOMED-CT. With this resource, PSAT enhances the ability of models like GPT-3.5 to generate application-relevant explanations.

Updated: 2024-10-18 13:21:49

标题: 使用临床实践指南的交叉注意力方法来解释抑郁症的诊断能力

摘要: 缺乏使用相关临床知识的可解释性阻碍了人工智能驱动的非结构化临床对话分析的采用。在线社区中提供了丰富而未开发的相关心理健康（MH）数据，为解决可解释性问题提供了机会，并具有对在线和离线应用都有重大潜在影响的潜力作为筛查工具。我们开发了一种方法，通过整合外部临床知识来增强流行的变压器模型中的注意力，并为分类生成临床人员可理解的解释。受到临床人员在与患者互动时依赖其专业知识的启发，我们利用相关临床知识来建模患者输入，为分类提供有意义的解释。这将节省手动审查时间并培养信任。我们在心理健康（MH）背景下开发了这样一个系统，使用临床实践指南（CPG）来诊断抑郁症，这是全球关注的心理健康障碍。我们提出了一种名为ProcesS knowledge-infused cross ATtention（PSAT）的应用特定语言模型，在计算注意力时整合了CPGs。通过对与抑郁症相关的三个专家策划的数据集进行严格评估，我们展示了PSAT的应用相关可解释性。PSAT还超越了九个基准模型的性能，并可以提供其他基准模型无法提供的解释。我们将关注抑郁症的CPG资源，如患者健康问卷（例如PHQ-9）和相关问题，转换为使用SNOMED-CT的机器可读本体。借助这一资源，PSAT增强了像GPT-3.5这样的模型生成应用相关解释的能力。

更新时间: 2024-10-18 13:21:49

领域: cs.AI

下载: http://arxiv.org/abs/2311.13852v4

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution.

Updated: 2024-10-18 13:16:57

标题: 基于模型内部的答案归因，用于可信的检索增强生成

摘要: 确保模型答案的可验证性是问答领域中检索增强生成（RAG）的基本挑战。最近，提出了自引提示，以便让大型语言模型（LLMs）生成引用支持文档的引文和答案。然而，自引用的LLMs经常难以匹配所需的格式，引用不存在的来源，并且在生成过程中未能忠实地反映LLMs的上下文使用。在这项工作中，我们提出了MIRAGE --基于模型内部的RAG解释 -- 一种使用模型内部进行忠实答案归因的即插即用方法。MIRAGE通过显著性方法检测上下文敏感的答案标记，并将它们与通过检索的文档进行配对，从而对它们的预测做出贡献。我们在一个多语言抽取式问答数据集上评估了我们提出的方法，发现与人类答案归因具有高度一致性。在开放式问答中，MIRAGE实现了与自引用相当的引文质量和效率，同时还允许对归因参数进行更精细的控制。我们的定性评估突出了MIRAGE归因的忠实性，并强调了模型内部在RAG答案归因中的应用前景。

更新时间: 2024-10-18 13:16:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13663v4

Constrained Recurrent Bayesian Forecasting for Crack Propagation

Predictive maintenance of railway infrastructure, especially railroads, is essential to ensure safety. However, accurate prediction of crack evolution represents a major challenge due to the complex interactions between intrinsic and external factors, as well as measurement uncertainties. Effective modeling requires a multidimensional approach and a comprehensive understanding of these dynamics and uncertainties. Motivated by an industrial use case based on collected real data containing measured crack lengths, this paper introduces a robust Bayesian multi-horizon approach for predicting the temporal evolution of crack lengths on rails. This model captures the intricate interplay between various factors influencing crack growth. Additionally, the Bayesian approach quantifies both epistemic and aleatoric uncertainties, providing a confidence interval around predictions. To enhance the model's reliability for railroad maintenance, specific constraints are incorporated. These constraints limit non-physical crack propagation behavior and prioritize safety. The findings reveal a trade-off between prediction accuracy and constraint compliance, highlighting the nuanced decision-making process in model training. This study offers insights into advanced predictive modeling for dynamic temporal forecasting, particularly in railway maintenance, with potential applications in other domains.

Updated: 2024-10-18 13:15:53

标题: 约束循环贝叶斯预测在裂纹扩展中的应用

摘要: 铁路基础设施，尤其是铁轨的预测性维护对确保安全至关重要。然而，由于内在和外部因素之间复杂的相互作用以及测量不确定性，裂纹演变的准确预测构成了一项重大挑战。有效的建模需要多维方法和对这些动态和不确定性的全面理解。本文以基于收集的真实数据的工业案例为动机，介绍了一种用于预测铁轨上裂纹长度时间演变的鲁棒贝叶斯多时间段方法。该模型捕捉了影响裂纹生长的各种因素之间错综复杂的相互作用。此外，贝叶斯方法量化了认识和随机不确定性，为预测提供了置信区间。为增强铁路维护模型的可靠性，特定约束被纳入。这些约束限制非物理的裂纹传播行为并优先考虑安全。研究结果揭示了在模型训练中的精细决策过程，突出了预测准确性与约束遵从之间的权衡。这项研究为动态时间预测的高级预测建模提供了见解，特别是在铁路维护领域，具有在其他领域的潜在应用。

更新时间: 2024-10-18 13:15:53

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.14761v1

Hip Fracture Patient Pathways and Agent-based Modelling

Increased healthcare demand is significantly straining European services. Digital solutions including advanced modelling techniques offer a promising solution to optimising patient flow without impacting day-to-day healthcare provision. In this work we outline an ongoing project that aims to optimise healthcare resources using agent-based simulations.

Updated: 2024-10-18 13:15:50

标题: 髋部骨折患者路径和基于代理的建模

摘要: 欧洲服务的需求增加显著，给医疗服务带来了巨大压力。数字化解决方案，包括先进的建模技术，为优化患者流程提供了有希望的解决方案，而不影响日常医疗服务提供。在这项工作中，我们概述了一个旨在利用基于代理的模拟来优化医疗资源的持续进行中项目。

更新时间: 2024-10-18 13:15:50

领域: cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.12804v2

Practical Light Clients for Committee-Based Blockchains

Light clients are gaining increasing attention in the literature since they obviate the need for users to set up dedicated blockchain full nodes. While the literature features a number of light client instantiations, most light client protocols optimize for long offline phases and implicitly assume that the block headers to be verified are signed by highly dynamic validators. In this paper, we show that (i) most light clients are rarely offline for more than a week, and (ii) validators are unlikely to drastically change in most permissioned blockchains and in a number of permissionless blockchains, such as Cosmos and Polkadot. Motivated by these findings, we propose a novel practical system that optimizes for such realistic assumptions and achieves minimal communication and computational costs for light clients when compared to existing protocols. By means of a prototype implementation of our solution, we show that our protocol achieves a reduction by up to $90$ and $40000\times$ (respectively) in end-to-end latency and up to $1000$ and $10000\times$ (respectively) smaller proof size when compared to two state-of-the-art light client instantiations from the literature.

Updated: 2024-10-18 13:10:00

标题: 委员会式区块链的实用轻量客户端

摘要: 轻量级客户端在文献中越来越受到关注，因为它们消除了用户需要设置专用区块链全节点的需求。虽然文献中有许多轻量级客户端实例，但大多数轻量级客户端协议都优化了长时间的离线阶段，并隐含地假设要验证的区块头是由高度动态的验证者签名的。在本文中，我们展示了（i）大多数轻量级客户端很少离线超过一周，以及（ii）在大多数许可区块链和一些许可区块链（如Cosmos和Polkadot）中，验证者不太可能发生 drastical 改变。受这些发现的启发，我们提出了一个新颖的实用系统，该系统优化了这些现实假设，并在与现有协议相比时实现了轻量级客户端的最小通信和计算成本。通过我们解决方案的原型实现，我们展示了我们的协议相对于文献中两种最先进的轻量级客户端实例，端到端延迟减少了高达90倍和40000倍（分别），证明尺寸减小了高达1000倍和10000倍（分别）。

更新时间: 2024-10-18 13:10:00

领域: cs.CR

下载: http://arxiv.org/abs/2410.03347v2

Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks?

Equivariant Graph Neural Networks (GNNs) that incorporate E(3) symmetry have achieved significant success in various scientific applications. As one of the most successful models, EGNN leverages a simple scalarization technique to perform equivariant message passing over only Cartesian vectors (i.e., 1st-degree steerable vectors), enjoying greater efficiency and efficacy compared to equivariant GNNs using higher-degree steerable vectors. This success suggests that higher-degree representations might be unnecessary. In this paper, we disprove this hypothesis by exploring the expressivity of equivariant GNNs on symmetric structures, including $k$-fold rotations and regular polyhedra. We theoretically demonstrate that equivariant GNNs will always degenerate to a zero function if the degree of the output representations is fixed to 1 or other specific values. Based on this theoretical insight, we propose HEGNN, a high-degree version of EGNN to increase the expressivity by incorporating high-degree steerable vectors while maintaining EGNN's efficiency through the scalarization trick. Our extensive experiments demonstrate that HEGNN not only aligns with our theoretical analyses on toy datasets consisting of symmetric structures, but also shows substantial improvements on more complicated datasets such as $N$-body and MD17. Our theoretical findings and empirical results potentially open up new possibilities for the research of equivariant GNNs.

Updated: 2024-10-18 13:09:00

标题: 等变图神经网络中的高度表示是否真的是不必要的？

摘要: 具有E(3)对称性的等变图神经网络（GNNs）在各种科学应用中取得了显著成功。作为最成功的模型之一，EGNN利用简单的标量化技术，在仅使用笛卡尔向量（即1阶可调向量）进行等变消息传递，相比使用更高阶可调向量的等变GNNs，享有更高的效率和功效。这一成功表明，更高阶的表示可能是不必要的。在本文中，我们通过探索等变GNNs在对称结构上的表达能力，包括k次旋转和正多面体，驳斥了这一假设。我们在理论上证明，如果将输出表示的阶数固定为1或其他特定值，等变GNNs将始终退化为零函数。基于这一理论洞察，我们提出了HEGNN，即EGNN的高阶版本，通过引入高阶可调向量来增加表达能力，同时通过标量化技巧保持EGNN的效率。我们的广泛实验表明，HEGNN不仅与我们在包含对称结构的玩具数据集上的理论分析相一致，而且在更复杂的数据集（如N体和MD17）上显示出显著改进。我们的理论发现和实证结果可能为等变GNNs的研究开辟了新的可能性。

更新时间: 2024-10-18 13:09:00

领域: cs.LG

下载: http://arxiv.org/abs/2410.11443v2

A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus

Natural language inference (NLI), the task of recognizing the entailment relationship in sentence pairs, is an actively studied topic serving as a proxy for natural language understanding. Despite the relevance of the task in building conversational agents and improving text classification, machine translation and other NLP tasks, to the best of our knowledge, there is no publicly available NLI corpus for the Romanian language. To this end, we introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs, which are obtained via distant supervision, and 6K validation and test sentence pairs, which are manually annotated with the correct labels. We conduct experiments with multiple machine learning methods based on distant learning, ranging from shallow models based on word embeddings to transformer-based neural networks, to establish a set of competitive baselines. Furthermore, we improve on the best model by employing a new curriculum learning strategy based on data cartography. Our dataset and code to reproduce the baselines are available at https://github.com/Eduard6421/RONLI.

Updated: 2024-10-18 13:03:05

标题: 一种基于地图绘制的课程学习方法在RoNLI上的应用：第一个罗马尼亚自然语言推理语料库

摘要: 自然语言推断（NLI）是识别句子对中蕴涵关系的任务，是一个广泛研究的主题，作为自然语言理解的代理。尽管该任务在构建对话代理、改进文本分类、机器翻译和其他自然语言处理任务中具有重要意义，但据我们所知，目前尚无公开的罗马尼亚语NLI语料库。为此，我们介绍了第一个罗马尼亚语NLI语料库（RoNLI），包含58K个通过远程监督获得的训练句子对，以及6K个手动注释的验证和测试句子对，标有正确标签。我们使用基于远程学习的多种机器学习方法进行实验，从基于词嵌入的浅层模型到基于变压器的神经网络，建立了一组竞争基线。此外，我们通过采用基于数据制图的新课程学习策略改进了最佳模型。我们的数据集和用于重现基线结果的代码可在https://github.com/Eduard6421/RONLI 获取。

更新时间: 2024-10-18 13:03:05

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.11877v5

Learning to refine domain knowledge for biological network inference

Perturbation experiments allow biologists to discover causal relationships between variables of interest, but the sparsity and high dimensionality of these data pose significant challenges for causal structure learning algorithms. Biological knowledge graphs can bootstrap the inference of causal structures in these situations, but since they compile vastly diverse information, they can bias predictions towards well-studied systems. Alternatively, amortized causal structure learning algorithms encode inductive biases through data simulation and train supervised models to recapitulate these synthetic graphs. However, realistically simulating biology is arguably even harder than understanding a specific system. In this work, we take inspiration from both strategies and propose an amortized algorithm for refining domain knowledge, based on data observations. On real and synthetic datasets, we show that our approach outperforms baselines in recovering ground truth causal graphs and identifying errors in the prior knowledge with limited interventional data.

Updated: 2024-10-18 12:53:23

标题: 学习如何精细化生物网络推断的领域知识

摘要: 摇动实验使生物学家能够发现感兴趣变量之间的因果关系，但这些数据的稀疏性和高维度给因果结构学习算法带来了重大挑战。生物知识图可以在这种情况下启动因果结构的推断，但由于它们编制了大量多样化的信息，可能会偏向于对研究充分的系统的预测。相反，摊销因果结构学习算法通过数据模拟编码归纳偏见，并训练监督模型以重现这些合成图。然而，实际模拟生物学可能比理解特定系统更困难。在这项工作中，我们从这两种策略中汲取灵感，提出了一种基于数据观察的摊销算法，用于优化领域知识。在真实和合成数据集上，我们展示了我们的方法在恢复地面真实因果图和识别有限干预数据中的先前知识错误方面的表现优于基线。

更新时间: 2024-10-18 12:53:23

领域: q-bio.QM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14436v1

A Bioinformatic Approach Validated Utilizing Machine Learning Algorithms to Identify Relevant Biomarkers and Crucial Pathways in Gallbladder Cancer

Gallbladder cancer (GBC) is the most frequent cause of disease among biliary tract neoplasms. Identifying the molecular mechanisms and biomarkers linked to GBC progression has been a significant challenge in scientific research. Few recent studies have explored the roles of biomarkers in GBC. Our study aimed to identify biomarkers in GBC using machine learning (ML) and bioinformatics techniques. We compared GBC tumor samples with normal samples to identify differentially expressed genes (DEGs) from two microarray datasets (GSE100363, GSE139682) obtained from the NCBI GEO database. A total of 146 DEGs were found, with 39 up-regulated and 107 down-regulated genes. Functional enrichment analysis of these DEGs was performed using Gene Ontology (GO) terms and REACTOME pathways through DAVID. The protein-protein interaction network was constructed using the STRING database. To identify hub genes, we applied three ranking algorithms: Degree, MNC, and Closeness Centrality. The intersection of hub genes from these algorithms yielded 11 hub genes. Simultaneously, two feature selection methods (Pearson correlation and recursive feature elimination) were used to identify significant gene subsets. We then developed ML models using SVM and RF on the GSE100363 dataset, with validation on GSE139682, to determine the gene subset that best distinguishes GBC samples. The hub genes outperformed the other gene subsets. Finally, NTRK2, COL14A1, SCN4B, ATP1A2, SLC17A7, SLIT3, COL7A1, CLDN4, CLEC3B, ADCYAP1R1, and MFAP4 were identified as crucial genes, with SLIT3, COL7A1, and CLDN4 being strongly linked to GBC development and prediction.

Updated: 2024-10-18 12:51:19

标题: 一种利用机器学习算法验证的生物信息学方法，用于识别胆囊癌中相关生物标记和关键通路

摘要: 胆囊癌（GBC）是胆道肿瘤中疾病的最常见原因。在科学研究中，确定与GBC进展相关的分子机制和生物标志物一直是一个重大挑战。最近的一些研究探讨了生物标志物在GBC中的作用。我们的研究旨在利用机器学习（ML）和生物信息学技术在GBC中鉴定生物标志物。我们比较了GBC肿瘤样本与正常样本，从NCBI GEO数据库中获取了两个微阵列数据集（GSE100363，GSE139682）来鉴定不同表达的基因（DEGs）。共发现了146个DEGs，其中39个上调基因和107个下调基因。使用DAVID对这些DEGs进行了功能富集分析，使用Gene Ontology（GO）术语和REACTOME通路。使用STRING数据库构建了蛋白质相互作用网络。为了鉴定中心基因，我们应用了三个排名算法：度、MNC和接近中心性。这些算法的中心基因的交集得到了11个中心基因。同时，使用皮尔逊相关和递归特征消除两种特征选择方法来鉴定重要基因子集。然后，我们在GSE100363数据集上使用SVM和RF开发了ML模型，并在GSE139682上进行验证，以确定最能区分GBC样本的基因子集。中心基因的表现优于其他基因子集。最后，NTRK2，COL14A1，SCN4B，ATP1A2，SLC17A7，SLIT3，COL7A1，CLDN4，CLEC3B，ADCYAP1R1和MFAP4被确定为关键基因，其中SLIT3，COL7A1和CLDN4与GBC的发展和预测密切相关。

更新时间: 2024-10-18 12:51:19

领域: q-bio.GN,cs.LG

下载: http://arxiv.org/abs/2410.14433v1

Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation

The paper introduces a framework for the evaluation of the encoding of factual scientific knowledge, designed to streamline the manual evaluation process typically conducted by domain experts. Inferring over and extracting information from Large Language Models (LLMs) trained on a large corpus of scientific literature can potentially define a step change in biomedical discovery, reducing the barriers for accessing and integrating existing medical evidence. This work explores the potential of LLMs for dialoguing with biomedical background knowledge, using the context of antibiotic discovery. The framework involves of three evaluation steps, each assessing different aspects sequentially: fluency, prompt alignment, semantic coherence, factual knowledge, and specificity of the generated responses. By splitting these tasks between non-experts and experts, the framework reduces the effort required from the latter. The work provides a systematic assessment on the ability of eleven state-of-the-art models LLMs, including ChatGPT, GPT-4 and Llama 2, in two prompting-based tasks: chemical compound definition generation and chemical compound-fungus relation determination. Although recent models have improved in fluency, factual accuracy is still low and models are biased towards over-represented entities. The ability of LLMs to serve as biomedical knowledge bases is questioned, and the need for additional systematic evaluation frameworks is highlighted. While LLMs are currently not fit for purpose to be used as biomedical factual knowledge bases in a zero-shot setting, there is a promising emerging property in the direction of factuality as the models become domain specialised, scale-up in size and level of human feedback.

Updated: 2024-10-18 12:49:35

标题: 大型语言模型、科学知识和事实性：简化人类专家评估的框架

摘要: 这篇论文介绍了一个评估事实科学知识编码的框架，旨在简化领域专家通常进行的手动评估过程。推测和提取大语言模型（LLMs）中训练的大量科学文献中的信息，可能定义了生物医学发现的一个重要改变，降低了访问和整合现有医学证据的障碍。该工作探讨了LLMs与生物医学背景知识对话的潜力，以抗生素发现的背景为例。该框架包括三个评估步骤，依次评估不同方面：流畅度、提示对齐、语义连贯性、事实知识和生成响应的特异性。通过在非专家和专家之间分配这些任务，该框架减少了专家所需的工作量。该工作对包括ChatGPT、GPT-4和Llama 2在内的十一种最先进的模型LLMs的能力进行了系统评估，涉及两个基于提示的任务：化合物定义生成和化合物-真菌关系确定。尽管最近的模型在流畅性方面有所改善，但事实准确性仍然较低，并且模型对过度表示的实体存在偏见。质疑了LLMs作为生物医学知识库的能力，并强调了对额外系统评估框架的需求。虽然LLMs目前还不适合作为零-shot设置中使用的生物医学事实知识库，但随着模型变得领域专业化、规模扩大和人类反馈水平提高，它们在事实性方面呈现出有前途的新特性。

更新时间: 2024-10-18 12:49:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.17819v3

FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models

Modeling and producing lifelike clothed human images has attracted researchers' attention from different areas for decades, with the complexity from highly articulated and structured content. Rendering algorithms decompose and simulate the imaging process of a camera, while are limited by the accuracy of modeled variables and the efficiency of computation. Generative models can produce impressively vivid human images, however still lacking in controllability and editability. This paper studies photorealism enhancement of rendered images, leveraging generative power from diffusion models on the controlled basis of rendering. We introduce a novel framework to translate rendered images into their realistic counterparts, which consists of two stages: Domain Knowledge Injection (DKI) and Realistic Image Generation (RIG). In DKI, we adopt positive (real) domain finetuning and negative (rendered) domain embedding to inject knowledge into a pretrained Text-to-image (T2I) diffusion model. In RIG, we generate the realistic image corresponding to the input rendered image, with a Texture-preserving Attention Control (TAC) to preserve fine-grained clothing textures, exploiting the decoupled features encoded in the UNet structure. Additionally, we introduce SynFashion dataset, featuring high-quality digital clothing images with diverse textures. Extensive experimental results demonstrate the superiority and effectiveness of our method in rendered-to-real image translation.

Updated: 2024-10-18 12:48:22

标题: FashionR2R：使用扩散模型进行保持纹理的从渲染到真实图像翻译

摘要: 对于几十年来吸引研究人员注意力的建模和生成逼真服装人类图像，其复杂性源于高度关节化和结构化内容。渲染算法分解和模拟相机成像过程，但受到建模变量准确性和计算效率的限制。生成模型可以生成令人印象深刻的逼真人类图像，但仍然缺乏可控性和可编辑性。本文研究了渲染图像的逼真增强，利用扩散模型的生成能力，在渲染的基础上进行控制。我们引入了一个新颖的框架，将渲染图像转换为其逼真对应物，包括两个阶段：领域知识注入（DKI）和逼真图像生成（RIG）。在DKI中，我们采用正（真实）领域微调和负（渲染）领域嵌入，将知识注入预训练的文本到图像（T2I）扩散模型中。在RIG中，我们生成与输入渲染图像对应的逼真图像，采用纹理保持注意力控制（TAC）来保留细粒度的服装纹理，利用UNet结构中编码的解耦特征。此外，我们引入了SynFashion数据集，其中包含具有多样纹理的高质量数字服装图像。大量实验结果证明了我们的方法在渲染到真实图像转换中的优越性和有效性。

更新时间: 2024-10-18 12:48:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14429v1

Predicting time-varying flux and balance in metabolic systems using structured neural-ODE processes

We develop a novel data-driven framework as an alternative to dynamic flux balance analysis, bypassing the demand for deep domain knowledge and manual efforts to formulate the optimization problem. The proposed framework is end-to-end, which trains a structured neural ODE process (SNODEP) model to estimate flux and balance samples using gene-expression time-series data. SNODEP is designed to circumvent the limitations of the standard neural ODE process model, including restricting the latent and decoder sampling distributions to be normal and lacking structure between context points for calculating the latent, thus more suitable for modeling the underlying dynamics of a metabolic system. Through comprehensive experiments ($156$ in total), we demonstrate that SNODEP not only predicts the unseen time points of real-world gene-expression data and the flux and balance estimates well but can even generalize to more challenging unseen knockout configurations and irregular data sampling scenarios, all essential for metabolic pathway analysis. We hope our work can serve as a catalyst for building more scalable and powerful models for genome-scale metabolic analysis. Our code is available at: \url{https://github.com/TrustMLRG/SNODEP}.

Updated: 2024-10-18 12:41:41

标题: 使用结构化神经-ODE 过程预测代谢系统中的时间变化通量和平衡

摘要: 我们开发了一个新颖的数据驱动框架，作为动态通量平衡分析的一种替代方案，绕过对深度领域知识和手动努力来制定优化问题的需求。所提出的框架是端到端的，通过训练一个结构化神经ODE过程（SNODEP）模型，使用基因表达时间序列数据来估计通量和平衡样本。SNODEP的设计旨在规避标准神经ODE过程模型的限制，包括将潜在和解码器采样分布限制为正常分布，并且缺乏计算潜在值之间的结构，因此更适合对代谢系统的潜在动态进行建模。通过全面的实验（共156个），我们证明了SNODEP不仅可以很好地预测真实世界基因表达数据的未见时间点以及通量和平衡估计，甚至可以推广到更具挑战性的未见基因敲除配置和不规则数据采样情景，这对代谢途径分析至关重要。我们希望我们的工作可以作为建立更具可扩展性和强大性的基因组规模代谢分析模型的催化剂。我们的代码可以在以下网址找到：\url{https://github.com/TrustMLRG/SNODEP}。

更新时间: 2024-10-18 12:41:41

领域: cs.LG

下载: http://arxiv.org/abs/2410.14426v1

Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models

Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.

Updated: 2024-10-18 12:41:23

标题: 即插即用的后验抽样在测量和先验模型不匹配情况下

摘要: 后验抽样已被证明是解决成像逆问题的强大贝叶斯方法。最近出现的即插即用未调整朗之万算法（PnP-ULA）已成为通过将物理测量模型与使用图像去噪器指定的深度学习先验相结合，进行蒙特卡洛抽样和最小均方误差（MMSE）估计的一种有前途的方法。然而，PnP-ULA的抽样分布与不匹配的数据保真度和去噪器之间的复杂关系尚未得到理论分析。我们通过提出后验-L2伪度量并将其用于在不匹配的后验分布下量化PnP-ULA的显式误差界来填补这一空白。我们在几个逆问题上对我们的理论进行了数值验证，例如从高斯混合模型和图像去模糊中抽样。我们的结果表明，PnP-ULA的抽样分布对于测量模型和去噪器不匹配的敏感性可以被准确地表征。

更新时间: 2024-10-18 12:41:23

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.03546v2

Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

Parameter-efficient fine-tuning (PEFT) can bridge the gap between large language models (LLMs) and downstream tasks. However, PEFT has been proven vulnerable to malicious attacks. Research indicates that poisoned LLMs, even after PEFT, retain the capability to activate internalized backdoors when input samples contain predefined triggers. In this paper, we introduce a novel weak-to-strong unlearning algorithm to defend against backdoor attacks based on feature alignment knowledge distillation, named W2SDefense. Specifically, we first train a small-scale language model through full-parameter fine-tuning to serve as the clean teacher model. Then, this teacher model guides the large-scale poisoned student model in unlearning the backdoor, leveraging PEFT. Theoretical analysis suggests that W2SDefense has the potential to enhance the student model's ability to unlearn backdoor features, preventing the activation of the backdoor. We conduct experiments on text classification tasks involving three state-of-the-art language models and three different backdoor attack algorithms. Our empirical results demonstrate the outstanding performance of W2SDefense in defending against backdoor attacks without compromising model performance.

Updated: 2024-10-18 12:39:32

标题: 使用弱到强知识蒸馏来消除LLMs的后门攻击

摘要: Parameter-efficient fine-tuning（PEFT）可以弥合大型语言模型（LLMs）和下游任务之间的差距。然而，PEFT已被证明容易受到恶意攻击。研究表明，即使经过PEFT处理，毒害的LLMs在输入样本中包含预定义触发器时仍保留激活内部后门的能力。在本文中，我们介绍了一种新颖的弱到强反学习算法，用于基于特征对齐知识蒸馏防御后门攻击，命名为W2SDefense。具体而言，我们首先通过全参数微调训练一个小规模语言模型作为干净的教师模型。然后，这个教师模型引导大规模毒害的学生模型通过PEFT来反学习后门。理论分析表明，W2SDefense有潜力增强学生模型消除后门特征的能力，防止后门的激活。我们在涉及三种最先进的语言模型和三种不同后门攻击算法的文本分类任务上进行实验。我们的实证结果展示了W2SDefense在抵御后门攻击方面的出色表现，而不损害模型性能。

更新时间: 2024-10-18 12:39:32

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.14425v1

Oblivious Monitoring for Discrete-Time STL via Fully Homomorphic Encryption

When monitoring a cyber-physical system (CPS) from a remote server, keeping the monitored data secret is crucial, particularly when they contain sensitive information, e.g., biological or location data. Recently, Banno et al. (CAV'22) proposed a protocol for online LTL monitoring that keeps data concealed from the server using Fully Homomorphic Encryption (FHE). We build on this protocol to allow arithmetic operations over encrypted values, e.g., to compute a safety measurement combining distance, velocity, and so forth. Overall, our protocol enables oblivious online monitoring of discrete-time real-valued signals against signal temporal logic (STL) formulas. Our protocol combines two FHE schemes, CKKS and TFHE, leveraging their respective strengths. We employ CKKS to evaluate arithmetic predicates in STL formulas while utilizing TFHE to process them using a DFA derived from the STL formula. We conducted case studies on monitoring blood glucose levels and vehicles' behavior against the Responsibility-Sensitive Safety (RSS) rules. Our results suggest the practical relevance of our protocol.

Updated: 2024-10-18 12:39:21

标题: 使用完全同态加密的离线监测离散时间STL

摘要: 在远程服务器监控网络物理系统（CPS）时，保持监控数据的保密性至关重要，特别是当这些数据包含敏感信息，例如生物或位置数据。最近，Banno等人（CAV'22）提出了一种在线LTL监控协议，使用全同态加密（FHE）将数据隐藏在服务器中。我们在此协议的基础上进行改进，允许对加密数值进行算术运算，例如计算结合距离、速度等的安全测量。总体而言，我们的协议实现了对离散时间实值信号与信号时间逻辑（STL）公式的遗忘式在线监控。我们的协议结合了两种FHE方案，CKKS和TFHE，充分利用它们各自的优势。我们使用CKKS来评估STL公式中的算术谓词，同时利用TFHE来使用从STL公式导出的DFA来处理这些谓词。我们对监控血糖水平和车辆行为与“责任敏感安全”（RSS）规则的案例进行了研究。我们的结果表明了我们的协议的实际相关性。

更新时间: 2024-10-18 12:39:21

领域: cs.CR,cs.FL

下载: http://arxiv.org/abs/2405.16767v3

Integrating Deep Learning with Fundus and Optical Coherence Tomography for Cardiovascular Disease Prediction

Early identification of patients at risk of cardiovascular diseases (CVD) is crucial for effective preventive care, reducing healthcare burden, and improving patients' quality of life. This study demonstrates the potential of retinal optical coherence tomography (OCT) imaging combined with fundus photographs for identifying future adverse cardiac events. We used data from 977 patients who experienced CVD within a 5-year interval post-image acquisition, alongside 1,877 control participants without CVD, totaling 2,854 subjects. We propose a novel binary classification network based on a Multi-channel Variational Autoencoder (MCVAE), which learns a latent embedding of patients' fundus and OCT images to classify individuals into two groups: those likely to develop CVD in the future and those who are not. Our model, trained on both imaging modalities, achieved promising results (AUROC 0.78 +/- 0.02, accuracy 0.68 +/- 0.002, precision 0.74 +/- 0.02, sensitivity 0.73 +/- 0.02, and specificity 0.68 +/- 0.01), demonstrating its efficacy in identifying patients at risk of future CVD events based on their retinal images. This study highlights the potential of retinal OCT imaging and fundus photographs as cost-effective, non-invasive alternatives for predicting cardiovascular disease risk. The widespread availability of these imaging techniques in optometry practices and hospitals further enhances their potential for large-scale CVD risk screening. Our findings contribute to the development of standardized, accessible methods for early CVD risk identification, potentially improving preventive care strategies and patient outcomes.

Updated: 2024-10-18 12:37:51

标题: 将深度学习与眼底和光学相干断层扫描结合，用于心血管疾病预测

摘要: 早期识别患有心血管疾病（CVD）风险的患者对于有效的预防护理、减少医疗负担和改善患者生活质量至关重要。本研究展示了视网膜光学相干断层扫描（OCT）成像与眼底照片相结合用于识别未来不良心脏事件的潜力。我们使用了977名在图像获取后5年内经历CVD的患者的数据，以及1,877名无CVD的对照参与者，共计2,854名受试者。我们提出了一种基于多通道变分自动编码器（MCVAE）的新型二分类网络，该网络学习患者眼底和OCT图像的潜在嵌入，将个体分类为两组：未来可能发展心血管疾病的患者和不会发展的患者。我们的模型在两种成像模式上训练，并取得了令人满意的结果（AUROC 0.78 +/- 0.02，准确度0.68 +/- 0.002，精确度0.74 +/- 0.02，敏感度0.73 +/- 0.02和特异性0.68 +/- 0.01），表明其在基于视网膜图像识别未来CVD事件风险的患者方面的有效性。该研究突显了视网膜OCT成像和眼底照片作为预测心血管疾病风险的经济、无创替代方案的潜力。这些成像技术在验光和医院中的广泛可用性进一步增强了它们在大规模CVD风险筛查中的潜力。我们的发现有助于制定标准化、可访问的早期CVD风险识别方法，潜在改善预防护理策略和患者结果。

更新时间: 2024-10-18 12:37:51

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.14423v1

Asymptotic non-linear shrinkage formulas for weighted sample covariance

We compute asymptotic non-linear shrinkage formulas for covariance and precision matrix estimators for weighted sample covariances, in the spirit of Ledoit and P\'ech\'e. We detail explicitly the formulas for exponentially-weighted sample covariances. Those new tools pave a way for applying non-linear shrinkage methods on weighted sample covariance. We show experimentally the performance of the asymptotic shrinkage formulas. Finally, we test the robustness of the theory to a heavy-tailed distributions.

Updated: 2024-10-18 12:33:10

标题: 加权样本协方差的渐近非线性收缩公式

摘要: 我们计算了加权样本协方差的协方差和精度矩阵估计的渐近非线性收缩公式，这是在Ledoit和P\'ech\'e的精神中进行的。我们明确详细地说明了指数加权样本协方差的公式。这些新工具为在加权样本协方差上应用非线性收缩方法打开了道路。我们实验性地展示了渐近缩减公式的性能。最后，我们测试了理论对重尾分布的稳健性。

更新时间: 2024-10-18 12:33:10

领域: math.ST,cs.LG,math.PR,stat.AP,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.14420v1

An explainable machine learning approach for energy forecasting at the household level

Electricity forecasting has been a recurring research topic, as it is key to finding the right balance between production and consumption. While most papers are focused on the national or regional scale, few are interested in the household level. Desegregated forecast is a common topic in Machine Learning (ML) literature but lacks explainability that household energy forecasts require. This paper specifically targets the challenges of forecasting electricity use at the household level. This paper confronts common Machine Learning algorithms to electricity household forecasts, weighing the pros and cons, including accuracy and explainability with well-known key metrics. Furthermore, we also confront them in this paper with the business challenges specific to this sector such as explainability or outliers resistance. We introduce a custom decision tree, aiming at providing a fair estimate of the energy consumption, while being explainable and consistent with human intuition. We show that this novel method allows greater explainability without sacrificing much accuracy. The custom tree methodology can be used in various business use cases but is subject to limitations, such as a lack of resilience with outliers.

Updated: 2024-10-18 12:29:10

标题: 一个可解释的机器学习方法用于家庭能量预测

摘要: 电力预测是一个经常出现的研究课题，因为它对于找到生产和消费之间的正确平衡至关重要。虽然大多数论文都集中在国家或地区范围内，但很少有人对家庭层面感兴趣。细分预测是机器学习文献中常见的主题，但缺乏家庭能源预测所需的可解释性。本文特别针对家庭层面的电力使用预测挑战。本文将常见的机器学习算法与家庭电力预测进行对比，权衡利弊，包括准确性和可解释性与众所周知的关键指标。此外，我们还将它们与该领域特有的商业挑战对比，如可解释性或异常值抵抗力。我们引入了一个定制决策树，旨在提供对能源消耗的公正估计，同时具有可解释性并与人类直觉一致。我们展示了这种新颖方法可以在不牺牲太多准确性的情况下提供更大的可解释性。定制决策树方法可以用于各种业务用例，但存在一些限制，如对异常值的抵抗力不足。

更新时间: 2024-10-18 12:29:10

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.14416v1

Multi-LLM QA with Embodied Exploration

Large language models (LLMs) have grown in popularity due to their natural language interface and pre trained knowledge, leading to rapidly increasing success in question-answering (QA) tasks. More recently, multi-agent systems with LLM-based agents (Multi-LLM) have been utilized increasingly more for QA. In these scenarios, the models may each answer the question and reach a consensus or each model is specialized to answer different domain questions. However, most prior work dealing with Multi-LLM QA has focused on scenarios where the models are asked in a zero-shot manner or are given information sources to extract the answer. For question answering of an unknown environment, embodied exploration of the environment is first needed to answer the question. This skill is necessary for personalizing embodied AI to environments such as households. There is a lack of insight into whether a Multi-LLM system can handle question-answering based on observations from embodied exploration. In this work, we address this gap by investigating the use of Multi-Embodied LLM Explorers (MELE) for QA in an unknown environment. Multiple LLM-based agents independently explore and then answer queries about a household environment. We analyze different aggregation methods to generate a single, final answer for each query: debating, majority voting, and training a central answer module (CAM). Using CAM, we observe a $46\%$ higher accuracy compared against the other non-learning-based aggregation methods. We provide code and the query dataset for further research.

Updated: 2024-10-18 12:27:07

标题: 具有体验式探索的多LLM问答

摘要: 大语言模型（LLMs）因其自然语言接口和预先训练的知识而变得越来越受欢迎，导致在问答（QA）任务中取得了迅速增长的成功。最近，基于LLM的多Agent系统（Multi-LLM）在QA中被越来越多地利用。在这些场景中，模型可以各自回答问题并达成共识，或者每个模型专门回答不同领域的问题。然而，大多数先前处理Multi-LLM QA的工作都集中在模型以零-shot方式被问及或者给定信息源以提取答案的情况。对于未知环境的问答，首先需要对环境进行体验探索才能回答问题。这种技能对于个性化地将体验式人工智能应用于家庭等环境是必要的。目前缺乏对于Multi-LLM系统是否能够根据体验探索的观察进行问答的深入了解。在这项工作中，我们通过研究在未知环境中使用Multi-Embodied LLM Explorers（MELE）进行QA来填补这一空白。多个基于LLM的Agent独立进行探索，然后回答有关家庭环境的查询。我们分析不同的聚合方法来为每个查询生成一个单一的最终答案：辩论、多数投票和训练中央答案模块（CAM）。使用CAM，我们观察到与其他非学习型聚合方法相比，准确率提高了46%。我们提供代码和查询数据集以供进一步研究。

更新时间: 2024-10-18 12:27:07

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10918v5

WeSpeR: Population spectrum retrieval and spectral density estimation of weighted sample covariance

The spectrum of the weighted sample covariance shows a asymptotic non random behavior when the dimension grows with the number of samples. In this setting, we prove that the asymptotic spectral distribution $F$ of the weighted sample covariance has a continuous density on $\mathbb{R}^*$. We address then the practical problem of numerically finding this density. We propose a procedure to compute it, to determine the support of $F$ and define an efficient grid on it. We use this procedure to design the $\textit{WeSpeR}$ algorithm, which estimates the spectral density and retrieves the true spectral covariance spectrum. Empirical tests confirm the good properties of the $\textit{WeSpeR}$ algorithm.

Updated: 2024-10-18 12:26:51

标题: WeSpeR: 加权样本协方差的人口谱检索和谱密度估计

摘要: 加权样本协方差的谱显示出在样本数增加时维度增长时的渐近非随机行为。在这种情况下，我们证明了加权样本协方差的渐近谱分布$F$在$\mathbb{R}^*$上具有连续密度。然后我们解决了在数值上找到这个密度的实际问题。我们提出了一个计算该密度、确定$F$的支持并在其上定义高效网格的过程。我们使用这个过程设计了$\textit{WeSpeR}$算法，该算法估计谱密度并检索真实的谱协方差谱。经验测试证实了$\textit{WeSpeR}$算法的良好性能。

更新时间: 2024-10-18 12:26:51

领域: math.ST,cs.LG,math.PR,stat.CO,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.14413v1

Learning Social Cost Functions for Human-Aware Path Planning

Achieving social acceptance is one of the main goals of Social Robotic Navigation. Despite this topic has received increasing interest in recent years, most of the research has focused on driving the robotic agent along obstacle-free trajectories, planning around estimates of future human motion to respect personal distances and optimize navigation. However, social interactions in everyday life are also dictated by norms that do not strictly depend on movement, such as when standing at the end of a queue rather than cutting it. In this paper, we propose a novel method to recognize common social scenarios and modify a traditional planner's cost function to adapt to them. This solution enables the robot to carry out different social navigation behaviors that would not arise otherwise, maintaining the robustness of traditional navigation. Our approach allows the robot to learn different social norms with a single learned model, rather than having different modules for each task. As a proof of concept, we consider the tasks of queuing and respect interaction spaces of groups of people talking to one another, but the method can be extended to other human activities that do not involve motion.

Updated: 2024-10-18 12:25:46

标题: 学习人类感知路径规划的社会成本函数

摘要: 实现社会接受是社交机器人导航的主要目标之一。尽管近年来这个话题受到越来越多的关注，但大多数研究集中在沿着无障碍轨迹驾驶机器人代理，规划关于未来人类运动估计以尊重个人距离并优化导航。然而，日常生活中的社交互动也受到不严格依赖运动的规范的影响，比如站在队伍的末尾而不是插队。在本文中，我们提出了一种新颖的方法，识别常见的社交场景，并修改传统规划器的成本函数以适应它们。这种解决方案使机器人能够执行不同的社交导航行为，否则不会出现，同时保持传统导航的稳健性。我们的方法允许机器人通过单一学习模型学习不同的社交规范，而不是为每个任务都有不同的模块。作为概念验证，我们考虑排队和尊重相互交谈的人群的交互空间的任务，但该方法可以扩展到不涉及运动的其他人类活动。

更新时间: 2024-10-18 12:25:46

领域: cs.RO,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.10547v2

SNAC: Multi-Scale Neural Audio Codec

Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual Vector Quantization (RVQ) has become the standard technique for neural audio compression using a cascade of VQ codebooks. This paper proposes the Multi-Scale Neural Audio Codec, a simple extension of RVQ where the quantizers can operate at different temporal resolutions. By applying a hierarchy of quantizers at variable frame rates, the codec adapts to the audio structure across multiple timescales. This leads to more efficient compression, as demonstrated by extensive objective and subjective evaluations. The code and model weights are open-sourced at https://github.com/hubertsiuzdak/snac.

Updated: 2024-10-18 12:24:05

标题: SNAC：多尺度神经音频编解码器

摘要: 神经音频编解码器最近因其能够以非常低的比特率高保真度地表示音频信号而变得流行，这使得使用语言建模方法进行音频生成和理解成为可能。残差向量量化（RVQ）已成为使用一系列VQ码书进行神经音频压缩的标准技术。本文提出了多尺度神经音频编解码器，这是对RVQ的简单扩展，其中量化器可以在不同的时间分辨率下运行。通过在可变帧率下应用一系列量化器的层次结构，编解码器可以适应跨多个时间尺度的音频结构。这导致更有效的压缩，通过广泛的客观和主观评估加以证明。代码和模型权重可在https://github.com/hubertsiuzdak/snac 上公开获取。

更新时间: 2024-10-18 12:24:05

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.14411v1

An algorithm for clustering with confidence-based must-link and cannot-link constraints

We study here the semi-supervised $k$-clustering problem where information is available on whether pairs of objects are in the same or in different clusters. This information is either available with certainty or with a limited level of confidence. We introduce the PCCC (Pairwise-Confidence-Constraints-Clustering) algorithm, which iteratively assigns objects to clusters while accounting for the information provided on the pairs of objects. Our algorithm uses integer programming for the assignment of objects which allows to include relationships as hard constraints that are guaranteed to be satisfied or as soft constraints that can be violated subject to a penalty. This flexibility distinguishes our algorithm from the state-of-the-art in which all pairwise constraints are either considered hard, or all are considered soft. We developed an enhanced multi-start approach and a model-size reduction technique for the integer program that contributes to the effectiveness and the efficiency of the algorithm. Unlike existing algorithms, our algorithm scales to large-scale instances with up to 60,000 objects, 100 clusters, and millions of cannot-link constraints (which are the most challenging constraints to incorporate). We compare the PCCC algorithm with state-of-the-art approaches in an extensive computational study. Even though the PCCC algorithm is more general than the state-of-the-art approaches in its applicability, it outperforms the state-of-the-art approaches on instances with all hard or all soft constraints both in terms of runtime and various metrics of solution quality. The code of the PCCC algorithm is publicly available on GitHub.

Updated: 2024-10-18 12:20:54

标题: 一种基于置信度的必连接和禁连接约束的聚类算法

摘要: 我们在这里研究了半监督$k$-聚类问题，其中提供了关于对象是否属于相同或不同簇的信息。这些信息可以确定地提供，也可以具有有限的置信水平。我们引入了PCCC（Pairwise-Confidence-Constraints-Clustering）算法，该算法在分配对象到簇时迭代考虑了提供的对象对信息。我们的算法使用整数规划来分配对象，从而可以将关系作为硬约束包含进来，这些硬约束是保证满足的，或者作为软约束，可以在惩罚的条件下违反。这种灵活性使我们的算法与现有技术有所不同，因为所有的成对约束要么被视为硬约束，要么被视为软约束。我们开发了一个增强的多起点方法和一个模型大小缩减技术，用于整数规划问题，这有助于算法的效果和效率。与现有算法不同的是，我们的算法适用于具有高达60,000个对象，100个簇和数百万个不能连接约束（这是最具挑战性的约束之一）的大规模实例。我们在广泛的计算研究中将PCCC算法与最先进的方法进行了比较。尽管PCCC算法在适用性上比最先进的方法更广泛，但在所有硬约束或所有软约束实例中，无论是在运行时间还是解决方案质量的各种度量上，它都优于最先进的方法。PCCC算法的代码已在GitHub上公开可用。

更新时间: 2024-10-18 12:20:54

领域: cs.LG

下载: http://arxiv.org/abs/2212.14437v3

Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation

Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs, all featuring sub-quadratic complexity in sequence length and excellent scaling properties, enabling the construction of a new type of foundation models. In this paper, we present a unified view of these models, formulating such layers as implicit causal self-attention layers. The formulation includes most of their sub-components and is not limited to a specific part of the architecture. The framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods. Our experiments show that our attention matrices and attribution method outperform an alternative and a more limited formulation that was recently proposed for Mamba. For the other architectures for which our method is the first to provide such a view, our method is effective and competitive in the relevant metrics compared to the results obtained by state-of-the-art Transformer explainability methods. Our code is publicly available.

Updated: 2024-10-18 12:20:11

标题: 通过统一的隐式注意力公式解释现代门控线性循环神经网络

摘要: 最近在高效序列建模方面取得的进展，已经出现了无需注意力的层，例如Mamba、RWKV和各种门控RNN，所有这些特征都具有子二次复杂度和优秀的扩展属性，使得可以构建一种新型的基础模型。在本文中，我们提出了这些模型的统一观点，将这些层形式化为隐式因果自注意力层。该形式化包括了它们的大部分子组件，并不限于架构的特定部分。该框架在相似的基础上比较了不同层的基本机制，并提供了一种直接的方法来应用可解释性方法。我们的实验表明，我们的注意力矩阵和归因方法优于最近为Mamba提出的另一种更为有限的形式化。对于其他体系结构，我们的方法是首次提供这种视角，我们的方法在相关指标上是有效的和具有竞争力的，与现有的Transformer可解释性方法获得的结果相比。我们的代码是公开可用的。

更新时间: 2024-10-18 12:20:11

领域: cs.LG,F.2.2; I.2.7

下载: http://arxiv.org/abs/2405.16504v2

MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

Molecular property prediction (MPP) is a fundamental and crucial task in drug discovery. However, prior methods are limited by the requirement for a large number of labeled molecules and their restricted ability to generalize for unseen and new tasks, both of which are essential for real-world applications. To address these challenges, we present MolecularGPT for few-shot MPP. From a perspective on instruction tuning, we fine-tune large language models (LLMs) based on curated molecular instructions spanning over 1000 property prediction tasks. This enables building a versatile and specialized LLM that can be adapted to novel MPP tasks without any fine-tuning through zero- and few-shot in-context learning (ICL). MolecularGPT exhibits competitive in-context reasoning capabilities across 10 downstream evaluation datasets, setting new benchmarks for few-shot molecular prediction tasks. More importantly, with just two-shot examples, MolecularGPT can outperform standard supervised graph neural network methods on 4 out of 7 datasets. It also excels state-of-the-art LLM baselines by up to 15.7% increase on classification accuracy and decrease of 17.9 on regression metrics (e.g., RMSE) under zero-shot. This study demonstrates the potential of LLMs as effective few-shot molecular property predictors. The code is available at https://github.com/NYUSHCS/MolecularGPT.

Updated: 2024-10-18 12:19:41

标题: 分子GPT：面向少样本分子性质预测的开放大型语言模型（LLM）

摘要: 分子性质预测（MPP）是药物发现中的一项基础和关键任务。然而，先前的方法受限于对大量标记分子的要求，以及它们在未见过的和新任务上的泛化能力受限，这两者对于实际应用至关重要。为了解决这些挑战，我们提出了用于少样本MPP的MolecularGPT。从指导调整的角度出发，我们基于涵盖1000多个性质预测任务的精心策划的分子指令，对大型语言模型（LLMs）进行微调。这使得构建出一个多才多艺且专门的LLM，可以在零样本和少样本的情况下进行上下文学习，并适应新的MPP任务。MolecularGPT在10个下游评估数据集上展现出有竞争力的上下文推理能力，为少样本分子预测任务设立了新的基准。更重要的是，仅通过两个样本示例，MolecularGPT在7个数据集中的4个数据集上可以胜过标准监督图神经网络方法。在零样本情况下，它还能将最先进的LLM基线提高高达15.7％的分类准确率，并将回归指标（如RMSE）降低17.9。这项研究展示了LLMs作为有效的少样本分子性质预测器的潜力。源代码可在https://github.com/NYUSHCS/MolecularGPT 上找到。

更新时间: 2024-10-18 12:19:41

领域: q-bio.QM,cs.AI,cs.CE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12950v2

Timeseria: an object-oriented time series processing library

Timeseria is an object-oriented time series processing library implemented in Python, which aims at making it easier to manipulate time series data and to build statistical and machine learning models on top of it. Unlike common data analysis frameworks, it builds up from well defined and reusable logical units (objects), which can be easily combined together in order to ensure a high level of consistency. Thanks to this approach, Timeseria can address by design several non-trivial issues often underestimated, such as handling data losses, non-uniform sampling rates, differences between aggregated data and punctual observations, time zones, daylight saving times, and more. Timeseria comes with a comprehensive set of base data structures, common data manipulation operations, and extensible models for data reconstruction, forecasting and anomaly detection. It also integrates a powerful plotting engine capable of handling even millions of data points.

Updated: 2024-10-18 12:18:01

标题: Timeseria：一个面向对象的时间序列处理库

摘要: Timeseria是一个基于Python实现的面向对象的时间序列处理库，旨在使操作时间序列数据和构建统计和机器学习模型变得更加容易。与常见的数据分析框架不同，它从明确定义和可重用的逻辑单元（对象）构建起来，这些单元可以很容易地组合在一起，以确保高水平的一致性。由于这种方法，Timeseria可以设计上解决一些常常被低估的非常规问题，如处理数据丢失、非均匀采样率、聚合数据和点观测之间的差异、时区、夏令时等。Timeseria配备了一套全面的基本数据结构、常见数据操作以及可扩展的数据重建、预测和异常检测模型。它还集成了一个强大的绘图引擎，能够处理甚至数百万个数据点。

更新时间: 2024-10-18 12:18:01

领域: cs.LG

下载: http://arxiv.org/abs/2410.09567v2

Design and Prototype of a Unified Framework for Error-robust Compression and Encryption in IoT

The Internet of Things (IoT) relies on resource-constrained devices for data acquisition, but the vast amount of data generated and security concerns present challenges for efficient data handling and confidentiality. Conventional techniques for data compression and secrecy often lack energy efficiency for these devices. Compressive sensing has the potential to compress data and maintain secrecy, but many solutions do not address the issue of packet loss or errors caused by unreliable wireless channels. To address these issues, we have developed the ENCRUST scheme, which combines compression, secrecy, and error recovery. In this paper, we present a prototype of ENCRUST that uses energy-efficient operations, as well as a lighter variant called L-ENCRUST. We also perform security analysis and compare the performance of ENCRUST and L-ENCRUST with a state-of-the-art solution in terms of memory, encryption time, and energy consumption on a resource-constrained TelosB mote. Our results show that both ENCRUST and L-ENCRUST outperform the state-of-the-art solution in these metrics.

Updated: 2024-10-18 12:00:06

标题: 物联网中误差鲁棒压缩和加密统一框架的设计与原型

摘要: 物联网依赖于资源受限的设备进行数据采集，但产生的大量数据和安全问题给有效数据处理和保密性带来挑战。传统的数据压缩和保密技术通常对这些设备缺乏能效性。压缩感知有潜力压缩数据并保持保密性，但许多解决方案未解决由不可靠无线通道引起的丢包或错误的问题。为解决这些问题，我们开发了ENCRUST方案，结合了压缩、保密和错误恢复。本文介绍了一个使用能效操作的ENCRUST原型，以及一个较轻的变体L-ENCRUST。我们还进行了安全分析，并在资源受限的TelosB节点上比较了ENCRUST和L-ENCRUST与最先进解决方案在内存、加密时间和能量消耗方面的性能。我们的结果表明，ENCRUST和L-ENCRUST在这些指标上均优于最先进的解决方案。

更新时间: 2024-10-18 12:00:06

领域: cs.CR

下载: http://arxiv.org/abs/2410.14396v1

Generative AI, Pragmatics, and Authenticity in Second Language Learning

There are obvious benefits to integrating generative AI (artificial intelligence) into language learning and teaching. Those include using AI as a language tutor, creating learning materials, or assessing learner output. However, due to how AI systems under-stand human language, based on a mathematical model using statistical probability, they lack the lived experience to be able to use language with the same social aware-ness as humans. Additionally, there are built-in linguistic and cultural biases based on their training data which is mostly in English and predominantly from Western sources. Those facts limit AI suitability for some language learning interactions. Stud-ies have clearly shown that systems such as ChatGPT often do not produce language that is pragmatically appropriate. The lack of linguistic and cultural authenticity has important implications for how AI is integrated into second language acquisition as well as in instruction targeting development of intercultural communication compe-tence.

Updated: 2024-10-18 11:58:03

标题: 生成式人工智能、语用学和第二语言学习中的真实性

摘要: 将生成式人工智能（AI）整合到语言学习和教学中具有明显的好处。这些包括将AI用作语言导师，创建学习材料，或评估学习者的输出。然而，由于AI系统理解人类语言的方式是基于使用统计概率的数学模型，它们缺乏像人类一样具有社会意识的实际经验。此外，由于它们的训练数据主要是英语，且主要来自西方来源，因此存在内置的语言和文化偏见。这些事实限制了AI在某些语言学习交互中的适用性。研究清楚地表明，诸如ChatGPT之类的系统通常无法产生在语用上合适的语言。缺乏语言和文化的真实性对于将AI整合到第二语言习得以及旨在发展跨文化沟通能力的教学中具有重要意义。

更新时间: 2024-10-18 11:58:03

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2410.14395v1

Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks

Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. With the rise of code-fluent Large Language Models empowered with agentic techniques, smart bug-fixing tools with a high level of autonomy have emerged. However, those tools are tuned for classical script programming and still struggle with non-linear computational notebooks. In this paper, we present an AI agent designed specifically for error resolution in a computational notebook. We have developed an agentic system capable of exploring a notebook environment by interacting with it -- similar to how a user would -- and integrated the system into the JetBrains service for collaborative data science called Datalore. We evaluate our approach against the pre-existing single-action solution by comparing costs and conducting a user study. Users rate the error resolution capabilities of the agentic system higher but experience difficulties with UI. We share the results of the study and consider them valuable for further improving user-agent collaboration.

Updated: 2024-10-18 11:55:34

标题: 更聪明地进行调试：计算笔记本中错误解决的AI代理

摘要: 计算笔记本已成为研究相关开发中不可或缺的工具，提供了在开发过程中前所未有的交互性和灵活性。然而，这些好处是以可复现性和错误增加的潜力为代价的。随着具有主动技术的代码流畅大型语言模型的崛起，智能错误修复工具也应运而生，具有高度自治的水平。然而，这些工具针对经典脚本编程进行了调整，仍然在非线性计算笔记本上面临困难。在本文中，我们提出了一个专门设计用于计算笔记本中错误解决的人工智能代理。我们开发了一个具有主动能力的系统，能够通过与笔记本环境进行交互来探索它--类似于用户的操作方式--并将该系统集成到JetBrains的协作数据科学服务Datalore中。我们通过比较成本和进行用户研究来评估我们的方法与现有的单一操作解决方案。用户对主动系统的错误解决能力给予了更高评价，但在用户界面方面遇到了困难。我们分享了研究结果，并认为这些结果对进一步改进用户-代理协作非常有价值。

更新时间: 2024-10-18 11:55:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14393v1

Personalizing Low-Rank Bayesian Neural Networks Via Federated Learning

To support real-world decision-making, it is crucial for models to be well-calibrated, i.e., to assign reliable confidence estimates to their predictions. Uncertainty quantification is particularly important in personalized federated learning (PFL), as participating clients typically have small local datasets, making it difficult to unambiguously determine optimal model parameters. Bayesian PFL (BPFL) methods can potentially enhance calibration, but they often come with considerable computational and memory requirements due to the need to track the variances of all the individual model parameters. Furthermore, different clients may exhibit heterogeneous uncertainty levels owing to varying local dataset sizes and distributions. To address these challenges, we propose LR-BPFL, a novel BPFL method that learns a global deterministic model along with personalized low-rank Bayesian corrections. To tailor the local model to each client's inherent uncertainty level, LR-BPFL incorporates an adaptive rank selection mechanism. We evaluate LR-BPFL across a variety of datasets, demonstrating its advantages in terms of calibration, accuracy, as well as computational and memory requirements.

Updated: 2024-10-18 11:50:54

标题: 通过联邦学习个性化低秩贝叶斯神经网络

摘要: 为了支持现实世界的决策制定，模型的校准至关重要，即对其预测分配可靠的置信度估计。在个性化联邦学习（PFL）中，不确定性量化尤为重要，因为参与的客户通常拥有较小的本地数据集，很难明确确定最佳模型参数。贝叶斯PFL（BPFL）方法可以潜在增强校准性，但由于需要跟踪所有个体模型参数的方差，通常伴随着相当大的计算和内存需求。此外，不同客户可能因本地数据集大小和分布不同而表现出异质的不确定性水平。为了解决这些挑战，我们提出了LR-BPFL，一种新颖的BPFL方法，它学习一个全局确定性模型以及个性化的低秩贝叶斯修正。为了使本地模型适应每个客户固有的不确定性水平，LR-BPFL结合了自适应秩选择机制。我们在各种数据集上评估了LR-BPFL，展示了其在校准性、准确性以及计算和内存需求方面的优势。

更新时间: 2024-10-18 11:50:54

领域: cs.LG

下载: http://arxiv.org/abs/2410.14390v1

SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery

Model merging-based multitask learning (MTL) offers a promising approach for performing MTL by merging multiple expert models without requiring access to raw training data. However, in this paper, we examine the merged model's representation distribution and uncover a critical issue of "representation bias". This bias arises from a significant distribution gap between the representations of the merged and expert models, leading to the suboptimal performance of the merged MTL model. To address this challenge, we first propose a representation surgery solution called Surgery. Surgery is a lightweight, task-specific module that aligns the final layer representations of the merged model with those of the expert models, effectively alleviating bias and improving the merged model's performance. Despite these improvements, a performance gap remains compared to the traditional MTL method. Further analysis reveals that representation bias phenomena exist at each layer of the merged model, and aligning representations only in the last layer is insufficient for fully reducing systemic bias because biases introduced at each layer can accumulate and interact in complex ways. To tackle this, we then propose a more comprehensive solution, deep representation surgery (also called SurgeryV2), which mitigates representation bias across all layers, and thus bridges the performance gap between model merging-based MTL and traditional MTL. Finally, we design an unsupervised optimization objective to optimize both the Surgery and SurgeryV2 modules. Our experimental results show that incorporating these modules into state-of-the-art (SOTA) model merging schemes leads to significant performance gains. Notably, our SurgeryV2 scheme reaches almost the same level as individual expert models or the traditional MTL model. The code is available at \url{https://github.com/EnnengYang/SurgeryV2}.

Updated: 2024-10-18 11:49:40

标题: 手术V2：通过深度表征手术弥合模型合并和多任务学习之间的差距

摘要: 基于模型合并的多任务学习（MTL）为通过合并多个专家模型而无需访问原始训练数据来执行MTL提供了一种有前景的方法。然而，在本文中，我们研究了合并模型的表示分布，并揭示了一个关键问题，“表示偏差”。这种偏差源于合并模型和专家模型之间表示的显著分布差异，导致合并的MTL模型性能不佳。为了解决这一挑战，我们首先提出了一种称为Surgery的表示手术解决方案。Surgery是一个轻量级的、任务特定的模块，它使合并模型的最终层表示与专家模型的表示对齐，有效减轻偏差并提高合并模型的性能。尽管有这些改进，与传统的MTL方法相比，性能仍存在差距。进一步的分析揭示了合并模型的每一层都存在表示偏差现象，仅在最后一层对齐表示是不够的，因为每一层引入的偏差可以积累并以复杂的方式相互作用。为了解决这个问题，我们随后提出了一个更全面的解决方案，深度表示手术（也称为SurgeryV2），它减轻了所有层的表示偏差，从而弥合了基于模型合并的MTL和传统MTL之间的性能差距。最后，我们设计了一个无监督优化目标来优化Surgery和SurgeryV2模块。我们的实验结果表明，将这些模块整合到最先进的模型合并方案中会带来显著的性能提升。值得注意的是，我们的SurgeryV2方案达到了几乎与个人专家模型或传统MTL模型相同的水平。代码可在\url{https://github.com/EnnengYang/SurgeryV2}上找到。

更新时间: 2024-10-18 11:49:40

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.14389v1

Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks

Convolutional neural networks (CNNs) are widely used in computer vision. They can be used not only for conventional digital image material to recognize patterns, but also for feature extraction from digital imagery representing spectral and rhythm features extracted from time-domain digital audio signals for the acoustic classification of sounds. Different spectral and rhythm feature representations like mel-scaled spectrograms, mel-frequency cepstral coefficients (MFCCs), cyclic tempograms, short-time Fourier transform (STFT) chromagrams, constant-Q transform (CQT) chromagrams and chroma energy normalized statistics (CENS) chromagrams are investigated in terms of the audio classification performance using a deep convolutional neural network. It can be clearly shown that the mel-scaled spectrograms and the mel-frequency cepstral coefficients (MFCCs) perform significantly better than the other spectral and rhythm features investigated in this research for audio classification tasks using deep CNNs. The experiments were carried out with the aid of the ESC-50 dataset with 2,000 labeled environmental audio recordings.

Updated: 2024-10-18 11:47:40

标题: 深度卷积神经网络用于音频分类的频谱和节奏特征

摘要: 卷积神经网络（CNNs）在计算机视觉中被广泛使用。它们不仅可用于识别常规数字图像材料中的模式，还可用于从代表从时间域数字音频信号提取的光谱和节奏特征的数字图像中提取特征，以进行声音的声学分类。不同的光谱和节奏特征表示，如mel-scaled频谱图，mel频率倒谱系数（MFCCs），循环tempograms，短时傅里叶变换（STFT）色谱图，常量Q变换（CQT）色谱图和色谱能量归一化统计（CENS）色谱图，通过使用深度卷积神经网络在音频分类性能方面进行了研究。可以清楚地表明，与其他在本研究中研究的光谱和节奏特征相比，mel-scaled频谱图和mel频率倒谱系数（MFCCs）在使用深度CNN进行音频分类任务时表现显著更好。实验是借助包含2,000个标记的环境音频录音的ESC-50数据集进行的。

更新时间: 2024-10-18 11:47:40

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.06927v2

Unscrambling disease progression at scale: fast inference of event permutations with optimal transport

Disease progression models infer group-level temporal trajectories of change in patients' features as a chronic degenerative condition plays out. They provide unique insight into disease biology and staging systems with individual-level clinical utility. Discrete models consider disease progression as a latent permutation of events, where each event corresponds to a feature becoming measurably abnormal. However, permutation inference using traditional maximum likelihood approaches becomes prohibitive due to combinatoric explosion, severely limiting model dimensionality and utility. Here we leverage ideas from optimal transport to model disease progression as a latent permutation matrix of events belonging to the Birkhoff polytope, facilitating fast inference via optimisation of the variational lower bound. This enables a factor of 1000 times faster inference than the current state of the art and, correspondingly, supports models with several orders of magnitude more features than the current state of the art can consider. Experiments demonstrate the increase in speed, accuracy and robustness to noise in simulation. Further experiments with real-world imaging data from two separate datasets, one from Alzheimer's disease patients, the other age-related macular degeneration, showcase, for the first time, pixel-level disease progression events in the brain and eye, respectively. Our method is low compute, interpretable and applicable to any progressive condition and data modality, giving it broad potential clinical utility.

Updated: 2024-10-18 11:44:29

标题: 大规模解析疾病进展：利用最优传输快速推断事件排列

摘要: 疾病进展模型推断患者特征的群体级时间轨迹，随着慢性退行性疾病的发展而发生变化。它们为疾病生物学和分期系统提供了独特的见解，并具有个体级的临床实用性。离散模型将疾病进展视为事件的潜在排列，其中每个事件对应于一个特征变得可测异常。然而，使用传统的最大似然方法进行排列推断变得困难，由于组合爆炸，严重限制了模型的维度和实用性。在这里，我们利用最优输运的思想将疾病进展建模为属于Birkhoff多面体的事件的潜在排列矩阵，通过优化变分下界进行快速推断。这使得推断速度比当前技术水平快1000倍，相应地支持比当前技术水平能考虑的特征数量多几个数量级的模型。实验证明了在模拟中速度、准确性和对噪声的稳健性的增加。进一步的实验使用来自两个单独数据集的真实世界成像数据，一个来自阿尔茨海默病患者，另一个来自与年龄相关的黄斑变性，首次展示了大脑和眼睛中像素级疾病进展事件。我们的方法计算成本低，易解释，并适用于任何进行性疾病和数据模态，具有广泛的潜在临床实用性。

更新时间: 2024-10-18 11:44:29

领域: cs.LG

下载: http://arxiv.org/abs/2410.14388v1

Distributionally and Adversarially Robust Logistic Regression via Intersecting Wasserstein Balls

Adversarially robust optimization (ARO) has become the de facto standard for training models to defend against adversarial attacks during testing. However, despite their robustness, these models often suffer from severe overfitting. To mitigate this issue, several successful approaches have been proposed, including replacing the empirical distribution in training with: (i) a worst-case distribution within an ambiguity set, leading to a distributionally robust (DR) counterpart of ARO; or (ii) a mixture of the empirical distribution with one derived from an auxiliary dataset (e.g., synthetic, external, or out-of-domain). Building on the first approach, we explore the Wasserstein DR counterpart of ARO for logistic regression and show it admits a tractable convex optimization reformulation. Adopting the second approach, we enhance the DR framework by intersecting its ambiguity set with one constructed from an auxiliary dataset, which yields significant improvements when the Wasserstein distance between the data-generating and auxiliary distributions can be estimated. We analyze the resulting optimization problem, develop efficient solutions, and show that our method outperforms benchmark approaches on standard datasets.

Updated: 2024-10-18 11:43:28

标题: 通过相交的Wasserstein球实现分布稳健和对抗稳健的逻辑回归

摘要: 对抗性鲁棒优化（ARO）已成为在训练模型以抵御测试期间的对抗性攻击时的事实标准。然而，尽管这些模型具有鲁棒性，但它们经常受到严重的过拟合困扰。为了缓解这个问题，提出了几种成功的方法，包括用以下内容替换训练中的经验分布：（i）在一个模糊集合内的最坏情况分布，导致ARO的分布鲁棒（DR）对应物；或（ii）经验分布与从辅助数据集（例如，合成的，外部的或域外的）派生的混合分布。基于第一种方法，我们探索了逻辑回归中ARO的Wasserstein DR对应物，并表明它具有可解的凸优化重塑。采用第二种方法，我们通过将其模糊集合与从辅助数据集构建的相交集相交，增强了DR框架，当数据生成和辅助分布之间的Wasserstein距离可以估计时，可以获得显着的改进。我们分析了由此产生的优化问题，开发了有效的解决方案，并展示了我们的方法在标准数据集上优于基准方法。

更新时间: 2024-10-18 11:43:28

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2407.13625v2

Investigating the Capabilities of Deep Learning for Processing and Interpreting One-Shot Multi-offset GPR Data: A Numerical Case Study for Lunar and Martian Environments

Ground-penetrating radar (GPR) is a mature geophysical method that has gained increasing popularity in planetary science over the past decade. GPR has been utilised both for Lunar and Martian missions providing pivotal information regarding the near surface geology of Terrestrial planets. Within that context, numerous processing pipelines have been suggested to address the unique challenges present in planetary setups. These processing pipelines often require manual tuning resulting to ambiguous outputs open to non-unique interpretations. These pitfalls combined with the large number of planetary GPR data (kilometers in magnitude), highlight the necessity for automatic, objective and advanced processing and interpretation schemes. The current paper investigates the potential of deep learning for interpreting and processing GPR data. The one-shot multi-offset configuration is investigated via a coherent numerical case study, showcasing the potential of deep learning for A) reconstructing the dielectric distribution of the the near surface of Terrestrial planets, and B) filling missing or bad-quality traces. Special care was taken for the numerical data to be both realistic and challenging. Moreover, the generated synthetic data are properly labelled and made publicly available for training future data-driven pipelines and contributing towards developing pre-trained foundation models for GPR.

Updated: 2024-10-18 11:38:29

标题: 研究深度学习在处理和解释一次多偏移GPR数据方面的能力：月球和火星环境的数值案例研究

摘要: 地下雷达（GPR）是一种成熟的地球物理方法，在过去十年在行星科学领域越来越受欢迎。 GPR已被用于月球和火星任务，为地球类行星的近地表地质提供了关键信息。在这种情况下，已经提出了许多处理管道，以解决行星设置中存在的独特挑战。这些处理管道通常需要手动调整，导致模糊的输出，容易产生不唯一的解释。这些缺陷与大量的行星GPR数据（数量级为公里）结合在一起，突显了自动、客观和先进的处理和解释方案的必要性。本文研究了深度学习用于解释和处理GPR数据的潜力。通过一个一次性多偏移配置的一致数值案例研究，展示了深度学习在A）重建地球类行星近地表dielectric分布和B）填补缺失或质量差的迹线方面的潜力。对于数值数据，特别注意使其既真实又具有挑战性。此外，生成的合成数据被正确标记并公开提供，用于训练未来数据驱动的管道，并为开发GPR的预训练基础模型做出贡献。

更新时间: 2024-10-18 11:38:29

领域: physics.geo-ph,astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2410.14386v1

Predicting Accurate Lagrangian Multipliers for Mixed Integer Linear Programs

Lagrangian relaxation stands among the most efficient approaches for solving a Mixed Integer Linear Programs (MILP) with difficult constraints. Given any duals for these constraints, called Lagrangian Multipliers (LMs), it returns a bound on the optimal value of the MILP, and Lagrangian methods seek the LMs giving the best such bound. But these methods generally rely on iterative algorithms resembling gradient descent to maximize the concave piecewise linear dual function: the computational burden grows quickly with the number of relaxed constraints. We introduce a deep learning approach that bypasses the descent, effectively amortizing the local, per instance, optimization. A probabilistic encoder based on a graph convolutional network computes high-dimensional representations of relaxed constraints in MILP instances. A decoder then turns these representations into LMs. We train the encoder and decoder jointly by directly optimizing the bound obtained from the predicted multipliers. Numerical experiments show that our approach closes up to 85~\% of the gap between the continuous relaxation and the best Lagrangian bound, and provides a high quality warm-start for descent based Lagrangian methods.

Updated: 2024-10-18 11:32:20

标题: 预测混合整数线性规划的准确拉格朗日乘子

摘要: Lagrangian relaxation是解决具有困难约束条件的混合整数线性规划(MILP)中最有效的方法之一。给定这些约束条件的任何对偶，称为Lagrangian Multipliers(LMs)，它返回MILP的最优值的一个界限，Lagrangian方法寻求给出最佳界限的LMs。但是这些方法通常依赖于类似梯度下降的迭代算法来最大化凹凸分段线性对偶函数：随着放宽约束的数量增加，计算负担迅速增加。我们引入了一种深度学习方法，通过绕过下降，有效地摊销局部的每个实例的优化。基于图卷积网络的概率编码器计算MILP实例中放宽约束的高维表示。一个解码器然后将这些表示转换为LMs。我们通过直接优化从预测的乘数获得的界限来联合训练编码器和解码器。数值实验表明，我们的方法可以缩小连续放松与最佳Lagrangian界限之间的差距高达85\%，并为基于下降的Lagrangian方法提供高质量的热启动。

更新时间: 2024-10-18 11:32:20

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2310.14659v2

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the complexities of constructing tasks and evaluators. To overcome these limitations, we introduce Crab, the first agent benchmark framework designed to support cross-environment tasks, incorporating a graph-based fine-grained evaluation method and an efficient mechanism for task and evaluator construction. Our framework supports multiple devices and can be easily extended to any environment with a Python interface. Leveraging Crab, we developed a cross-platform Crab Benchmark-v0 comprising 120 tasks in computer desktop and mobile phone environments. We evaluated four advanced MLMs using different single and multi-agent system configurations on this benchmark. The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 38.01%. All framework code, agent code, and task datasets are publicly available at https://github.com/camel-ai/crab.

Updated: 2024-10-18 11:29:39

标题: CRAB：多模态语言模型Agent的跨环境Agent基准

摘要: 自主代理的发展越来越依赖于多模态语言模型（MLMs）来执行自然语言描述的任务，例如网站、台式电脑或手机等GUI环境中的任务。目前，交互环境中用于MLM代理的基准测试受限于它们专注于单一环境、缺乏详细和泛化评估方法以及构建任务和评估器的复杂性。为了克服这些限制，我们引入了Crab，这是第一个旨在支持跨环境任务的代理基准框架，包括基于图形的细粒度评估方法和用于任务和评估器构建的高效机制。我们的框架支持多个设备，并可以轻松扩展到具有Python接口的任何环境。利用Crab，我们开发了一个跨平台的Crab Benchmark-v0，其中包含120个任务，涵盖了计算机桌面和手机环境。我们在这个基准测试中评估了四种先进的MLM，使用不同的单一和多代理系统配置。实验结果表明，单一代理与GPT-4o达到了38.01%的最佳完成比率。所有框架代码、代理代码和任务数据集均在https://github.com/camel-ai/crab 上公开可用。

更新时间: 2024-10-18 11:29:39

领域: cs.AI

下载: http://arxiv.org/abs/2407.01511v2

3-D Magnetotelluric Deep Learning Inversion Guided by Pseudo-Physical Information

Magnetotelluric deep learning (DL) inversion methods based on joint data-driven and physics-driven have become a hot topic in recent years. When mapping observation data (or forward modeling data) to the resistivity model using neural networks (NNs), incorporating the error (loss) term of the inversion resistivity's forward modeling response--which introduces physical information about electromagnetic field propagation--can significantly enhance the inversion accuracy. To efficiently achieve data-physical dual-driven MT deep learning inversion for large-scale 3-D MT data, we propose using DL forward modeling networks to compute this portion of the loss. This approach introduces pseudo-physical information through the forward modeling of NN simulation, further guiding the inversion network fitting. Specifically, we first pre-train the forward modeling networks as fixed forward modeling operators, then transfer and integrate them into the inversion network training, and finally optimize the inversion network by minimizing the multinomial loss. Theoretical experimental results indicate that despite some simulation errors in DL forward modeling, the introduced pseudo-physical information still enhances inversion accuracy and significantly mitigates the overfitting problem during training. Additionally, we propose a new input mode that involves masking and adding noise to the data, simulating the field data environment of 3-D MT inversion, thereby making the method more flexible and effective for practical applications.

Updated: 2024-10-18 11:23:18

标题: 三维磁 telluric 深度学习反演，由伪物理信息指导

摘要: 磁 telluric 深度学习（DL）反演方法基于联合数据驱动和物理驱动在近年来成为热门话题。当利用神经网络（NNs）将观测数据（或正演建模数据）映射到电阻率模型时，将反演电阻率的正演建模响应的误差（损失）项纳入其中——这样可以显著提高反演精度，并引入关于电磁场传播的物理信息。为了有效实现用于大规模 3-D MT 数据的数据-物理双重驱动 MT 深度学习反演，我们提出使用 DL 正演建模网络来计算这部分损失。这种方法通过 NN 模拟的正演建模引入了伪物理信息，进一步指导反演网络拟合。具体而言，我们首先预训练正演建模网络作为固定的正演建模算子，然后将它们转移并整合到反演网络训练中，最后通过最小化多项式损失来优化反演网络。理论实验结果表明，尽管 DL 正演建模中存在一些模拟误差，但引入的伪物理信息仍然增强了反演精度，并在训练过程中显著缓解了过拟合问题。此外，我们提出了一种新的输入模式，涉及对数据进行掩蔽和添加噪声，模拟 3-D MT 反演的现场数据环境，从而使该方法在实际应用中更加灵活和有效。

更新时间: 2024-10-18 11:23:18

领域: physics.geo-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.09388v2

Constructive Interpolation and Concept-Based Beth Definability for Description Logics via Sequents

We introduce a constructive method applicable to a large number of description logics (DLs) for establishing the concept-based Beth definability property (CBP) based on sequent systems. Using the highly expressive DL RIQ as a case study, we introduce novel sequent calculi for RIQ-ontologies and show how certain interpolants can be computed from sequent calculus proofs, which permit the extraction of explicit definitions of implicitly definable concepts. To the best of our knowledge, this is the first sequent-based approach to computing interpolants and definitions within the context of DLs, as well as the first proof that RIQ enjoys the CBP. Moreover, due to the modularity of our sequent systems, our results hold for restrictions of RIQ, and are applicable to other DLs by suitable modifications.

Updated: 2024-10-18 11:22:32

标题: 利用序列的构造插值和基于概念的Beth可定义性来描述逻辑。

摘要: 我们引入了一种适用于大量描述逻辑（DLs）的构造方法，用于建立基于概念的Beth可定义性属性（CBP），基于sequent系统。以高度表达式DL RIQ为案例研究，我们引入了RIQ本体论的新颖sequent演算，并展示了如何从sequent演算证明中计算出某些插值物，这允许提取出隐式可定义概念的显式定义。据我们所知，这是第一个基于sequent的方法来计算插值物和定义DLs上下文中的第一个证明，也是第一个证明RIQ享有CBP。此外，由于我们sequent系统的模块化性，我们的结果适用于RIQ的限制，并通过适当的修改适用于其他DLs。

更新时间: 2024-10-18 11:22:32

领域: cs.LO,cs.AI,cs.DB,math.LO

下载: http://arxiv.org/abs/2404.15840v3

Fine-Tuning Pre-trained Language Models for Robust Causal Representation Learning

The fine-tuning of pre-trained language models (PLMs) has been shown to be effective across various domains. By using domain-specific supervised data, the general-purpose representation derived from PLMs can be transformed into a domain-specific representation. However, these methods often fail to generalize to out-of-domain (OOD) data due to their reliance on non-causal representations, often described as spurious features. Existing methods either make use of adjustments with strong assumptions about lack of hidden common causes, or mitigate the effect of spurious features using multi-domain data. In this work, we investigate how fine-tuned pre-trained language models aid generalizability from single-domain scenarios under mild assumptions, targeting more general and practical real-world scenarios. We show that a robust representation can be derived through a so-called causal front-door adjustment, based on a decomposition assumption, using fine-tuned representations as a source of data augmentation. Comprehensive experiments in both synthetic and real-world settings demonstrate the superior generalizability of the proposed method compared to existing approaches. Our work thus sheds light on the domain generalization problem by introducing links between fine-tuning and causal mechanisms into representation learning.

Updated: 2024-10-18 11:06:23

标题: Feinabstimmung von vortrainierten Sprachmodellen für robustes kausales Repräsentationslernen

摘要: 预训练语言模型（PLMs）的微调已被证明在各个领域都是有效的。通过使用特定领域的监督数据，可以将从PLMs中得到的通用表示转化为特定领域的表示。然而，这些方法通常无法推广到域外（OOD）数据，因为它们依赖于非因果表示，通常被描述为虚假特征。现有方法要么利用对缺乏隐藏共同原因的强假设进行调整，要么利用多领域数据减轻虚假特征的影响。在这项工作中，我们研究了如何通过微调预训练语言模型在温和假设下帮助从单一领域情景中实现泛化，以针对更一般和实际的现实世界情景。我们展示了通过所谓的因果前门调整可以通过分解假设使用微调表示作为数据增强的来源来得到健壮的表示。在合成和实际世界环境中进行的全面实验表明，与现有方法相比，所提出的方法具有更好的泛化能力。因此，我们的工作通过在表示学习中引入微调和因果机制之间的联系，为领域泛化问题带来了新的启示。

更新时间: 2024-10-18 11:06:23

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.14375v1

Advancing Physics Data Analysis through Machine Learning and Physics-Informed Neural Networks

In an era increasingly focused on green computing and explainable AI, revisiting traditional approaches in theoretical and phenomenological particle physics is paramount. This project evaluates various machine learning (ML) algorithms-including Nearest Neighbors, Decision Trees, Random Forest, AdaBoost, Naive Bayes, Quadratic Discriminant Analysis (QDA), and XGBoost-alongside standard neural networks and a novel Physics-Informed Neural Network (PINN) for physics data analysis. We apply these techniques to a binary classification task that distinguishes the experimental viability of simulated scenarios based on Higgs observables and essential parameters. Through this comprehensive analysis, we aim to showcase the capabilities and computational efficiency of each model in binary classification tasks, thereby contributing to the ongoing discourse on integrating ML and Deep Neural Networks (DNNs) into physics research. In this study, XGBoost emerged as the preferred choice among the evaluated machine learning algorithms for its speed and effectiveness, especially in the initial stages of computation with limited datasets. However, while standard Neural Networks and Physics-Informed Neural Networks (PINNs) demonstrated superior performance in terms of accuracy and adherence to physical laws, they require more computational time. These findings underscore the trade-offs between computational efficiency and model sophistication.

Updated: 2024-10-18 11:05:52

标题: 推进物理数据分析：通过机器学习和受物理启发的神经网络

摘要: 在一个越来越专注于绿色计算和可解释人工智能的时代，重新审视理论和现象粒子物理学中的传统方法至关重要。本项目评估了各种机器学习（ML）算法，包括最近邻、决策树、随机森林、AdaBoost、朴素贝叶斯、二次判别分析（QDA）和XGBoost，以及标准神经网络和一种新颖的物理信息神经网络（PINN）用于物理数据分析。我们将这些技术应用于一个二分类任务，该任务基于希格斯可观测量和基本参数来区分模拟情景的实验可行性。通过这种综合分析，我们旨在展示每个模型在二分类任务中的能力和计算效率，从而为将ML和深度神经网络（DNN）整合到物理研究中的持续讨论做出贡献。在这项研究中，XGBoost在评估的机器学习算法中成为首选，因为它在计算初期和数据集有限的情况下速度和效果特别好。然而，标准神经网络和物理信息神经网络（PINNs）在准确性和遵守物理定律方面表现出更好的性能，但它们需要更多的计算时间。这些发现强调了计算效率和模型复杂性之间的权衡。

更新时间: 2024-10-18 11:05:52

领域: hep-ph,cs.LG

下载: http://arxiv.org/abs/2410.14760v1

CybORG++: An Enhanced Gym for the Development of Autonomous Cyber Agents

CybORG++ is an advanced toolkit for reinforcement learning research focused on network defence. Building on the CAGE 2 CybORG environment, it introduces key improvements, including enhanced debugging capabilities, refined agent implementation support, and a streamlined environment that enables faster training and easier customisation. Along with addressing several software bugs from its predecessor, CybORG++ introduces MiniCAGE, a lightweight version of CAGE 2, which improves performance dramatically, up to 1000x faster execution in parallel iterations, without sacrificing accuracy or core functionality. CybORG++ serves as a robust platform for developing and evaluating defensive agents, making it a valuable resource for advancing enterprise network defence research.

Updated: 2024-10-18 11:04:07

标题: CybORG++：用于自主网络代理开发的增强版健身房

摘要: CybORG++是一个专注于网络防御的强化学习研究的先进工具包。在CAGE 2 CybORG环境的基础上，它引入了关键的改进，包括增强的调试能力、精细的代理实施支持，以及一个简化的环境，可以加快训练速度并更容易定制。除了解决前身的几个软件bug外，CybORG++还引入了MiniCAGE，这是CAGE 2的一个轻量级版本，可以显著提高性能，实现并行迭代速度提高了1000倍，而不会牺牲准确性或核心功能。CybORG++作为开发和评估防御代理的强大平台，使其成为推动企业网络防御研究的宝贵资源。

更新时间: 2024-10-18 11:04:07

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.16324v1

Interpretable end-to-end Neurosymbolic Reinforcement Learning agents

Deep reinforcement learning (RL) agents rely on shortcut learning, preventing them from generalizing to slightly different environments. To address this problem, symbolic method, that use object-centric states, have been developed. However, comparing these methods to deep agents is not fair, as these last operate from raw pixel-based states. In this work, we instantiate the symbolic SCoBots framework. SCoBots decompose RL tasks into intermediate, interpretable representations, culminating in action decisions based on a comprehensible set of object-centric relational concepts. This architecture aids in demystifying agent decisions. By explicitly learning to extract object-centric representations from raw states, object-centric RL, and policy distillation via rule extraction, this work places itself within the neurosymbolic AI paradigm, blending the strengths of neural networks with symbolic AI. We present the first implementation of an end-to-end trained SCoBot, separately evaluate of its components, on different Atari games. The results demonstrate the framework's potential to create interpretable and performing RL systems, and pave the way for future research directions in obtaining end-to-end interpretable RL agents.

Updated: 2024-10-18 10:59:13

标题: 可解释的端到端神经符号强化学习代理

摘要: 深度强化学习（RL）代理依赖于捷径学习，这阻碍了它们对稍有不同的环境进行泛化。为了解决这个问题，已经开发了使用物体中心状态的符号方法。然而，将这些方法与深度代理进行比较是不公平的，因为后者是从原始像素状态操作的。在这项工作中，我们实例化了符号化SCoBots框架。SCoBots将RL任务分解为中间、可解释的表示，最终根据一组可理解的物体中心关系概念做出行动决策。这种架构有助于解密代理决策。通过明确学习从原始状态中提取物体中心表示、物体中心RL和通过规则提取进行策略提炼，这项工作将自己置于神经符号AI范式中，将神经网络的优势与符号AI融合在一起。我们在不同的Atari游戏上首次实现了端到端训练的SCoBot，并分别评估了其组成部分。结果表明，该框架有潜力创建可解释且表现良好的RL系统，并为未来研究方向铺平道路，以获得端到端可解释的RL代理。

更新时间: 2024-10-18 10:59:13

领域: cs.AI

下载: http://arxiv.org/abs/2410.14371v1

CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic

The integration of autonomous vehicles into urban traffic has great potential to improve efficiency by reducing congestion and optimizing traffic flow systematically. In this paper, we introduce CoMAL (Collaborative Multi-Agent LLMs), a framework designed to address the mixed-autonomy traffic problem by collaboration among autonomous vehicles to optimize traffic flow. CoMAL is built upon large language models, operating in an interactive traffic simulation environment. It utilizes a Perception Module to observe surrounding agents and a Memory Module to store strategies for each agent. The overall workflow includes a Collaboration Module that encourages autonomous vehicles to discuss the effective strategy and allocate roles, a reasoning engine to determine optimal behaviors based on assigned roles, and an Execution Module that controls vehicle actions using a hybrid approach combining rule-based models. Experimental results demonstrate that CoMAL achieves superior performance on the Flow benchmark. Additionally, we evaluate the impact of different language models and compare our framework with reinforcement learning approaches. It highlights the strong cooperative capability of LLM agents and presents a promising solution to the mixed-autonomy traffic challenge. The code is available at https://github.com/Hyan-Yao/CoMAL.

Updated: 2024-10-18 10:53:44

标题: CoMAL：混合自治交通的协作多智能体大型语言模型

摘要: 将自动驾驶车辆整合进城市交通系统具有巨大潜力，可以通过减少拥堵和优化交通流量系统来提高效率。本文介绍了CoMAL（协作多智能体LLMs），这是一个旨在通过自动驾驶车辆之间的协作来优化交通流量的框架。CoMAL建立在大型语言模型之上，在交互式交通仿真环境中运行。它利用感知模块观察周围智能体，并利用记忆模块存储每个智能体的策略。整体工作流程包括一个协作模块，鼓励自动驾驶车辆讨论有效策略并分配角色，一个推理引擎根据分配的角色确定最佳行为，以及一个执行模块使用基于规则的模型控制车辆行动。实验结果表明，CoMAL在Flow基准测试中表现出优越性能。此外，我们评估了不同语言模型的影响，并将我们的框架与强化学习方法进行了比较。它突出了LLM智能体的强大合作能力，并提出了一个有希望的解决方案来解决混合自治交通挑战。代码可在https://github.com/Hyan-Yao/CoMAL找到。

更新时间: 2024-10-18 10:53:44

领域: cs.AI,cs.RO,68T42,I.2.11

下载: http://arxiv.org/abs/2410.14368v1

A Survey of Mamba

As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computation complexity of attention calculation. Recently, a novel architecture named Mamba, drawing inspiration from classical state space models (SSMs), has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length. This has sparked an increasing number of studies actively exploring Mamba's potential to achieve impressive performance across diverse domains. Given such rapid evolution, there is a critical need for a systematic review that consolidates existing Mamba-empowered models, offering a comprehensive understanding of this emerging model architecture. In this survey, we therefore conduct an in-depth investigation of recent Mamba-associated studies, covering three main aspects: the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel. Specifically, we first review the foundational knowledge of various representative deep learning models and the details of Mamba-1&2 as preliminaries. Then, to showcase the significance of Mamba for AI, we comprehensively review the related studies focusing on Mamba models' architecture design, data adaptability, and applications. Finally, we present a discussion of current limitations and explore various promising research directions to provide deeper insights for future investigations.

Updated: 2024-10-18 10:46:43

标题: 一份关于曼巴蛇的调查

摘要: 作为最具代表性的深度学习技术之一，Transformer架构赋予了许多先进模型以力量，特别是那些包含数十亿参数的大型语言模型（LLMs），成为深度学习的基石。尽管取得了令人印象深刻的成就，Transformers仍然面临固有的限制，尤其是由于注意力计算的二次计算复杂度而导致的耗时推断。最近，一种名为Mamba的新架构，受传统状态空间模型（SSMs）启发，已经成为建立基础模型的一种有希望的替代方案，提供了与Transformers相当的建模能力，同时保持了关于序列长度的近线性可扩展性。这引发了越来越多的研究积极探索Mamba在各个领域实现令人印象深刻的性能的潜力。鉴于这种迅速发展，迫切需要进行系统性审查，整合现有的基于Mamba的模型，提供对这种新兴模型架构的全面理解。在本调查中，我们深入调查了最近与Mamba相关的研究，涵盖三个主要方面：基于Mamba模型的进展、将Mamba适应各种数据的技术以及Mamba可以胜任的应用。具体而言，我们首先回顾各种代表性深度学习模型的基础知识以及Mamba-1和2的细节作为准备工作。然后，为了展示Mamba对人工智能的重要性，我们全面回顾了相关研究，重点关注Mamba模型的架构设计、数据适应性和应用。最后，我们对当前的限制进行讨论，并探讨各种有希望的研究方向，为未来的调查提供更深入的见解。

更新时间: 2024-10-18 10:46:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.01129v4

Optimization Dynamics of Equivariant and Augmented Neural Networks

We investigate the optimization of neural networks on symmetric data, and compare the strategy of constraining the architecture to be equivariant to that of using data augmentation. Our analysis reveals that that the relative geometry of the admissible and the equivariant layers, respectively, plays a key role. Under natural assumptions on the data, network, loss, and group of symmetries, we show that compatibility of the spaces of admissible layers and equivariant layers, in the sense that the corresponding orthogonal projections commute, implies that the sets of equivariant stationary points are identical for the two strategies. If the linear layers of the network also are given a unitary parametrization, the set of equivariant layers is even invariant under the gradient flow for augmented models. Our analysis however also reveals that even in the latter situation, stationary points may be unstable for augmented training although they are stable for the manifestly equivariant models.

Updated: 2024-10-18 10:31:27

标题: 《等变和增强神经网络的优化动态》

摘要: 我们研究了对称数据上神经网络的优化，并比较了将架构限制为等变和使用数据增强两种策略。我们的分析揭示了可允许层和等变层的相对几何性分别起着关键作用。在对数据、网络、损失和对称群做出自然假设的情况下，我们表明，在允许层和等变层的空间兼容的意义下，对应的正交投影是可交换的，意味着两种策略的等变稳定点集是相同的。如果网络的线性层也具有酉参数化，那么等变层的集合甚至在增强模型的梯度流下也是不变的。然而，我们的分析也揭示了即使在后一种情况下，对于增强训练，稳定点可能是不稳定的，尽管它们对于明显等变模型是稳定的。

更新时间: 2024-10-18 10:31:27

领域: cs.LG,math.OC,68T07, 20C35, 37N40

下载: http://arxiv.org/abs/2303.13458v5

Entity Matching using Large Language Models

Entity matching is the task of deciding whether two entity descriptions refer to the same real-world entity. Entity matching is a central step in most data integration pipelines. Many state-of-the-art entity matching methods rely on pre-trained language models (PLMs) such as BERT or RoBERTa. Two major drawbacks of these models for entity matching are that (i) the models require significant amounts of task-specific training data and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. This paper investigates using generative large language models (LLMs) as a less task-specific training data-dependent and more robust alternative to PLM-based matchers. The study covers hosted and open-source LLMs which can be run locally. We evaluate these models in a zero-shot scenario and a scenario where task-specific training data is available. We compare different prompt designs and the prompt sensitivity of the models. We show that there is no single best prompt but that the prompt needs to be tuned for each model/dataset combination. We further investigate (i) the selection of in-context demonstrations, (ii) the generation of matching rules, as well as (iii) fine-tuning LLMs using the same pool of training data. Our experiments show that the best LLMs require no or only a few training examples to perform comparably to PLMs that were fine-tuned using thousands of examples. LLM-based matchers further exhibit higher robustness to unseen entities. We show that GPT4 can generate structured explanations for matching decisions and can automatically identify potential causes of matching errors by analyzing explanations of wrong decisions. We demonstrate that the model can generate meaningful textual descriptions of the identified error classes, which can help data engineers to improve entity matching pipelines.

Updated: 2024-10-18 10:21:31

标题: 使用大型语言模型进行实体匹配

摘要: 实体匹配是决定两个实体描述是否指代同一现实世界实体的任务。实体匹配是大多数数据集成流程中的核心步骤。许多最先进的实体匹配方法依赖于预训练语言模型（PLMs）如BERT或RoBERTa。这些模型在实体匹配中存在两个主要缺点，即（i）模型需要大量的任务特定训练数据，以及（ii）微调的模型在处理分布外实体时不够稳健。本文调查了使用生成型大型语言模型（LLMs）作为相对不那么依赖于任务特定训练数据且更稳健的PLM基础匹配器的替代方案。研究涵盖了可在本地运行的托管和开源LLMs。我们在零-shot场景和具有任务特定训练数据的场景中评估这些模型。我们比较了不同提示设计和模型的提示敏感性。我们展示了没有单一最佳提示，但每个模型/数据集组合都需要调整提示。我们进一步研究了（i）选择上下文演示，（ii）匹配规则的生成，以及（iii）使用相同的训练数据池对LLMs进行微调。我们的实验表明，最佳的LLMs不需要或只需要少量的训练样本就可以表现得与使用数千个样本进行微调的PLMs相当。基于LLM的匹配器进一步表现出更高的对未知实体的稳健性。我们展示了GPT4可以为匹配决策生成结构化解释，并通过分析错误决策的解释自动识别匹配错误的潜在原因。我们证明了该模型可以生成有意义的文本描述已识别的错误类别，这有助于数据工程师改进实体匹配流程。

更新时间: 2024-10-18 10:21:31

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.11244v4

Assistive AI for Augmenting Human Decision-making

Regulatory frameworks for the use of AI are emerging. However, they trail behind the fast-evolving malicious AI technologies that can quickly cause lasting societal damage. In response, we introduce a pioneering Assistive AI framework designed to enhance human decision-making capabilities. This framework aims to establish a trust network across various fields, especially within legal contexts, serving as a proactive complement to ongoing regulatory efforts. Central to our framework are the principles of privacy, accountability, and credibility. In our methodology, the foundation of reliability of information and information sources is built upon the ability to uphold accountability, enhance security, and protect privacy. This approach supports, filters, and potentially guides communication, thereby empowering individuals and communities to make well-informed decisions based on cutting-edge advancements in AI. Our framework uses the concept of Boards as proxies to collectively ensure that AI-assisted decisions are reliable, accountable, and in alignment with societal values and legal standards. Through a detailed exploration of our framework, including its main components, operations, and sample use cases, the paper shows how AI can assist in the complex process of decision-making while maintaining human oversight. The proposed framework not only extends regulatory landscapes but also highlights the synergy between AI technology and human judgement, underscoring the potential of AI to serve as a vital instrument in discerning reality from fiction and thus enhancing the decision-making process. Furthermore, we provide domain-specific use cases to highlight the applicability of our framework.

Updated: 2024-10-18 10:16:07

标题: 辅助人工智能用于增强人类决策-making

摘要: 人工智能使用的监管框架正在逐渐形成。然而，它们落后于快速发展的恶意人工智能技术，这些技术可能很快造成持久的社会损害。为此，我们介绍了一种创新的辅助人工智能框架，旨在增强人类决策能力。该框架旨在建立信任网络，尤其是在法律背景下，作为对正在进行的监管工作的积极补充。我们框架的核心是隐私、问责和可信性原则。在我们的方法论中，信息和信息来源的可靠性基础建立在维护问责制、增强安全性和保护隐私的能力上。这种方法支持、过滤并潜在地引导交流，从而赋予个人和社区基于人工智能的前沿进展做出明智决策的能力。我们的框架使用董事会的概念作为代理人，共同确保人工智能辅助决策是可靠的、可问责的，并符合社会价值观和法律标准。通过对我们框架的详细探讨，包括其主要组成部分、运作和示例用例，本文展示了人工智能如何在维持人类监督的同时辅助复杂的决策过程。提出的框架不仅扩展了监管领域，还突显了人工智能技术与人类判断之间的协同作用，强调人工智能作为一个重要工具来辨别现实和虚构，从而增强决策过程。此外，我们提供了领域特定的用例，以突显我们框架的适用性。

更新时间: 2024-10-18 10:16:07

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2410.14353v1

A Survey of Multi-Agent Deep Reinforcement Learning with Communication

Communication is an effective mechanism for coordinating the behaviors of multiple agents, broadening their views of the environment, and to support their collaborations. In the field of multi-agent deep reinforcement learning (MADRL), agents can improve the overall learning performance and achieve their objectives by communication. Agents can communicate various types of messages, either to all agents or to specific agent groups, or conditioned on specific constraints. With the growing body of research work in MADRL with communication (Comm-MADRL), there is a lack of a systematic and structural approach to distinguish and classify existing Comm-MADRL approaches. In this paper, we survey recent works in the Comm-MADRL field and consider various aspects of communication that can play a role in designing and developing multi-agent reinforcement learning systems. With these aspects in mind, we propose 9 dimensions along which Comm-MADRL approaches can be analyzed, developed, and compared. By projecting existing works into the multi-dimensional space, we discover interesting trends. We also propose some novel directions for designing future Comm-MADRL systems through exploring possible combinations of the dimensions.

Updated: 2024-10-18 10:14:58

标题: 《具有通信功能的多智能体深度强化学习调查》

摘要: 通信是协调多个代理的行为、拓宽他们对环境的视野并支持他们合作的有效机制。在多智能体深度强化学习（MADRL）领域，代理可以通过通信来提高整体学习性能并实现他们的目标。代理可以传递各种类型的消息，可以传递给所有代理或特定代理组，或根据特定约束条件传递。随着MADRL领域带有通信特性的研究工作不断增加，缺乏一种系统性和结构性方法来区分和分类现有的通信MADRL方法。在本文中，我们调查了通信MADRL领域的最新研究工作，并考虑了通信的各个方面，这些方面可以在设计和开发多智能体强化学习系统中发挥作用。考虑到这些方面，我们提出了9个维度，可以用来分析、开发和比较通信MADRL方法。通过将现有工作投影到多维空间中，我们发现了一些有趣的趋势。我们还提出了一些未来设计通信MADRL系统的新方向，通过探索这些维度的可能组合。

更新时间: 2024-10-18 10:14:58

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2203.08975v2

FAME: Towards Factual Multi-Task Model Editing

Large language models (LLMs) embed extensive knowledge and utilize it to perform exceptionally well across various tasks. Nevertheless, outdated knowledge or factual errors within LLMs can lead to misleading or incorrect responses, causing significant issues in practical applications. To rectify the fatal flaw without the necessity for costly model retraining, various model editing approaches have been proposed to correct inaccurate knowledge within LLMs in a cost-efficient way. To evaluate these model editing methods, previous work introduced a series of datasets. However, most of the previous datasets only contain fabricated data in a single format, which diverges from real-world model editing scenarios, raising doubts about their usability in practice. To facilitate the application of model editing in real-world scenarios, we propose the challenge of practicality. To resolve such challenges and effectively enhance the capabilities of LLMs, we present FAME, an factual, comprehensive, and multi-task dataset, which is designed to enhance the practicality of model editing. We then propose SKEME, a model editing method that uses a novel caching mechanism to ensure synchronization with the real world. The experiments demonstrate that SKEME performs excellently across various tasks and scenarios, confirming its practicality.

Updated: 2024-10-18 10:02:03

标题: FAME:走向事实多任务模型编辑

摘要: 大型语言模型（LLMs）嵌入了广泛的知识并利用它在各种任务中表现出色。然而，LLMs中过时的知识或事实错误可能导致误导性或不正确的响应，在实际应用中引起重大问题。为了纠正这个致命缺陷而无需进行昂贵的模型重新训练，已提出了各种模型编辑方法来以成本效益的方式纠正LLMs中的不准确知识。为了评估这些模型编辑方法，先前的工作引入了一系列数据集。然而，大多数先前的数据集只包含单一格式的虚构数据，与真实世界中的模型编辑场景有所偏离，对其在实践中的可用性产生疑虑。为了促进模型编辑在实际场景中的应用，我们提出了可行性挑战。为了解决这些挑战并有效增强LLMs的能力，我们提出了FAME，一个事实性、全面性和多任务数据集，旨在增强模型编辑的实用性。然后，我们提出了SKEME，一种使用新颖缓存机制来确保与现实世界同步的模型编辑方法。实验证明，SKEME在各种任务和场景中表现出色，证实了其实用性。

更新时间: 2024-10-18 10:02:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.10859v2

Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition

Visual Speech Recognition (VSR) aims to infer speech into text depending on lip movements alone. As it focuses on visual information to model the speech, its performance is inherently sensitive to personal lip appearances and movements, and this makes the VSR models show degraded performance when they are applied to unseen speakers. In this paper, to remedy the performance degradation of the VSR model on unseen speakers, we propose prompt tuning methods of Deep Neural Networks (DNNs) for speaker-adaptive VSR. Specifically, motivated by recent advances in Natural Language Processing (NLP), we finetune prompts on adaptation data of target speakers instead of modifying the pre-trained model parameters. Different from the previous prompt tuning methods mainly limited to Transformer variant architecture, we explore different types of prompts, the addition, the padding, and the concatenation form prompts that can be applied to the VSR model which is composed of CNN and Transformer in general. With the proposed prompt tuning, we show that the performance of the pre-trained VSR model on unseen speakers can be largely improved by using a small amount of adaptation data (e.g., less than 5 minutes), even if the pre-trained model is already developed with large speaker variations. Moreover, by analyzing the performance and parameters of different types of prompts, we investigate when the prompt tuning is preferred over the finetuning methods. The effectiveness of the proposed method is evaluated on both word- and sentence-level VSR databases, LRW-ID and GRID.

Updated: 2024-10-18 09:58:45

标题: 深度神经网络的快速调整用于适应性视觉语音识别

摘要: 视觉语音识别（VSR）旨在仅依靠嘴唇运动将语音推断为文本。由于它专注于利用视觉信息来建模语音，因此其性能在本质上对个人嘴唇外观和运动敏感，这使得当VSR模型应用于未知说话者时表现出降级的性能。本文旨在解决VSR模型在未知说话者身上的性能下降问题，我们提出了一种基于深度神经网络（DNNs）的说话者自适应VSR的提示调整方法。具体来说，受自然语言处理（NLP）最新进展的启发，我们在目标说话者的适应数据上微调提示，而不是修改预先训练的模型参数。与以前主要限于变压器变体架构的提示调整方法不同，我们探索了不同类型的提示，包括添加、填充和连接形式的提示，这些提示可以应用于通常由CNN和变压器组成的VSR模型。通过提出的提示调整，我们展示了即使预先训练的模型已经包含大量说话者变化，也可以通过使用少量适应数据（例如不到5分钟）大大提高预先训练的VSR模型在未知说话者身上的性能。此外，通过分析不同类型提示的性能和参数，我们研究了何时更倾向于使用提示调整而非微调方法。我们在单词和句子级别的VSR数据库LRW-ID和GRID上评估了所提出方法的有效性。

更新时间: 2024-10-18 09:58:45

领域: cs.CL,cs.AI,cs.CV,cs.SD,eess.AS,eess.IV

下载: http://arxiv.org/abs/2302.08102v2

A Scientific Machine Learning Approach for Predicting and Forecasting Battery Degradation in Electric Vehicles

Carbon emissions are rising at an alarming rate, posing a significant threat to global efforts to mitigate climate change. Electric vehicles have emerged as a promising solution, but their reliance on lithium-ion batteries introduces the critical challenge of battery degradation. Accurate prediction and forecasting of battery degradation over both short and long time spans are essential for optimizing performance, extending battery life, and ensuring effective long-term energy management. This directly influences the reliability, safety, and sustainability of EVs, supporting their widespread adoption and aligning with key UN SDGs. In this paper, we present a novel approach to the prediction and long-term forecasting of battery degradation using Scientific Machine Learning framework which integrates domain knowledge with neural networks, offering more interpretable and scientifically grounded solutions for both predicting short-term battery health and forecasting degradation over extended periods. This hybrid approach captures both known and unknown degradation dynamics, improving predictive accuracy while reducing data requirements. We incorporate ground-truth data to inform our models, ensuring that both the predictions and forecasts reflect practical conditions. The model achieved MSE of 9.90 with the UDE and 11.55 with the NeuralODE, in experimental data, a loss of 1.6986 with the UDE, and a MSE of 2.49 in the NeuralODE, demonstrating the enhanced precision of our approach. This integration of data-driven insights with SciML's strengths in interpretability and scalability allows for robust battery management. By enhancing battery longevity and minimizing waste, our approach contributes to the sustainability of energy systems and accelerates the global transition toward cleaner, more responsible energy solutions, aligning with the UN's SDG agenda.

Updated: 2024-10-18 09:57:59

标题: 一种用于预测和预测电动汽车电池衰减的科学机器学习方法

摘要: 碳排放量正以惊人的速度上升，对全球减缓气候变化的努力构成了重大威胁。电动汽车已被视为一种有前途的解决方案，但它们对锂离子电池的依赖引入了电池降解的关键挑战。准确地预测和预测电池在短期和长期内的降解对于优化性能、延长电池寿命和确保有效的长期能源管理至关重要。这直接影响电动汽车的可靠性、安全性和可持续性，支持它们的广泛采用，并与联合国可持续发展目标保持一致。在本文中，我们提出了一种新颖的方法，利用科学机器学习框架对电池降解进行预测和长期预测，该框架将领域知识与神经网络相结合，为预测短期电池健康和预测长期降解提供更易解释和科学基础的解决方案。这种混合方法捕捉了已知和未知的降解动态，提高了预测准确性同时减少了数据需求。我们结合了地面真实数据来指导我们的模型，确保预测和预测都反映了实际情况。在实验数据中，该模型在UDE方面实现了MSE为9.90，在NeuralODE方面为11.55，在UDE方面损失为1.6986，在NeuralODE方面MSE为2.49，展示了我们方法的增强精度。数据驱动洞察力与SciML在可解释性和可扩展性方面的优势的整合，使得电池管理更加稳健。通过提高电池的寿命并最大限度地减少浪费，我们的方法有助于能源系统的可持续性，并加速全球向更清洁、更负责任的能源解决方案过渡，与联合国的可持续发展目标一致。

更新时间: 2024-10-18 09:57:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14347v1

Universal approximation results for neural networks with non-polynomial activation function over non-compact domains

In this paper, we generalize the universal approximation property of single-hidden-layer feed-forward neural networks beyond the classical formulation over compact domains. More precisely, by assuming that the activation function is non-polynomial, we derive universal approximation results for neural networks within function spaces over non-compact subsets of a Euclidean space, e.g., weighted spaces, $L^p$-spaces, and (weighted) Sobolev spaces over unbounded domains, where the latter includes the approximation of the (weak) derivatives. Furthermore, we provide some dimension-independent rates for approximating a function with sufficiently regular and integrable Fourier transform by neural networks with non-polynomial activation function.

Updated: 2024-10-18 09:53:20

标题: 非紧致域上具有非多项式激活函数的神经网络的通用逼近结果

摘要: 在这篇论文中，我们将单隐藏层前馈神经网络的普遍逼近性质推广到超过紧致域的经典表达。更具体地，通过假设激活函数是非多项式的，我们推导出神经网络在欧几里得空间的非紧致子集上的函数空间内的普遍逼近结果，例如加权空间、$L^p$空间以及未被限制域上的(加权) Sobolev空间，其中后者包括(弱)导数的逼近。此外，我们为使用非多项式激活函数的神经网络逼近具有足够规则和可积 Fourier 变换的函数提供了一些与维度无关的速率。

更新时间: 2024-10-18 09:53:20

领域: stat.ML,cs.LG,cs.NE,math.CA

下载: http://arxiv.org/abs/2410.14759v1

Evaluating the evaluators: Towards human-aligned metrics for missing markers reconstruction

Animation data is often obtained through optical motion capture systems, which utilize a multitude of cameras to establish the position of optical markers. However, system errors or occlusions can result in missing markers, the manual cleaning of which can be time-consuming. This has sparked interest in machine learning-based solutions for missing marker reconstruction in the academic community. Most academic papers utilize a simplistic mean square error as the main metric. In this paper, we show that this metric does not correlate with subjective perception of the fill quality. We introduce and evaluate a set of better-correlated metrics that can drive progress in the field.

Updated: 2024-10-18 09:44:35

标题: 评估评估者：朝向面向人类的度量标准的缺失标记重建

摘要: 动画数据通常通过光学运动捕捉系统获取，该系统利用多台摄像机确定光学标记的位置。然而，系统错误或遮挡可能导致标记丢失，手动清理可能耗时。这引起了学术界对基于机器学习的遗失标记重建解决方案的兴趣。大多数学术论文将简单均方误差作为主要指标。在本文中，我们展示了这一指标与填充质量的主观感知没有相关性。我们引入并评估了一组更相关的指标，可以推动该领域的进展。

更新时间: 2024-10-18 09:44:35

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.14334v1

Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Language Model Agents

Some have criticised Generative AI Systems for replicating the familiar pathologies of already widely-deployed AI systems. Other critics highlight how they foreshadow vastly more powerful future systems, which might threaten humanity's survival. The first group says there is nothing new here; the other looks through the present to a perhaps distant horizon. In this paper, I instead pay attention to what makes these particular systems distinctive: both their remarkable scientific achievement, and the most likely and consequential ways in which they will change society over the next five to ten years. In particular, I explore the potential societal impacts and normative questions raised by the looming prospect of 'Language Model Agents', in which multimodal large language models (LLMs) form the executive centre of complex, tool-using AI systems that can take unsupervised sequences of actions towards some goal.

Updated: 2024-10-18 09:43:09

标题: 前沿人工智能伦理：预测和评估语言模型代理的社会影响

摘要: 一些人批评生成式人工智能系统复制了已经广泛部署的人工智能系统的熟悉病态。其他批评者强调它们预示着未来更强大的系统，可能威胁到人类的生存。第一组批评者说这里没有什么新东西；另一组则通过当前看到可能遥远的地平线。在本文中，我关注这些特定系统的独特之处：它们的卓越科学成就，以及它们在未来五到十年内将如何改变社会的最有可能和最重要的方式。特别是，我探讨了“语言模型代理”的潜在社会影响和规范性问题，其中多模态大型语言模型(LLMs)构成了可以朝着某个目标采取无监督行动序列的复杂、使用工具的人工智能系统的执行中心。

更新时间: 2024-10-18 09:43:09

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2404.06750v2

Understanding Likelihood Over-optimisation in Direct Alignment Algorithms

Direct Alignment Algorithms (DAAs), such as Direct Preference Optimisation (DPO) and Identity Preference Optimisation (IPO), have emerged as alternatives to online Reinforcement Learning from Human Feedback (RLHF) algorithms such as Proximal Policy Optimisation (PPO) for aligning language models to human preferences, without the need for explicit reward modelling. These methods generally aim to increase the likelihood of generating better (preferred) completions while discouraging worse (non-preferred) ones, while staying close to the original model's behaviour. In this work, we explore the relationship between completion likelihood and model performance in state-of-the-art DAAs, and identify a critical issue of likelihood over-optimisation. Contrary to expectations, we find that higher likelihood of better completions and larger margins between better and worse completion likelihoods do not necessarily lead to better performance, and may even degrade it. Our analysis reveals that while higher likelihood correlates with better memorisation of factual knowledge patterns, a slightly lower completion likelihood tends to improve output diversity, thus leading to better generalisation to unseen scenarios. Moreover, we identify two key indicators that signal when over-optimised output diversity begins to harm performance: Decreasing Entropy over Top-k Tokens and Diminishing Top-k Probability Mass. Our experimental results validate that these indicators are reliable signs of declining performance under different regularisations, helping prevent over-optimisation and improve alignment with human preferences.

Updated: 2024-10-18 09:41:53

标题: 理解直接对准算法中可能的过度优化

摘要: 直接对齐算法（DAAs），如直接偏好优化（DPO）和身份偏好优化（IPO），已成为在线强化学习从人类反馈（RLHF）算法的替代方案，例如Proximal Policy Optimisation（PPO），用于将语言模型对齐到人类偏好，而无需明确奖励建模。这些方法通常旨在增加生成更好（首选）完成的可能性，同时阻止更差（非首选）完成，同时保持接近原始模型的行为。在这项工作中，我们探讨了最先进的DAAs中完成可能性和模型性能之间的关系，并确定了可能性过度优化的关键问题。与预期相反，我们发现更好完成的更高可能性和更好完成可能性之间的较大差距并不一定导致更好的性能，甚至可能使性能下降。我们的分析表明，虽然更高的可能性与更好地记忆事实知识模式相关，但略低的完成可能性倾向于改善输出多样性，从而导致更好地泛化到未见过的情况。此外，我们确定了两个关键指标，用于指示何时过度优化的输出多样性开始损害性能：Top-k标记上的熵降低和Top-k概率质量减小。我们的实验结果验证了这些指标在不同的正则化下是性能下降的可靠迹象，有助于防止过度优化并改善与人类偏好的对齐。

更新时间: 2024-10-18 09:41:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.11677v2

Fast proxy centers for Jeffreys centroids: The Jeffreys-Fisher-Rao and the inductive Gauss-Bregman centers

The symmetric Kullback-Leibler centroid also called the Jeffreys centroid of a set of mutually absolutely continuous probability distributions on a measure space provides a notion of centrality which has proven useful in many tasks including information retrieval, information fusion, and clustering in image, video and sound processing. However, the Jeffreys centroid is not available in closed-form for sets of categorical or normal distributions, two widely used statistical models, and thus need to be approximated numerically in practice. In this paper, we first propose the new Jeffreys-Fisher-Rao center defined as the Fisher-Rao midpoint of the sided Kullback-Leibler centroids as a plug-in replacement of the Jeffreys centroid. This Jeffreys-Fisher-Rao center admits a generic formula for uni-parameter exponential family distributions, and closed-form formula for categorical and normal distributions, matches exactly the Jeffreys centroid for same-mean normal distributions, and is experimentally observed in practice to be close to the Jeffreys centroid. Second, we define a new type of inductive centers generalizing the principle of Gauss arithmetic-geometric double sequence mean for pairs of densities of any given exponential family. This center is shown experimentally to approximate very well the Jeffreys centroid and is suggested to use when the Jeffreys-Fisher-Rao center is not available in closed form. Moreover, this Gauss-Bregman inductive center always converges and matches the Jeffreys centroid for sets of same-mean normal distributions. We report on our experiments demonstrating the use of the Jeffreys-Fisher-Rao and Gauss-Bregman centers instead of the Jeffreys centroid. Finally, we conclude this work by reinterpreting these fast proxy centers of Jeffreys centroids under the lens of dually flat spaces in information geometry.

Updated: 2024-10-18 09:37:38

标题: Jeffreys质心的快速代理中心：Jeffreys-Fisher-Rao和归纳高斯-Bregman中心

摘要: 对称Kullback-Leibler质心，也称为Jeffreys质心，是在测度空间上一组互相绝对连续的概率分布提供了一个中心性概念，在许多任务中已被证明是有用的，包括信息检索、信息融合以及图像、视频和声音处理中的聚类。然而，对于一组分类或正态分布，即两种广泛使用的统计模型，Jeffreys质心并不是以闭合形式提供的，因此在实践中需要通过数值逼近。在本文中，我们首先提出了新的Jeffreys-Fisher-Rao中心，定义为侧面Kullback-Leibler质心的Fisher-Rao中点，作为Jeffreys质心的替代。这个Jeffreys-Fisher-Rao中心对于单参数指数族分布具有通用公式，并且对于分类和正态分布有闭式公式，与相同均值正态分布的Jeffreys质心完全匹配，在实践中观察到与Jeffreys质心接近。其次，我们定义了一种新型归纳中心，推广了高斯算术-几何双序列均值原则，适用于任何给定指数族的密度对。实验表明，这种中心非常接近Jeffreys质心，当Jeffreys-Fisher-Rao中心不以闭式形式提供时建议使用。此外，这种高斯-Bregman归纳中心总是收敛，并且对于相同均值正态分布集合与Jeffreys质心匹配。我们报告了使用Jeffreys-Fisher-Rao和高斯-Bregman中心代替Jeffreys质心的实验结果。最后，我们通过信息几何中的双平滑空间重新解释了这些快速代理质心的Jeffreys质心。

更新时间: 2024-10-18 09:37:38

领域: cs.IT,cs.CV,cs.LG,math.IT

下载: http://arxiv.org/abs/2410.14326v1

Debiasing Mini-Batch Quadratics for Applications in Deep Learning

Quadratic approximations form a fundamental building block of machine learning methods. E.g., second-order optimizers try to find the Newton step into the minimum of a local quadratic proxy to the objective function; and the second-order approximation of a network's loss function can be used to quantify the uncertainty of its outputs via the Laplace approximation. When computations on the entire training set are intractable - typical for deep learning - the relevant quantities are computed on mini-batches. This, however, distorts and biases the shape of the associated stochastic quadratic approximations in an intricate way with detrimental effects on applications. In this paper, we (i) show that this bias introduces a systematic error, (ii) provide a theoretical explanation for it, (iii) explain its relevance for second-order optimization and uncertainty quantification via the Laplace approximation in deep learning, and (iv) develop and evaluate debiasing strategies.

Updated: 2024-10-18 09:37:05

标题: 校正深度学习应用中的小批量二次项

摘要: 二次逼近是机器学习方法的基本构建块。例如，二阶优化器试图找到一个局部二次函数的牛顿步进入目标函数的最小值；而网络损失函数的二阶逼近可以通过拉普拉斯逼近来量化其输出的不确定性。当整个训练集的计算是不可行的 - 这在深度学习中很典型 - 相关数量是在小批量上计算的。然而，这会以一种复杂的方式扭曲和偏离相关随机二次逼近的形状，对应用产生不利影响。在本文中，我们（i）展示这种偏差引入了系统误差，（ii）为其提供了理论解释，（iii）解释其在深度学习中的二阶优化和通过拉普拉斯逼近进行不确定性量化中的相关性，并（iv）开发和评估去偏方法。

更新时间: 2024-10-18 09:37:05

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.14325v1

Interpreting Microbiome Relative Abundance Data Using Symbolic Regression

Understanding the complex interactions within the microbiome is crucial for developing effective diagnostic and therapeutic strategies. Traditional machine learning models often lack interpretability, which is essential for clinical and biological insights. This paper explores the application of symbolic regression (SR) to microbiome relative abundance data, with a focus on colorectal cancer (CRC). SR, known for its high interpretability, is compared against traditional machine learning models, e.g., random forest, gradient boosting decision trees. These models are evaluated based on performance metrics such as F1 score and accuracy. We utilize 71 studies encompassing, from various cohorts, over 10,000 samples across 749 species features. Our results indicate that SR not only competes reasonably well in terms of predictive performance, but also excels in model interpretability. SR provides explicit mathematical expressions that offer insights into the biological relationships within the microbiome, a crucial advantage for clinical and biological interpretation. Our experiments also show that SR can help understand complex models like XGBoost via knowledge distillation. To aid in reproducibility and further research, we have made the code openly available at https://github.com/swag2198/microbiome-symbolic-regression .

Updated: 2024-10-18 09:35:51

标题: 使用符号回归解释微生物组相对丰度数据

摘要: 理解微生物组内复杂的相互作用对于发展有效的诊断和治疗策略至关重要。传统的机器学习模型经常缺乏可解释性，这对于临床和生物学洞见至关重要。本文探讨了符号回归（SR）在微生物组相对丰度数据中的应用，重点关注结肠癌（CRC）。SR以其高可解释性而闻名，与传统的机器学习模型，如随机森林、梯度提升决策树进行比较。这些模型根据F1分数和准确性等性能指标进行评估。我们利用71项研究，涵盖了来自不同队列的超过10,000个样本，涵盖了749个物种特征。我们的结果表明，SR不仅在预测性能方面表现得相当不错，而且在模型可解释性方面表现出色。SR提供了明确的数学表达式，可以深入了解微生物组内的生物关系，这对于临床和生物学解释至关重要。我们的实验还显示，SR可以通过知识蒸馏帮助理解XGBoost等复杂模型。为了促进可重复性和进一步研究，我们已在https://github.com/swag2198/microbiome-symbolic-regression上公开提供了代码。

更新时间: 2024-10-18 09:35:51

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2410.16109v1

From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting

Large Language Models (LLMs) have shown remarkable potential in code generation, making them increasingly important in the field. However, the security issues of generated code have not been fully addressed, and the usability of LLMs in code generation still requires further exploration. This work introduces SecCode, a framework that leverages an innovative interactive encouragement prompting (EP) technique for secure code generation with \textit{only NL} prompts. This approach ensures that the prompts can be easily shared and understood by general users. SecCode functions through three stages: 1) Code Generation using NL Prompts; 2) Code Vulnerability Detection and Fixing, utilising our proposed encouragement prompting; 3) Vulnerability Cross-Checking and Code Security Refinement. These stages are executed in multiple interactive iterations to progressively enhance security. By using both proprietary LLMs (i.e., GPT-3.5 Turbo, GPT-4 and GPT-4o) and open-source LLMs (i.e., Llama 3.1 8B Instruct, DeepSeek Coder V2 Lite Instruct) evaluated on three benchmark datasets, extensive experimental results show that our proposed SecCode greatly outperforms compared baselines, generating secure code with a high vulnerability correction rate. For example, SecCode exhibits a high fix success rate of over 76\% after running 5 automated EP interactive iterations and over 89\% after running 10 automated EP interactive iterations. To the best of our knowledge, this work is the first to formulate secure code generation with NL prompts only. We have open-sourced our code and encourage the community to focus on secure code generation.

Updated: 2024-10-18 09:32:08

标题: 从孤立的指令到互动鼓励！通过自然语言提示实现LLM安全代码生成

摘要: 大型语言模型(LLMs)在代码生成方面展现出了显著的潜力，使它们在该领域变得越来越重要。然而，生成代码的安全问题尚未得到充分解决，LLMs在代码生成中的可用性仍需要进一步探索。本文介绍了SecCode，这是一个利用创新的交互式鼓励提示(EP)技术进行安全代码生成的框架，仅使用自然语言提示。这种方法确保提示可以轻松地被一般用户共享和理解。SecCode通过三个阶段运行：1) 使用自然语言提示生成代码；2) 利用我们提出的鼓励提示检测和修复代码漏洞；3) 漏洞交叉检查和代码安全性改进。这些阶段在多次交互迭代中执行，逐渐增强安全性。通过在三个基准数据集上评估专有LLMs(例如GPT-3.5 Turbo、GPT-4和GPT-4o)和开源LLMs(例如Llama 3.1 8B Instruct、DeepSeek Coder V2 Lite Instruct)，广泛的实验结果表明我们提出的SecCode在生成安全代码方面远远超过了比较基准，具有高漏洞修复率。例如，SecCode在运行5次自动EP交互迭代后显示出超过76\%的修复成功率，在运行10次自动EP交互迭代后超过89\%。据我们所知，这项工作是第一个仅使用自然语言提示来制定安全代码生成的工作。我们已开源我们的代码，并鼓励社区关注安全代码生成。

更新时间: 2024-10-18 09:32:08

领域: cs.CR,cs.PL,cs.SE

下载: http://arxiv.org/abs/2410.14321v1

Not Sure Your Car Withstands Cyberwarfare

Data and derived information about target victims has always been key for successful attacks, both during historical wars and modern cyber wars. Ours turns out to be an era in which modern cars generate a plethora of data about their drivers, and such data could be extremely attractive for offenders. This paper seeks to assess how well modern cars protect their drivers' data. It pursues its goal at a requirement level by analysing the gaps of the privacy policies of chief automakers such as BMW and Mercedes with respect to the General Data Protection Regulation (GDPR). It is found that both brands are still imprecise about how they comply with a number of GDPR articles, hence compliance often results non-verifiable. Most importantly, while BMW exhibits slightly broader compliance, both brands still fail to comply with a number of relevant articles of the regulation. An interpretation of these findings is a non-negligible likelihood that your car may turn against you should cyberwarfare break out.

Updated: 2024-10-18 09:29:39

标题: 不确定你的汽车能否抵御网络战争

摘要: 数据和关于目标受害者的派生信息一直是成功攻击的关键，无论是在历史战争还是现代网络战中。我们正处于一个现代汽车生成大量关于驾驶员的数据的时代，这些数据可能极具吸引力，对攻击者来说。本文旨在评估现代汽车如何保护其驾驶员的数据。它在需求级别上通过分析主要汽车制造商（如宝马和奔驰）与《通用数据保护条例》（GDPR）相关的隐私政策的差距来追求其目标。发现这两个品牌在如何遵守GDPR的一些条款上仍存在不清晰之处，因此合规性通常是不可验证的。最重要的是，尽管宝马展示了稍微更广泛的合规性，但这两个品牌仍未遵守一些相关条例。根据这些发现的解释是，如果网络战爆发，你的汽车可能会对你发起攻击的可能性是不可忽视的。

更新时间: 2024-10-18 09:29:39

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2410.14320v1

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Accurate interpretation and visualization of human instructions are crucial for text-to-image (T2I) synthesis. However, current models struggle to capture semantic variations from word order changes, and existing evaluations, relying on indirect metrics like text-image similarity, fail to reliably assess these challenges. This often obscures poor performance on complex or uncommon linguistic patterns by the focus on frequent word combinations. To address these deficiencies, we propose a novel metric called SemVarEffect and a benchmark named SemVarBench, designed to evaluate the causality between semantic variations in inputs and outputs in T2I synthesis. Semantic variations are achieved through two types of linguistic permutations, while avoiding easily predictable literal variations. Experiments reveal that the CogView-3-Plus and Ideogram 2 performed the best, achieving a score of 0.2/1. Semantic variations in object relations are less understood than attributes, scoring 0.07/1 compared to 0.17-0.19/1. We found that cross-modal alignment in UNet or Transformers plays a crucial role in handling semantic variations, a factor previously overlooked by a focus on textual encoders. Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding. Our benchmark and code are available at https://github.com/zhuxiangru/SemVarBench .

Updated: 2024-10-18 09:26:46

标题: 评估文本到图像合成中的语义变化：因果透视

摘要: 人类指令的准确解释和可视化对于文本到图像（T2I）合成至关重要。然而，当前模型很难捕捉从词序变化中产生的语义变化，并且现有的评估依赖于间接指标（如文本图像相似性），无法可靠地评估这些挑战。这经常会由于专注于频繁的词组合而掩盖对复杂或不常见语言模式的表现不佳。为了解决这些不足，我们提出了一种称为SemVarEffect的新度量标准和一个名为SemVarBench的基准，旨在评估T2I合成中输入和输出之间的语义变化之间的因果关系。语义变化通过两种类型的语言排列实现，同时避免了容易预测的字面变化。实验显示，CogView-3-Plus和Ideogram 2表现最佳，得分为0.2/1。与属性相比，对象关系中的语义变化不太理解，得分为0.07/1，而属性得分为0.17-0.19/1。我们发现UNet或Transformers中的跨模态对齐在处理语义变化中起着至关重要的作用，这是先前专注于文本编码器而忽视的因素。我们的工作建立了一个有效的评估框架，推动了T2I合成社区对人类指令理解的探索。我们的基准和代码可在https://github.com/zhuxiangru/SemVarBench找到。

更新时间: 2024-10-18 09:26:46

领域: cs.CL,cs.AI,cs.MM

下载: http://arxiv.org/abs/2410.10291v2

Optimizing importance weighting in the presence of sub-population shifts

A distribution shift between the training and test data can severely harm performance of machine learning models. Importance weighting addresses this issue by assigning different weights to data points during training. We argue that existing heuristics for determining the weights are suboptimal, as they neglect the increase of the variance of the estimated model due to the finite sample size of the training data. We interpret the optimal weights in terms of a bias-variance trade-off, and propose a bi-level optimization procedure in which the weights and model parameters are optimized simultaneously. We apply this optimization to existing importance weighting techniques for last-layer retraining of deep neural networks in the presence of sub-population shifts and show empirically that optimizing weights significantly improves generalization performance.

Updated: 2024-10-18 09:21:10

标题: 在子群体转变存在的情况下优化重要性加权

摘要: 机器学习模型的性能可能会受到训练数据和测试数据之间的分布变化的严重影响。重要性加权通过在训练过程中为数据点分配不同的权重来解决这个问题。我们认为，现有的确定权重的启发式方法并不是最佳的，因为它们忽视了由于训练数据的有限样本量而导致估计模型方差增加。我们将最优权重解释为偏差-方差权衡，并提出了一个双层优化过程，其中权重和模型参数同时优化。我们将这种优化应用于现有的重要性加权技术，用于深度神经网络最后一层的重新训练，以应对亚种群变化，并在经验上表明优化权重显著提高了泛化性能。

更新时间: 2024-10-18 09:21:10

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.14315v1

Game Theory with Simulation in the Presence of Unpredictable Randomisation

AI agents will be predictable in certain ways that traditional agents are not. Where and how can we leverage this predictability in order to improve social welfare? We study this question in a game-theoretic setting where one agent can pay a fixed cost to simulate the other in order to learn its mixed strategy. As a negative result, we prove that, in contrast to prior work on pure-strategy simulation, enabling mixed-strategy simulation may no longer lead to improved outcomes for both players in all so-called "generalised trust games". In fact, mixed-strategy simulation does not help in any game where the simulatee's action can depend on that of the simulator. We also show that, in general, deciding whether simulation introduces Pareto-improving Nash equilibria in a given game is NP-hard. As positive results, we establish that mixed-strategy simulation can improve social welfare if the simulator has the option to scale their level of trust, if the players face challenges with both trust and coordination, or if maintaining some level of privacy is essential for enabling cooperation.

Updated: 2024-10-18 09:17:18

标题: 博弈论在不可预测随机化情况下的模拟研究

摘要: AI代理在某些方面将是可预测的，而传统代理则不是。我们在哪里以及如何可以利用这种可预测性来提高社会福利？我们在一个博弈论设置中研究了这个问题，其中一个代理可以支付固定成本来模拟另一个代理以学习其混合策略。作为一个负面结果，我们证明，与先前关于纯策略模拟的工作相反，启用混合策略模拟可能不再导致在所有所谓的“广义信任博弈”中对两个玩家都产生改善的结果。事实上，在任何模拟对象的行动可能取决于模拟器的情况下，混合策略模拟也不会有所帮助。我们还展示，在一般情况下，决定模拟是否在给定游戏中引入帕累托改进的纳什均衡是NP难的。作为积极的结果，我们建立了如果模拟器有选择扩展他们的信任水平的选项，如果玩家面临信任和协调的挑战，或者如果维持一定程度的隐私对于促进合作至关重要，混合策略模拟可以提高社会福利。

更新时间: 2024-10-18 09:17:18

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2410.14311v1

Transferring Tactile Data Across Sensors

Tactile perception is essential for human interaction with the environment and is becoming increasingly crucial in robotics. Tactile sensors like the BioTac mimic human fingertips and provide detailed interaction data. Despite its utility in applications like slip detection and object identification, this sensor is now deprecated, making many existing datasets obsolete. This article introduces a novel method for translating data between tactile sensors by exploiting sensor deformation information rather than output signals. We demonstrate the approach by translating BioTac signals into the DIGIT sensor. Our framework consists of three steps: first, converting signal data into corresponding 3D deformation meshes; second, translating these 3D deformation meshes from one sensor to another; and third, generating output images using the converted meshes. Our approach enables the continued use of valuable datasets.

Updated: 2024-10-18 09:15:47

标题: 跨传感器传输触觉数据

摘要: 触觉知觉对人类与环境的互动至关重要，并在机器人技术中变得越来越关键。类似BioTac的触觉传感器模拟人类指尖并提供详细的互动数据。尽管在滑动检测和物体识别等应用中具有实用性，但该传感器目前已不再使用，使许多现有数据集变得过时。本文介绍了一种新方法，通过利用传感器变形信息而非输出信号来在触觉传感器之间转换数据。我们通过将BioTac信号转换为DIGIT传感器来演示这一方法。我们的框架包括三个步骤：首先，将信号数据转换为相应的3D变形网格；其次，将这些3D变形网格从一个传感器转换到另一个传感器；第三，利用转换后的网格生成输出图像。我们的方法使宝贵的数据集得以继续使用。

更新时间: 2024-10-18 09:15:47

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.14310v1

LoGU: Long-form Generation with Uncertainty Expressions

While Large Language Models (LLMs) demonstrate impressive capabilities, they still struggle with generating factually incorrect content (i.e., hallucinations). A promising approach to mitigate this issue is enabling models to express uncertainty when unsure. Previous research on uncertainty modeling has primarily focused on short-form QA, but realworld applications often require much longer responses. In this work, we introduce the task of Long-form Generation with Uncertainty(LoGU). We identify two key challenges: Uncertainty Suppression, where models hesitate to express uncertainty, and Uncertainty Misalignment, where models convey uncertainty inaccurately. To tackle these challenges, we propose a refinement-based data collection framework and a two-stage training pipeline. Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims. The collected data are then used in training through supervised fine-tuning (SFT) and direct preference optimization (DPO) to enhance uncertainty expression. Extensive experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.

Updated: 2024-10-18 09:15:35

标题: LoGU：使用不确定表达式进行长篇生成

摘要: 尽管大型语言模型(LLMs)展示了令人印象深刻的能力，但它们仍然在生成事实不正确的内容（即幻觉）方面存在困难。缓解这一问题的一个有前途的方法是在模型不确定时使其表达不确定性。以往关于不确定性建模的研究主要集中在短形式问答，但现实世界的应用通常需要更长的回答。在这项工作中，我们引入了长形式生成与不确定性（LoGU）任务。我们确定了两个关键挑战：不确定性抑制，即模型在表达不确定性时犹豫不决，以及不确定性错位，即模型不准确地传达不确定性。为了解决这些挑战，我们提出了一个基于精细化数据收集框架和一个两阶段训练流程。我们的框架采用分而治之的策略，基于原子命题对不确定性进行细化。然后，通过监督微调（SFT）和直接偏好优化（DPO）将收集的数据用于训练，以增强不确定性表达。对三个长形式指令遵循数据集进行的大量实验显示，我们的方法显著提高了准确性，减少了幻觉，并保持了回答的全面性。

更新时间: 2024-10-18 09:15:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14309v1

Mitigating Embedding Collapse in Diffusion Models for Categorical Data

Latent diffusion models have enabled continuous-state diffusion models to handle a variety of datasets, including categorical data. However, most methods rely on fixed pretrained embeddings, limiting the benefits of joint training with the diffusion model. While jointly learning the embedding (via reconstruction loss) and the latent diffusion model (via score matching loss) could enhance performance, our analysis shows that end-to-end training risks embedding collapse, degrading generation quality. To address this issue, we introduce CATDM, a continuous diffusion framework within the embedding space that stabilizes training. We propose a novel objective combining the joint embedding-diffusion variational lower bound with a Consistency-Matching (CM) regularizer, alongside a shifted cosine noise schedule and random dropping strategy. The CM regularizer ensures the recovery of the true data distribution. Experiments on benchmarks show that CATDM mitigates embedding collapse, yielding superior results on FFHQ, LSUN Churches, and LSUN Bedrooms. In particular, CATDM achieves an FID of 6.81 on ImageNet $256\times256$ with 50 steps. It outperforms non-autoregressive models in machine translation and is on a par with previous methods in text generation.

Updated: 2024-10-18 09:12:33

标题: 在用于分类数据的扩散模型中缓解嵌入坍塌

摘要: 潜在扩散模型使连续状态扩散模型能够处理各种数据集，包括分类数据。然而，大多数方法依赖于固定的预训练嵌入，限制了与扩散模型的联合训练的好处。虽然通过重建损失联合学习嵌入和潜在扩散模型可以提高性能，但我们的分析显示，端到端训练存在嵌入坍缩的风险，降低了生成质量。为解决这个问题，我们引入了CATDM，一个在嵌入空间内的连续扩散框架，稳定了训练。我们提出了一个新颖的目标，结合了联合嵌入-扩散变分下界和一致性匹配（CM）正则化器，以及一个偏移余弦噪声计划和随机丢弃策略。CM正则化器确保了真实数据分布的恢复。在基准测试中的实验表明，CATDM减轻了嵌入坍缩，在FFHQ、LSUN教堂和LSUN卧室上产生了更好的结果。特别是，CATDM在ImageNet $256\times256$上以50步的FID为6.81。它在机器翻译中优于非自回归模型，并且在文本生成中与先前的方法不相上下。

更新时间: 2024-10-18 09:12:33

领域: cs.LG

下载: http://arxiv.org/abs/2410.14758v1

GLANCE: Global Actions in a Nutshell for Counterfactual Explainability

The widespread deployment of machine learning systems in critical real-world decision-making applications has highlighted the urgent need for counterfactual explainability methods that operate effectively. Global counterfactual explanations, expressed as actions to offer recourse, aim to provide succinct explanations and insights applicable to large population subgroups. Effectiveness is measured by the fraction of the population that is provided recourse, ensuring that the actions benefit as many individuals as possible. Keeping the cost of actions low ensures the proposed recourse actions remain practical and actionable. Limiting the number of actions that provide global counterfactuals is essential to maximize interpretability. The primary challenge, therefore, is balancing these trade-offs, i.e., maximizing effectiveness, minimizing cost, while maintaining a small number of actions. We introduce GLANCE, a versatile and adaptive framework, comprising two algorithms, that allows the careful balancing of the trade-offs among the three key objectives, with the size objective functioning as a tunable parameter to keep the actions few and easy to interpret. C-GLANCE employs a clustering approach that considers both the feature space and the space of counterfactual actions, thereby accounting for the distribution of points in a way that aligns with the structure of the model. T-GLANCE provides additional features to enhance flexibility. It employs a tree-based approach, that allows users to specify split features, to build a decision tree with a single counterfactual action at each node that can be used as a subgroup policy. Our extensive experimental evaluation demonstrates that our method consistently shows greater robustness and performance compared to existing methods across various datasets and models.

Updated: 2024-10-18 09:05:18

标题: GLANCE：反事实可解释性的全球行动要点

摘要: 在关键的现实决策应用中，机器学习系统的广泛部署突显了迫切需要有效运作的反事实可解释性方法。作为提供救济的行动表达的全局反事实解释旨在提供简洁的解释和适用于大量人群子群的见解。有效性通过提供救济的人口比例来衡量，确保行动尽可能多地造福个人。保持行动成本低确保提出的救济行动保持实用性和可操作性。限制提供全局反事实的行动数量对于最大化可解释性至关重要。因此，主要挑战在于平衡这些权衡，即最大化有效性，最小化成本，同时保持少量行动。我们引入了GLANCE，一个多功能和自适应的框架，包括两个算法，允许在三个关键目标之间谨慎平衡权衡，尺寸目标作为一个可调参数，使行动少而易于解释。C-GLANCE采用聚类方法，考虑特征空间和反事实行动空间，从而考虑点的分布，使其与模型结构一致。T-GLANCE提供额外功能以增强灵活性。它采用基于树的方法，允许用户指定拆分特征，构建一个决策树，每个节点只有一个反事实行动，可以用作子群政策。我们进行了广泛的实验评估，结果表明，与现有方法相比，我们的方法在各种数据集和模型上始终表现出更高的鲁棒性和性能。

更新时间: 2024-10-18 09:05:18

领域: cs.LG

下载: http://arxiv.org/abs/2405.18921v2

Dating ancient manuscripts using radiocarbon and AI-based writing style analysis

Determining the chronology of ancient handwritten manuscripts is essential for reconstructing the evolution of ideas. For the Dead Sea Scrolls, this is particularly important. However, there is an almost complete lack of date-bearing manuscripts evenly distributed across the timeline and written in similar scripts available for palaeographic comparison. Here, we present Enoch, a state-of-the-art AI-based date-prediction model, trained on the basis of new radiocarbon-dated samples of the scrolls. Enoch uses established handwriting-style descriptors and applies Bayesian ridge regression. The challenge of this study is that the number of radiocarbon-dated manuscripts is small, while current machine learning requires an abundance of training data. We show that by using combined angular and allographic writing style feature vectors and applying Bayesian ridge regression, Enoch could predict the radiocarbon-based dates from style, supported by leave-one-out validation, with varied MAEs of 27.9 to 30.7 years relative to the radiocarbon dating. Enoch was then used to estimate the dates of 135 unseen manuscripts, revealing that 79 per cent of the samples were considered 'realistic' upon palaeographic post-hoc evaluation. We present a new chronology of the scrolls. The radiocarbon ranges and Enoch's style-based predictions are often older than the traditionally assumed palaeographic estimates. In the range of 300-50 BCE, Enoch's date prediction provides an improved granularity. The study is in line with current developments in multimodal machine-learning techniques, and the methods can be used for date prediction in other partially-dated manuscript collections. This research shows how Enoch's quantitative, probability-based approach can be a tool for palaeographers and historians, re-dating ancient Jewish key texts and contributing to current debates on Jewish and Christian origins.

Updated: 2024-10-18 08:57:30

标题: 使用放射性碳和基于人工智能的写作风格分析技术来确定古代手稿的年代

摘要: 确定古代手稿的年代对于重建思想演变过程至关重要。对于死海古卷来说，这一点尤为重要。然而，几乎没有均匀分布在时间线上并以类似书写风格书写的带有日期的手稿可供古文字比较。在这里，我们提出了一个名为Enoch的AI基于日期预测模型，该模型基于新的古卷放射碳测年样本进行训练。Enoch使用已建立的书写风格描述符，并应用贝叶斯岭回归。这项研究的挑战在于放射碳测年手稿的数量很少，而当前的机器学习需要大量的训练数据。我们展示了通过使用组合的角度和字形书写风格特征向量，并应用贝叶斯岭回归，Enoch可以从风格中预测出放射碳日期，支持一次性验证，相对于放射碳测年，平均绝对误差在27.9至30.7年之间。Enoch随后被用于估计135份未见手稿的日期，结果显示79%的样本在古文字事后评估中被认为是“现实的”。我们呈现了古卷的新年代表。放射碳范围和Enoch基于风格的预测往往比传统认为的古文字估计要老。在公元前300-50年的范围内，Enoch的日期预测提供了更好的细化。该研究符合当前多模式机器学习技术的发展方向，这些方法可以用于其他部分日期手稿收藏的日期预测。这项研究展示了Enoch的量化、基于概率的方法如何成为古文字学家和历史学家的工具，重新确定古代犹太重要文本的年代，并为当前关于犹太教和基督教起源的争论做出贡献。

更新时间: 2024-10-18 08:57:30

领域: cs.DL,cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.12013v2

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Perceiving and generating diverse modalities are crucial for AI models to effectively learn from and engage with real-world signals, necessitating reliable evaluations for their development. We identify two major issues in current evaluations: (1) inconsistent standards, shaped by different communities with varying protocols and maturity levels; and (2) significant query, grading, and generalization biases. To address these, we introduce MixEval-X, the first any-to-any, real-world benchmark designed to optimize and standardize evaluations across diverse input and output modalities. We propose multi-modal benchmark mixture and adaptation-rectification pipelines to reconstruct real-world task distributions, ensuring evaluations generalize effectively to real-world use cases. Extensive meta-evaluations show our approach effectively aligns benchmark samples with real-world task distributions. Meanwhile, MixEval-X's model rankings correlate strongly with that of crowd-sourced real-world evaluations (up to 0.98) while being much more efficient. We provide comprehensive leaderboards to rerank existing models and organizations and offer insights to enhance understanding of multi-modal evaluations and inform future research.

Updated: 2024-10-18 08:56:52

标题: MixEval-X: 来自真实世界数据混合的任意评估

摘要: 感知和生成多种模态对于人工智能模型有效地学习和与现实世界信号进行交互至关重要，这需要可靠的评估来推动它们的发展。我们确定了当前评估中的两个主要问题：（1）不一致的标准，由不同社区制定，具有不同的协议和成熟水平；（2）显著的查询、评分和泛化偏差。为了解决这些问题，我们引入了MixEval-X，第一个任意到任意、面向真实世界的基准，旨在优化和标准化跨多种输入和输出模态的评估。我们提出了多模态基准混合和适应-校正管道，重建真实世界任务分布，确保评估有效地泛化到真实世界用例。广泛的元评估显示我们的方法有效地将基准样本与真实世界任务分布对齐。与此同时，MixEval-X的模型排名与众包实际世界评估强相关（高达0.98），同时效率更高。我们提供全面的排行榜来重新排列现有模型和组织，并提供见解来增强对多模态评估的理解，并为未来研究提供信息。

更新时间: 2024-10-18 08:56:52

领域: cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2410.13754v2

Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models

Although Retrieval-Augmented Large Language Models (RALMs) demonstrate their superiority in terms of factuality, they do not consistently outperform the original retrieval-free Language Models (LMs). Our experiments reveal that this example-level performance inconsistency exists not only between retrieval-augmented and retrieval-free LM but also among different retrievers. To understand this phenomenon, we investigate the degeneration behavior of RALMs and theoretically decompose it into four categories. Further analysis based on our decomposition reveals that the innate difference in knowledge sources and the unpredictable degeneration of the reader model contribute most to the inconsistency. Drawing from our analysis, we introduce Ensemble of Retrievers (EoR), a trainable framework that can adaptively retrieve from different knowledge sources and effectively decrease unpredictable reader errors. Our experiments on Open Domain Question Answering show that EoR substantially improves performance over the RALM with a single retriever by considerably reducing inconsistent behaviors.

Updated: 2024-10-18 08:54:37

标题: 揭示和减轻在检索增强的大型语言模型中的检索不一致性

摘要: 尽管检索增强的大型语言模型（RALMs）在事实性方面表现出优势，但它们并不始终胜过原始的无检索语言模型（LMs）。我们的实验揭示了这种例级性能不一致性不仅存在于检索增强和无检索LM之间，而且在不同的检索器之间也存在。为了理解这一现象，我们调查了RALMs的退化行为，并在理论上将其分解为四类。基于我们的分解的进一步分析显示，知识来源的固有差异和阅读模型的不可预测的退化是造成不一致性的主要原因。根据我们的分析，我们引入了检索器集成（EoR），这是一个可训练的框架，可以自适应地从不同的知识来源中检索，并有效地减少不可预测的阅读错误。我们在开放域问答实验中发现，EoR通过显著减少不一致行为，大大提高了性能，超过了单个检索器的RALM。

更新时间: 2024-10-18 08:54:37

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20680v4

SwaQuAD-24: QA Benchmark Dataset in Swahili

This paper proposes the creation of a Swahili Question Answering (QA) benchmark dataset, aimed at addressing the underrepresentation of Swahili in natural language processing (NLP). Drawing from established benchmarks like SQuAD, GLUE, KenSwQuAD, and KLUE, the dataset will focus on providing high-quality, annotated question-answer pairs that capture the linguistic diversity and complexity of Swahili. The dataset is designed to support a variety of applications, including machine translation, information retrieval, and social services like healthcare chatbots. Ethical considerations, such as data privacy, bias mitigation, and inclusivity, are central to the dataset development. Additionally, the paper outlines future expansion plans to include domain-specific content, multimodal integration, and broader crowdsourcing efforts. The Swahili QA dataset aims to foster technological innovation in East Africa and provide an essential resource for NLP research and applications in low-resource languages.

Updated: 2024-10-18 08:49:24

标题: SwaQuAD-24：斯瓦希里语问答基准数据集

摘要: 本文提出了创建一个斯瓦希里问答（QA）基准数据集的建议，旨在解决自然语言处理（NLP）中斯瓦希里存在的代表性不足。该数据集借鉴了已建立的基准数据集，如SQuAD、GLUE、KenSwQuAD和KLUE，重点放在提供捕捉斯瓦希里语言多样性和复杂性的高质量注释问答对上。该数据集的设计旨在支持各种应用，包括机器翻译、信息检索以及医疗聊天机器人等社会服务。在数据集开发中，伦理考虑，如数据隐私、偏见缓解和包容性，是核心问题。此外，本文还概述了未来扩展计划，包括领域特定内容、多模态整合和更广泛的众包努力。斯瓦希里问答数据集旨在促进东非地区的技术创新，并为低资源语言中的NLP研究和应用提供重要资源。

更新时间: 2024-10-18 08:49:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14289v1

FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems

Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge. Heterogeneous hardware, unreliable client devices, and energy constraints often characterize edge computing systems. In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities. We focus on computational and communication bottlenecks, client behavior, and data security implications. Our experiments with models varying from 14K to 80M trainable parameters are carried out on dedicated hardware with emulated network characteristics and client behavior. We find that state-of-the-art embedded hardware has significant memory bottlenecks, leading to 4x longer processing times than on modern data center GPUs.

Updated: 2024-10-18 08:44:31

标题: FLEdge：在边缘计算系统中对联邦机器学习应用进行基准测试

摘要: 联邦学习（FL）已成为在网络边缘实现增强隐私的分布式深度学习的可行技术。异构硬件、不可靠的客户端设备和能源限制通常是边缘计算系统的特征。在本文中，我们提出了FLEdge，通过使客户端能力的系统评估成为可能，来补充现有的FL基准。我们关注计算和通信瓶颈、客户端行为和数据安全影响。我们对14K到80M可训练参数不等的模型进行了实验，使用专用硬件模拟网络特征和客户端行为。我们发现，最先进的嵌入式硬件存在显著的内存瓶颈，导致处理时间比现代数据中心GPU长4倍。

更新时间: 2024-10-18 08:44:31

领域: cs.LG,cs.DC,I.2.11; C.2.4; C.4; D.2.8

下载: http://arxiv.org/abs/2306.05172v3

Advanced Underwater Image Quality Enhancement via Hybrid Super-Resolution Convolutional Neural Networks and Multi-Scale Retinex-Based Defogging Techniques

The difficulties of underwater image degradation due to light scattering, absorption, and fog-like particles which lead to low resolution and poor visibility are discussed in this study report. We suggest a sophisticated hybrid strategy that combines Multi-Scale Retinex (MSR) defogging methods with Super-Resolution Convolutional Neural Networks (SRCNN) to address these problems. The Retinex algorithm mimics human visual perception to reduce uneven lighting and fogging, while the SRCNN component improves the spatial resolution of underwater photos.Through the combination of these methods, we are able to enhance the clarity, contrast, and colour restoration of underwater images, offering a reliable way to improve image quality in difficult underwater conditions. The research conducts extensive experiments on real-world underwater datasets to further illustrate the efficacy of the suggested approach. In terms of sharpness, visibility, and feature retention, quantitative evaluation which use metrics like the Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR) demonstrates notable advances over conventional techniques.In real-time underwater applications like marine exploration, underwater robotics, and autonomous underwater vehicles, where clear and high-resolution imaging is crucial for operational success, the combination of deep learning and conventional image processing techniques offers a computationally efficient framework with superior results.

Updated: 2024-10-18 08:40:26

标题: 通过混合超分辨率卷积神经网络和多尺度Retinex-based除雾技术实现先进的水下图像质量增强

摘要: 本研究报告讨论了水下图像退化的困难，这是由于光散射、吸收和类似雾的颗粒导致分辨率低和能见度差。我们建议采用复杂的混合策略，将多尺度Retinex（MSR）除雾方法与超分辨率卷积神经网络（SRCNN）结合起来解决这些问题。Retinex算法模仿人类视觉感知，减少不均匀的光照和雾化，而SRCNN组件提高了水下照片的空间分辨率。通过结合这些方法，我们能够增强水下图像的清晰度、对比度和色彩恢复，提供了一种可靠的方法来改善困难水下条件下的图像质量。研究对真实世界的水下数据集进行了广泛实验，以进一步说明建议方法的有效性。在清晰度、能见度和特征保留方面，使用结构相似性指数测量（SSIM）和峰值信噪比（PSNR）等指标的定量评估显示出与传统技术相比的显著进展。在海洋探索、水下机器人和自主水下车等实时水下应用中，清晰和高分辨率成像对操作成功至关重要，深度学习和传统图像处理技术的结合提供了一个计算效率高且结果优越的框架。

更新时间: 2024-10-18 08:40:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.14285v1

PTR: A Pre-trained Language Model for Trajectory Recovery

Spatiotemporal trajectory data is vital for web-of-things services and is extensively collected and analyzed by web-based hardware and platforms. However, issues such as service interruptions and network instability often lead to sparsely recorded trajectories, resulting in a loss of detailed movement data. As a result, recovering these trajectories to restore missing information becomes essential. Despite progress, several challenges remain unresolved. First, the lack of large-scale dense trajectory data hampers the performance of existing deep learning methods, which rely heavily on abundant data for supervised training. Second, current methods struggle to generalize across sparse trajectories with varying sampling intervals, necessitating separate re-training for each interval and increasing computational costs. Third, external factors crucial for the recovery of missing points are not fully incorporated. To address these challenges, we propose a framework called PTR. This framework mitigates the issue of limited dense trajectory data by leveraging the capabilities of pre-trained language models (PLMs). PTR incorporates an explicit trajectory prompt and is trained on datasets with multiple sampling intervals, enabling it to generalize effectively across different intervals in sparse trajectories. To capture external factors, we introduce an implicit trajectory prompt that models road conditions, providing richer information for recovering missing points. Additionally, we present a trajectory embedder that encodes trajectory points and transforms the embeddings of both observed and missing points into a format comprehensible to PLMs. Experimental results on two public trajectory datasets with three sampling intervals demonstrate the efficacy and scalability of PTR.

Updated: 2024-10-18 08:38:12

标题: PTR：用于轨迹恢复的预训练语言模型

摘要: 时空轨迹数据对于物联网服务至关重要，并且由基于网络的硬件和平台广泛收集和分析。然而，诸如服务中断和网络不稳定等问题经常导致轨迹稀疏记录，从而导致详细移动数据的丢失。因此，恢复这些轨迹以恢复丢失的信息变得至关重要。尽管取得了进展，但仍有几个挑战尚未解决。首先，缺乏大规模密集轨迹数据阻碍了现有深度学习方法的性能，这些方法严重依赖于丰富的数据进行监督训练。其次，当前方法难以泛化到具有不同采样间隔的稀疏轨迹，需要针对每个间隔进行单独的重新训练，增加了计算成本。第三，对于恢复缺失点至关重要的外部因素并未完全整合。为了解决这些挑战，我们提出了一个名为PTR的框架。该框架通过利用预训练语言模型（PLMs）的能力，缓解了有限密集轨迹数据的问题。PTR包含一个明确的轨迹提示，并在具有多个采样间隔的数据集上进行训练，使其能够有效地泛化到稀疏轨迹中的不同间隔。为了捕捉外部因素，我们引入了一个隐式轨迹提示，该提示模拟了道路条件，为恢复缺失点提供了更丰富的信息。此外，我们提出了一个轨迹嵌入器，将轨迹点编码并将观察到的点和缺失点的嵌入转换为PLMs可以理解的格式。在具有三个采样间隔的两个公共轨迹数据集上的实验结果表明PTR的有效性和可扩展性。

更新时间: 2024-10-18 08:38:12

领域: cs.LG

下载: http://arxiv.org/abs/2410.14281v1

Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

Modeling trajectory data with generic-purpose dense representations has become a prevalent paradigm for various downstream applications, such as trajectory classification, travel time estimation and similarity computation. However, existing methods typically rely on trajectories from a single spatial view, limiting their ability to capture the rich contextual information that is crucial for gaining deeper insights into movement patterns across different geospatial contexts. To this end, we propose MVTraj, a novel multi-view modeling method for trajectory representation learning. MVTraj integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data. To align the learning process across multiple views, we utilize GPS trajectories as a bridge and employ self-supervised pretext tasks to capture and distinguish movement patterns across different spatial views. Following this, we treat trajectories from different views as distinct modalities and apply a hierarchical cross-modal interaction module to fuse the representations, thereby enriching the knowledge derived from multiple sources. Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views, validating its effectiveness and practical utility in spatio-temporal modeling.

Updated: 2024-10-18 08:33:19

标题: 增强背景下的多视角轨迹表示学习：通过自监督模型弥合差距

摘要: 用通用密集表示对轨迹数据建模已成为各种下游应用的普遍范式，如轨迹分类、行程时间估计和相似性计算。然而，现有方法通常依赖于来自单一空间视图的轨迹，限制了它们捕获丰富上下文信息的能力，而这对于深入了解跨不同地理空间背景下的运动模式至关重要。为此，我们提出了MVTraj，一种新颖的多视图建模方法，用于轨迹表示学习。MVTraj整合了从GPS到道路网络和兴趣点的多样化上下文知识，以提供对轨迹数据更全面的理解。为了在多个视图之间对齐学习过程，我们利用GPS轨迹作为桥梁，并利用自监督预文本任务来捕获和区分不同空间视图中的运动模式。在此基础上，我们将来自不同视图的轨迹视为不同的模态，并应用层次交叉模态交互模块来融合表示，从而丰富从多个来源中得出的知识。在真实世界数据集上进行的大量实验表明，MVTraj在与各种空间视图相关的任务中明显优于现有基线，验证了其在时空建模中的有效性和实用性。

更新时间: 2024-10-18 08:33:19

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.13196v2

REEF: Representation Encoding Fingerprints for Large Language Models

Protecting the intellectual property of open-source Large Language Models (LLMs) is very important, because training LLMs costs extensive computational resources and data. Therefore, model owners and third parties need to identify whether a suspect model is a subsequent development of the victim model. To this end, we propose a training-free REEF to identify the relationship between the suspect and victim models from the perspective of LLMs' feature representations. Specifically, REEF computes and compares the centered kernel alignment similarity between the representations of a suspect model and a victim model on the same samples. This training-free REEF does not impair the model's general capabilities and is robust to sequential fine-tuning, pruning, model merging, and permutations. In this way, REEF provides a simple and effective way for third parties and models' owners to protect LLMs' intellectual property together. The code is available at https://github.com/tmylla/REEF.

Updated: 2024-10-18 08:27:02

标题: REEF：大型语言模型的表示编码指纹

摘要: 保护开源大型语言模型（LLMs）的知识产权非常重要，因为训练LLMs需要大量的计算资源和数据。因此，模型所有者和第三方需要确定可疑模型是否是受害模型的后续开发。为此，我们提出了一种无需训练的REEF来从LLMs特征表示的角度识别可疑模型和受害模型之间的关系。具体来说，REEF计算并比较了可疑模型和受害模型在相同样本上表示之间的中心核对齐相似度。这种无需训练的REEF不会影响模型的通用能力，对于顺序微调、剪枝、模型合并和排列具有鲁棒性。通过这种方式，REEF为第三方和模型所有者提供了一种简单有效的方式来共同保护LLMs的知识产权。该代码可在https://github.com/tmylla/REEF获得。

更新时间: 2024-10-18 08:27:02

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.14273v1

SatSwinMAE: Efficient Autoencoding for Multiscale Time-series Satellite Imagery

Recent advancements in foundation models have significantly impacted various fields, including natural language processing, computer vision, and multi-modal tasks. One area that stands to benefit greatly is Earth observation, where these models can efficiently process large-scale, unlabeled geospatial data. In this work we extend the SwinMAE model to integrate temporal information for satellite time-series data. The architecture employs a hierarchical 3D Masked Autoencoder (MAE) with Video Swin Transformer blocks to effectively capture multi-scale spatio-temporal dependencies in satellite imagery. To enhance transfer learning, we incorporate both encoder and decoder pretrained weights, along with skip connections to preserve scale-specific information. This forms an architecture similar to SwinUNet with an additional temporal component. Our approach shows significant performance improvements over existing state-of-the-art foundation models for all the evaluated downstream tasks: land cover segmentation, building density prediction, flood mapping, wildfire scar mapping and multi-temporal crop segmentation. Particularly, in the land cover segmentation task of the PhilEO Bench dataset, it outperforms other geospatial foundation models with a 10.4% higher accuracy.

Updated: 2024-10-18 08:25:52

标题: SatSwinMAE：多尺度时间序列卫星图像的高效自编码

摘要: 最近基础模型的进展显著影响了各个领域，包括自然语言处理、计算机视觉和多模态任务。一个受益匪浅的领域是地球观测，其中这些模型可以高效处理大规模、未标记的地理空间数据。在这项工作中，我们将SwinMAE模型扩展到集成卫星时间序列数据的时间信息。该架构采用了具有视频Swin Transformer块的分层3D蒙版自动编码器（MAE），以有效捕获卫星图像中的多尺度时空依赖关系。为了增强迁移学习，我们结合了预训练的编码器和解码器权重，以及跳过连接以保留特定比例的信息。这形成了一个类似于SwinUNet的架构，带有额外的时间组件。我们的方法在所有评估的下游任务中都显示出明显的性能改进：土地覆盖分割、建筑密度预测、洪水映射、野火烧伤映射和多时相作物分割。特别是在PhilEO Bench数据集的土地覆盖分割任务中，它表现优于其他地理空间基础模型，准确率提高了10.4%。

更新时间: 2024-10-18 08:25:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.02512v2

Stochastic Quasi-Newton Optimization in Large Dimensions Including Deep Network Training

Our proposal is on a new stochastic optimizer for non-convex and possibly non-smooth objective functions typically defined over large dimensional design spaces. Towards this, we have tried to bridge noise-assisted global search and faster local convergence, the latter being the characteristic feature of a Newton-like search. Our specific scheme -- acronymed FINDER (Filtering Informed Newton-like and Derivative-free Evolutionary Recursion), exploits the nonlinear stochastic filtering equations to arrive at a derivative-free update that has resemblance with the Newton search employing the inverse Hessian of the objective function. Following certain simplifications of the update to enable a linear scaling with dimension and a few other enhancements, we apply FINDER to a range of problems, starting with some IEEE benchmark objective functions to a couple of archetypal data-driven problems in deep networks to certain cases of physics-informed deep networks. The performance of the new method vis-\'a-vis the well-known Adam and a few others bears evidence to its promise and potentialities for large dimensional optimization problems of practical interest.

Updated: 2024-10-18 08:25:28

标题: 大维度中包括深度网络训练的随机拟牛顿优化

摘要: 我们提出了一种新的随机优化器，用于非凸和可能非光滑的目标函数，通常定义在大维设计空间上。为此，我们试图将噪声辅助的全局搜索与更快的局部收敛结合起来，后者是类似牛顿搜索的特征。我们的具体方案——缩写为FINDER（Filtering Informed Newton-like and Derivative-free Evolutionary Recursion），利用非线性随机滤波方程得出一种无导数更新，类似于使用目标函数的逆Hessian的牛顿搜索。在对更新进行一些简化以实现与维度的线性缩放和其他一些增强之后，我们将FINDER应用于一系列问题，从一些IEEE基准目标函数开始，到一些深度网络中的典型数据驱动问题，再到一些物理信息化的深度网络中的特定情况。新方法与众所周知的Adam和其他几种方法的性能比较表明，它在实际感兴趣的大维优化问题上具有潜力和前景。

更新时间: 2024-10-18 08:25:28

领域: cs.LG

下载: http://arxiv.org/abs/2410.14270v1

On time series clustering with k-means

There is a long history of research into time series clustering using distance-based partitional clustering. Many of the most popular algorithms adapt k-means (also known as Lloyd's algorithm) to exploit time dependencies in the data by specifying a time series distance function. However, these algorithms are often presented with k-means configured in various ways, altering key parameters such as the initialisation strategy. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. To address this, we propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria. By doing so, we create a unified structure for comparing seven popular Lloyd's-based TSCL algorithms. This common framework enables us to more easily attribute differences in clustering performance to the distance function itself, rather than variations in the k-means configuration.

Updated: 2024-10-18 08:24:07

标题: 关于使用k-means进行时间序列聚类

摘要: 有很长历史的研究着时间序列聚类使用基于距离的分区聚类。许多最受欢迎的算法都是通过调整k均值（也称为劳埃德算法）来利用数据中的时间依赖性，指定时间序列距离函数。然而，这些算法通常使用以不同方式配置的k均值，改变关键参数如初始化策略。这种变化使得很难比较研究，因为已知k均值对其配置非常敏感。为了解决这个问题，我们提出了一个标准的基于劳埃德算法的时间序列聚类模型，采用端到端的方法，不仅在分配步骤中还在初始化和停止标准中包括专门的距离函数。通过这样做，我们为比较七种流行的基于劳埃德算法的时间序列聚类算法创建了一个统一的结构。这个共同的框架使我们能够更容易地将聚类性能的差异归因于距离函数本身，而不是k均值配置的变化。

更新时间: 2024-10-18 08:24:07

领域: cs.LG

下载: http://arxiv.org/abs/2410.14269v1

MoDification: Mixture of Depths Made Easy

Long-context efficiency has recently become a trending topic in serving large language models (LLMs). And mixture of depths (MoD) is proposed as a perfect fit to bring down both latency and memory. In this paper, however, we discover that MoD can barely transform existing LLMs without costly training over an extensive number of tokens. To enable the transformations from any LLMs to MoD ones, we showcase top-k operator in MoD should be promoted to threshold-p operator, and refinement to architecture and data should also be crafted along. All these designs form our method termed MoDification. Through a comprehensive set of experiments covering model scales from 3B to 70B, we exhibit MoDification strikes an excellent balance between efficiency and effectiveness. MoDification can achieve up to ~1.2x speedup in latency and ~1.8x reduction in memory compared to original LLMs especially in long-context applications.

Updated: 2024-10-18 08:22:07

标题: MoDification: 简化深度混合

摘要: Long-context efficiency has recently become a popular topic in the optimization of large language models (LLMs). The mixture of depths (MoD) has been proposed as a suitable solution to reduce both latency and memory usage. However, in this study, it was found that implementing MoD into existing LLMs requires extensive training on a large number of tokens. To facilitate the transformation of any LLM into a MoD model, we propose promoting the top-k operator in MoD to a threshold-p operator, and suggest refining the architecture and data accordingly. These modifications form our approach, termed MoDification. Through a series of experiments ranging from 3B to 70B model scales, we demonstrate that MoDification achieves a good balance between efficiency and effectiveness. Compared to original LLMs, MoDification can improve latency by up to 1.2 times and reduce memory usage by up to 1.8 times, particularly in long-context applications.

更新时间: 2024-10-18 08:22:07

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14268v1

Graph Neural Network Enhanced Retrieval for Question Answering of LLMs

Retrieval augmented generation has revolutionized large language model (LLM) outputs by providing factual supports. Nevertheless, it struggles to capture all the necessary knowledge for complex reasoning questions. Existing retrieval methods typically divide reference documents into passages, treating them in isolation. These passages, however, are often interrelated, such as passages that are contiguous or share the same keywords. Therefore, it is crucial to recognize such relatedness for enhancing the retrieval process. In this paper, we propose a novel retrieval method, called GNN-Ret, which leverages graph neural networks (GNNs) to enhance retrieval by exploiting the relatedness between passages. Specifically, we first construct a graph of passages by connecting passages that are structure-related or keyword-related. A graph neural network (GNN) is then leveraged to exploit the relationships between passages and improve the retrieval of supporting passages. Furthermore, we extend our method to handle multi-hop reasoning questions using a recurrent graph neural network (RGNN), named RGNN-Ret. At each step, RGNN-Ret integrates the graphs of passages from previous steps, thereby enhancing the retrieval of supporting passages. Extensive experiments on benchmark datasets demonstrate that GNN-Ret achieves higher accuracy for question answering with a single query of LLMs than strong baselines that require multiple queries, and RGNN-Ret further improves accuracy and achieves state-of-the-art performance, with up to 10.4% accuracy improvement on the 2WikiMQA dataset.

Updated: 2024-10-18 08:20:38

标题: 图神经网络增强的LLMs问题回答检索

摘要: 检索增强生成已经通过提供事实支持彻底改变了大型语言模型（LLM）的结果。然而，它在捕捉复杂推理问题所需的所有必要知识方面存在困难。现有的检索方法通常将参考文献分成段落，并将它们视为孤立的。然而，这些段落通常是相互关联的，比如连续的段落或共享相同关键字的段落。因此，识别这种相关性以增强检索过程是至关重要的。在本文中，我们提出了一种新的检索方法，称为GNN-Ret，它利用图神经网络（GNNs）来增强检索，通过利用段落之间的相关性。具体来说，我们首先构建一个段落图，通过连接结构相关或关键字相关的段落。然后利用图神经网络（GNN）来利用段落之间的关系，并提高支持段落的检索。此外，我们将我们的方法扩展到使用递归图神经网络（RGNN）处理多跳推理问题，称为RGNN-Ret。在每一步，RGNN-Ret整合先前步骤的段落图，从而增强对支持段落的检索。对基准数据集进行的广泛实验表明，GNN-Ret在仅使用LLMs的单次查询进行问答时比需要多次查询的强基线实现更高的准确性，而RGNN-Ret进一步提高了准确性，并实现了最先进的性能，在2WikiMQA数据集上准确性提高了高达10.4%。

更新时间: 2024-10-18 08:20:38

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.06572v2

The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence

Generative artificial intelligence (AI) offers numerous opportunities for research and innovation, but its commercialization has raised concerns about the transparency and safety of frontier AI models. Most models lack the necessary components for full understanding, auditing, and reproducibility, and some model producers use restrictive licenses whilst claiming that their models are "open source". To address these concerns, we introduce the Model Openness Framework (MOF), a three-tiered ranked classification system that rates machine learning models based on their completeness and openness, following open science principles. For each MOF class, we specify code, data, and documentation components of the model development lifecycle that must be released and under which open licenses. In addition, the Model Openness Tool (MOT) provides a user-friendly reference implementation to evaluate the openness and completeness of models against the MOF classification system. Together, the MOF and MOT provide timely practical guidance for (i) model producers to enhance the openness and completeness of their publicly-released models, and (ii) model consumers to identify open models and their constituent components that can be permissively used, studied, modified, and redistributed. Through the MOF, we seek to establish completeness and openness as core tenets of responsible AI research and development, and to promote best practices in the burgeoning open AI ecosystem.

Updated: 2024-10-18 08:20:22

标题: 模型开放框架：促进人工智能的可再现性、透明度和可用性的完整性和开放性

摘要: 生成式人工智能（AI）为研究和创新提供了许多机会，但其商业化引发了有关前沿AI模型透明度和安全性的担忧。大多数模型缺乏完全理解、审计和可重复性所需的组件，一些模型生产者使用限制性许可证，同时声称他们的模型是“开源”的。为了解决这些问题，我们引入了模型开放性框架（MOF），这是一个三层排名分类系统，根据其完整性和开放性对机器学习模型进行评级，遵循开放科学原则。对于每个MOF类别，我们指定了必须发布的模型开发生命周期的代码、数据和文档组件，并规定了开放许可证。此外，模型开放性工具（MOT）提供了一个用户友好的参考实现，用于评估模型的开放性和完整性，以符合MOF分类系统。MOF和MOT共同为（i）模型生产者提供及时的实用指导，以增强其公开发布的模型的开放性和完整性，以及（ii）模型消费者识别可以被允许使用、研究、修改和重新分发的开放模型及其组成部分提供了帮助。通过MOF，我们希望将完整性和开放性确立为负责任的AI研究和开发的核心原则，并在蓬勃发展的开放AI生态系统中推广最佳实践。

更新时间: 2024-10-18 08:20:22

领域: cs.LG,cs.AI,cs.CY,cs.SE

下载: http://arxiv.org/abs/2403.13784v6

TotalVibeSegmentator: Full Body MRI Segmentation for the NAKO and UK Biobank

Objectives: To present a publicly available torso segmentation network for large epidemiology datasets on volumetric interpolated breath-hold examination (VIBE) images. Materials & Methods: We extracted preliminary segmentations from TotalSegmentator, spine, and body composition networks for VIBE images, then improved them iteratively and retrained a nnUNet network. Using subsets of NAKO (85 subjects) and UK Biobank (16 subjects), we evaluated with Dice-score on a holdout set (12 subjects) and existing organ segmentation approach (1000 subjects), generating 71 semantic segmentation types for VIBE images. We provide an additional network for the vertebra segments 22 individual vertebra types. Results: We achieved an average Dice score of 0.89 +- 0.07 overall 71 segmentation labels. We scored > 0.90 Dice-score on the abdominal organs except for the pancreas with a Dice of 0.70. Conclusion: Our work offers a detailed and refined publicly available full torso segmentation on VIBE images.

Updated: 2024-10-18 08:18:36

标题: TotalVibeSegmentator：NAKO和UK Biobank的全身MRI分割

摘要: 目标：提供一个公开可用的躯干分割网络，用于体积插值呼吸保持检查（VIBE）图像的大型流行病学数据集。材料和方法：我们从TotalSegmentator、脊柱和身体组成网络提取了VIBE图像的初步分割，然后通过迭代改进并重新训练了一个nnUNet网络。使用NAKO（85名受试者）和英国生物库（16名受试者）的子集，我们在保留集（12名受试者）和现有器官分割方法（1000名受试者）上评估了Dice分数，生成了71种VIBE图像的语义分割类型。我们还提供了一个用于椎骨分段的附加网络，包括22种不同类型的椎骨。结果：我们在所有71个分割标签上取得了平均Dice分数为0.89+-0.07。除了胰腺的Dice为0.70外，我们在腹部器官上的Dice分数均高于0.90。结论：我们的工作提供了一个详细和精细的公开可用的VIBE图像全躯干分割。

更新时间: 2024-10-18 08:18:36

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.00125v3

Boosting Graph Pooling with Persistent Homology

Recently, there has been an emerging trend to integrate persistent homology (PH) into graph neural networks (GNNs) to enrich expressive power. However, naively plugging PH features into GNN layers always results in marginal improvement with low interpretability. In this paper, we investigate a novel mechanism for injecting global topological invariance into pooling layers using PH, motivated by the observation that filtration operation in PH naturally aligns graph pooling in a cut-off manner. In this fashion, message passing in the coarsened graph acts along persistent pooled topology, leading to improved performance. Experimentally, we apply our mechanism to a collection of graph pooling methods and observe consistent and substantial performance gain over several popular datasets, demonstrating its wide applicability and flexibility.

Updated: 2024-10-18 08:09:14

标题: 使用持续同调加强图池化

摘要: 最近，有一个新兴趋势是将持久同调（PH）整合到图神经网络（GNN）中，以丰富表达能力。然而，简单地将PH特征插入GNN层总是导致较低的解释性的边际改进。在本文中，我们研究了一种新颖的机制，利用PH将全局拓扑不变性注入到汇聚层中，受到PH中的过滤操作自然地将图汇聚与截断方式对齐的观察的启发。通过这种方式，在粗化后的图中进行消息传递沿着持续汇聚的拓扑结构，从而提高性能。在实验中，我们将我们的机制应用于一系列图汇聚方法，并观察到在几个流行数据集上一致且显著的性能提升，展示了其广泛适用性和灵活性。

更新时间: 2024-10-18 08:09:14

领域: cs.LG,math.AT

下载: http://arxiv.org/abs/2402.16346v3

Revisiting SLO and Goodput Metrics in LLM Serving

Large language models (LLMs) have achieved remarkable performance and are widely deployed in various applications, while the serving of LLM inference has raised concerns about user experience and serving throughput. Accordingly, service level objectives (SLOs) and goodput-the number of requests that meet SLOs per second-are introduced to evaluate the performance of LLM serving. However, existing metrics fail to capture the nature of user experience. We observe two ridiculous phenomena in existing metrics: 1) delaying token delivery can smooth the tail time between tokens (tail TBT) of a request and 2) dropping the request that fails to meet the SLOs midway can improve goodput. In this paper, we revisit SLO and goodput metrics in LLM serving and propose a unified metric framework smooth goodput including SLOs and goodput to reflect the nature of user experience in LLM serving. The framework can adapt to specific goals of different tasks by setting parameters. We re-evaluate the performance of different LLM serving systems under multiple workloads based on this unified framework and provide possible directions for future optimization of existing strategies. We hope that this framework can provide a unified standard for evaluating LLM serving and foster researches in the field of LLM serving optimization to move in a cohesive direction.

Updated: 2024-10-18 08:05:37

标题: 重新审视在LLM服务中的SLO和Goodput指标

摘要: 大型语言模型(LLMs)已经取得了显著的性能，并广泛应用于各种应用程序中，但LLM推理的服务引发了用户体验和服务吞吐量方面的担忧。因此，引入了服务水平目标(SLOs)和goodput-每秒满足SLOs的请求数量-来评估LLM服务的性能。然而，现有的指标未能准确捕捉用户体验的本质。我们观察到现有指标中存在两个荒谬现象：1) 延迟令牌交付可以平滑请求的尾部时间(tail TBT)；2) 中途放弃未能满足SLOs的请求可以提高goodput。在本文中，我们重新审视了LLM服务中的SLO和goodput指标，并提出了一个统一的度量框架smooth goodput，包括SLOs和goodput，以反映LLM服务中用户体验的本质。该框架可以通过设置参数来适应不同任务的具体目标。我们根据这个统一框架在多个工作负载下重新评估了不同LLM服务系统的性能，并为现有策略的未来优化提供了可能的方向。我们希望这个框架可以提供一个统一的评估标准，促进LLM服务优化领域的研究朝着一个协同的方向发展。

更新时间: 2024-10-18 08:05:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14257v1

Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

Driven by the ever-growing volume and decentralized nature of data, coupled with the need to harness this data and generate knowledge from it, has led to the extensive use of distributed deep learning (DDL) techniques for training. These techniques rely on local training that is performed at the distributed nodes based on locally collected data, followed by a periodic synchronization process that combines these models to create a global model. However, frequent synchronization of DL models, encompassing millions to many billions of parameters, creates a communication bottleneck, severely hindering scalability. Worse yet, DDL algorithms typically waste valuable bandwidth, and make themselves less practical in bandwidth-constrained federated settings, by relying on overly simplistic, periodic, and rigid synchronization schedules. These drawbacks also have a direct impact on the time required for the training process, necessitating excessive time for data communication. To address these shortcomings, we propose Federated Dynamic Averaging (FDA), a communication-efficient DDL strategy that dynamically triggers synchronization based on the value of the model variance. In essence, the costly synchronization step is triggered only if the local models, which are initialized from a common global model after each synchronization, have significantly diverged. This decision is facilitated by the communication of a small local state from each distributed node/worker. Through extensive experiments across a wide range of learning tasks we demonstrate that FDA reduces communication cost by orders of magnitude, compared to both traditional and cutting-edge communication-efficient algorithms. Additionally, we show that FDA maintains robust performance across diverse data heterogeneity settings.

Updated: 2024-10-18 08:05:18

标题: 通过联邦动态平均实现高效通信的分布式深度学习

摘要: 随着数据量不断增长和分散性质的数据以及需要利用这些数据并从中生成知识的需求，分布式深度学习（DDL）技术被广泛应用于训练。这些技术依赖于在分布式节点上基于本地收集的数据进行的本地训练，然后通过周期性同步过程将这些模型合并以创建全局模型。然而，DL模型的频繁同步，涵盖数百万到数十亿个参数，会造成通信瓶颈，严重阻碍可扩展性。更糟糕的是，DDL算法通常浪费宝贵的带宽，并且通过依赖过于简化、周期性和刚性的同步时间表，在带宽受限的联邦设置中变得不太实用。这些缺点也直接影响训练过程所需的时间，需要大量时间进行数据通信。为了解决这些缺点，我们提出了联邦动态平均（FDA），这是一种通信高效的DDL策略，它根据模型方差的值动态触发同步。本质上，昂贵的同步步骤仅在本地模型（在每次同步后从共同的全局模型初始化）明显发散时才会触发。这一决定是通过从每个分布式节点/工作者传递一个小的本地状态来实现的。通过在各种学习任务中进行广泛的实验，我们证明FDA将通信成本降低了数个数量级，与传统和最先进的通信高效算法相比。此外，我们展示了FDA在不同数据异质性设置中保持稳健的性能。

更新时间: 2024-10-18 08:05:18

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.20988v3

FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings

External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval. However, the main challenge in implementing ECA lies in accessing real-world or historical clinical trials data. Indeed, regulations protecting patients' rights by strictly controlling data processing make pooling data from multiple sources in a central server often difficult. To address these limitations, we develop a new method, 'FedECA' that leverages federated learning (FL) to enable inverse probability of treatment weighting (IPTW) for time-to-event outcomes on separate cohorts without needing to pool data. To showcase the potential of FedECA, we apply it in different settings of increasing complexity culminating with a real-world use-case in which FedECA provides evidence for a differential effect between two drugs that would have otherwise gone unnoticed. By sharing our code, we hope FedECA will foster the creation of federated research networks and thus accelerate drug development.

Updated: 2024-10-18 08:04:36

标题: FedECA：一种用于分布式设置中基于时间事件数据因果推断的联合外部控制臂方法

摘要: 外部对照组（ECA）可以为实验药物的早期临床开发提供信息，并为监管批准提供有效性证据。然而，实施ECA的主要挑战在于获取现实世界或历史临床试验数据。事实上，通过严格控制数据处理保护患者权利的法规，使得将数据从多个来源汇总到中央服务器通常变得困难。为了解决这些限制，我们开发了一种新方法，称为“FedECA”，利用联邦学习（FL）实现逆概率处理权重（IPTW）在不需要汇总数据的情况下对不同队列的时间至事件结果进行加权。为展示FedECA的潜力，我们将其应用于不同设置的不断增加的复杂性，最终在一个现实世界的用例中展示了FedECA提供了两种药物之间的差异效应的证据，否则这种差异效应可能会被忽视。通过分享我们的代码，我们希望FedECA将促进联邦研究网络的创建，从而加速药物开发。

更新时间: 2024-10-18 08:04:36

领域: stat.ME,cs.DC,cs.LG

下载: http://arxiv.org/abs/2311.16984v4

Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas

Scientific innovation is pivotal for humanity, and harnessing large language models (LLMs) to generate research ideas could transform discovery. However, existing LLMs often produce simplistic and repetitive suggestions due to their limited ability in acquiring external knowledge for innovation. To address this problem, we introduce an enhanced planning and search methodology designed to boost the creative potential of LLM-based systems. Our approach involves an iterative process to purposely plan the retrieval of external knowledge, progressively enriching the idea generation with broader and deeper insights. Validation through automated and human assessments indicates that our framework substantially elevates the quality of generated ideas, particularly in novelty and diversity. The number of unique novel ideas produced by our framework is 3.4 times higher than without it. Moreover, our method outperforms the current state-of-the-art, generating at least 2.5 times more top-rated ideas based on 170 seed papers in a Swiss Tournament evaluation.

Updated: 2024-10-18 08:04:36

标题: 新星：一种迭代规划和搜索方法，以增强LLM生成的想法的新颖性和多样性

摘要: 科学创新对人类至关重要，利用大型语言模型（LLMs）生成研究思路可以改变发现过程。然而，现有的LLMs往往由于获取外部知识以进行创新的能力有限，导致产生简单化和重复性建议。为了解决这个问题，我们引入了一种增强的规划和搜索方法，旨在提升基于LLMs系统的创造潜力。我们的方法涉及一个迭代过程，有目的地计划检索外部知识，逐步丰富思路生成，带来更广泛和更深入的见解。通过自动化和人工评估验证，我们的框架显著提升了生成思路的质量，尤其是在新颖性和多样性方面。我们的框架产生的独特新颖思路数量比没有使用该框架高出3.4倍。此外，我们的方法在瑞士锦标赛评估中表现优异，至少生成了基于170篇种子论文的顶级思路数量的2.5倍以上。

更新时间: 2024-10-18 08:04:36

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14255v1

RAZOR: Refining Accuracy by Zeroing Out Redundancies

In many application domains, the proliferation of sensors and devices is generating vast volumes of data, imposing significant pressure on existing data analysis and data mining techniques. Nevertheless, an increase in data volume does not inherently imply an increase in informational content, as a substantial portion may be redundant or represent noise. This challenge is particularly evident in the deep learning domain, where the utility of additional data is contingent on its informativeness. In the absence of such, larger datasets merely exacerbate the computational cost and complexity of the learning process. To address these challenges, we propose RAZOR, a novel instance selection technique designed to extract a significantly smaller yet sufficiently informative subset from a larger set of instances without compromising the learning process. RAZOR has been specifically engineered to be robust, efficient, and scalable, making it suitable for large-scale datasets. Unlike many techniques in the literature, RAZOR is capable of operating in both supervised and unsupervised settings. Experimental results demonstrate that RAZOR outperforms recent state-of-the-art techniques in terms of both effectiveness and efficiency.

Updated: 2024-10-18 08:04:31

标题: RAZOR：通过清除冗余来提高准确性

摘要: 在许多应用领域，传感器和设备的增多正在产生大量数据，给现有的数据分析和数据挖掘技术带来了巨大压力。然而，数据量的增加并不一定意味着信息内容的增加，因为其中很大一部分可能是冗余的或者代表噪音。这个挑战在深度学习领域尤为明显，额外数据的价值取决于其信息量。在缺乏信息性的情况下，更大的数据集只会加剧学习过程的计算成本和复杂性。为了解决这些挑战，我们提出了RAZOR，一种新颖的实例选择技术，旨在从更大的实例集中提取一个显著较小但足够信息的子集，而不影响学习过程。RAZOR专门设计为健壮、高效和可扩展，适用于大规模数据集。与文献中许多技术不同，RAZOR能够在监督和无监督设置下运行。实验结果表明，RAZOR在效果和效率方面均优于最近的最先进技术。

更新时间: 2024-10-18 08:04:31

领域: cs.LG

下载: http://arxiv.org/abs/2410.14254v1

On the Use of Large Language Models to Generate Capability Ontologies

Capability ontologies are increasingly used to model functionalities of systems or machines. The creation of such ontological models with all properties and constraints of capabilities is very complex and can only be done by ontology experts. However, Large Language Models (LLMs) have shown that they can generate machine-interpretable models from natural language text input and thus support engineers / ontology experts. Therefore, this paper investigates how LLMs can be used to create capability ontologies. We present a study with a series of experiments in which capabilities with varying complexities are generated using different prompting techniques and with different LLMs. Errors in the generated ontologies are recorded and compared. To analyze the quality of the generated ontologies, a semi-automated approach based on RDF syntax checking, OWL reasoning, and SHACL constraints is used. The results of this study are very promising because even for complex capabilities, the generated ontologies are almost free of errors.

Updated: 2024-10-18 08:03:02

标题: 关于使用大型语言模型生成能力本体论的研究

摘要: 能力本体论越来越被用来模拟系统或机器的功能。创建这种本体模型，包括所有能力的属性和约束，是非常复杂的，只能由本体论专家完成。然而，大型语言模型（LLMs）表明它们可以从自然语言文本输入生成机器可解释的模型，从而支持工程师/本体论专家。因此，本文研究了LLMs如何用于创建能力本体论。我们进行了一系列实验，使用不同的提示技术和不同的LLMs生成具有不同复杂性的能力。记录并比较了生成的本体中的错误。为了分析生成的本体的质量，采用了基于RDF语法检查、OWL推理和SHACL约束的半自动化方法。这项研究的结果非常有希望，因为即使对于复杂的能力，生成的本体几乎没有错误。

更新时间: 2024-10-18 08:03:02

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.17524v4

Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation

Post-training is essential for enabling large language models (LLMs) to follow human instructions. Inspired by the recent success of using LLMs to simulate human society, we leverage multi-agent simulation to automatically generate diverse text-based scenarios, capturing a wide range of real-world human needs. We propose MATRIX, a multi-agent simulator that creates realistic and scalable scenarios. Leveraging these outputs, we introduce a novel scenario-driven instruction generator MATRIX-Gen for controllable and highly realistic data synthesis. Extensive experiments demonstrate that our framework effectively generates both general and domain-specific data. Notably, on AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained on datasets synthesized by MATRIX-Gen with just 20K instruction-response pairs, outperforms Meta's Llama-3-8B-Instruct model, which was trained on over 10M pairs; see our project at https://github.com/ShuoTang123/MATRIX-Gen.

Updated: 2024-10-18 08:01:39

标题: 通过多智能体模拟合成LLMs的训练后数据

摘要: 后续训练对于使大型语言模型（LLMs）能够遵循人类指令至关重要。受到最近使用LLMs成功模拟人类社会的启发，我们利用多智能体模拟来自动生成多样化的基于文本的场景，捕捉广泛的现实世界人类需求。我们提出了MATRIX，一个创建逼真且可扩展场景的多智能体模拟器。利用这些输出，我们引入了一个新颖的基于场景驱动的指令生成器MATRIX-Gen，用于可控和高度逼真的数据合成。大量实验证明，我们的框架有效地生成了通用和特定领域的数据。值得注意的是，在AlpacaEval 2和Arena-Hard基准测试中，仅在由MATRIX-Gen合成的20,000个指令-响应对数据集上进行后续训练的Llama-3-8B-Base模型，优于Meta的Llama-3-8B-Instruct模型，后者经过超过10百万对训练；请访问我们的项目网址https://github.com/ShuoTang123/MATRIX-Gen。

更新时间: 2024-10-18 08:01:39

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14251v1

Simple Opinion Dynamics for No-Regret Learning

We study a cooperative multi-agent bandit setting in the distributed GOSSIP model: in every round, each of $n$ agents chooses an action from a common set, observes the action's corresponding reward, and subsequently exchanges information with a single randomly chosen neighbor, which may inform its choice in the next round. We introduce and analyze families of memoryless and time-independent protocols for this setting, inspired by opinion dynamics that are well-studied for other algorithmic tasks in the GOSSIP model. For stationary reward settings, we prove for the first time that these simple protocols exhibit best-of-both-worlds behavior, simultaneously obtaining constant cumulative regret scaling like $R(T)/T = \widetilde O(1/T)$, and also reaching consensus on the highest-mean action within $\widetilde O(\sqrt{n})$ rounds. We obtain these results by showing a new connection between the global evolution of these decentralized protocols and a class of zero-sum multiplicative weights update} processes. Using this connection, we establish a general framework for analyzing the population-level regret and other properties of our protocols. Finally, we show our protocols are also surprisingly robust to adversarial rewards, and in this regime we obtain sublinear regret scaling like $R(T)/T = \widetilde O(1/\sqrt{T})$ as long as the number of rounds does not grow too fast as a function of $n$.

Updated: 2024-10-18 08:00:31

标题: 简单的观点动态学习在不后悔学习中的应用

摘要: 我们研究了在分布式GOSSIP模型中的合作多Agent赌徒设置: 在每一轮中，$n$个Agent中的每一个都从一个共同的集合中选择一个动作，观察动作对应的奖励，然后与一个随机选择的邻居交换信息，这可能会影响它在下一轮的选择。我们引入并分析了适用于这种设置的无记忆和与时间无关的协议系列，这些协议受到对GOSSIP模型中其他算法任务进行了深入研究的观点动态的启发。对于固定奖励设置，我们首次证明这些简单协议展现出了“两全其美”的行为，同时获得了像$R(T)/T = \widetilde O(1/T)$的常数累积遗憾，同时在$\widetilde O(\sqrt{n})$轮内达成对最高均值动作的共识。我们通过展示这些去中心化协议的全局演化与一类零和乘法权重更新过程之间的新连接来获得这些结果。利用这一连接，我们建立了一个分析我们的协议的人口层面遗憾和其他属性的一般框架。最后，我们展示我们的协议在面对对手奖励时也出奇地稳健，并且在这种情况下，只要轮数不以$n$的函数形式增长得太快，我们就可以获得像$R(T)/T = \widetilde O(1/\sqrt{T})$的次线性遗憾。

更新时间: 2024-10-18 08:00:31

领域: cs.LG,cs.DC,cs.DS

下载: http://arxiv.org/abs/2306.08670v5

Pseudo-label Refinement for Improving Self-Supervised Learning Systems

Self-supervised learning systems have gained significant attention in recent years by leveraging clustering-based pseudo-labels to provide supervision without the need for human annotations. However, the noise in these pseudo-labels caused by the clustering methods poses a challenge to the learning process leading to degraded performance. In this work, we propose a pseudo-label refinement (SLR) algorithm to address this issue. The cluster labels from the previous epoch are projected to the current epoch cluster-labels space and a linear combination of the new label and the projected label is computed as a soft refined label containing the information from the previous epoch clusters as well as from the current epoch. In contrast to the common practice of using the maximum value as a cluster/class indicator, we employ hierarchical clustering on these soft pseudo-labels to generate refined hard-labels. This approach better utilizes the information embedded in the soft labels, outperforming the simple maximum value approach for hard label generation. The effectiveness of the proposed SLR algorithm is evaluated in the context of person re-identification (Re-ID) using unsupervised domain adaptation (UDA). Experimental results demonstrate that the modified Re-ID baseline, incorporating the SLR algorithm, achieves significantly improved mean Average Precision (mAP) performance in various UDA tasks, including real-to-synthetic, synthetic-to-real, and different real-to-real scenarios. These findings highlight the efficacy of the SLR algorithm in enhancing the performance of self-supervised learning systems.

Updated: 2024-10-18 07:47:59

标题: 伪标签细化以改善自监督学习系统

摘要: 自我监督学习系统近年来引起了广泛关注，通过利用基于聚类的伪标签提供监督，无需人工注释。然而，由聚类方法引起的伪标签中的噪声对学习过程造成挑战，导致性能下降。在这项工作中，我们提出了一个伪标签精炼（SLR）算法来解决这个问题。上一个时期的聚类标签被投影到当前时期的聚类标签空间，计算新标签和投影标签的线性组合作为包含来自上一个时期聚类和当前时期信息的软精炼标签。与常见做法在软伪标签上使用最大值作为聚类/类别指示器相比，我们在这些软伪标签上采用分层聚类来生成精炼的硬标签。这种方法更好地利用了软标签中嵌入的信息，优于简单的最大值方法用于硬标签生成。提出的SLR算法在使用无监督域自适应（UDA）的人员重新识别（Re-ID）环境中进行了评估。实验结果表明，集成SLR算法的修改后的Re-ID基线在各种UDA任务中显著提高了平均精度（mAP）性能，包括真实到合成、合成到真实以及不同真实到真实的情景。这些发现突显了SLR算法在提高自我监督学习系统性能方面的有效性。

更新时间: 2024-10-18 07:47:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.14242v1

Almost-Linear RNNs Yield Highly Interpretable Symbolic Codes in Dynamical Systems Reconstruction

Dynamical systems (DS) theory is fundamental for many areas of science and engineering. It can provide deep insights into the behavior of systems evolving in time, as typically described by differential or recursive equations. A common approach to facilitate mathematical tractability and interpretability of DS models involves decomposing nonlinear DS into multiple linear DS separated by switching manifolds, i.e. piecewise linear (PWL) systems. PWL models are popular in engineering and a frequent choice in mathematics for analyzing the topological properties of DS. However, hand-crafting such models is tedious and only possible for very low-dimensional scenarios, while inferring them from data usually gives rise to unnecessarily complex representations with very many linear subregions. Here we introduce Almost-Linear Recurrent Neural Networks (AL-RNNs) which automatically and robustly produce most parsimonious PWL representations of DS from time series data, using as few PWL nonlinearities as possible. AL-RNNs can be efficiently trained with any SOTA algorithm for dynamical systems reconstruction (DSR), and naturally give rise to a symbolic encoding of the underlying DS that provably preserves important topological properties. We show that for the Lorenz and R\"ossler systems, AL-RNNs discover, in a purely data-driven way, the known topologically minimal PWL representations of the corresponding chaotic attractors. We further illustrate on two challenging empirical datasets that interpretable symbolic encodings of the dynamics can be achieved, tremendously facilitating mathematical and computational analysis of the underlying systems.

Updated: 2024-10-18 07:44:12

标题: 几乎线性的递归神经网络在动态系统重建中产生高度可解释的符号编码

摘要: 动力系统（DS）理论对许多科学和工程领域至关重要。它可以深入了解系统随时间演化的行为，通常由微分方程或递归方程描述。一个常见的方法是将非线性DS分解为由开关流形分隔的多个线性DS，即分段线性（PWL）系统，以促进DS模型的数学可操作性和可解释性。PWL模型在工程领域很受欢迎，在数学中经常用于分析DS的拓扑性质。然而，手工制作这样的模型是繁琐的，仅适用于非常低维度的情况，而从数据中推断它们通常会产生具有非常多线性子区域的不必要复杂的表示。在这里，我们介绍了几乎线性递归神经网络（AL-RNNs），它可以自动且稳健地从时间序列数据中生成DS的最简PWL表示，尽可能少地使用PWL非线性。AL-RNNs可以使用任何DS重建（DSR）的最先进算法高效训练，并自然地产生潜在DS的符号编码，可以证明保留重要的拓扑性质。我们展示了对于Lorenz和Rössler系统，AL-RNNs以纯粹数据驱动的方式发现了已知的对应混沌吸引子的拓扑最小PWL表示。我们进一步在两个具有挑战性的实证数据集上说明，可以实现对动态的可解释符号编码，极大地促进了对底层系统的数学和计算分析。

更新时间: 2024-10-18 07:44:12

领域: cs.LG,cs.AI,math.DS,nlin.CD,physics.data-an

下载: http://arxiv.org/abs/2410.14240v1

Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers

Score-based diffusion models have emerged as powerful techniques for generating samples from high-dimensional data distributions. These models involve a two-phase process: first, injecting noise to transform the data distribution into a known prior distribution, and second, sampling to recover the original data distribution from noise. Among the various sampling methods, deterministic samplers stand out for their enhanced efficiency. However, analyzing these deterministic samplers presents unique challenges, as they preclude the use of established techniques such as Girsanov's theorem, which are only applicable to stochastic samplers. Furthermore, existing analysis for deterministic samplers usually focuses on specific examples, lacking a generalized approach for general forward processes and various deterministic samplers. Our paper addresses these limitations by introducing a unified convergence analysis framework. To demonstrate the power of our framework, we analyze the variance-preserving (VP) forward process with the exponential integrator (EI) scheme, achieving iteration complexity of $\tilde O(d^2/\epsilon)$. Additionally, we provide a detailed analysis of Denoising Diffusion Implicit Models (DDIM)-type samplers, which have been underexplored in previous research, achieving polynomial iteration complexity.

Updated: 2024-10-18 07:37:36

标题: 具有确定性采样器的基于得分的扩散模型的统一收敛分析

摘要: 基于分数的扩散模型已经成为从高维数据分布中生成样本的强大技术。这些模型涉及两个阶段的过程：首先，注入噪声将数据分布转换为已知的先验分布，然后，采样以从噪声中恢复原始数据分布。在各种采样方法中，确定性采样器以其增强的效率脱颖而出。然而，分析这些确定性采样器面临独特挑战，因为它们排除了像Girsanov定理这样的已建立技术的使用，这些技术仅适用于随机采样器。此外，现有对确定性采样器的分析通常集中在特定示例上，缺乏一种适用于一般前向过程和各种确定性采样器的广义方法。我们的论文通过引入统一的收敛分析框架来解决这些局限性。为了展示我们框架的强大之处，我们分析了具有指数积分器（EI）方案的保方差（VP）前向过程，实现了$\tilde O(d^2/\epsilon)$的迭代复杂度。此外，我们对去噪扩散隐式模型（DDIM）类型的采样器进行了详细分析，这在先前的研究中未被充分探索，实现了多项式迭代复杂度。

更新时间: 2024-10-18 07:37:36

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.14237v1

Toward a Method to Generate Capability Ontologies from Natural Language Descriptions

To achieve a flexible and adaptable system, capability ontologies are increasingly leveraged to describe functions in a machine-interpretable way. However, modeling such complex ontological descriptions is still a manual and error-prone task that requires a significant amount of effort and ontology expertise. This contribution presents an innovative method to automate capability ontology modeling using Large Language Models (LLMs), which have proven to be well suited for such tasks. Our approach requires only a natural language description of a capability, which is then automatically inserted into a predefined prompt using a few-shot prompting technique. After prompting an LLM, the resulting capability ontology is automatically verified through various steps in a loop with the LLM to check the overall correctness of the capability ontology. First, a syntax check is performed, then a check for contradictions, and finally a check for hallucinations and missing ontology elements. Our method greatly reduces manual effort, as only the initial natural language description and a final human review and possible correction are necessary, thereby streamlining the capability ontology generation process.

Updated: 2024-10-18 07:34:39

标题: 朝向从自然语言描述中生成能力本体论的方法

摘要: 为了实现一个灵活和适应性强的系统，能力本体论越来越被利用来以机器可解释的方式描述功能。然而，对于建模这种复杂的本体论描述仍然是一个需要大量努力和本体论专业知识的手动且容易出错的任务。本文提出了一种创新的方法，利用大型语言模型（LLMs）来自动化能力本体论建模，这已被证明非常适合这类任务。我们的方法只需要一个能力的自然语言描述，然后通过少量提示技术自动插入到预定义的提示中。在提示LLM之后，通过循环与LLM一起进行多个步骤来自动验证生成的能力本体论，以检查整体正确性。首先进行语法检查，然后检查矛盾，最后检查幻觉和缺失的本体元素。我们的方法极大地减少了手动工作量，只需要初始的自然语言描述和最终的人工审查和可能的纠正，从而简化了能力本体论生成过程。

更新时间: 2024-10-18 07:34:39

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.07962v2

Identifying treatment response subgroups in observational time-to-event data

Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approaches for subgroup analysis primarily rely on Randomised Controlled Trials (RCTs), in which treatment assignment is randomised. RCTs' patient cohorts are often constrained by cost, rendering them not representative of the heterogeneity of patients likely to receive treatment in real-world clinical practice. When applied to observational studies, subgroup analysis approaches suffer from significant statistical biases particularly because of the non-randomisation of treatment. Our work introduces a novel, outcome-guided method for identifying treatment response subgroups in observational studies. Our approach assigns each patient to a subgroup associated with two time-to-event distributions: one under treatment and one under control regime. It hence positions itself in between individualised and average treatment effect estimation. The assumptions of our model result in a simple correction of the statistical bias from treatment non-randomisation through inverse propensity weighting. In experiments, our approach significantly outperforms the current state-of-the-art method for outcome-guided subgroup analysis in both randomised and observational treatment regimes.

Updated: 2024-10-18 07:32:18

标题: 在观察性事件发生时间数据中识别治疗反应亚组

摘要: 识别具有不同治疗反应的患者亚组是一个重要任务，可用于制定医疗建议、指南和未来临床试验的设计。现有的亚组分析方法主要依赖于随机对照试验（RCTs），其中治疗分配是随机的。RCT的患者队列往往受到成本的限制，使它们不能代表真实世界临床实践中可能接受治疗的患者的多样性。当应用于观察性研究时，亚组分析方法受到显著的统计偏差的影响，尤其是由于治疗非随机化。我们的研究引入了一种新颖的、以结果为导向的方法，用于在观察性研究中识别治疗反应亚组。我们的方法将每个患者分配到与两个时间至事件分布相关联的亚组中：一个是在治疗下，另一个是在对照制度下。因此，它处于个体化和平均治疗效果估计之间。我们模型的假设导致了一种简单的纠正治疗非随机化统计偏差的方法，即通过倒数倾向权重。在实验中，我们的方法在随机和观察性治疗制度下均显著优于目前的最先进的以结果为导向的亚组分析方法。

更新时间: 2024-10-18 07:32:18

领域: stat.ME,cs.AI

下载: http://arxiv.org/abs/2408.03463v3

Cyber Attacks Prevention Towards Prosumer-based EV Charging Stations: An Edge-assisted Federated Prototype Knowledge Distillation Approach

In this paper, cyber-attack prevention for the prosumer-based electric vehicle (EV) charging stations (EVCSs) is investigated, which covers two aspects: 1) cyber-attack detection on prosumers' network traffic (NT) data, and 2) cyber-attack intervention. To establish an effective prevention mechanism, several challenges need to be tackled, for instance, the NT data per prosumer may be non-independent and identically distributed (non-IID), and the boundary between benign and malicious traffic becomes blurred. To this end, we propose an edge-assisted federated prototype knowledge distillation (E-FPKD) approach, where each client is deployed on a dedicated local edge server (DLES) and can report its availability for joining the federated learning (FL) process. Prior to the E-FPKD approach, to enhance accuracy, the Pearson Correlation Coefficient is adopted for feature selection. Regarding the proposed E-FPKD approach, we integrate the knowledge distillation and prototype aggregation technique into FL to deal with the non-IID challenge. To address the boundary issue, instead of directly calculating the distance between benign and malicious traffic, we consider maximizing the overall detection correctness of all prosumers (ODC), which can mitigate the computational cost compared with the former way. After detection, a rule-based method will be triggered at each DLES for cyber-attack intervention. Experimental analysis demonstrates that the proposed E-FPKD can achieve the largest ODC on NSL-KDD, UNSW-NB15, and IoTID20 datasets in both binary and multi-class classification, compared with baselines. For instance, the ODC for IoTID20 obtained via the proposed method is separately 0.3782% and 4.4471% greater than FedProto and FedAU in multi-class classification.

Updated: 2024-10-18 07:26:24

标题: 针对生产者-消费者型电动汽车充电站的网络攻击预防：一种边缘辅助的联邦原型知识蒸馏方法

摘要: 本文研究了基于生产消费者的电动汽车（EV）充电站（EVCSs）的网络攻击预防，涵盖两个方面：1）在生产消费者的网络流量（NT）数据上进行网络攻击检测，2）进行网络攻击干预。为建立有效的预防机制，需要解决一些挑战，例如，每个生产消费者的NT数据可能是非独立和同分布的（非IID），良性和恶意流量之间的界限变得模糊。为此，我们提出了一种边缘辅助联邦原型知识蒸馏（E-FPKD）方法，其中每个客户端部署在专用本地边缘服务器（DLES）上，并可以报告其加入联邦学习（FL）过程的可用性。在E-FPKD方法之前，为增强准确性，采用皮尔逊相关系数进行特征选择。关于提出的E-FPKD方法，我们将知识蒸馏和原型聚合技术整合到FL中，以应对非IID挑战。为了解决界限问题，我们不直接计算良性和恶意流量之间的距离，而是考虑最大化所有生产消费者的整体检测正确性（ODC），这可以减少与前一种方式相比的计算成本。在检测后，每个DLES将触发基于规则的方法进行网络攻击干预。实验分析表明，所提出的E-FPKD在二元分类和多类分类中在NSL-KDD、UNSW-NB15和IoTID20数据集上均可实现最大的ODC，与基线相比。例如，通过所提出的方法，在多类分类中，IoTID20的ODC分别比FedProto和FedAU大0.3782％和4.4471％。

更新时间: 2024-10-18 07:26:24

领域: cs.CR

下载: http://arxiv.org/abs/2410.13260v2

Encode-Store-Retrieve: Augmenting Human Memory through Language-Encoded Egocentric Perception

We depend on our own memory to encode, store, and retrieve our experiences. However, memory lapses can occur. One promising avenue for achieving memory augmentation is through the use of augmented reality head-mounted displays to capture and preserve egocentric videos, a practice commonly referred to as lifelogging. However, a significant challenge arises from the sheer volume of video data generated through lifelogging, as the current technology lacks the capability to encode and store such large amounts of data efficiently. Further, retrieving specific information from extensive video archives requires substantial computational power, further complicating the task of quickly accessing desired content. To address these challenges, we propose a memory augmentation agent that involves leveraging natural language encoding for video data and storing them in a vector database. This approach harnesses the power of large vision language models to perform the language encoding process. Additionally, we propose using large language models to facilitate natural language querying. Our agent underwent extensive evaluation using the QA-Ego4D dataset and achieved state-of-the-art results with a BLEU score of 8.3, outperforming conventional machine learning models that scored between 3.4 and 5.8. Additionally, we conducted a user study in which participants interacted with the human memory augmentation agent through episodic memory and open-ended questions. The results of this study show that the agent results in significantly better recall performance on episodic memory tasks compared to human participants. The results also highlight the agent's practical applicability and user acceptance.

Updated: 2024-10-18 07:24:54

标题: Encode-Store-Retrieve: 通过语言编码的自我中心感知增强人类记忆

摘要: 我们依赖自己的记忆来编码、存储和检索我们的经历。然而，记忆缺失是可能发生的。实现记忆增强的一个有前途的途径是利用增强现实头戴显示器来捕捉和保存以自我为中心的视频，这种做法通常被称为生活记录。然而，一个重要的挑战是来自生活记录产生的大量视频数据，因为当前技术缺乏高效地编码和存储如此大量的数据的能力。此外，从广泛的视频档案中检索特定信息需要大量的计算能力，进一步复杂了快速访问所需内容的任务。为了解决这些挑战，我们提出了一种记忆增强代理，涉及利用自然语言编码视频数据并将其存储在向量数据库中。这种方法利用大型视觉语言模型的能力来执行语言编码过程。此外，我们提出使用大型语言模型来促进自然语言查询。我们的代理经过了对QA-Ego4D数据集的广泛评估，并以8.3的BLEU分数取得了最新的成果，超过了传统的机器学习模型，它们的分数在3.4和5.8之间。此外，我们进行了一项用户研究，参与者通过情节记忆和开放性问题与人类记忆增强代理进行交互。这项研究的结果表明，与人类参与者相比，该代理在情节记忆任务上表现出显著更好的回忆能力。结果还突显了该代理的实用性和用户接受度。

更新时间: 2024-10-18 07:24:54

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2308.05822v3

Controllable Discovery of Intents: Incremental Deep Clustering Using Semi-Supervised Contrastive Learning

Deriving value from a conversational AI system depends on the capacity of a user to translate the prior knowledge into a configuration. In most cases, discovering the set of relevant turn-level speaker intents is often one of the key steps. Purely unsupervised algorithms provide a natural way to tackle discovery problems but make it difficult to incorporate constraints and only offer very limited control over the outcomes. Previous work has shown that semi-supervised (deep) clustering techniques can allow the system to incorporate prior knowledge and constraints in the intent discovery process. However they did not address how to allow for control through human feedback. In our Controllable Discovery of Intents (CDI) framework domain and prior knowledge are incorporated using a sequence of unsupervised contrastive learning on unlabeled data followed by fine-tuning on partially labeled data, and finally iterative refinement of clustering and representations through repeated clustering and pseudo-label fine-tuning. In addition, we draw from continual learning literature and use learning-without-forgetting to prevent catastrophic forgetting across those training stages. Finally, we show how this deep-clustering process can become part of an incremental discovery strategy with human-in-the-loop. We report results on both CLINC and BANKING datasets. CDI outperforms previous works by a significant margin: 10.26% and 11.72% respectively.

Updated: 2024-10-18 07:24:02

标题: 可控的意图发现：使用半监督对比学习的增量深度聚类

摘要: 从对话式人工智能系统中获得价值取决于用户将先前知识转化为配置的能力。在大多数情况下，发现相关的转向级别的讲话者意图集通常是关键步骤之一。纯无监督算法提供了解决发现问题的一种自然方式，但很难融入约束，并且对结果的控制非常有限。先前的研究表明，半监督（深度）聚类技术可以使系统在意图发现过程中融入先前知识和约束。然而，他们没有解决如何通过人类反馈来实现控制的问题。在我们的可控意图发现（CDI）框架中，领域和先前知识被纳入使用未标记数据的无监督对比学习序列，然后在部分标记数据上进行微调，最后通过重复聚类和伪标签微调对聚类和表示的迭代细化。此外，我们借鉴了不断学习的文献，并使用无遗忘学习防止在这些训练阶段之间发生灾难性遗忘。最后，我们展示了这种深度聚类过程如何成为一个带有人类参与的增量发现策略的一部分。我们报告了在CLINC和银行数据集上的结果。CDI的表现明显优于先前的工作：分别为10.26%和11.72%。

更新时间: 2024-10-18 07:24:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14755v1

Integrating spoken instructions into flight trajectory prediction to optimize automation in air traffic control

The booming air transportation industry inevitably burdens air traffic controllers' workload, causing unexpected human factor-related incidents. Current air traffic control systems fail to consider spoken instructions for traffic prediction, bringing significant challenges in detecting human errors during real-time traffic operations. Here, we present an automation paradigm integrating controlling intent into the information processing loop through the spoken instruction-aware flight trajectory prediction framework. A 3-stage progressive multi-modal learning paradigm is proposed to address the modality gap between the trajectory and spoken instructions, as well as minimize the data requirements. Experiments on a real-world dataset show the proposed framework achieves flight trajectory prediction with high predictability and timeliness, obtaining over 20% relative reduction in mean deviation error. Moreover, the generalizability of the proposed framework is also confirmed by various model architectures. The proposed framework can formulate full-automated information processing in real-world air traffic applications, supporting human error detection and enhancing aviation safety.

Updated: 2024-10-18 07:15:51

标题: 将口头指令整合到飞行轨迹预测中，优化空中交通管制中的自动化

摘要: 空中运输业的蓬勃发展不可避免地增加了空中交通管制员的工作负担，导致意外的人为因素相关事件。当前的空中交通管制系统未考虑口头指令用于交通预测，这在实时交通操作中检测人为错误方面带来了重大挑战。在这里，我们提出了一个自动化范式，通过口头指令感知飞行轨迹预测框架将控制意图整合到信息处理循环中。提出了一个3阶段渐进的多模态学习范式，以解决轨迹和口头指令之间的模态差距，并最大限度地减少数据需求。对真实世界的数据集进行的实验显示，所提出的框架可以高度可预测和及时地实现飞行轨迹预测，平均偏差误差相对减少超过20%。此外，通过各种模型架构也确认了所提出框架的泛化能力。这个提出的框架可以在真实世界的空中交通应用中制定完全自动化的信息处理，支持人为错误检测，并增强航空安全。

更新时间: 2024-10-18 07:15:51

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2305.01661v2

Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model

Joint Multimodal Entity-Relation Extraction (JMERE) is a challenging task that aims to extract entities and their relations from text-image pairs in social media posts. Existing methods for JMERE require large amounts of labeled data. However, gathering and annotating fine-grained multimodal data for JMERE poses significant challenges. Initially, we construct diverse and comprehensive multimodal few-shot datasets fitted to the original data distribution. To address the insufficient information in the few-shot setting, we introduce the \textbf{K}nowledge-\textbf{E}nhanced \textbf{C}ross-modal \textbf{P}rompt \textbf{M}odel (KECPM) for JMERE. This method can effectively address the problem of insufficient information in the few-shot setting by guiding a large language model to generate supplementary background knowledge. Our proposed method comprises two stages: (1) a knowledge ingestion stage that dynamically formulates prompts based on semantic similarity guide ChatGPT generating relevant knowledge and employs self-reflection to refine the knowledge; (2) a knowledge-enhanced language model stage that merges the auxiliary knowledge with the original input and utilizes a transformer-based model to align with JMERE's required output format. We extensively evaluate our approach on a few-shot dataset derived from the JMERE dataset, demonstrating its superiority over strong baselines in terms of both micro and macro F$_1$ scores. Additionally, we present qualitative analyses and case studies to elucidate the effectiveness of our model.

Updated: 2024-10-18 07:14:54

标题: 通过知识增强的跨模态提示模型进行少样本联合多模态实体关系抽取

摘要: 多模态实体关系抽取（JMERE）是一项具有挑战性的任务，旨在从社交媒体帖子的文本-图像对中提取实体及其关系。现有的JMERE方法需要大量标记数据。然而，为JMERE收集和注释精细的多模态数据存在重大挑战。首先，我们构建了多样化和全面的适合原始数据分布的多模态少样本数据集。为了解决少样本设置中信息不足的问题，我们引入了知识增强的跨模态提示模型（KECPM）用于JMERE。该方法可以通过引导大型语言模型生成补充背景知识有效解决少样本设置中信息不足的问题。我们提出的方法包括两个阶段：（1）知识摄入阶段根据语义相似性指导ChatGPT生成相关知识，并利用自我反思来完善知识；（2）知识增强语言模型阶段将辅助知识与原始输入合并，并利用基于Transformer的模型与JMERE所需的输出格式对齐。我们对从JMERE数据集衍生的少样本数据集进行了广泛评估，显示了其在微观和宏观F1分数方面优于强基线。此外，我们提出了定性分析和案例研究，以阐明我们模型的有效性。

更新时间: 2024-10-18 07:14:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14225v1

G-NeuroDAVIS: A Neural Network model for generalized embedding, data visualization and sample generation

Visualizing high-dimensional datasets through a generalized embedding has been a challenge for a long time. Several methods have shown up for the same, but still, they have not been able to generate a generalized embedding, which not only can reveal the hidden patterns present in the data but also generate realistic high-dimensional samples from it. Motivated by this aspect, in this study, a novel generative model, called G-NeuroDAVIS, has been developed, which is capable of visualizing high-dimensional data through a generalized embedding, and thereby generating new samples. The model leverages advanced generative techniques to produce high-quality embedding that captures the underlying structure of the data more effectively than existing methods. G-NeuroDAVIS can be trained in both supervised and unsupervised settings. We rigorously evaluated our model through a series of experiments, demonstrating superior performance in classification tasks, which highlights the robustness of the learned representations. Furthermore, the conditional sample generation capability of the model has been described through qualitative assessments, revealing a marked improvement in generating realistic and diverse samples. G-NeuroDAVIS has outperformed the Variational Autoencoder (VAE) significantly in multiple key aspects, including embedding quality, classification performance, and sample generation capability. These results underscore the potential of our generative model to serve as a powerful tool in various applications requiring high-quality data generation and representation learning.

Updated: 2024-10-18 07:14:08

标题: G-NeuroDAVIS：用于广义嵌入、数据可视化和样本生成的神经网络模型

摘要: 长期以来，通过广义嵌入来可视化高维数据集一直是一个挑战。已经出现了几种方法来解决这个问题，但它们仍然无法生成一个可以揭示数据中隐藏模式的广义嵌入，同时还能从中生成逼真的高维样本。受这一方面的启发，在这项研究中开发了一种新颖的生成模型，称为G-NeuroDAVIS，它能够通过广义嵌入可视化高维数据，并生成新样本。该模型利用先进的生成技术来产生高质量的嵌入，比现有方法更有效地捕捉数据的潜在结构。G-NeuroDAVIS可以在监督和无监督设置中进行训练。我们通过一系列实验对我们的模型进行了严格评估，在分类任务中展现出卓越的性能，突显了学习表示的稳健性。此外，通过定性评估描述了模型的条件样本生成能力，揭示了在生成逼真和多样化样本方面的显著改进。在多个关键方面，G-NeuroDAVIS在嵌入质量、分类性能和样本生成能力方面明显优于变分自动编码器(VAE)。这些结果强调了我们生成模型在需要高质量数据生成和表示学习的各种应用中作为强大工具的潜力。

更新时间: 2024-10-18 07:14:08

领域: cs.LG

下载: http://arxiv.org/abs/2410.14223v1

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new capabilities in language modeling for speech processing, we introduce the generative speech transcription error correction (GenSEC) challenge. This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition. These tasks aim to emulate future LLM-based agents handling voice-based interfaces while remaining accessible to a broad audience by utilizing open pretrained language models or agent-based APIs. We also discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.

Updated: 2024-10-18 07:11:35

标题: 基于大型语言模型的生成式错误校正：面向语音识别、说话者标记和情感识别的挑战和基线

摘要: 鉴于生成式人工智能技术的最新进展，一个关键问题是如何利用来自冻结的、预训练的自动语音识别（ASR）模型的文本解码结果，来增强大型语言模型（LLMs）在声学建模任务中的效果。为了探索语言建模在语音处理中的新能力，我们引入了生成式语音转录错误校正（GenSEC）挑战。这个挑战包括三个ASR后语言建模任务：（i）ASR后转录校正，（ii）说话者标记，（iii）情感识别。这些任务旨在模拟未来基于LLM的代理处理基于语音的界面，同时通过利用开放的预训练语言模型或代理API，使其对广泛受众可访问。我们还讨论了基线评估的见解，以及为设计未来评估所学到的经验。

更新时间: 2024-10-18 07:11:35

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2409.09785v3

Formal Explanations for Neuro-Symbolic AI

Despite the practical success of Artificial Intelligence (AI), current neural AI algorithms face two significant issues. First, the decisions made by neural architectures are often prone to bias and brittleness. Second, when a chain of reasoning is required, neural systems often perform poorly. Neuro-symbolic artificial intelligence is a promising approach that tackles these (and other) weaknesses by combining the power of neural perception and symbolic reasoning. Meanwhile, the success of AI has made it critical to understand its behaviour, leading to the development of explainable artificial intelligence (XAI). While neuro-symbolic AI systems have important advantages over purely neural AI, we still need to explain their actions, which are obscured by the interactions of the neural and symbolic components. To address the issue, this paper proposes a formal approach to explaining the decisions of neuro-symbolic systems. The approach hinges on the use of formal abductive explanations and on solving the neuro-symbolic explainability problem hierarchically. Namely, it first computes a formal explanation for the symbolic component of the system, which serves to identify a subset of the individual parts of neural information that needs to be explained. This is followed by explaining only those individual neural inputs, independently of each other, which facilitates succinctness of hierarchical formal explanations and helps to increase the overall performance of the approach. Experimental results for a few complex reasoning tasks demonstrate practical efficiency of the proposed approach, in comparison to purely neural systems, from the perspective of explanation size, explanation time, training time, model sizes, and the quality of explanations reported.

Updated: 2024-10-18 07:08:31

标题: 神经符号人工智能的形式化解释

摘要: 尽管人工智能（AI）在实践中取得了成功，但目前的神经AI算法面临两个重大问题。首先，神经结构所做出的决策往往容易受到偏见和脆弱性的影响。其次，在需要一系列推理的情况下，神经系统往往表现不佳。神经符号人工智能是一种有前途的方法，通过结合神经感知和符号推理的力量来应对这些（和其他）弱点。与此同时，AI的成功使得理解其行为变得至关重要，从而推动了可解释人工智能（XAI）的发展。虽然神经符号AI系统相比纯粹的神经AI具有重要优势，但我们仍然需要解释它们的行为，这些行为被神经和符号组件的相互作用所掩盖。为了解决这个问题，本文提出了一种形式化方法来解释神经符号系统的决策。该方法依赖于正式的适应性解释，并分层解决神经符号可解释性问题。即首先计算系统符号组件的正式解释，以确定需要解释的个别神经信息部分的子集。然后，解释这些个别神经输入，彼此独立，有助于简洁的分层正式解释，并有助于提高方法的整体性能。针对一些复杂推理任务的实验结果表明，所提出的方法在解释大小、解释时间、训练时间、模型大小和报告的解释质量等方面，相较于纯粹的神经系统，具有实际效率。

更新时间: 2024-10-18 07:08:31

领域: cs.AI,cs.LG,cs.LO

下载: http://arxiv.org/abs/2410.14219v1

WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off

Watermarking is a technical means to dissuade malfeasant usage of Large Language Models. This paper proposes a novel watermarking scheme, so-called WaterMax, that enjoys high detectability while sustaining the quality of the generated text of the original LLM. Its new design leaves the LLM untouched (no modification of the weights, logits, temperature, or sampling technique). WaterMax balances robustness and complexity contrary to the watermarking techniques of the literature inherently provoking a trade-off between quality and robustness. Its performance is both theoretically proven and experimentally validated. It outperforms all the SotA techniques under the most complete benchmark suite. Code available at https://github.com/eva-giboulot/WaterMax.

Updated: 2024-10-18 07:05:57

标题: WaterMax: 打破LLM水印可检测性-鲁棒性-质量折衷

摘要: 水印技术是一种技术手段，用于阻止大型语言模型的恶意使用。本文提出了一种新颖的水印方案，名为WaterMax，它具有高检测性能，同时保持原始LLM生成文本的质量。其新设计使LLM保持不变（不修改权重、对数、温度或采样技术）。WaterMax在鲁棒性和复杂性之间取得平衡，与文献中的水印技术相比，后者在质量和鲁棒性之间存在天然的权衡。其性能在理论上得到证明，并在实验中得到验证。在最完整的基准套件下，WaterMax优于所有最新技术。代码可在https://github.com/eva-giboulot/WaterMax找到。

更新时间: 2024-10-18 07:05:57

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.04808v3

Towards Satellite Non-IID Imagery: A Spectral Clustering-Assisted Federated Learning Approach

Low Earth orbit (LEO) satellites are capable of gathering abundant Earth observation data (EOD) to enable different Internet of Things (IoT) applications. However, to accomplish an effective EOD processing mechanism, it is imperative to investigate: 1) the challenge of processing the observed data without transmitting those large-size data to the ground because the connection between the satellites and the ground stations is intermittent, and 2) the challenge of processing the non-independent and identically distributed (non-IID) satellite data. In this paper, to cope with those challenges, we propose an orbit-based spectral clustering-assisted clustered federated self-knowledge distillation (OSC-FSKD) approach for each orbit of an LEO satellite constellation, which retains the advantage of FL that the observed data does not need to be sent to the ground. Specifically, we introduce normalized Laplacian-based spectral clustering (NLSC) into federated learning (FL) to create clustered FL in each round to address the challenge resulting from non-IID data. Particularly, NLSC is adopted to dynamically group clients into several clusters based on cosine similarities calculated by model updates. In addition, self-knowledge distillation is utilized to construct each local client, where the most recent updated local model is used to guide current local model training. Experiments demonstrate that the observation accuracy obtained by the proposed method is separately 1.01x, 2.15x, 1.10x, and 1.03x higher than that of pFedSD, FedProx, FedAU, and FedALA approaches using the SAT4 dataset. The proposed method also shows superiority when using other datasets.

Updated: 2024-10-18 07:04:25

标题: 朝向卫星非独立同分布图像：一种光谱聚类辅助的联邦学习方法

摘要: 低地球轨道（LEO）卫星能够收集丰富的地球观测数据（EOD），以支持不同的物联网（IoT）应用。然而，为了实现有效的EOD处理机制，必须研究以下问题：1）在不将观测数据传输到地面的情况下处理这些大型数据的挑战，因为卫星与地面站之间的连接是间歇性的；2）处理非独立和同分布（non-IID）卫星数据的挑战。在本文中，为了应对这些挑战，我们提出了一种基于轨道的谱聚类辅助聚类联邦自知蒸馏（OSC-FSKD）方法，用于LEO卫星星座的每个轨道，保留了FL的优势，即观测数据无需发送到地面。具体而言，我们将基于归一化拉普拉斯的谱聚类（NLSC）引入到联邦学习（FL）中，以在每一轮中创建聚类FL以解决由非I ID数据导致的挑战。特别是，NLSC被采用来根据模型更新计算的余弦相似度动态地将客户端分组成几个簇。此外，自知蒸馏被用于构建每个本地客户端，其中最近更新的本地模型被用来引导当前本地模型的训练。实验表明，所提出的方法获得的观测精度分别比使用SAT4数据集的pFedSD、FedProx、FedAU和FedALA方法高出1.01倍、2.15倍、1.10倍和1.03倍。所提出的方法在使用其他数据集时也表现出优势。

更新时间: 2024-10-18 07:04:25

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2410.13602v2

Comparative Evaluation of Clustered Federated Learning Method

Over recent years, Federated Learning (FL) has proven to be one of the most promising methods of distributed learning which preserves data privacy. As the method evolved and was confronted to various real-world scenarios, new challenges have emerged. One such challenge is the presence of highly heterogeneous (often referred as non-IID) data distributions among participants of the FL protocol. A popular solution to this hurdle is Clustered Federated Learning (CFL), which aims to partition clients into groups where the distribution are homogeneous. In the literature, state-of-the-art CFL algorithms are often tested using a few cases of data heterogeneities, without systematically justifying the choices. Further, the taxonomy used for differentiating the different heterogeneity scenarios is not always straightforward. In this paper, we explore the performance of two state-of-theart CFL algorithms with respect to a proposed taxonomy of data heterogeneities in federated learning (FL). We work with three image classification datasets and analyze the resulting clusters against the heterogeneity classes using extrinsic clustering metrics. Our objective is to provide a clearer understanding of the relationship between CFL performances and data heterogeneity scenarios.

Updated: 2024-10-18 07:01:56

标题: 聚类联邦学习方法的比较评估

摘要: 近年来，联合学习（Federated Learning，FL）已被证明是保护数据隐私的分布式学习方法中最具前景的方法之一。随着该方法的发展并在各种实际场景中得到应用，新的挑战也随之出现。其中之一是在FL协议的参与者之间存在高度异构（通常被称为非IID）的数据分布。针对这一障碍的一种流行解决方案是聚类式联合学习（Clustered Federated Learning，CFL），其目标是将客户端分成分布相同的群组。在文献中，最先进的CFL算法通常是使用一些数据异质性案例进行测试的，但并没有系统地证明这些选择。此外，用于区分不同异质性场景的分类法并不总是直接明了的。本文探讨了两种最先进的CFL算法在提出的联合学习（FL）数据异质性分类法中的性能。我们使用三个图像分类数据集，并使用外在聚类度量分析结果的聚类对异质性类别进行分类。我们的目标是提供更清晰的了解CFL性能与数据异质性场景之间的关系。

更新时间: 2024-10-18 07:01:56

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.14212v1

On the Sparsity of the Strong Lottery Ticket Hypothesis

Considerable research efforts have recently been made to show that a random neural network $N$ contains subnetworks capable of accurately approximating any given neural network that is sufficiently smaller than $N$, without any training. This line of research, known as the Strong Lottery Ticket Hypothesis (SLTH), was originally motivated by the weaker Lottery Ticket Hypothesis, which states that a sufficiently large random neural network $N$ contains \emph{sparse} subnetworks that can be trained efficiently to achieve performance comparable to that of training the entire network $N$. Despite its original motivation, results on the SLTH have so far not provided any guarantee on the size of subnetworks. Such limitation is due to the nature of the main technical tool leveraged by these results, the Random Subset Sum (RSS) Problem. Informally, the RSS Problem asks how large a random i.i.d. sample $\Omega$ should be so that we are able to approximate any number in $[-1,1]$, up to an error of $ \epsilon$, as the sum of a suitable subset of $\Omega$. We provide the first proof of the SLTH in classical settings, such as dense and equivariant networks, with guarantees on the sparsity of the subnetworks. Central to our results, is the proof of an essentially tight bound on the Random Fixed-Size Subset Sum Problem (RFSS), a variant of the RSS Problem in which we only ask for subsets of a given size, which is of independent interest.

Updated: 2024-10-18 06:57:37

标题: 关于强大的彩票票据假设的稀疏性

摘要: 最近已经进行了大量研究，表明随机神经网络N包含能够准确逼近任何比N小得足够的神经网络的子网络，而无需任何训练。这一研究领域被称为强彩票假设（SLTH），最初是由较弱的彩票假设激发的，该假设认为一个足够大的随机神经网络N包含稀疏的子网络，可以高效训练以达到与训练整个网络N相当的性能。尽管最初的动机是如此，到目前为止，有关SLTH的结果尚未提供对子网络大小的任何保证。这种限制是由于这些结果所依赖的主要技术工具，即随机子集和（RSS）问题的性质。简而言之，RSS问题询问一个大小为随机独立同分布样本Ω应该是多大，以便我们能够逼近[-1,1]中的任何数字，最多误差为ε，作为Ω的适当子集的和。我们在传统设置中提供了SLTH的第一个证明，例如密集和等变网络，并对子网络的稀疏性提供了保证。我们结果的核心是对随机固定大小子集和问题（RFSS）的基本紧密界限的证明，这是RSS问题的一个变种，其中我们只要求给定大小的子集，这具有独立的兴趣。

更新时间: 2024-10-18 06:57:37

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14754v1

PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

Expert-designed close-ended benchmarks are indispensable in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through \textbf{knowledge-invariant perturbations}. These perturbations employ human-like restatement techniques to generate on-the-fly test samples from static benchmarks, meticulously retaining knowledge-critical content while altering irrelevant details. Our toolkit further includes a suite of \textbf{response consistency analyses} that compare performance on raw vs. perturbed test sets to precisely assess LLMs' genuine knowledge capacity. Six representative LLMs are re-evaluated using PertEval. Results reveal significantly inflated performance of the LLMs on raw benchmarks, including an absolute 25.8% overestimation for GPT-4. Additionally, through a nuanced response pattern analysis, we discover that PertEval retains LLMs' uncertainty to specious knowledge, and reveals their potential rote memorization to correct options which leads to overestimated performance. We also find that the detailed response consistency analyses by PertEval could illuminate various weaknesses in existing LLMs' knowledge mastery and guide the development of refinement. Our findings provide insights for advancing more robust and genuinely knowledgeable LLMs. Our code is available at \url{https://github.com/aigc-apps/PertEval}.

Updated: 2024-10-18 06:57:08

标题: PertEval: 揭示具有知识不变扰动的LLMs的真实知识能力

摘要: 专家设计的封闭式基准对于评估大型语言模型（LLMs）的知识能力至关重要。尽管它们被广泛使用，但由于测试场景有限和数据污染风险不可避免，对它们的可靠性产生了担忧。为了纠正这一问题，我们提出了PertEval，这是一个旨在通过“知识不变扰动”深入探究LLMs知识能力的工具包。这些扰动利用类似人类的重述技术从静态基准中生成即时测试样本，精心保留关键知识内容同时改变无关细节。我们的工具包还包括一套“响应一致性分析”，用于比较原始测试集和扰动测试集上的表现，以精确评估LLMs的真实知识能力。我们使用PertEval重新评估了六个代表性的LLMs。结果显示，LLMs在原始基准上的表现明显夸大，其中包括对GPT-4的绝对过高估计达25.8%。此外，通过细致的响应模式分析，我们发现PertEval保留了LLMs对表面知识的不确定性，并揭示了它们对正确选项的潜在死记硬背，导致了过高的表现。我们还发现，PertEval的详细响应一致性分析可以揭示现有LLMs在知识掌握上的各种弱点，并指导改进的发展。我们的研究结果为推进更加强大和真正富有知识的LLMs提供了见解。我们的代码可在\url{https://github.com/aigc-apps/PertEval}上找到。

更新时间: 2024-10-18 06:57:08

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.19740v2

Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning

We present a novel end-to-end framework that generates highly compact (typically 6-15 dimensions), discrete (int4 type), and interpretable node representations, termed node identifiers (node IDs), to tackle inference challenges on large-scale graphs. By employing vector quantization, we compress continuous node embeddings from multiple layers of a Graph Neural Network (GNN) into discrete codes, applicable under both self-supervised and supervised learning paradigms. These node IDs capture high-level abstractions of graph data and offer interpretability that traditional GNN embeddings lack. Extensive experiments on 34 datasets, encompassing node classification, graph classification, link prediction, and attributed graph clustering tasks, demonstrate that the generated node IDs significantly enhance speed and memory efficiency while achieving competitive performance compared to current state-of-the-art methods.

Updated: 2024-10-18 06:56:10

标题: 节点标识符：紧凑、离散表示以便高效图学习

摘要: 我们提出了一种新颖的端到端框架，该框架生成高度紧凑（通常为6-15维）、离散（int4类型）和可解释的节点表示，称为节点标识符（节点ID），以解决大规模图上的推理挑战。通过使用向量量化，我们将图神经网络（GNN）多层连续节点嵌入压缩为离散代码，适用于自监督和监督学习范式。这些节点ID捕获了图数据的高级抽象，并提供了传统GNN嵌入所缺乏的可解释性。在34个数据集上进行了大量实验，涵盖了节点分类、图分类、链接预测和属性图聚类任务，结果表明生成的节点ID显著提高了速度和内存效率，同时与当前最先进的方法相比实现了竞争性能。

更新时间: 2024-10-18 06:56:10

领域: cs.LG

下载: http://arxiv.org/abs/2405.16435v2

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Synthetic data has been widely used to train large language models, but their generative nature inevitably introduces noisy, non-informative, and misleading learning signals. In this paper, we propose Montessori-Instruct, a novel data synthesis framework that tailors the data synthesis ability of the teacher language model toward the student language model's learning process. Specifically, we utilize local data influence of synthetic training data points on students to characterize students' learning preferences. Then, we train the teacher model with Direct Preference Optimization (DPO) to generate synthetic data tailored toward student learning preferences. Experiments with Llama3-8B-Instruct (teacher) and Llama3-8B (student) on Alpaca Eval and MT-Bench demonstrate that Montessori-Instruct significantly outperforms standard synthesis methods by 18.35\% and 46.24\% relatively. Our method also beats data synthesized by a stronger teacher model, GPT-4o. Further analysis confirms the benefits of teacher's learning to generate more influential training data in the student's improved learning, the advantages of local data influence in accurately measuring student preferences, and the robustness of Montessori-Instruct across different student models. Our code and data are open-sourced at https://github.com/cxcscmu/Montessori-Instruct.

Updated: 2024-10-18 06:50:15

标题: 蒙特梭利教学：生成为学生学习量身定制的有影响力的训练数据

摘要: 合成数据被广泛用于训练大型语言模型，但其生成性质不可避免地引入嘈杂、非信息性和误导性的学习信号。在本文中，我们提出了Montessori-Instruct，这是一个新颖的数据合成框架，它将教师语言模型的数据合成能力定制为学生语言模型的学习过程。具体来说，我们利用合成训练数据点对学生的局部数据影响来表征学生的学习偏好。然后，我们使用直接偏好优化（DPO）训练教师模型，生成符合学生学习偏好的合成数据。在Alpaca Eval和MT-Bench上使用Llama3-8B-Instruct（教师）和Llama3-8B（学生）进行实验表明，Montessori-Instruct的性能显著优于标准合成方法，相对提高了18.35%和46.24%。我们的方法还击败了由更强的教师模型GPT-4o合成的数据。进一步的分析证实了教师的学习有助于生成更具影响力的训练数据，局部数据影响在准确衡量学生偏好方面的优势，以及Montessori-Instruct在不同学生模型之间的稳健性。我们的代码和数据在https://github.com/cxcscmu/Montessori-Instruct 上开源。

更新时间: 2024-10-18 06:50:15

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14208v1

Flexi-Fuzz least squares SVM for Alzheimer's diagnosis: Tackling noise, outliers, and class imbalance

Alzheimer's disease (AD) is a leading neurodegenerative condition and the primary cause of dementia, characterized by progressive cognitive decline and memory loss. Its progression, marked by shrinkage in the cerebral cortex, is irreversible. Numerous machine learning algorithms have been proposed for the early diagnosis of AD. However, they often struggle with the issues of noise, outliers, and class imbalance. To tackle the aforementioned limitations, in this article, we introduce a novel, robust, and flexible membership scheme called Flexi-Fuzz. This scheme integrates a novel flexible weighting mechanism, class probability, and imbalance ratio. The proposed flexible weighting mechanism assigns the maximum weight to samples within a specific proximity to the center, with a gradual decrease in weight beyond a certain threshold. This approach ensures that samples near the class boundary still receive significant weight, maintaining their influence in the classification process. Class probability is used to mitigate the impact of noisy samples, while the imbalance ratio addresses class imbalance. Leveraging this, we incorporate the proposed Flexi-Fuzz membership scheme into the least squares support vector machines (LSSVM) framework, resulting in a robust and flexible model termed Flexi-Fuzz-LSSVM. We determine the class-center using two methods: the conventional mean approach and an innovative median approach, leading to two model variants, Flexi-Fuzz-LSSVM-I and Flexi-Fuzz-LSSVM-II. To validate the effectiveness of the proposed Flexi-Fuzz-LSSVM models, we evaluated them on benchmark UCI and KEEL datasets, both with and without label noise. Additionally, we tested the models on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset for AD diagnosis. Experimental results demonstrate the superiority of the Flexi-Fuzz-LSSVM models over baseline models.

Updated: 2024-10-18 06:47:39

标题: 弹性模糊最小二乘支持向量机用于阿尔茨海默病诊断：处理噪声、异常值和类别不平衡

摘要: 阿尔茨海默病（AD）是一种主要的神经退行性疾病，也是痴呆的主要原因，其特征是逐渐的认知能力下降和记忆丧失。其进展以大脑皮质萎缩为特征，是不可逆转的。为了早期诊断AD，已经提出了许多机器学习算法。然而，它们经常遇到噪声、异常值和类别不平衡等问题。为了解决上述限制，在本文中，我们引入了一种新颖、强大和灵活的成员方案，称为Flexi-Fuzz。这种方案整合了一种新颖的灵活加权机制、类概率和不平衡比例。建议的灵活加权机制将最大权重分配给与中心特定距离范围内的样本，超过一定阈值后权重逐渐减小。这种方法确保了接近类边界的样本仍然得到重要的权重，在分类过程中保持它们的影响力。类概率用于减轻嘈杂样本的影响，而不平衡比例解决了类别不平衡问题。利用这一点，我们将提出的Flexi-Fuzz成员方案整合到最小二乘支持向量机（LSSVM）框架中，形成一个强大而灵活的模型称为Flexi-Fuzz-LSSVM。我们使用两种方法确定类中心：传统的平均方法和创新的中位数方法，导致两种模型变体，Flexi-Fuzz-LSSVM-I和Flexi-Fuzz-LSSVM-II。为了验证提出的Flexi-Fuzz-LSSVM模型的有效性，我们在基准UCI和KEEL数据集上评估了它们，包括有标签噪声和没有标签噪声的情况。此外，我们在阿尔茨海默病神经影像研究倡议（ADNI）数据集上对模型进行了测试进行AD诊断。实验结果表明，Flexi-Fuzz-LSSVM模型优于基线模型。

更新时间: 2024-10-18 06:47:39

领域: cs.LG

下载: http://arxiv.org/abs/2410.14207v1

ScoreFusion: fusing score-based generative models via Kullback-Leibler barycenters

We introduce ScoreFusion, a theoretically grounded method for fusing multiple pre-trained diffusion models that are assumed to generate from auxiliary populations. ScoreFusion is particularly useful for enhancing the generative modeling of a target population with limited observed data. Our starting point considers the family of KL barycenters of the auxiliary populations, which is proven to be an optimal parametric class in the KL sense, but difficult to learn. Nevertheless, by recasting the learning problem as score matching in denoising diffusion, we obtain a tractable way of computing the optimal KL barycenter weights. We prove a dimension-free sample complexity bound in total variation distance, provided that the auxiliary models are well fitted for their own task and the auxiliary tasks combined capture the target well. We also explain a connection of the practice of checkpoint merging in AI art creation to an approximation of our KL-barycenter-based fusion approach. However, our fusion method differs in key aspects, allowing generation of new populations, as we illustrate in experiments.

Updated: 2024-10-18 06:40:37

标题: ScoreFusion：通过Kullback-Leibler质心融合基于分数的生成模型

摘要: 我们介绍了ScoreFusion，这是一种理论上基础的方法，用于融合多个预训练的扩散模型，这些模型被假定生成自辅助人口。ScoreFusion特别适用于增强目标人口的生成建模，这些人口的观察数据有限。我们的出发点是考虑辅助人口的KL重心族，这在KL意义上被证明是一种最优的参数类，但很难学习。然而，通过将学习问题重新构建为噪声扩散中的得分匹配，我们得到了一种可行的计算最优KL重心权重的方法。我们证明了在总变差距离上的无维样本复杂性界限，前提是辅助模型对其自身任务有很好的拟合，并且联合的辅助任务很好地捕捉了目标。我们还解释了在AI艺术创作中的检查点合并实践与我们基于KL重心的融合方法的近似之间的联系。然而，我们的融合方法在关键方面有所不同，允许生成新的人口，这在实验中得到了说明。

更新时间: 2024-10-18 06:40:37

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.19619v2

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques

Diffusion models are powerful generative models, and this capability can also be applied to discrimination. The inner activations of a pre-trained diffusion model can serve as features for discriminative tasks, namely, diffusion feature. We discover that diffusion feature has been hindered by a hidden yet universal phenomenon that we call content shift. To be specific, there are content differences between features and the input image, such as the exact shape of a certain object. We locate the cause of content shift as one inherent characteristic of diffusion models, which suggests the broad existence of this phenomenon in diffusion feature. Further empirical study also indicates that its negative impact is not negligible even when content shift is not visually perceivable. Hence, we propose to suppress content shift to enhance the overall quality of diffusion features. Specifically, content shift is related to the information drift during the process of recovering an image from the noisy input, pointing out the possibility of turning off-the-shelf generation techniques into tools for content shift suppression. We further propose a practical guideline named GATE to efficiently evaluate the potential benefit of a technique and provide an implementation of our methodology. Despite the simplicity, the proposed approach has achieved superior results on various tasks and datasets, validating its potential as a generic booster for diffusion features. Our code is available at https://github.com/Darkbblue/diffusion-content-shift.

Updated: 2024-10-18 06:39:27

标题: 抑制内容位移：通过现成的生成技术实现更好的扩散特性

摘要: 扩散模型是强大的生成模型，这种能力也可以应用于区分。预先训练的扩散模型的内部激活可以作为区分任务的特征，即扩散特征。我们发现，扩散特征受到一个被称为内容转移的隐藏但普遍现象的阻碍。具体来说，特征和输入图像之间存在内容差异，比如某个物体的确切形状。我们将内容转移的原因定位为扩散模型的一个固有特性，这表明了这种现象在扩散特征中的广泛存在。进一步的实证研究还表明，即使内容转移在视觉上不可感知，其负面影响也不可忽视。因此，我们提出抑制内容转移以提升扩散特征的整体质量。具体来说，内容转移与从嘈杂的输入中恢复图像的过程中的信息漂移有关，指出将现成的生成技术转化为内容转移抑制工具的可能性。我们进一步提出了一个名为GATE的实用指南，用于有效评估技术的潜在收益，并提供了我们方法的实现。尽管简单，所提出的方法在各种任务和数据集上均取得了优异的结果，验证了它作为扩散特征通用增强器的潜力。我们的代码可在https://github.com/Darkbblue/diffusion-content-shift 上找到。

更新时间: 2024-10-18 06:39:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.06719v3

FedSN: A Federated Learning Framework over Heterogeneous LEO Satellite Networks

Recently, a large number of Low Earth Orbit (LEO) satellites have been launched and deployed successfully in space by commercial companies, such as SpaceX. Due to multimodal sensors equipped by the LEO satellites, they serve not only for communication but also for various machine learning applications, such as space modulation recognition, remote sensing image classification, etc. However, the ground station (GS) may be incapable of downloading such a large volume of raw sensing data for centralized model training due to the limited contact time with LEO satellites (e.g. 5 minutes). Therefore, federated learning (FL) has emerged as the promising solution to address this problem via on-device training. Unfortunately, to enable FL on LEO satellites, we still face three critical challenges that are i) heterogeneous computing and memory capabilities, ii) limited uplink rate, and iii) model staleness. To this end, we propose FedSN as a general FL framework to tackle the above challenges, and fully explore data diversity on LEO satellites. Specifically, we first present a novel sub-structure scheme to enable heterogeneous local model training considering different computing, memory, and communication constraints on LEO satellites. Additionally, we propose a pseudo-synchronous model aggregation strategy to dynamically schedule model aggregation for compensating model staleness. To further demonstrate the effectiveness of the FedSN, we evaluate it using space modulation recognition and remote sensing image classification tasks by leveraging the data from real-world satellite networks. Extensive experimental results demonstrate that FedSN framework achieves higher accuracy, lower computing, and communication overhead than the state-of-the-art benchmarks and the effectiveness of each components in FedSN.

Updated: 2024-10-18 06:38:11

标题: FedSN：一个在异构低地球轨道卫星网络上的联合学习框架

摘要: 最近，许多低地球轨道（LEO）卫星已经由商业公司成功发射并在太空中部署，如SpaceX。由于LEO卫星配备了多模传感器，它们不仅用于通信，还用于各种机器学习应用，如空间调制识别、遥感图像分类等。然而，地面站（GS）可能无法下载如此大量的原始传感数据进行集中模型训练，因为与LEO卫星的接触时间有限（例如5分钟）。因此，联邦学习（FL）已成为通过设备上的训练解决这一问题的有希望的解决方案。不幸的是，为了在LEO卫星上实现FL，我们仍然面临三个关键挑战，即i）异构计算和存储能力，ii）有限的上行速率，以及iii）模型陈旧。为此，我们提出了FedSN作为一个通用的FL框架来解决上述挑战，并充分挖掘LEO卫星上的数据多样性。具体来说，我们首先提出了一种新颖的子结构方案，以实现对LEO卫星上的不同计算、存储和通信约束进行异构本地模型训练。此外，我们提出了一种伪同步模型聚合策略，动态调度模型聚合以弥补模型陈旧。为了进一步证明FedSN的有效性，我们利用来自真实卫星网络的数据评估了它在空间调制识别和遥感图像分类任务中的表现。大量的实验结果表明，FedSN框架比最先进的基准具有更高的准确度、更低的计算和通信开销，以及FedSN中每个组件的有效性。

更新时间: 2024-10-18 06:38:11

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2311.01483v5

Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs

Existing automated essay scoring (AES) has solely relied on essay text without using explanatory rationales for the scores, thereby forgoing an opportunity to capture the specific aspects evaluated by rubric indicators in a fine-grained manner. This paper introduces Rationale-based Multiple Trait Scoring (RMTS), a novel approach for multi-trait essay scoring that integrates prompt-engineering-based large language models (LLMs) with a fine-tuning-based essay scoring model using a smaller large language model (S-LLM). RMTS uses an LLM-based trait-wise rationale generation system where a separate LLM agent generates trait-specific rationales based on rubric guidelines, which the scoring model uses to accurately predict multi-trait scores. Extensive experiments on benchmark datasets, including ASAP, ASAP++, and Feedback Prize, show that RMTS significantly outperforms state-of-the-art models and vanilla S-LLMs in trait-specific scoring. By assisting quantitative assessment with fine-grained qualitative rationales, RMTS enhances the trait-wise reliability, providing partial explanations about essays.

Updated: 2024-10-18 06:35:17

标题: 文章标题翻译为：论文分数背后的理由：通过由LLMs生成的理由提升S-LLM的多特性论文评分

摘要: 现有的自动作文评分（AES）仅依赖于作文文本，而没有使用解释性评分理由，从而错失了捕捉用细粒度方式评估的评分指标的特定方面的机会。本文介绍了基于理由的多特征评分（RMTS），这是一种集成了基于提示工程的大型语言模型（LLMs）和使用较小的大型语言模型（S-LLM）进行微调的作文评分模型的新颖方法。RMTS使用基于LLM的特征化理由生成系统，其中一个单独的LLM代理根据评分标准生成特定特征的理由，评分模型使用这些理由来准确预测多特征得分。对包括ASAP、ASAP++和Feedback Prize在内的基准数据集进行了大量实验，结果显示RMTS在特定特征评分方面明显优于最先进的模型和基本S-LLMs。通过用细致的定性理由辅助定量评估，RMTS增强了特征可靠性，提供了关于作文的部分解释。

更新时间: 2024-10-18 06:35:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14202v1

Social Dynamics of Consumer Response: A Unified Framework Integrating Statistical Physics and Marketing Dynamics

Understanding how consumers react to advertising inputs is essential for marketers aiming to optimize advertising strategies and improve campaign effectiveness. This study examines the complex nature of consumer behaviour by applying theoretical frameworks derived from physics and social psychology. We present an innovative equation that captures the relation between spending on advertising and consumer response, using concepts such as symmetries, scaling laws, and phase transitions. By validating our equation against well-known models such as the Michaelis-Menten and Hill equations, we prove its effectiveness in accurately representing the complexity of consumer response dynamics. The analysis emphasizes the importance of key model parameters, such as marketing effectiveness, response sensitivity, and behavioural sensitivity, in influencing consumer behaviour. The work explores the practical implications for advertisers and marketers, as well as discussing the limitations and future research directions. In summary, this study provides a thorough framework for comprehending and forecasting consumer reactions to advertising, which has implications for optimizing advertising strategies and allocating resources.

Updated: 2024-10-18 06:33:19

标题: 消费者反应的社会动态：统一框架结合统计物理学和营销动态

摘要: 理解消费者对广告输入的反应对于旨在优化广告策略并提高广告效果的营销人员至关重要。本研究通过应用源自物理学和社会心理学的理论框架来探讨消费者行为的复杂性。我们提出了一个创新的方程，捕捉了广告支出与消费者反应之间的关系，使用了诸如对称性、标度律和相变等概念。通过将我们的方程验证与Michaelis-Menten和Hill方程等著名模型相对比，我们证明了其准确地表达了消费者反应动态的复杂性。分析强调了关键模型参数，如营销效果、反应敏感度和行为敏感度，对影响消费者行为具有重要作用。该研究探讨了广告商和营销人员的实际影响，同时讨论了局限性和未来研究方向。总之，本研究为理解和预测消费者对广告的反应提供了一个全面的框架，这对于优化广告策略和资源分配具有重要意义。

更新时间: 2024-10-18 06:33:19

领域: physics.soc-ph,cs.LG,q-fin.GN

下载: http://arxiv.org/abs/2404.02175v2

Supervised Chain of Thought

Large Language Models (LLMs) have revolutionized natural language processing and hold immense potential for advancing Artificial Intelligence. However, the core architecture of most mainstream LLMs -- the Transformer -- has inherent limitations in computational depth, rendering them theoretically incapable of solving many reasoning tasks that demand increasingly deep computations. Chain of Thought (CoT) prompting has emerged as a technique to address these architectural limitations, as evidenced by several theoretical studies. It offers a promising approach to solving complex reasoning tasks that were previously beyond the capabilities of these models. Despite its successes, CoT and its variants (such as Tree of Thought, Graph of Thought, etc.) rely on a "one-prompt-for-all" approach, using a single prompt structure (e.g., "think step by step") for a wide range of tasks -- from counting and sorting to solving mathematical and algorithmic problems. This approach poses significant challenges for models to generate the correct reasoning steps, as the model must navigate through a vast prompt template space to find the appropriate template for each task. In this work, we build upon previous theoretical analyses of CoT to demonstrate how the one-prompt-for-all approach can negatively affect the computability of LLMs. We partition the solution search space into two: the prompt space and the answer space. Our findings show that task-specific supervision is essential for navigating the prompt space accurately and achieving optimal performance. Through experiments with state-of-the-art LLMs, we reveal a gap in reasoning performance when supervision is applied versus when it is not.

Updated: 2024-10-18 06:25:27

标题: 受监督的思维链

摘要: 大型语言模型（LLMs）已经彻底改变了自然语言处理，并具有推动人工智能发展的巨大潜力。然而，大多数主流LLMs的核心架构——Transformer——在计算深度方面存在固有限制，理论上使它们无法解决许多需要越来越深层次计算的推理任务。Chain of Thought（CoT）提示作为一种技术出现，以解决这些架构限制，多个理论研究证实了这一点。它提供了一个有前途的方法来解决以前这些模型无法胜任的复杂推理任务。尽管取得了成功，但CoT及其变体（如Tree of Thought、Graph of Thought等）依赖于“一种提示适用于所有”的方法，使用单一提示结构（例如“一步一步地思考”）来解决从计数和排序到解决数学和算法问题等各种任务。这种方法对模型生成正确的推理步骤提出了重大挑战，因为模型必须在广泛的提示模板空间中导航，以找到每个任务的适当模板。在这项工作中，我们建立在以前对CoT的理论分析基础上，展示了一种“一种提示适用于所有”的方法如何负面影响LLMs的可计算性。我们将解决方案搜索空间分为两部分：提示空间和答案空间。我们的研究结果显示，任务特定的监督对于准确导航提示空间并实现最佳性能至关重要。通过与最先进的LLMs进行实验，我们揭示了在应用监督与不应用监督时推理性能的差距。

更新时间: 2024-10-18 06:25:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14198v1

Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features

Diffusion models are initially designed for image generation. Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks such as semantic segmentation. Given numerous activations, selecting a small yet effective subset poses a fundamental problem. To this end, the early study of this field performs a large-scale quantitative comparison of the discriminative ability of the activations. However, we find that many potential activations have not been evaluated, such as the queries and keys used to compute attention scores. Moreover, recent advancements in diffusion architectures bring many new activations, such as those within embedded ViT modules. Both combined, activation selection remains unresolved but overlooked. To tackle this issue, this paper takes a further step with a much broader range of activations evaluated. Considering the significant increase in activations, a full-scale quantitative comparison is no longer operational. Instead, we seek to understand the properties of these activations, such that the activations that are clearly inferior can be filtered out in advance via simple qualitative evaluation. After careful analysis, we discover three properties universal among diffusion models, enabling this study to go beyond specific models. On top of this, we present effective feature selection solutions for several popular diffusion models. Finally, the experiments across multiple discriminative tasks validate the superiority of our method over the SOTA competitors. Our code is available at https://github.com/Darkbblue/generic-diffusion-feature.

Updated: 2024-10-18 06:19:45

标题: 并非所有扩散模型激活都被评估为具有区分特征

摘要: 扩散模型最初是为图像生成而设计的。最近的研究表明，它们内部的信号，称为激活，也可以作为各种识别任务（如语义分割）的密集特征。鉴于大量的激活，选择一个小而有效的子集构成了一个基本问题。因此，这个领域的早期研究进行了大规模的激活识别能力定量比较。然而，我们发现许多潜在的激活尚未进行评估，比如用于计算注意力分数的查询和键。此外，最近对扩散架构的进展带来了许多新的激活，比如嵌入式ViT模块中的激活。综合考虑，激活选择仍未解决但被忽视。为了解决这个问题，本文进一步评估了更广泛范围的激活。考虑到激活数量的显著增加，全面的定量比较不再可行。相反，我们试图了解这些激活的特性，以便通过简单的定性评估事先过滤掉明显劣质的激活。经过仔细分析，我们发现扩散模型普遍具有三个特性，使本研究能够超越特定模型。除此之外，我们针对几种流行的扩散模型提出了有效的特征选择解决方案。最后，跨多个识别任务的实验证实了我们的方法优于SOTA竞争对手。我们的代码可在https://github.com/Darkbblue/generic-diffusion-feature获得。

更新时间: 2024-10-18 06:19:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.03558v3

Theories of synaptic memory consolidation and intelligent plasticity for continual learning

Humans and animals learn throughout life. Such continual learning is crucial for intelligence. In this chapter, we examine the pivotal role plasticity mechanisms with complex internal synaptic dynamics could play in enabling this ability in neural networks. By surveying theoretical research, we highlight two fundamental enablers for continual learning. First, synaptic plasticity mechanisms must maintain and evolve an internal state over several behaviorally relevant timescales. Second, plasticity algorithms must leverage the internal state to intelligently regulate plasticity at individual synapses to facilitate the seamless integration of new memories while avoiding detrimental interference with existing ones. Our chapter covers successful applications of these principles to deep neural networks and underscores the significance of synaptic metaplasticity in sustaining continual learning capabilities. Finally, we outline avenues for further research to understand the brain's superb continual learning abilities and harness similar mechanisms for artificial intelligence systems.

Updated: 2024-10-18 06:15:42

标题: 突触记忆巩固理论和智能可塑性理论对持续学习的影响

摘要: 人类和动物在整个生命中都在学习。这种持续学习对智力至关重要。在本章中，我们研究了复杂内部突触动力学的可塑性机制可能在神经网络中实现这种能力的关键作用。通过调查理论研究，我们强调了持续学习的两个基本因素。首先，突触可塑性机制必须在几个行为相关时间尺度上维持和演变内部状态。其次，可塑性算法必须利用内部状态智能地调节单个突触的可塑性，以促进新记忆的无缝整合，同时避免对现有记忆造成不利干扰。我们的章节涵盖了这些原则在深度神经网络中的成功应用，并强调了突触元可塑性在维持持续学习能力方面的重要性。最后，我们概述了进一步研究的途径，以了解大脑出色的持续学习能力，并利用类似的机制来开发人工智能系统。

更新时间: 2024-10-18 06:15:42

领域: q-bio.NC,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.16922v2

Speciesism in Natural Language Processing Research

Natural Language Processing (NLP) research on AI Safety and social bias in AI has focused on safety for humans and social bias against human minorities. However, some AI ethicists have argued that the moral significance of nonhuman animals has been ignored in AI research. Therefore, the purpose of this study is to investigate whether there is speciesism, i.e., discrimination against nonhuman animals, in NLP research. First, we explain why nonhuman animals are relevant in NLP research. Next, we survey the findings of existing research on speciesism in NLP researchers, data, and models and further investigate this problem in this study. The findings of this study suggest that speciesism exists within researchers, data, and models, respectively. Specifically, our survey and experiments show that (a) among NLP researchers, even those who study social bias in AI, do not recognize speciesism or speciesist bias; (b) among NLP data, speciesist bias is inherent in the data annotated in the datasets used to evaluate NLP models; (c) OpenAI GPTs, recent NLP models, exhibit speciesist bias by default. Finally, we discuss how we can reduce speciesism in NLP research.

Updated: 2024-10-18 06:09:41

标题: 《自然语言处理研究中的物种主义》

摘要: 自然语言处理（NLP）研究关注人工智能安全和人工智能中的社会偏见，主要集中在人类的安全和对人类少数群体的社会偏见。然而，一些人工智能伦理学家认为，在人工智能研究中忽视了非人类动物的道德重要性。因此，本研究的目的是调查在NLP研究中是否存在种族主义，即对非人类动物的歧视。首先，我们解释了为什么非人类动物在NLP研究中是相关的。接下来，我们调查了现有研究中关于NLP研究人员、数据和模型中种族主义的发现，并在本研究中进一步调查这一问题。本研究的发现表明，研究人员、数据和模型中分别存在种族主义。具体地，我们的调查和实验表明：（a）在NLP研究人员中，即使是研究人工智能社会偏见的人员，也没有意识到种族主义或种族主义偏见；（b）在NLP数据中，存在着天生的种族主义偏见，这些数据被用于评估NLP模型；（c）OpenAI GPTs，最近的NLP模型，默认情况下存在种族主义偏见。最后，我们讨论了如何减少NLP研究中的种族主义。

更新时间: 2024-10-18 06:09:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14194v1

xPerT: Extended Persistence Transformer

A persistence diagram provides a compact summary of persistent homology, which captures the topological features of a space at different scales. However, due to its nature as a set, incorporating it as a feature into a machine learning framework is challenging. Several methods have been proposed to use persistence diagrams as input for machine learning models, but they often require complex preprocessing steps and extensive hyperparameter tuning. In this paper, we propose a novel transformer architecture called the \textit{Extended Persistence Transformer (xPerT)}, which is highly scalable than the compared to Persformer, an existing transformer for persistence diagrams. xPerT reduces GPU memory usage by over 90\% and improves accuracy on multiple datasets. Additionally, xPerT does not require complex preprocessing steps or extensive hyperparameter tuning, making it easy to use in practice. Our code is available at https://github.com/sehunfromdaegu/ECG_JEPA.

Updated: 2024-10-18 06:07:22

标题: xPerT：扩展持久性转换器

摘要: 持久图提供了对持久同调的紧凑摘要，捕捉了空间在不同尺度上的拓扑特征。然而，由于其作为一个集合的性质，将其作为特征整合到机器学习框架中是具有挑战性的。已经提出了几种方法来将持久图作为机器学习模型的输入，但它们通常需要复杂的预处理步骤和广泛的超参数调整。在本文中，我们提出了一种名为Extended Persistence Transformer（xPerT）的新型变压器架构，比现有的持续图变压器Persformer具有更高的可扩展性。xPerT将GPU内存使用率降低了超过90％，并在多个数据集上提高了准确性。此外，xPerT不需要复杂的预处理步骤或广泛的超参数调整，使其在实践中易于使用。我们的代码可在https://github.com/sehunfromdaegu/ECG_JEPA 上找到。

更新时间: 2024-10-18 06:07:22

领域: cs.LG,math.AT

下载: http://arxiv.org/abs/2410.14193v1

FINED: Feed Instance-Wise Information Need with Essential and Disentangled Parametric Knowledge from the Past

Recommender models play a vital role in various industrial scenarios, while often faced with the catastrophic forgetting problem caused by the fast shifting data distribution. To alleviate this problem, a common approach is to reuse knowledge from the historical data. However, preserving the vast and fast-accumulating data is hard, which causes dramatic storage overhead. Memorizing old data through a parametric knowledge base is then proposed, which compresses the vast amount of raw data into model parameters. Despite the flexibility, how to improve the memorization and generalization capabilities of the parametric knowledge base and suit the flexible information need of each instance are challenging. In this paper, we propose FINED to Feed INstance-wise information need with Essential and Disentangled parametric knowledge from past data for recommendation enhancement. Concretely, we train a knowledge extractor that extracts knowledge patterns of arbitrary order from past data and a knowledge encoder that memorizes the arbitrary order patterns, which serves as the retrieval key generator and memory network respectively in the following knowledge reusing phase. The whole process is regularized by the proposed two constraints, which improve the capabilities of the parametric knowledge base without increasing the size of it. The essential principle helps to compress the input into representative vectors that capture the task-relevant information and filter out the noisy information. The disentanglement principle reduces the redundancy of stored information and pushes the knowledge base to focus on capturing the disentangled invariant patterns. These two rules together promote rational compression of information for robust and generalized knowledge representations. Extensive experiments on two datasets justify the effectiveness of the proposed method.

Updated: 2024-10-18 06:07:06

标题: FINED：用过去的基本和解耦参数化知识来满足每个喂养实例的信息需求

摘要: 推荐模型在各种工业场景中发挥着重要作用，但常常面临由于数据分布快速变化而引起的灾难性遗忘问题。为了缓解这个问题，一种常见的方法是重用来自历史数据的知识。然而，保留大量且快速积累的数据很困难，这导致了巨大的存储开销。因此，提出了通过参数化知识库记忆旧数据的方法，将大量原始数据压缩成模型参数。尽管灵活性很强，但如何提高参数化知识库的记忆和泛化能力，并适应每个实例的灵活信息需求，是具有挑战性的。在本文中，我们提出了FINED方法，用于通过从过去数据中提取的基本和解耦的参数化知识来满足推荐增强的实例化信息需求。具体地，我们训练一个知识提取器，从过去数据中提取任意顺序的知识模式，以及一个知识编码器，记忆任意顺序的模式，分别在后续知识重用阶段充当检索关键生成器和记忆网络。整个过程通过提出的两个约束进行规范化，提高了参数化知识库的能力，而不增加其大小。基本原则有助于将输入压缩成捕获任务相关信息并过滤掉噪声信息的代表性向量。解耦原则减少存储信息的冗余，并推动知识库专注于捕获解耦的不变模式。这两个规则共同促进了信息的合理压缩，以实现稳健和泛化的知识表示。对两个数据集进行的大量实验证明了所提出方法的有效性。

更新时间: 2024-10-18 06:07:06

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.00012v2

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

Recent advancements in Large Language Models(LLMs) have been notable, yet widespread enterprise adoption remains limited due to various constraints. This paper examines bias in LLMs-a crucial issue affecting their usability, reliability, and fairness. Researchers are developing strategies to mitigate bias, including debiasing layers, specialized reference datasets like Winogender and Winobias, and reinforcement learning with human feedback (RLHF). These techniques have been integrated into the latest LLMs. Our study evaluates gender bias in occupational scenarios and gender, age, and racial bias in crime scenarios across four leading LLMs released in 2024: Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o. Findings reveal that LLMs often depict female characters more frequently than male ones in various occupations, showing a 37% deviation from US BLS data. In crime scenarios, deviations from US FBI data are 54% for gender, 28% for race, and 17% for age. We observe that efforts to reduce gender and racial bias often lead to outcomes that may over-index one sub-class, potentially exacerbating the issue. These results highlight the limitations of current bias mitigation techniques and underscore the need for more effective approaches.

Updated: 2024-10-18 05:41:03

标题: 评估大型语言模型中的性别、种族和年龄偏见：职业和犯罪情景的比较分析

摘要: 最近大规模语言模型(LLMs)的进展值得注意，但由于各种限制，广泛企业采用仍然有限。本文研究了LLMs中的偏见-这是影响它们可用性、可靠性和公平性的关键问题。研究人员正在开发缓解偏见的策略，包括去偏见层、专门的参考数据集如Winogender和Winobias，以及带有人类反馈的强化学习（RLHF）。这些技术已经集成到最新的LLMs中。我们的研究评估了2024年发布的四个领先LLMs中的职业场景中的性别偏见以及犯罪场景中的性别、年龄和种族偏见：Gemini 1.5 Pro、Llama 3 70B、Claude 3 Opus和GPT-4o。研究结果显示，LLMs在各种职业中更频繁地描绘女性角色，相较于美国劳工统计局数据有37%的偏差。在犯罪场景中，与美国联邦调查局数据相比，性别偏差为54%，种族偏差为28%，年龄偏差为17%。我们观察到，减少性别和种族偏见的努力往往会导致可能过度指数化某个子类的结果，可能加剧问题。这些结果突显了目前偏见缓解技术的局限性，并强调了需要更有效方法的必要性。

更新时间: 2024-10-18 05:41:03

领域: cs.AI

下载: http://arxiv.org/abs/2409.14583v2

Combining Hough Transform and Deep Learning Approaches to Reconstruct ECG Signals From Printouts

This work presents our team's (SignalSavants) winning contribution to the 2024 George B. Moody PhysioNet Challenge. The Challenge had two goals: reconstruct ECG signals from printouts and classify them for cardiac diseases. Our focus was the first task. Despite many ECGs being digitally recorded today, paper ECGs remain common throughout the world. Digitising them could help build more diverse datasets and enable automated analyses. However, the presence of varying recording standards and poor image quality requires a data-centric approach for developing robust models that can generalise effectively. Our approach combines the creation of a diverse training set, Hough transform to rotate images, a U-Net based segmentation model to identify individual signals, and mask vectorisation to reconstruct the signals. We assessed the performance of our models using the 10-fold stratified cross-validation (CV) split of 21,799 recordings proposed by the PTB-XL dataset. On the digitisation task, our model achieved an average CV signal-to-noise ratio of 17.02 and an official Challenge score of 12.15 on the hidden set, securing first place in the competition. Our study shows the challenges of building robust, generalisable, digitisation approaches. Such models require large amounts of resources (data, time, and computational power) but have great potential in diversifying the data available.

Updated: 2024-10-18 05:36:24

标题: 将霍夫变换和深度学习方法结合以从打印输出中重建心电图信号

摘要: 这项工作展示了我们团队（SignalSavants）在2024年乔治·B·穆迪PhysioNet挑战中获胜的贡献。挑战有两个目标：从打印输出中重建心电图信号并对其进行心脏疾病分类。我们的焦点是第一个任务。尽管今天许多心电图都是数字化记录的，纸质心电图在全球范围内仍然很常见。数字化它们可以帮助构建更多样化的数据集并实现自动化分析。然而，由于不同的记录标准和图像质量差，需要一种以数据为中心的方法来开发能够有效泛化的强大模型。我们的方法结合了创建多样化训练集、霍夫变换来旋转图像、基于U-Net的分割模型来识别单个信号，以及掩模矢量化来重建信号。我们使用PTB-XL数据集提出的21,799个记录的10折分层交叉验证（CV）来评估我们模型的性能。在数字化任务上，我们的模型在隐藏集上实现了平均CV信噪比为17.02，官方挑战得分为12.15，获得了竞赛的第一名。我们的研究展示了构建强大、泛化的数字化方法的挑战。这种模型需要大量资源（数据、时间和计算能力），但在丰富可用数据方面具有巨大潜力。

更新时间: 2024-10-18 05:36:24

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2410.14185v1

Provable In-context Learning for Mixture of Linear Regressions using Transformers

We theoretically investigate the in-context learning capabilities of transformers in the context of learning mixtures of linear regression models. For the case of two mixtures, we demonstrate the existence of transformers that can achieve an accuracy, relative to the oracle predictor, of order $\mathcal{\tilde{O}}((d/n)^{1/4})$ in the low signal-to-noise ratio (SNR) regime and $\mathcal{\tilde{O}}(\sqrt{d/n})$ in the high SNR regime, where $n$ is the length of the prompt, and $d$ is the dimension of the problem. Additionally, we derive in-context excess risk bounds of order $\mathcal{O}(L/\sqrt{B})$, where $B$ denotes the number of (training) prompts, and $L$ represents the number of attention layers. The order of $L$ depends on whether the SNR is low or high. In the high SNR regime, we extend the results to $K$-component mixture models for finite $K$. Extensive simulations also highlight the advantages of transformers for this task, outperforming other baselines such as the Expectation-Maximization algorithm.

Updated: 2024-10-18 05:28:47

标题: 使用Transformer证明混合线性回归的上下文学习

摘要: 我们在学习线性回归模型混合物的情境学习能力方面理论上研究了transformers。对于两种混合物的情况，我们证明存在可以在低信噪比（SNR）区域达到与预测器相对精度为$\mathcal{\tilde{O}}((d/n)^{1/4})$的transformers，并且在高信噪比区域为$\mathcal{\tilde{O}(\sqrt{d/n})}$，其中$n$是提示的长度，$d$是问题的维度。此外，我们推导了关于情境下超出风险边界的顺序为$\mathcal{O}(L/\sqrt{B})$，其中$B$表示（训练）提示的数量，$L$表示注意力层的数量。$L$的顺序取决于SNR是低还是高。在高SNR区域，我们将结果扩展到有限$K$个成分混合模型。广泛的模拟还突显了transformers在这个任务中的优势，优于其他基线方法，如期望最大化算法。

更新时间: 2024-10-18 05:28:47

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.14183v1

Fisher Information-based Efficient Curriculum Federated Learning with Large Language Models

As a promising paradigm to collaboratively train models with decentralized data, Federated Learning (FL) can be exploited to fine-tune Large Language Models (LLMs). While LLMs correspond to huge size, the scale of the training data significantly increases, which leads to tremendous amounts of computation and communication costs. The training data is generally non-Independent and Identically Distributed (non-IID), which requires adaptive data processing within each device. Although Low Rank Adaptation (LoRA) can significantly reduce the scale of parameters to update in the fine-tuning process, it still takes unaffordable time to transfer the low-rank parameters of all the layers in LLMs. In this paper, we propose a Fisher Information-based Efficient Curriculum Federated Learning framework (FibecFed) with two novel methods, i.e., adaptive federated curriculum learning and efficient sparse parameter update. First, we propose a fisher information-based method to adaptively sample data within each device to improve the effectiveness of the FL fine-tuning process. Second, we dynamically select the proper layers for global aggregation and sparse parameters for local update with LoRA so as to improve the efficiency of the FL fine-tuning process. Extensive experimental results based on 10 datasets demonstrate that FibecFed yields excellent performance (up to 45.35% in terms of accuracy) and superb fine-tuning speed (up to 98.61% faster) compared with 17 baseline approaches).

Updated: 2024-10-18 05:22:02

标题: 基于费舍尔信息的高效课程式联邦学习与大型语言模型

摘要: 作为一种有前途的范例，分布式数据协作训练模型的联邦学习（FL）可以被利用来对大型语言模型（LLMs）进行微调。虽然LLMs对应巨大的规模，但训练数据的规模显著增加，导致了巨大的计算和通信成本。训练数据通常是非独立同分布的（non-IID），这要求每个设备内进行自适应数据处理。虽然低秩适应（LoRA）可以显著减少在微调过程中需要更新的参数规模，但仍需要支付不可承受的时间来传输所有LLMs层中的低秩参数。在本文中，我们提出了一种基于Fisher信息的高效课程联邦学习框架（FibecFed），其中包括两种新方法，即自适应联邦课程学习和高效稀疏参数更新。首先，我们提出了一种基于Fisher信息的方法，以在每个设备内自适应地抽样数据，以提高FL微调过程的有效性。其次，我们使用LoRA动态选择适当的层进行全局聚合和稀疏参数进行局部更新，从而提高FL微调过程的效率。基于10个数据集的广泛实验结果表明，与17个基线方法相比，FibecFed在准确性方面取得了出色的表现（高达45.35%），并且微调速度也非常快（高达98.61%）。

更新时间: 2024-10-18 05:22:02

领域: cs.LG,cs.AI,cs.CL,cs.DC

下载: http://arxiv.org/abs/2410.00131v2

LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Laboratory accidents pose significant risks to human life and property, underscoring the importance of robust safety protocols. Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices. With the increasing reliance on large language models (LLMs) for guidance in various fields, including laboratory settings, there is a growing concern about their reliability in critical safety-related decision-making. Unlike trained human researchers, LLMs lack formal lab safety education, raising questions about their ability to provide safe and accurate guidance. Existing research on LLM trustworthiness primarily focuses on issues such as ethical compliance, truthfulness, and fairness but fails to fully cover safety-critical real-world applications, like lab safety. To address this gap, we propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive evaluation framework based on a new taxonomy aligned with Occupational Safety and Health Administration (OSHA) protocols. This benchmark includes 765 multiple-choice questions verified by human experts, assessing LLMs and vision language models (VLMs) performance in lab safety contexts. Our evaluations demonstrate that while GPT-4o outperforms human participants, it is still prone to critical errors, highlighting the risks of relying on LLMs in safety-critical environments. Our findings emphasize the need for specialized benchmarks to accurately assess the trustworthiness of LLMs in real-world safety applications.

Updated: 2024-10-18 05:21:05

标题: 实验室安全工作台：在科学实验室安全问题上对LLMs进行基准测试

摘要: 实验室事故对人类生命和财产构成重大风险，突显了健全安全协议的重要性。尽管安全培训取得了进展，实验室人员仍可能无意中从事不安全的做法。随着在各个领域，包括实验室环境中，对大型语言模型（LLMs）指导的依赖增加，人们越来越担心它们在关键安全决策中的可靠性。与经过培训的人类研究人员不同，LLMs缺乏正式的实验室安全教育，这引发了对它们提供安全和准确指导能力的质疑。现有关于LLMs可信度的研究主要集中在伦理合规性、真实性和公平性等问题上，但未能完全涵盖实验室安全等安全关键的实际应用。为了填补这一空白，我们提出了实验室安全基准（LabSafety Bench），这是一个基于与职业安全和健康管理局（OSHA）协议一致的新分类法的综合评估框架。该基准包括由人类专家验证的765道多项选择题，评估LLMs和视觉语言模型（VLMs）在实验室安全环境中的表现。我们的评估表明，虽然GPT-4o优于人类参与者，但仍容易出现关键错误，突显了在安全关键环境中依赖LLMs的风险。我们的研究结果强调了需要专门的基准来准确评估LLMs在实际安全应用中的可信度。

更新时间: 2024-10-18 05:21:05

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14182v1

ViLCo-Bench: VIdeo Language COntinual learning Benchmark

Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model's ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark, ViLCo-Bench, designed to evaluate continual learning models across a range of video-text tasks. The dataset comprises ten-minute-long videos and corresponding language queries collected from publicly available datasets. Additionally, we introduce a novel memory-efficient framework that incorporates self-supervised learning and mimics long-term and short-term memory effects. This framework addresses challenges including memory complexity from long video clips, natural language complexity from open queries, and text-video misalignment. We posit that ViLCo-Bench, with greater complexity compared to existing continual learning benchmarks, would serve as a critical tool for exploring the video-language domain, extending beyond conventional class-incremental tasks, and addressing complex and limited annotation issues. The curated data, evaluations, and our novel method are available at https://github.com/cruiseresearchgroup/ViLCo.

Updated: 2024-10-18 05:20:34

标题: ViLCo-Bench: 视频语言持续学习基准测试

摘要: 视频语言持续学习涉及不断适应来自视频和文本输入的信息，增强模型处理新任务的能力同时保留先前知识。这个领域是一个相对未被充分开发的领域，建立适当的数据集对促进这一领域的交流和研究至关重要。在这项研究中，我们提出了第一个专门的基准测试ViLCo-Bench，旨在评估跨多种视频文本任务的持续学习模型。该数据集包括从公开可用数据集中收集的长达十分钟的视频和相应的语言查询。此外，我们引入了一种新颖的内存高效框架，结合自监督学习并模拟长期和短期记忆效应。这个框架解决了来自长视频片段的内存复杂性、来自开放查询的自然语言复杂性以及文本视频不对齐等挑战。我们认为，与现有的持续学习基准测试相比，ViLCo-Bench具有更大的复杂性，将成为探索视频语言领域的重要工具，超越传统的类增量任务，并解决复杂和受限注释问题。精心策划的数据、评估和我们的新方法可在https://github.com/cruiseresearchgroup/ViLCo上找到。

更新时间: 2024-10-18 05:20:34

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.13123v2

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Scientific literature understanding is crucial for extracting targeted information and garnering insights, thereby significantly advancing scientific discovery. Despite the remarkable success of Large Language Models (LLMs), they face challenges in scientific literature understanding, primarily due to (1) a lack of scientific knowledge and (2) unfamiliarity with specialized scientific tasks. To develop an LLM specialized in scientific literature understanding, we propose a hybrid strategy that integrates continual pre-training (CPT) and supervised fine-tuning (SFT), to simultaneously infuse scientific domain knowledge and enhance instruction-following capabilities for domain-specific tasks.cIn this process, we identify two key challenges: (1) constructing high-quality CPT corpora, and (2) generating diverse SFT instructions. We address these challenges through a meticulous pipeline, including PDF text extraction, parsing content error correction, quality filtering, and synthetic instruction creation. Applying this strategy, we present a suite of LLMs: SciLitLLM, specialized in scientific literature understanding. These models demonstrate promising performance on scientific literature understanding benchmarks. Our contributions are threefold: (1) We present an effective framework that integrates CPT and SFT to adapt LLMs to scientific literature understanding, which can also be easily adapted to other domains. (2) We propose an LLM-based synthesis method to generate diverse and high-quality scientific instructions, resulting in a new instruction set -- SciLitIns -- for supervised fine-tuning in less-represented scientific domains. (3) SciLitLLM achieves promising performance improvements on scientific literature understanding benchmarks.

Updated: 2024-10-18 05:04:53

标题: SciLitLLM：如何调整LLM以理解科学文献

摘要: 科学文献理解对于提取目标信息和获得见解至关重要，从而显著推动科学发现。尽管大型语言模型（LLMs）取得了显著成功，但它们在科学文献理解方面面临挑战，主要是由于（1）缺乏科学知识和（2）对专门科学任务的陌生。为了开发一种专门用于科学文献理解的LLM，我们提出了一种混合策略，将持续预训练（CPT）和监督微调（SFT）相结合，以同时注入科学领域知识并增强领域特定任务的指导遵循能力。在这个过程中，我们确定了两个关键挑战：（1）构建高质量的CPT语料库，和（2）生成多样化的SFT指令。通过细致的流程，包括PDF文本提取、解析内容错误校正、质量过滤和合成指导创建，我们解决了这些挑战。应用这一策略，我们提出了一系列LLM：SciLitLLM，专门用于科学文献理解。这些模型在科学文献理解基准测试中表现出有希望的性能。我们的贡献有三个方面：（1）我们提出了一个有效的框架，将CPT和SFT整合在一起，以适应LLMs对科学文献理解的需求，这也可以轻松适应其他领域。（2）我们提出了一种基于LLM的合成方法，生成多样化和高质量的科学指导，从而产生一组新的指令集--SciLitIns--用于在较少代表的科学领域进行监督微调。（3）SciLitLLM在科学文献理解基准测试中实现了有希望的性能提升。

更新时间: 2024-10-18 05:04:53

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2408.15545v3

Explainable Graph Neural Networks Under Fire

Predictions made by graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs. In an attempt to tackle this, many GNN explanation methods have emerged. Their goal is to explain a model's predictions and thereby obtain trust when GNN models are deployed in decision critical applications. Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes. In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations. That is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations. This calls into question the trustworthiness and practical utility of post-hoc explanation methods for GNNs. To be able to attack GNN explanation models, we devise a novel attack method dubbed \textit{GXAttack}, the first \textit{optimization-based} adversarial white-box attack method for post-hoc GNN explanations under such settings. Due to the devastating effectiveness of our attack, we call for an adversarial evaluation of future GNN explainers to demonstrate their robustness. For reproducibility, our code is available via GitHub.

Updated: 2024-10-18 05:03:08

标题: 可解释的图神经网络受到质疑

摘要: 图神经网络（GNNs）做出的预测通常缺乏解释性，这是因为它们复杂的计算行为和图的抽象性质。为了解决这个问题，许多GNN解释方法应运而生。它们的目标是解释模型的预测，从而在GNN模型部署在决策关键应用程序时获得信任。大多数GNN解释方法是事后工作的，并以重要边缘和/或节点的小子集的形式提供解释。在本文中，我们展示了这些解释不可信，因为常见的GNN解释方法很容易受到对抗性扰动的影响。也就是说，即使保留模型预测的原始图结构发生微小扰动，可能会得到完全不同的解释。这对于GNN的事后解释方法的可信度和实际效用提出了质疑。为了能够攻击GNN解释模型，我们设计了一种新的攻击方法，名为GXAttack，这是第一种基于优化的针对事后GNN解释的白盒攻击方法。由于我们攻击的毁灭性有效性，我们呼吁对未来的GNN解释器进行对抗性评估，以展示它们的稳健性。为了可重复性，我们的代码可以通过GitHub获得。

更新时间: 2024-10-18 05:03:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06417v2

CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only

Software robots have long been used in Robotic Process Automation (RPA) to automate mundane and repetitive computer tasks. With the advent of Large Language Models (LLMs) and their advanced reasoning capabilities, these agents are now able to handle more complex or previously unseen tasks. However, LLM-based automation techniques in recent literature frequently rely on HTML source code for input or application-specific API calls for actions, limiting their applicability to specific environments. We propose an LLM-based agent that mimics human behavior in solving computer tasks. It perceives its environment solely through screenshot images, which are then converted into text for an LLM to process. By leveraging the reasoning capability of the LLM, we eliminate the need for large-scale human demonstration data typically required for model training. The agent only executes keyboard and mouse operations on Graphical User Interface (GUI), removing the need for pre-provided APIs to function. To further enhance the agent's performance in this setting, we propose a novel prompting strategy called Context-Aware Action Planning (CAAP) prompting, which enables the agent to thoroughly examine the task context from multiple perspectives. Our agent achieves an average success rate of 94.5% on MiniWoB++ and an average task score of 62.3 on WebShop, outperforming all previous studies of agents that rely solely on screen images. This method demonstrates potential for broader applications, particularly for tasks requiring coordination across multiple applications on desktops or smartphones, marking a significant advancement in the field of automation agents. Codes and models are accessible at https://github.com/caap-agent/caap-agent.

Updated: 2024-10-18 05:01:07

标题: CAAP：具有前端UI的上下文感知行动规划提示，用于解决计算机任务

摘要: 软件机器人长期以来被用于机器人流程自动化（RPA）来自动化乏味和重复的计算机任务。随着大型语言模型（LLMs）及其先进的推理能力的出现，这些代理现在能够处理更复杂或以前未见过的任务。然而，最近文献中基于LLM的自动化技术经常依赖HTML源代码作为输入或特定应用程序的API调用作为动作，限制了它们适用于特定环境的能力。我们提出了一种基于LLM的代理，模仿人类在解决计算机任务中的行为。它仅通过屏幕截图图像来感知其环境，然后将这些图像转换为文本供LLM处理。通过利用LLM的推理能力，我们消除了通常需要大规模人类演示数据来训练模型的需要。该代理只在图形用户界面（GUI）上执行键盘和鼠标操作，消除了需要预提供API来运行的需要。为了进一步提高代理在这种环境中的性能，我们提出了一种称为上下文感知行动规划（CAAP）提示的新颖提示策略，使代理能够从多个角度彻底审查任务背景。我们的代理在MiniWoB++上实现了94.5%的平均成功率，在WebShop上实现了62.3的平均任务分数，超过了所有仅依赖屏幕图像的代理的先前研究。这种方法展示了更广泛应用的潜力，特别是对于需要在桌面或智能手机上跨多个应用程序协调的任务，标志着自动代理领域的重大进步。代码和模型可在https://github.com/caap-agent/caap-agent上访问。

更新时间: 2024-10-18 05:01:07

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.06947v2

Auto Detecting Cognitive Events Using Machine Learning on Pupillary Data

Assessing cognitive workload is crucial for human performance as it affects information processing, decision making, and task execution. Pupil size is a valuable indicator of cognitive workload, reflecting changes in attention and arousal governed by the autonomic nervous system. Cognitive events are closely linked to cognitive workload as they activate mental processes and trigger cognitive responses. This study explores the potential of using machine learning to automatically detect cognitive events experienced using individuals. We framed the problem as a binary classification task, focusing on detecting stimulus onset across four cognitive tasks using CNN models and 1-second pupillary data. The results, measured by Matthew's correlation coefficient, ranged from 0.47 to 0.80, depending on the cognitive task. This paper discusses the trade-offs between generalization and specialization, model behavior when encountering unseen stimulus onset times, structural variances among cognitive tasks, factors influencing model predictions, and real-time simulation. These findings highlight the potential of machine learning techniques in detecting cognitive events based on pupil and eye movement responses, contributing to advancements in personalized learning and optimizing neurocognitive workload management.

Updated: 2024-10-18 04:54:46

标题: 使用机器学习在瞳孔数据上自动检测认知事件

摘要: 评估认知负荷对人类表现至关重要，因为它影响信息处理、决策制定和任务执行。瞳孔大小是认知负荷的宝贵指标，反映了由自主神经系统控制的注意力和唤醒状态的变化。认知事件与认知负荷密切相关，因为它们激活了心理过程并触发认知反应。本研究探讨了利用机器学习自动检测个体经历的认知事件的潜力。我们将问题构建为一个二元分类任务，重点关注使用CNN模型和1秒瞳孔数据在四个认知任务中检测刺激开始。根据马修斯相关系数测量的结果，取决于认知任务，范围从0.47到0.80不等。本文讨论了泛化和专业化之间的权衡、模型在遇到未知刺激开始时间时的行为、认知任务之间的结构变异、影响模型预测的因素以及实时模拟。这些发现突显了利用机器学习技术基于瞳孔和眼动反应检测认知事件的潜力，有助于推动个性化学习的进步和优化神经认知负荷管理。

更新时间: 2024-10-18 04:54:46

领域: cs.LG,cs.HC,q-bio.NC

下载: http://arxiv.org/abs/2410.14174v1

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces \textit{LatentExplainer}, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. \textit{LatentExplainer} tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. Our approach perturbs latent variables, interpreting changes in generated data, and uses multi-modal large language models (MLLMs) to produce human-understandable explanations. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations for latent variables. The results highlight the effectiveness of incorporating inductive biases and uncertainty quantification, significantly enhancing model interpretability.

Updated: 2024-10-18 04:39:35

标题: 潜在解释器：使用多模态基础模型解释深度生成模型中的潜在表示

摘要: 深度生成模型如VAE和扩散模型通过利用潜在变量来学习数据分布并生成高质量样本，推动了各种生成任务的发展。尽管可解释人工智能领域在解释机器学习模型方面取得了进展，但理解生成模型中的潜在变量仍然具有挑战性。本文介绍了一种名为LatentExplainer的框架，用于自动生成深度生成模型中潜在变量的语义有意义的解释。LatentExplainer解决了三个主要挑战：推断潜在变量的含义，将解释与归纳偏差对齐，以及处理不同程度的可解释性。我们的方法扰动潜在变量，解释生成数据的变化，并利用多模态大型语言模型(MLLMs)生成人类可理解的解释。我们在几个真实和合成数据集上评估了我们提出的方法，结果表明在生成潜在变量的高质量解释方面表现出卓越性能。结果突出了将归纳偏差和不确定性量化纳入模型解释中的有效性，显著提升了模型的可解释性。

更新时间: 2024-10-18 04:39:35

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.14862v4

Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patterns

LLMs' performance on complex tasks is still unsatisfactory. A key issue is that presently LLMs learn in a data-driven schema, while the instructions about these complex tasks are both scarce and hard to collect or construct. On the contrary, a prominent phenomenon is that LLMs can learn rather fast on simpler tasks with adequate prior knowledge captured during pretraining stage. Thus, if the prerequisite and mechanism of such rapid generalization could be elucidated, it could enhance the efficiency and effectiveness of the LLM's ability to learn complex tasks. Thus, in this paper, we employ a gradient-based method, to dissect the process that the SFT process adapts LLMs to downstream tasks via the perspective of attention patterns. We find that: (1) LLMs selectively activate task-specific attention heads during SFT; (2) activation patterns for complex tasks are combinations of basic task patterns; and (3) changes in a few parameters can significantly impact activation patterns after SFT on a small number of samples.Based on these insights, experiments are conducted to actually enhance the efficiency and effectiveness of SFT.

Updated: 2024-10-18 04:38:47

标题: 监督微调通过交替注意力头激活模式实现快速任务适应

摘要: LLMs在复杂任务上的表现仍然不尽人意。一个关键问题是，目前LLMs在数据驱动的模式下学习，而这些复杂任务的说明既稀缺又难以收集或构建。相反，一个显著现象是LLMs在具有足够先验知识的简单任务上可以学习得相当快速，这些知识在预训练阶段被捕捉到。因此，如果能够阐明这种快速泛化的先决条件和机制，就可以增强LLMs学习复杂任务的效率和效果。因此，在本文中，我们采用基于梯度的方法，通过关注模式的视角来剖析SFT过程如何使LLMs适应下游任务。我们发现：（1）LLMs在SFT过程中选择性地激活特定于任务的注意头；（2）复杂任务的激活模式是基础任务模式的组合；（3）一些参数的改变可以显著影响SFT后在少量样本上的激活模式。基于这些见解，进行了实验来实际增强SFT的效率和效果。

更新时间: 2024-10-18 04:38:47

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.15820v2

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, UltraFeedback, GSM8K, HH-RLHF, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves a 65% win rate at maximum lengths of 192 and 384 tokens, outperforming standard BoN with the same computational cost. Furthermore, TreeBoN achieves around a 60% win rate across longer responses, showcasing its scalability and alignment efficacy.

Updated: 2024-10-18 04:38:21

标题: TreeBoN: 通过推测性树搜索和最佳N采样增强推理时间对齐

摘要: 推理时间对齐增强了大型语言模型的性能，无需额外的训练或微调，但在保持计算效率和高质量输出之间存在挑战。最佳-N（BoN）采样作为一种简单而强大的方法，生成多个响应并选择最佳响应，实现了性能的提升，但计算成本较高。我们提出了TreeBoN，这是一个将推测树搜索策略集成到最佳-N（BoN）采样中的新颖框架。TreeBoN维护一组父节点，迭代地分支和修剪低质量响应，从而降低计算开销同时保持高输出质量。我们的方法还利用来自直接偏好优化（DPO）的令牌级奖励来引导树扩展并修剪低质量路径。我们使用AlpacaFarm、UltraFeedback、GSM8K、HH-RLHF和TutorEval数据集评估了TreeBoN，展示了持续的改进。具体而言，TreeBoN在最大长度为192和384令牌时实现了65%的胜率，优于具有相同计算成本的标准BoN。此外，TreeBoN在更长响应中实现了约60%的胜率，展示了其可伸缩性和对齐效果。

更新时间: 2024-10-18 04:38:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.16033v1

From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning

Motivated by in-context learning (ICL) capabilities of Large Language models (LLMs), multimodal LLMs with additional visual modality are also exhibited with similar ICL abilities when multiple image-text pairs are provided as demonstrations. However, relatively less work has been done to investigate the principles behind how and why multimodal ICL works. We conduct a systematic and principled evaluation of multimodal ICL for models of different scales on a broad spectrum of new yet critical tasks. Through perturbations over different modality information, we show that modalities matter differently across tasks in multimodal ICL. Guided by task-specific modality impact, we recommend modality-driven demonstration strategies to boost ICL performance. We also find that models may follow inductive biases from multimodal ICL even if they are rarely seen in or contradict semantic priors from pretraining data. Our principled analysis provides a comprehensive way of understanding the role of demonstrations in multimodal in-context learning, and sheds light on effectively improving multimodal ICL on a wide range of tasks.

Updated: 2024-10-18 04:37:33

标题: 从内省到最佳实践：多模态情境学习中示范的原则性分析

摘要: 受大型语言模型（LLMs）的上下文学习（ICL）能力启发，具有额外视觉模态的多模态LLMs在提供多个图像-文本对作为示范时也表现出类似的ICL能力。然而，相对较少的工作已经对多模态ICL的工作原理进行了探究。我们对不同规模模型在广泛的新但关键任务上进行了系统和原则性的多模态ICL评估。通过对不同模态信息的扰动，我们展示了在多模态ICL中，模态在任务之间的重要性不同。在任务特定的模态影响指导下，我们推荐以模态驱动的示范策略来提升ICL性能。我们还发现，即使模型很少在预训练数据中看到或与语义先验相矛盾，模型也可能遵循多模态ICL的归纳偏见。我们的原则性分析提供了一种全面理解示范在多模态上下文学习中的作用的方式，并为有效改进广泛任务上的多模态ICL提供了启示。

更新时间: 2024-10-18 04:37:33

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.00902v2

Heavy-Tailed Diffusion Models

Diffusion models achieve state-of-the-art generation quality across many applications, but their ability to capture rare or extreme events in heavy-tailed distributions remains unclear. In this work, we show that traditional diffusion and flow-matching models with standard Gaussian priors fail to capture heavy-tailed behavior. We address this by repurposing the diffusion framework for heavy-tail estimation using multivariate Student-t distributions. We develop a tailored perturbation kernel and derive the denoising posterior based on the conditional Student-t distribution for the backward process. Inspired by $\gamma$-divergence for heavy-tailed distributions, we derive a training objective for heavy-tailed denoisers. The resulting framework introduces controllable tail generation using only a single scalar hyperparameter, making it easily tunable for diverse real-world distributions. As specific instantiations of our framework, we introduce t-EDM and t-Flow, extensions of existing diffusion and flow models that employ a Student-t prior. Remarkably, our approach is readily compatible with standard Gaussian diffusion models and requires only minimal code changes. Empirically, we show that our t-EDM and t-Flow outperform standard diffusion models in heavy-tail estimation on high-resolution weather datasets in which generating rare and extreme events is crucial.

Updated: 2024-10-18 04:29:46

标题: 重尾扩散模型

摘要: 扩散模型在许多应用中实现了最先进的生成质量，但其捕捉重尾分布中罕见或极端事件的能力仍不清楚。在这项工作中，我们展示传统的扩散和流匹配模型使用标准高斯先验无法捕捉重尾行为。我们通过重新利用扩散框架来进行重尾估计，使用多元学生t分布来解决这个问题。我们开发了一个定制的扰动核，并根据后向过程的条件学生t分布推导出去噪后验。受重尾分布的$\gamma$-散度启发，我们为重尾去噪器推导出了一个训练目标。由此产生的框架引入了可控尾部生成，仅使用单个标量超参数，使其易于调整以适应多样的现实世界分布。作为我们框架的具体实例，我们介绍了t-EDM和t-Flow，这是现有扩散和流模型的扩展，采用学生t先验。值得注意的是，我们的方法与标准高斯扩散模型兼容，并且只需要最少的代码更改。在实证方面，我们展示了我们的t-EDM和t-Flow在高分辨率天气数据集上的重尾估计中胜过标准扩散模型，在这些数据集中生成罕见和极端事件至关重要。

更新时间: 2024-10-18 04:29:46

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.14171v1

Biometric Authentication Based on Enhanced Remote Photoplethysmography Signal Morphology

Remote photoplethysmography (rPPG) is a non-contact method for measuring cardiac signals from facial videos, offering a convenient alternative to contact photoplethysmography (cPPG) obtained from contact sensors. Recent studies have shown that each individual possesses a unique cPPG signal morphology that can be utilized as a biometric identifier, which has inspired us to utilize the morphology of rPPG signals extracted from facial videos for person authentication. Since the facial appearance and rPPG are mixed in the facial videos, we first de-identify facial videos to remove facial appearance while preserving the rPPG information, which protects facial privacy and guarantees that only rPPG is used for authentication. The de-identified videos are fed into an rPPG model to get the rPPG signal morphology for authentication. In the first training stage, unsupervised rPPG training is performed to get coarse rPPG signals. In the second training stage, an rPPG-cPPG hybrid training is performed by incorporating external cPPG datasets to achieve rPPG biometric authentication and enhance rPPG signal morphology. Our approach needs only de-identified facial videos with subject IDs to train rPPG authentication models. The experimental results demonstrate that rPPG signal morphology hidden in facial videos can be used for biometric authentication. The code is available at https://github.com/zhaodongsun/rppg_biometrics.

Updated: 2024-10-18 04:23:00

标题: 基于增强远程光电容积脉搏信号形态的生物特征认证

摘要: 遥感光电容积脉动图（rPPG）是一种从面部视频中测量心脏信号的非接触方法，为接触式光电容积脉动图（cPPG）提供了方便的替代方法。最近的研究表明，每个个体都具有一种独特的cPPG信号形态，可以用作生物识别标识符，这启发我们利用从面部视频中提取的rPPG信号形态进行个人认证。由于面部外观和rPPG混合在面部视频中，我们首先对面部视频进行去识别处理，删除面部外观同时保留rPPG信息，从而保护面部隐私并确保仅使用rPPG进行认证。去识别视频被输入rPPG模型以获取用于认证的rPPG信号形态。在第一训练阶段，进行无监督的rPPG训练以获取粗略rPPG信号。在第二训练阶段，通过整合外部cPPG数据集进行rPPG-cPPG混合训练，实现rPPG生物识别认证并增强rPPG信号形态。我们的方法仅需要带有主题ID的去识别面部视频来训练rPPG认证模型。实验结果表明，隐藏在面部视频中的rPPG信号形态可用于生物识别认证。代码可在https://github.com/zhaodongsun/rppg_biometrics找到。

更新时间: 2024-10-18 04:23:00

领域: cs.CV,cs.AI,eess.IV,eess.SP

下载: http://arxiv.org/abs/2407.04127v2

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Talking head generation intends to produce vivid and realistic talking head videos from a single portrait and speech audio clip. Although significant progress has been made in diffusion-based talking head generation, almost all methods rely on autoregressive strategies, which suffer from limited context utilization beyond the current generation step, error accumulation, and slower generation speed. To address these challenges, we present DAWN (Dynamic frame Avatar With Non-autoregressive diffusion), a framework that enables all-at-once generation of dynamic-length video sequences. Specifically, it consists of two main components: (1) audio-driven holistic facial dynamics generation in the latent motion space, and (2) audio-driven head pose and blink generation. Extensive experiments demonstrate that our method generates authentic and vivid videos with precise lip motions, and natural pose/blink movements. Additionally, with a high generation speed, DAWN possesses strong extrapolation capabilities, ensuring the stable production of high-quality long videos. These results highlight the considerable promise and potential impact of DAWN in the field of talking head video generation. Furthermore, we hope that DAWN sparks further exploration of non-autoregressive approaches in diffusion models. Our code will be publicly available at https://github.com/Hanbo-Cheng/DAWN-pytorch.

Updated: 2024-10-18 04:19:02

标题: DAWN：用于生成说话头部视频的动态帧头像和非自回归扩散框架

摘要: 头部生成旨在从单个肖像和语音音频剪辑生成生动逼真的说话头部视频。尽管在基于扩散的头部生成方面取得了显著进展，但几乎所有方法都依赖于自回归策略，这些策略在当前生成步骤之外利用有限上下文、误差累积和生成速度较慢。为了解决这些挑战，我们提出了DAWN（具有非自回归扩散的动态帧头像）框架，该框架使动态长度视频序列的一次性生成成为可能。具体来说，它包括两个主要组成部分：（1）在潜在运动空间中生成基于音频驱动的整体面部动态，和（2）生成基于音频驱动的头部姿势和眨眼动作。大量实验证明，我们的方法生成了具有精确唇部运动和自然姿势/眨眼动作的真实生动视频。此外，DAWN具有高生成速度，具有强大的外推能力，确保稳定生成高质量的长视频。这些结果突显了DAWN在头部视频生成领域的巨大潜力和潜在影响。此外，我们希望DAWN引发对扩散模型中非自回归方法的进一步探索。我们的代码将在https://github.com/Hanbo-Cheng/DAWN-pytorch上公开提供。

更新时间: 2024-10-18 04:19:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.13726v2

Elements of disinformation theory: cyber engagement via increasing adversary information consumption

We consider the case where an adversary is conducting a surveillance campaign against a networked control system (NCS), and take the perspective of a defender/control system operator who has successfully isolated the cyber intruder. To better understand the adversary's intentions and to drive up their operating costs, the defender directs the adversary towards a ``honeypot" that emulates a real control system and without actual connections to a physical plant. We propose a strategy for adversary engagement within the ``honey" control system to increase the adversary's costs of information processing. We assume that, based on an understanding of the adversary's control theoretic goals, cyber threat intelligence (CTI) provides the defender knowledge of the adversary's preferences for information acquisition. We use this knowledge to spoof sensor readings to maximize the amount of information the adversary consumes while making it (information theoretically) difficult for the adversary to detect that they are being spoofed. We discuss the case of imperfect versus perfect threat intelligence and perform a numerical comparison.

Updated: 2024-10-18 04:18:45

标题: 虚假信息理论要素：通过增加对手信息消费进行网络参与

摘要: 我们考虑的情况是，对网络控制系统（NCS）进行监视活动的对手，并从成功隔离网络入侵者的防御者/控制系统操作员的角度出发。为了更好地了解对手的意图并增加其运营成本，防御者将对手引向一个“蜜罐”，模拟一个真实的控制系统，但没有与实际物理设备的连接。我们提出了一种对手参与“蜜罐”控制系统的策略，以增加对手的信息处理成本。我们假设，基于对对手控制理论目标的理解，网络威胁情报（CTI）为防御者提供了对对手信息获取偏好的知识。我们利用这些知识欺骗传感器读数，以最大化对手消耗的信息量，同时使对手难以（信息理论上）检测到自己被欺骗。我们讨论了完美与不完美的威胁情报之间的情况，并进行了数值比较。

更新时间: 2024-10-18 04:18:45

领域: eess.SY,cs.CR,cs.IT,cs.SY,math.IT,math.OC

下载: http://arxiv.org/abs/2410.14168v1

Tight bounds on Pauli channel learning without entanglement

Quantum entanglement is a crucial resource for learning properties from nature, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize states, measurements, and operations that are separable between the main system of interest and an ancillary system. Interestingly, we show that these algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. Within this setting, we prove a tight lower bound for Pauli channel learning without entanglement that closes the gap between the best-known upper and lower bound. In particular, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ copies of the Pauli channel. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for Pauli noise characterization.

Updated: 2024-10-18 04:18:05

标题: 没有纠缠的情况下学习Pauli信道的严格界限

摘要: 量子纠缠是从自然界学习性质的关键资源，但其优势的精确表征可能具有挑战性。在这项工作中，我们认为没有纠缠的学习算法是指那些仅利用主系统和辅助系统之间可分离的状态、测量和操作的算法。有趣的是，我们证明这些算法等价于在主系统上应用量子电路，其中夹杂着中间电路测量和经典反馈。在这个设置中，我们证明了没有纠缠的Pauli信道学习的严格下界，缩小了已知的最优上界和下界之间的差距。特别地，我们展示了在学习过程中没有纠缠时，对于估计$n$比特Pauli信道的每个本征值到$\varepsilon$误差，高概率下需要$\Theta(2^n\varepsilon^{-2})$轮测量。相比之下，一个带有纠缠的学习算法只需要$\Theta(\varepsilon^{-2})$个Pauli信道的副本。这个严格的下界加强了为Pauli噪声特性实验演示纠缠增强优势的基础。

更新时间: 2024-10-18 04:18:05

领域: quant-ph,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2309.13461v3

LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems

Interestingly, LLMs yet struggle with some basic tasks that humans find trivial to handle, e.g., counting the number of character r's in the word "strawberry". There are several popular conjectures (e.g., tokenization, architecture and training data) regarding the reason for deficiency of LLMs in simple word-based counting problems, sharing the similar belief that such failure stems from model pretraining hence probably inevitable during deployment. In this paper, we carefully design multiple evaluation settings to investigate validity of prevalent conjectures. Meanwhile, we measure transferability of advanced mathematical and coding reasoning capabilities from specialized LLMs to simple counting tasks. Although specialized LLMs suffer from counting problems as well, we find conjectures about inherent deficiency of LLMs invalid and further seek opportunities to elicit knowledge and capabilities from LLMs that are beneficial to counting tasks. Compared with strategies such as finetuning and in-context learning that are commonly adopted to enhance performance on new or challenging tasks, we show that engaging reasoning is the most robust and efficient way to help LLMs better perceive tasks with more accurate responses. We hope our conjecture validation design could provide insights into the study of future critical failure modes of LLMs. Based on challenges in transferring advanced capabilities to much simpler tasks, we call for more attention to model capability acquisition and evaluation. We also highlight the importance of cultivating consciousness of "reasoning before responding" during model pretraining.

Updated: 2024-10-18 04:17:16

标题: LLM天才悖论：语言和数学专家在简单基于单词的计数问题中的挣扎

摘要: 有趣的是，大型语言模型(LLMs)仍然在一些人类认为微不足道的基本任务上遇到困难，例如计算单词“strawberry”中字符r的数量。关于LLMs在简单基于单词的计数问题上存在不足的原因，有几种流行的猜测（例如，分词、架构和训练数据），这些猜测都认为这种失败源于模型的预训练，因此在部署过程中可能是不可避免的。在本文中，我们精心设计了多个评估设置，以调查流行猜测的有效性。同时，我们衡量了从专门的LLMs到简单计数任务的高级数学和编码推理能力的可转移性。尽管专门的LLMs也存在计数问题，我们发现关于LLMs固有缺陷的猜测是无效的，并进一步寻求从LLMs中挖掘知识和能力的机会，这些对计数任务有益。与常用于增强新任务或具有挑战性任务性能的策略（如微调和上下文学习）相比，我们表明，参与推理是帮助LLMs更准确地感知任务的最稳健和高效的方式。我们希望我们的猜测验证设计能够为未来LLMs的关键故障模式研究提供见解。基于将高级能力转移到更简单任务中的挑战，我们呼吁更多关注模型能力的获取和评估。我们还强调在模型预训练期间培养“在回应之前进行推理”的意识的重要性。

更新时间: 2024-10-18 04:17:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14166v1

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example of "emergence", where model ability manifests sharply through a phase transition. In this work, we show that the phenomenon of grokking is not specific to neural networks nor to gradient descent-based optimization. Specifically, we show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines (RFM), an iterative algorithm that uses the Average Gradient Outer Product (AGOP) to enable task-specific feature learning with general machine learning models. When used in conjunction with kernel machines, iterating RFM results in a fast transition from random, near zero, test accuracy to perfect test accuracy. This transition cannot be predicted from the training loss, which is identically zero, nor from the test loss, which remains constant in initial iterations. Instead, as we show, the transition is completely determined by feature learning: RFM gradually learns block-circulant features to solve modular arithmetic. Paralleling the results for RFM, we show that neural networks that solve modular arithmetic also learn block-circulant features. Furthermore, we present theoretical evidence that RFM uses such block-circulant features to implement the Fourier Multiplication Algorithm, which prior work posited as the generalizing solution neural networks learn on these tasks. Our results demonstrate that emergence can result purely from learning task-relevant features and is not specific to neural architectures nor gradient descent-based optimization methods. Furthermore, our work provides more evidence for AGOP as a key mechanism for feature learning in neural networks.

Updated: 2024-10-18 04:13:15

标题: 非神经模型中的出现：通过平均梯度外积理解模块算术

摘要: 训练用于解决模数算术任务的神经网络表现出理解能力，这是一个现象，即测试准确度在模型在训练过程中达到100％训练准确度之后开始显著提高。这经常被视为“出现”的一个例子，即模型的能力通过阶段性转变显著表现出来。在这项工作中，我们展示了理解现象并不特定于神经网络或基于梯度下降的优化。具体而言，我们展示了在使用递归特征机器（RFM）学习模数算术时，这种现象会发生。RFM是一种迭代算法，使用平均梯度外积（AGOP）来实现具有一般机器学习模型的任务特定特征学习。与核机器结合使用时，迭代RFM会导致从随机、接近零的测试准确度快速过渡到完美的测试准确度。这种转变无法从训练损失（恒等于零）或测试损失（在初始迭代中保持恒定）中预测。相反，正如我们所展示的，转变完全取决于特征学习：RFM逐渐学习块循环特征来解决模数算术问题。与RFM的结果相似，我们展示了解决模数算术的神经网络也学习块循环特征。此外，我们提出了理论证据表明RFM使用这些块循环特征来实现傅里叶乘法算法，先前的研究将其作为神经网络在这些任务上学习的泛化解决方案。我们的结果表明，出现可以纯粹源自学习与任务相关的特征，而并非特定于神经结构或基于梯度下降的优化方法。此外，我们的工作为AGOP作为神经网络特征学习的关键机制提供了更多证据。

更新时间: 2024-10-18 04:13:15

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.20199v2

Amphista: Bi-directional Multi-head Decoding for Accelerating LLM Inference

Large Language Models (LLMs) inherently use autoregressive decoding, which lacks parallelism in inference and results in significantly slow inference speed. While methods such as Medusa constructs parallelized heads, they lack adequate information interaction across different prediction positions. To overcome this limitation, we introduce Amphista, an enhanced speculative decoding framework that builds upon Medusa. Specifically, Amphista models an Auto-embedding Block capable of parallel inference, incorporating bi-directional attention to enable interaction between different drafting heads. Additionally, Amphista integrates Staged Adaptation Layers, which ensure a seamless transition of semantic information from the target model's autoregressive inference to the drafting heads' non-autoregressive inference, effectively achieving paradigm shift and feature fusion. Experimental results on Vicuna models using MT-Bench and Spec-Bench demonstrate that Amphista achieves substantial acceleration while maintaining generation quality. On MT-Bench, Amphista delivers up to 2.75$\times$ speedup over vanilla autoregressive decoding and 1.40$\times$ over Medusa on Vicuna 33B in wall-clock time.

Updated: 2024-10-18 04:13:05

标题: Amphista：双向多头解码以加速LLM推断

摘要: 大型语言模型（LLMs）固有地使用自回归解码，这在推理中缺乏并行性，并导致推理速度显著减慢。虽然像Medusa这样的方法构建了并行化头部，但它们在不同预测位置之间缺乏足够的信息交互。为了克服这一限制，我们引入了Amphista，这是一个基于Medusa的增强推测解码框架。具体来说，Amphista建模了一个自动嵌入块，能够进行并行推理，并整合了双向注意力，以实现不同起草头之间的交互。此外，Amphista整合了分阶段适应层，确保从目标模型的自回归推理到起草头的非自回归推理的语义信息的无缝过渡，有效实现了范式转变和特征融合。在使用MT-Bench和Spec-Bench的Vicuna模型上的实验结果表明，Amphista在保持生成质量的同时实现了显著加速。在MT-Bench上，Amphista在墙钟时间上比基本自回归解码快了高达2.75倍，并比Vicuna 33B上的Medusa快了1.40倍。

更新时间: 2024-10-18 04:13:05

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.13170v2

BlockFound: Customized blockchain foundation model for anomaly detection

We propose BlockFound, a customized foundation model for anomaly blockchain transaction detection. Unlike existing methods that rely on rule-based systems or directly apply off-the-shelf large language models, BlockFound introduces a series of customized designs to model the unique data structure of blockchain transactions. First, a blockchain transaction is multi-modal, containing blockchain-specific tokens, texts, and numbers. We design a modularized tokenizer to handle these multi-modal inputs, balancing the information across different modalities. Second, we design a customized mask language learning mechanism for pretraining with RoPE embedding and FlashAttention for handling longer sequences. After training the foundation model, we further design a novel detection method for anomaly detection. Extensive evaluations on Ethereum and Solana transactions demonstrate BlockFound's exceptional capability in anomaly detection while maintaining a low false positive rate. Remarkably, BlockFound is the only method that successfully detects anomalous transactions on Solana with high accuracy, whereas all other approaches achieved very low or zero detection recall scores. This work not only provides new foundation models for blockchain but also sets a new benchmark for applying LLMs in blockchain data.

Updated: 2024-10-18 04:05:06

标题: BlockFound：用于异常检测的定制区块链基础模型

摘要: 我们提出了BlockFound，这是一个定制的基础模型，用于异常区块链交易检测。与现有依赖基于规则的系统或直接应用现成的大型语言模型的方法不同，BlockFound引入了一系列定制设计来建模区块链交易的独特数据结构。首先，区块链交易是多模态的，包含区块链特定的令牌、文本和数字。我们设计了一个模块化的分词器来处理这些多模态输入，平衡不同模态之间的信息。其次，我们设计了一个定制的掩码语言学习机制，用RoPE嵌入和FlashAttention进行预训练，以处理更长的序列。在训练基础模型之后，我们进一步设计了一种新颖的异常检测方法。对以太坊和Solana交易的广泛评估表明，BlockFound在异常检测方面具有出色的能力，同时保持较低的误报率。值得注意的是，BlockFound是唯一成功在Solana上高精度检测异常交易的方法，而所有其他方法都取得了非常低或零的检测召回分数。这项工作不仅为区块链提供了新的基础模型，还为在区块链数据中应用LLM设定了新的基准。

更新时间: 2024-10-18 04:05:06

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.04039v3

Collaboratively adding new knowledge to an LLM

We address the question of how to successively add new knowledge to an LLM whilst retaining previously-added knowledge. We consider two settings, semi-cooperative and fully-cooperative. Overall, LoRA performs better in most cases than full-fine tuning of all parameters when both new knowledge acquisition and retention of old, including recent, knowledge are taken into account. In the semi-cooperative setting, where datasets are not available after training, MOE mixing, model merging, and LoRA-based orthogonal subspace sequential learning, using a small weight on the orthogonality term, perform well. In the fully-cooperative setting where datasets remain available, joint training and sequential training with replay are both effective approaches with LoRA training generally preferable to full fine-tuning. The codes needed to reproduce the results are provided in an open source repository.

Updated: 2024-10-18 04:04:51

标题: 协作性地向LLM添加新知识

摘要: 我们探讨了如何在保留先前添加的知识的同时，成功地将新知识逐步添加到LLM中的问题。我们考虑了两种情景，半合作和完全合作。总的来说，在大多数情况下，LoRA在考虑了新知识获取和保留旧知识（包括最近的知识）时比全参数细调整表现更好。在半合作情景中，当训练后数据集不可用时，MOE混合、模型合并和基于LoRA的正交子空间顺序学习，在正交性项上使用小权重表现良好。在完全合作情景中，数据集仍然可用，联合训练和带重放的顺序训练都是有效的方法，LoRA训练通常优于全参数细调整。可以在开源库中找到重现结果所需的代码。

更新时间: 2024-10-18 04:04:51

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14753v1

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task. We open source our code and checkpoints at https://github.com/facebookresearch/LayerSkip.

Updated: 2024-10-18 04:02:31

标题: LayerSkip：启用早期退出推断和自我猜测解码

摘要: 我们提出了LayerSkip，这是一个端到端的解决方案，用于加速大型语言模型（LLMs）的推断。首先，在训练期间，我们应用层丢弃，较早层使用低丢弃率，而较后层使用较高的丢弃率，并且使用一个早期退出损失，其中所有变换器层共享相同的退出。其次，在推断期间，我们展示了这种训练配方增加了在较早层的早期退出的准确性，而不需要向模型添加任何辅助层或模块。第三，我们提出了一种新颖的自我推断解码解决方案，其中我们在较早层退出，并使用模型的剩余层进行验证和校正。我们提出的自我推断解码方法比其他推测解码方法具有更小的内存占用，并且受益于草稿和验证阶段的共享计算和激活。我们在不同类型的训练上对不同大小的Llama模型进行了实验：从头开始预训练、持续预训练、在特定数据领域微调以及在特定任务上微调。我们实现了我们的推断解决方案，并展示了在CNN/DM文档的摘要上加速了最多2.16倍，在编码上加速了1.82倍，在TOPv2语义解析任务上加速了2.0倍。我们在https://github.com/facebookresearch/LayerSkip 上开源我们的代码和检查点。

更新时间: 2024-10-18 04:02:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.16710v4

LeanAgent: Lifelong Learning for Formal Theorem Proving

Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 162 theorems previously unproved by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.

Updated: 2024-10-18 03:59:57

标题: LeanAgent：形式定理证明的终身学习

摘要: 大型语言模型（LLMs）已经在数学推理任务中取得成功，例如与Lean等交互式证明助手集成时的形式定理证明。现有方法涉及在特定数据集上训练或微调LLM，以在特定领域（如本科水平数学）表现良好。这些方法在推广到高级数学方面方面方面方面方面方面方面方面方面方面。一个根本限制是这些方法在静态领域上操作，未能捕捉数学家经常跨多个领域和项目同时或循环工作的方式。我们提出了LeanAgent，这是一个新颖的生命周期学习框架，用于定理证明，可以持续泛化并改进不断扩展的数学知识，而不会忘记之前学习过的知识。LeanAgent引入了几个关键创新，包括以数学难度为优化学习轨迹的课程学习策略，用于有效管理不断发展的数学知识的动态数据库，以及平衡稳定性和可塑性的渐进训练。LeanAgent成功地证明了23个不同的Lean存储库中先前未经人类证明的162个定理，其中许多来自高级数学。它在静态LLM基线上表现得更好，证明了挑战性定理，如抽象代数和代数拓扑，同时展示了从基本概念到高级主题的学习明显进展。此外，我们分析了LeanAgent在关键终身学习指标上的卓越表现。LeanAgent在稳定性和向后传递方面取得了杰出成绩，学习新任务提高了先前学习任务的表现。这强调了LeanAgent的持续泛化和改进，解释了其卓越的定理证明性能。

更新时间: 2024-10-18 03:59:57

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2410.06209v4

Assessing Open-world Forgetting in Generative Image Model Customization

Recent advances in diffusion models have significantly enhanced image generation capabilities. However, customizing these models with new classes often leads to unintended consequences that compromise their reliability. We introduce the concept of open-world forgetting to emphasize the vast scope of these unintended alterations, contrasting it with the well-studied closed-world forgetting, which is measurable by evaluating performance on a limited set of classes or skills. Our research presents the first comprehensive investigation into open-world forgetting in diffusion models, focusing on semantic and appearance drift of representations. We utilize zero-shot classification to analyze semantic drift, revealing that even minor model adaptations lead to unpredictable shifts affecting areas far beyond newly introduced concepts, with dramatic drops in zero-shot classification of up to 60%. Additionally, we observe significant changes in texture and color of generated content when analyzing appearance drift. To address these issues, we propose a mitigation strategy based on functional regularization, designed to preserve original capabilities while accommodating new concepts. Our study aims to raise awareness of unintended changes due to model customization and advocates for the analysis of open-world forgetting in future research on model customization and finetuning methods. Furthermore, we provide insights for developing more robust adaptation methodologies.

Updated: 2024-10-18 03:58:29

标题: 评估生成图像模型定制中的开放世界遗忘

摘要: 最近对扩散模型的进展显著增强了图像生成能力。然而，使用新类别定制这些模型通常会导致意想不到的后果，从而损害它们的可靠性。我们引入了“开放世界遗忘”的概念，以强调这些意外变化的广泛范围，并将其与已广泛研究的“封闭世界遗忘”进行对比，后者可通过评估在有限类别或技能集上的表现来衡量。我们的研究首次全面调查了扩散模型中的开放世界遗忘，重点关注语义和外观表示的漂移。我们利用零样本分类来分析语义漂移，揭示即使微小的模型调整也会导致影响远超新引入概念的领域的不可预测转变，零样本分类显著下降高达60％。此外，当分析外观漂移时，我们观察到生成内容的纹理和颜色发生了显著变化。为了解决这些问题，我们提出了一种基于功能正则化的缓解策略，旨在保留原始能力的同时容纳新概念。我们的研究旨在提高对模型定制所导致的意外变化的认识，并倡导在未来关于模型定制和微调方法的研究中分析开放世界遗忘。此外，我们为开发更健壮的适应方法提供了见解。

更新时间: 2024-10-18 03:58:29

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2410.14159v1

A Mirror Descent Perspective of Smoothed Sign Descent

Recent work by Woodworth et al. (2020) shows that the optimization dynamics of gradient descent for overparameterized problems can be viewed as low-dimensional dual dynamics induced by a mirror map, explaining the implicit regularization phenomenon from the mirror descent perspective. However, the methodology does not apply to algorithms where update directions deviate from true gradients, such as ADAM. We use the mirror descent framework to study the dynamics of smoothed sign descent with a stability constant $\varepsilon$ for regression problems. We propose a mirror map that establishes equivalence to dual dynamics under some assumptions. By studying dual dynamics, we characterize the convergent solution as an approximate KKT point of minimizing a Bregman divergence style function, and show the benefit of tuning the stability constant $\varepsilon$ to reduce the KKT error.

Updated: 2024-10-18 03:52:21

标题: 平滑符号下降的镜像下降视角

摘要: 最近Woodworth等人的研究（2020年）表明，梯度下降的优化动态对于超参数问题可以被视为由镜像映射引起的低维对偶动态，从镜像下降的视角解释了隐式正则化现象。然而，这种方法不适用于更新方向偏离真实梯度的算法，例如ADAM。我们利用镜像下降框架研究带有稳定常数$\varepsilon$的平滑符号下降在回归问题中的动态。我们提出了一个镜像映射，在一些假设下建立了与对偶动态的等价性。通过研究对偶动态，我们将收敛解表征为最小化Bregman散度样式函数的近似KKT点，并展示了调整稳定常数$\varepsilon$以减少KKT误差的好处。

更新时间: 2024-10-18 03:52:21

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.14158v1

Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred Policies

Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each temporally extended goal is expressed using a formula in Linear Temporal Logic on Finite Traces (LTL$_f$). To plan with the partially ordered preference, we introduce order theory to map a preference over temporal goals to a preference over policies for the MDP. Accordingly, a most preferred policy under a stochastic ordering induces a stochastic nondominated probability distribution over the finite paths in the MDP. To synthesize a most preferred policy, our technical approach includes two key steps. In the first step, we develop a procedure to transform a partially ordered preference over temporal goals into a computational model, called preference automaton, which is a semi-automaton with a partial order over acceptance conditions. In the second step, we prove that finding a most preferred policy is equivalent to computing a Pareto-optimal policy in a multi-objective MDP that is constructed from the original MDP, the preference automaton, and the chosen stochastic ordering relation. Throughout the paper, we employ running examples to illustrate the proposed preference specification and solution approaches. We demonstrate the efficacy of our algorithm using these examples, providing detailed analysis, and then discuss several potential future directions.

Updated: 2024-10-18 03:50:57

标题: 首选规划在随机环境中的应用：从部分有序的时间目标到最优偏好策略

摘要: 人类的偏好并不总是通过完整的线性顺序来表示：使用部分有序的偏好来表达无法比较的结果是自然的。在这项工作中，我们考虑决策和概率规划在被建模为马尔可夫决策过程（MDPs）的随机系统中进行，给定一个对一组时间延伸目标的部分有序偏好。具体来说，每个时间延伸目标都是用有限轨迹上的线性时态逻辑（LTL$_f$）公式来表示的。为了计划部分有序偏好，我们引入顺序理论将一个对时间目标的偏好映射到MDP的策略偏好上。因此，在随机排序下，一个最优策略会引发一个在MDP中有限路径上的随机非支配概率分布。为了综合出一个最优策略，我们的技术方法包括两个关键步骤。在第一步中，我们开发了一个过程，将对时间目标的部分有序偏好转化为一个计算模型，称为偏好自动机，它是一个具有接受条件部分序的半自动机。在第二步中，我们证明找到一个最优策略等同于计算一个在多目标MDP中的帕累托最优策略，该MDP由原始MDP、偏好自动机和选择的随机排序关系构建而成。在整个论文中，我们使用运行示例来说明提出的偏好规范和解决方法。我们使用这些示例证明了我们算法的有效性，提供了详细的分析，然后讨论了几个潜在的未来方向。

更新时间: 2024-10-18 03:50:57

领域: cs.RO,cs.AI,cs.FL,cs.LO

下载: http://arxiv.org/abs/2403.18212v2

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. We introduce discrete diffusion models as a novel solution to these challenges. Through the lens of subgoal imbalance, we demonstrate how diffusion models effectively learn difficult subgoals that elude autoregressive approaches. We propose Multi-granularity Diffusion Modeling (MDM), which prioritizes subgoals based on difficulty during learning. On complex tasks like Countdown, Sudoku, and Boolean Satisfiability Problems, MDM significantly outperforms autoregressive models without using search techniques. For instance, MDM achieves 91.5\% and 100\% accuracy on Countdown and Sudoku, respectively, compared to 45.8\% and 20.7\% for autoregressive models. Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks.

Updated: 2024-10-18 03:48:53

标题: 超越自回归：用于复杂推理和规划的离散扩散

摘要: 自回归语言模型，尽管其出色的能力，仍然在复杂推理和长期规划任务中遇到困难。我们引入了离散扩散模型作为这些挑战的新解决方案。通过子目标不平衡的视角，我们展示了扩散模型如何有效地学习难以捉摸的子目标，这是自回归方法所无法做到的。我们提出了多粒度扩散建模（MDM），在学习过程中基于困难程度优先考虑子目标。在复杂任务如倒计时、数独和布尔满足性问题中，MDM在不使用搜索技术的情况下显著优于自回归模型。例如，与自回归模型的45.8％和20.7％相比，MDM分别在倒计时和数独上分别达到91.5％和100％的准确率。我们的工作突显了基于扩散的方法在推进人工智能能力方面具有潜力，特别是在复杂语言理解和问题解决任务方面。

更新时间: 2024-10-18 03:48:53

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14157v1

RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training

Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. MLLMs involve significant external knowledge within their parameters; however, it is challenging to continually update these models with the latest knowledge, which involves huge computational costs and poor interpretability. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs. Considering the redundant information within vision modality, we first leverage the question to instruct the extraction of visual information through interactions with one set of learnable queries, minimizing irrelevant interference during retrieval and generation. Besides, we introduce a pre-trained multimodal adaptive fusion module to achieve question text-to-multimodal retrieval and integration of multimodal knowledge by projecting visual and language modalities into a unified semantic space. Furthermore, we present an Adaptive Selection Knowledge Generation (ASKG) strategy to train the generator to autonomously discern the relevance of retrieved knowledge, which realizes excellent denoising performance. Extensive experiments on open multimodal question-answering datasets demonstrate that RA-BLIP achieves significant performance and surpasses the state-of-the-art retrieval-augmented models.

Updated: 2024-10-18 03:45:19

标题: RA-BLIP：多模态自适应检索增强引导语言图像预训练

摘要: 多模态大型语言模型（MLLMs）最近受到了大量关注，显示出它们作为各种视觉语言任务的通用模型的新兴潜力。 MLLMs的参数中涉及大量外部知识; 但是，持续更新这些模型以获取最新知识具有挑战性，这涉及巨大的计算成本和较差的可解释性。检索增强技术已被证明是LLMs和MLLMs的有效插件。在本研究中，我们提出了多模态自适应检索增强引导语言-图像预训练（RA-BLIP），这是一种新颖的检索增强框架，适用于各种MLLMs。考虑到视觉模态内部的冗余信息，我们首先利用问题通过与一组可学习查询的交互来指导提取视觉信息，从而在检索和生成过程中最小化不相关的干扰。此外，我们引入了一个预先训练的多模态自适应融合模块，实现了问题文本到多模态检索的融合，并通过将视觉和语言模态投影到统一的语义空间来整合多模态知识。此外，我们提出了一种自适应选择知识生成（ASKG）策略，用于训练生成器自主辨别检索知识的相关性，从而实现出色的去噪性能。对开放式多模态问答数据集的大量实验表明，RA-BLIP取得了显著的性能，并超越了最先进的检索增强模型。

更新时间: 2024-10-18 03:45:19

领域: cs.MM,cs.AI

下载: http://arxiv.org/abs/2410.14154v1

Utilizing Large Language Models for Event Deconstruction to Enhance Multimodal Aspect-Based Sentiment Analysis

With the rapid development of the internet, the richness of User-Generated Contentcontinues to increase, making Multimodal Aspect-Based Sentiment Analysis (MABSA) a research hotspot. Existing studies have achieved certain results in MABSA, but they have not effectively addressed the analytical challenges in scenarios where multiple entities and sentiments coexist. This paper innovatively introduces Large Language Models (LLMs) for event decomposition and proposes a reinforcement learning framework for Multimodal Aspect-based Sentiment Analysis (MABSA-RL) framework. This framework decomposes the original text into a set of events using LLMs, reducing the complexity of analysis, introducing reinforcement learning to optimize model parameters. Experimental results show that MABSA-RL outperforms existing advanced methods on two benchmark datasets. This paper provides a new research perspective and method for multimodal aspect-level sentiment analysis.

Updated: 2024-10-18 03:40:45

标题: 利用大型语言模型进行事件拆解以增强多模态方面的基于情感的分析

摘要: 随着互联网的快速发展，用户生成内容的丰富度不断增加，使得多模态基于方面的情感分析（MABSA）成为研究热点。现有研究在MABSA方面取得了一定的成果，但并没有有效地解决多个实体和情感共存的分析挑战。本文创新性地引入大型语言模型（LLMs）进行事件分解，并提出了一个用于多模态基于方面情感分析（MABSA-RL）的强化学习框架。该框架利用LLMs将原始文本分解为一组事件，降低了分析的复杂性，引入强化学习来优化模型参数。实验结果表明，MABSA-RL在两个基准数据集上表现优于现有的先进方法。本文为多模态方面级情感分析提供了新的研究视角和方法。

更新时间: 2024-10-18 03:40:45

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14150v1

A Tighter Complexity Analysis of SparseGPT

In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $O(d^{3})$ to $O(d^{\omega} + d^{2+a+o(1)} + d^{1+\omega(1,1,a)-a})$ for any $a \in [0, 1]$, where $\omega$ is the exponent of matrix multiplication. In particular, for the current $\omega \approx 2.371$ [Alman, Duan, Williams, Xu, Xu, Zhou 2024], our running time boils down to $O(d^{2.53})$. This running time is due to the analysis of the lazy update behavior in iterative maintenance problems such as [Deng, Song, Weinstein 2022; Brand, Song, Zhou ICML 2024].

Updated: 2024-10-18 03:36:03

标题: 《SparseGPT的更严格复杂性分析》

摘要: 在这项工作中，我们改进了SparseGPT [Frantar, Alistarh ICML 2023]的运行时间分析，从$O(d^{3})$提高到$O(d^{\omega} + d^{2+a+o(1)} + d^{1+\omega(1,1,a)-a})$，其中$a \in [0, 1]$，其中$\omega$是矩阵乘法的指数。特别是，对于当前的$\omega \approx 2.371$ [Alman, Duan, Williams, Xu, Xu, Zhou 2024]，我们的运行时间简化为$O(d^{2.53})。这个运行时间是由于对诸如[Deng, Song, Weinstein 2022; Brand, Song, Zhou ICML 2024]等迭代维护问题中懒惰更新行为的分析。

更新时间: 2024-10-18 03:36:03

领域: cs.DS,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.12151v2

Leveraging Large Language Models for Enhancing Public Transit Services

Public transit systems play a crucial role in providing efficient and sustainable transportation options in urban areas. However, these systems face various challenges in meeting commuters' needs. On the other hand, despite the rapid development of Large Language Models (LLMs) worldwide, their integration into transit systems remains relatively unexplored. The objective of this paper is to explore the utilization of LLMs in the public transit system, with a specific focus on improving the customers' experience and transit staff performance. We present a general framework for developing LLM applications in transit systems, wherein the LLM serves as the intermediary for information communication between natural language content and the resources within the database. In this context, the LLM serves a multifaceted role, including understanding users' requirements, retrieving data from the dataset in response to user queries, and tailoring the information to align with the users' specific needs. Three transit LLM applications are presented: Tweet Writer, Trip Advisor, and Policy Navigator. Tweet Writer automates updates to the transit system alerts on social media, Trip Advisor offers customized transit trip suggestions, and Policy Navigator provides clear and personalized answers to policy queries. Leveraging LLMs in these applications enhances seamless communication with their capabilities of understanding and generating human-like languages. With the help of these three LLM transit applications, transit system media personnel can provide system updates more efficiently, and customers can access travel information and policy answers in a more user-friendly manner.

Updated: 2024-10-18 03:33:47

标题: 利用大型语言模型增强公共交通服务

摘要: 公共交通系统在提供城市区域高效可持续的交通选择方面发挥着至关重要的作用。然而，这些系统在满足通勤者需求方面面临着各种挑战。与此同时，尽管全球范围内大型语言模型（LLMs）的快速发展，但它们在公共交通系统中的整合仍然相对未被探索。本文旨在探讨LLMs在公共交通系统中的利用，特别关注改善客户体验和交通工作人员绩效。我们提出了一个通用框架，用于开发公共交通系统中的LLM应用程序，其中LLM作为自然语言内容和数据库资源之间信息传递的中介。在这种背景下，LLM扮演多方面的角色，包括理解用户需求，根据用户查询从数据集中检索数据，并根据用户的特定需求调整信息。文中介绍了三种公共交通LLM应用程序：Tweet Writer、Trip Advisor和Policy Navigator。Tweet Writer自动更新社交媒体上的公共交通系统警报，Trip Advisor提供定制的公共交通出行建议，Policy Navigator为政策查询提供清晰个性化的答案。利用LLMs在这些应用程序中增强了与其理解和生成人类语言的能力的无缝沟通。借助这三种LLM公共交通应用程序，公共交通系统媒体人员可以更高效地提供系统更新，客户可以更加用户友好地获取出行信息和政策答案。

更新时间: 2024-10-18 03:33:47

领域: cs.SI,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.14147v1

CausalChat: Interactive Causal Model Development and Refinement Using Large Language Models

Causal networks are widely used in many fields to model the complex relationships between variables. A recent approach has sought to construct causal networks by leveraging the wisdom of crowds through the collective participation of humans. While this can yield detailed causal networks that model the underlying phenomena quite well, it requires a large number of individuals with domain understanding. We adopt a different approach: leveraging the causal knowledge that large language models, such as OpenAI's GPT-4, have learned by ingesting massive amounts of literature. Within a dedicated visual analytics interface, called CausalChat, users explore single variables or variable pairs recursively to identify causal relations, latent variables, confounders, and mediators, constructing detailed causal networks through conversation. Each probing interaction is translated into a tailored GPT-4 prompt and the response is conveyed through visual representations which are linked to the generated text for explanations. We demonstrate the functionality of CausalChat across diverse data contexts and conduct user studies involving both domain experts and laypersons.

Updated: 2024-10-18 03:33:32

标题: CausalChat：使用大型语言模型进行交互式因果模型开发和细化

摘要: 因果网络被广泛应用于许多领域，用于建模变量之间复杂的关系。最近的一种方法试图通过人群的集体参与来构建因果网络。虽然这可以产生详细的因果网络，很好地模拟潜在现象，但需要大量具有领域理解力的个人。我们采用了一种不同的方法：利用像OpenAI的GPT-4这样的大型语言模型所学习的因果知识，通过消化大量文献。在一个名为CausalChat的专用视觉分析界面中，用户递归地探索单个变量或变量对，以识别因果关系、潜在变量、混杂因素和中介因素，通过对话构建详细的因果网络。每个探究性互动都被转化为定制的GPT-4提示，并通过与生成的文字解释相关联的可视化表示来传达响应。我们展示了CausalChat在不同数据背景下的功能，并进行了涉及领域专家和普通人的用户研究。

更新时间: 2024-10-18 03:33:32

领域: cs.AI,cs.HC,cs.LG,cs.SI

下载: http://arxiv.org/abs/2410.14146v1

A Lightweight Multi Aspect Controlled Text Generation Solution For Large Language Models

Large language models (LLMs) show remarkable abilities with instruction tuning. However, they fail to achieve ideal tasks when lacking high-quality instruction tuning data on target tasks. Multi-Aspect Controllable Text Generation (MCTG) is a representative task for this dilemma, where aspect datasets are usually biased and correlated. Existing work exploits additional model structures and strategies for solutions, limiting adaptability to LLMs. To activate MCTG ability of LLMs, we propose a lightweight MCTG pipeline based on data augmentation. We analyze bias and correlations in traditional datasets, and address these concerns with augmented control attributes and sentences. Augmented datasets are feasible for instruction tuning. In our experiments, LLMs perform better in MCTG after data augmentation, with a 20% accuracy rise and less aspect correlations.

Updated: 2024-10-18 03:32:00

标题: 一个轻量级多方面控制的文本生成解决方案，适用于大型语言模型

摘要: 大型语言模型（LLMs）在指导调优方面表现出卓越的能力。然而，当目标任务缺乏高质量的指导调优数据时，它们无法实现理想的任务。多方面可控文本生成（MCTG）是这一困境的代表性任务，其中方面数据集通常存在偏见和相关性。现有工作利用额外的模型结构和策略来解决这个问题，但限制了对LLMs的适应性。为了激活LLMs的MCTG能力，我们提出了一种基于数据增强的轻量级MCTG流水线。我们分析传统数据集中的偏见和相关性，并通过增强的控制属性和句子来解决这些问题。增强的数据集对指导调优是可行的。在我们的实验中，经过数据增强后，LLMs在MCTG中表现更好，准确率提高了20％，并且方面相关性更少。

更新时间: 2024-10-18 03:32:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.14144v1

Granger Causality in Extremes

We introduce a rigorous mathematical framework for Granger causality in extremes, designed to identify causal links from extreme events in time series. Granger causality plays a pivotal role in uncovering directional relationships among time-varying variables. While this notion gains heightened importance during extreme and highly volatile periods, state-of-the-art methods primarily focus on causality within the body of the distribution, often overlooking causal mechanisms that manifest only during extreme events. Our framework is designed to infer causality mainly from extreme events by leveraging the causal tail coefficient. We establish equivalences between causality in extremes and other causal concepts, including (classical) Granger causality, Sims causality, and structural causality. We prove other key properties of Granger causality in extremes and show that the framework is especially helpful under the presence of hidden confounders. We also propose a novel inference method for detecting the presence of Granger causality in extremes from data. Our method is model-free, can handle non-linear and high-dimensional time series, outperforms current state-of-the-art methods in all considered setups, both in performance and speed, and was found to uncover coherent effects when applied to financial and extreme weather observations.

Updated: 2024-10-18 03:31:01

标题: 在极端情况下的格兰杰因果关系

摘要: 我们引入了一个严格的数学框架，用于极端情况下的Granger因果关系，旨在识别时间序列中极端事件的因果关系。Granger因果关系在揭示时间变化变量之间的方向关系方面起着关键作用。虽然这个概念在极端和高度不稳定的时期变得更加重要，但现代方法主要集中在分布的主体内的因果关系，往往忽视了只在极端事件中表现的因果机制。我们的框架旨在通过利用因果尾系数主要从极端事件中推断因果关系。我们建立了在极端情况下因果关系和其他因果概念之间的等价性，包括（经典的）Granger因果关系、Sims因果关系和结构因果关系。我们证明了在极端情况下Granger因果关系的其他关键特性，并表明在存在隐藏混淆因素时，该框架尤其有帮助。我们还提出了一种新颖的推断方法，用于从数据中检测极端情况下的Granger因果关系的存在。我们的方法不受模型限制，可以处理非线性和高维时间序列，在所有考虑的设置中性能和速度均优于当前现代方法，并且在应用于金融和极端天气观测时发现能够揭示一致的效果。

更新时间: 2024-10-18 03:31:01

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH,62M10,G.3

下载: http://arxiv.org/abs/2407.09632v2

Preview-based Category Contrastive Learning for Knowledge Distillation

Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods mainly investigate the consistency between instance-level feature representation or prediction, which neglects the category-level information and the difficulty of each sample, leading to undesirable performance. To address these issues, we propose a novel preview-based category contrastive learning method for knowledge distillation (PCKD). It first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers in a contrastive learning fashion, which can explicitly optimize the category representation and explore the distinct correlation between representations of instances and categories, contributing to discriminative category centers and better classification results. Besides, we introduce a novel preview strategy to dynamically determine how much the student should learn from each sample according to their difficulty. Different from existing methods that treat all samples equally and curriculum learning that simply filters out hard samples, our method assigns a small weight for hard instances as a preview to better guide the student training. Extensive experiments on several challenging datasets, including CIFAR-100 and ImageNet, demonstrate the superiority over state-of-the-art methods.

Updated: 2024-10-18 03:31:00

标题: 基于预览的类别对比学习用于知识蒸馏

摘要: 知识蒸馏是一种在模型压缩中流行的算法，通过将知识从较大的模型（教师）转移到较小的模型（学生）来提高学生的性能。尽管已经付出了许多努力，现有方法主要研究实例级特征表示或预测之间的一致性，忽略了类别级别信息和每个样本的难度，导致性能不佳。为了解决这些问题，我们提出了一种基于预览的类别对比学习方法，用于知识蒸馏（PCKD）。它首先以对比学习的方式提炼实例级特征对应关系和实例特征与类别中心之间的关系的结构知识，可以明确优化类别表示并探索实例和类别表示之间的明显相关性，有助于产生有区分力的类别中心和更好的分类结果。此外，我们引入了一种新颖的预览策略，动态确定学生应该从每个样本中学习多少，根据它们的难度。与将所有样本视为相同并仅简单过滤难样本的现有方法不同，我们的方法为难例分配一个小的权重作为预览，以更好地指导学生训练。在包括CIFAR-100和ImageNet在内的几个具有挑战性的数据集上进行了大量实验，证明了其优于最先进方法的优越性。

更新时间: 2024-10-18 03:31:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.14143v1

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build self-evolving AI by generating high-quality and diverse synthetic data through text-guided prompts. However, text-only guidance cannot control synthetic images' proximity to the original images, resulting in out-of-distribution data detrimental to the model performance. To overcome the limitation, we study image guidance to achieve a spectrum of interpolations between synthetic and real images. With stronger image guidance, the generated images are similar to the training data but hard to learn. While with weaker image guidance, the synthetic images will be easier for model but contribute to a larger distribution gap with the original data. The generated full spectrum of data enables us to build a novel "Diffusion Curriculum (DisCL)". DisCL adjusts the image guidance level of image synthesis for each training stage: It identifies and focuses on hard samples for the model and assesses the most effective guidance level of synthetic images to improve hard data learning. We apply DisCL to two challenging tasks: long-tail (LT) classification and learning from low-quality data. It focuses on lower-guidance images of high-quality to learn prototypical features as a warm-up of learning higher-guidance images that might be weak on diversity or quality. Extensive experiments showcase a gain of 2.7% and 2.1% in OOD and ID macro-accuracy when applying DisCL to iWildCam dataset. On ImageNet-LT, DisCL improves the base model's tail-class accuracy from 4.4% to 23.64% and leads to a 4.02% improvement in all-class accuracy.

Updated: 2024-10-18 03:28:38

标题: 课程扩散：通过图像引导扩散实现的从合成到实际的生成式课程学习

摘要: 低质量或稀缺数据在实践中为训练深度神经网络提出了重大挑战。虽然传统的数据增强方法不能提供非常不同的新数据，但扩散模型为通过文本引导提示生成高质量和多样化的合成数据打开了一扇新的大门，从而构建了自我发展的人工智能。然而，仅依靠文本指导无法控制合成图像与原始图像之间的接近程度，导致有害于模型性能的分布外数据。为了克服这一限制，我们研究了图像引导，以实现合成图像和真实图像之间的一系列插值。通过更强的图像引导，生成的图像与训练数据相似但难以学习。而通过更弱的图像引导，合成图像对模型来说更容易，但会导致与原始数据之间更大的分布差距。生成的完整数据谱使我们能够构建一种新颖的“Diffusion Curriculum (DisCL)”。DisCL调整了每个训练阶段的图像合成的图像引导级别：它识别并专注于模型的难样本，并评估合成图像的最有效引导级别，以改善难数据学习。我们将DisCL应用于两个具有挑战性的任务：长尾（LT）分类和从低质量数据中学习。它专注于高质量的低引导图像，以学习原型特征作为学习高引导图像的热身，这些高引导图像可能在多样性或质量方面较弱。广泛的实验展示了将DisCL应用于iWildCam数据集时，OOD和ID宏精度分别提高了2.7%和2.1%。在ImageNet-LT上，DisCL将基础模型的尾部类准确率从4.4%提高到23.64%，并将全类准确率提高了4.02%。

更新时间: 2024-10-18 03:28:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.13674v2

ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom

Large vision-language models (LVLMs) have witnessed significant progress on visual understanding tasks. However, they often prioritize language knowledge over image information on visual reasoning tasks, incurring performance degradation. To tackle this issue, we first identify the drawbacks of existing solutions (i.e., insufficient and irrelevant visual descriptions, and limited multi-modal capacities). We then decompose visual reasoning process into two stages: visual perception (i.e., eyesight) and textual reasoning (i.e., wisdom), and introduce a novel visual reasoning framework named ProReason. This framework features multi-run proactive perception and decoupled vision-reasoning capabilities. Briefly, given a multi-modal question, ProReason iterates proactive information collection and reasoning until the answer can be concluded with necessary and sufficient visual descriptions. Notably, the disassociation of capabilities allows seamless integration of existing large language models (LLMs) to compensate for the reasoning deficits of LVLMs. Our extensive experiments demonstrate that ProReason outperforms both existing multi-step reasoning frameworks and passive peer methods on a wide range of benchmarks for both open-source and closed-source models. In addition, with the assistance of LLMs, ProReason achieves a performance improvement of up to 15% on MMMU benchmark. Our insights into existing solutions and the decoupled perspective for feasible integration of LLMs illuminate future research on visual reasoning techniques, especially LLM-assisted ones.

Updated: 2024-10-18 03:22:06

标题: ProReason：多模态主动推理与解耦视觉和智慧

摘要: 大型视觉-语言模型（LVLMs）在视觉理解任务上取得了显著进展。然而，在视觉推理任务中，它们往往优先考虑语言知识而非图像信息，导致性能下降。为了解决这个问题，我们首先确定了现有解决方案的缺点（即，视觉描述不足和无关紧要，以及有限的多模态能力）。然后，我们将视觉推理过程分解为两个阶段：视觉感知（即，视力）和文本推理（即，智慧），并引入了一种名为ProReason的新型视觉推理框架。该框架具有多次主动感知和解耦视觉推理能力。简而言之，给定一个多模态问题，ProReason迭代主动信息收集和推理，直到答案能够通过必要和充分的视觉描述得出。值得注意的是，能力的分离允许无缝集成现有的大型语言模型（LLMs）以弥补LVLMs的推理缺陷。我们的广泛实验证明，ProReason在各种基准测试中均优于现有的多步推理框架和被动对等方法，无论是开源模型还是闭源模型。此外，借助LLMs的帮助，ProReason在MMMU基准测试中实现了高达15%的性能提升。我们对现有解决方案的见解以及LLMs的解耦视角对LLM辅助的视觉推理技术的未来研究进行了启示。

更新时间: 2024-10-18 03:22:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.14138v1

Timer: Generative Pre-trained Transformers Are Large Time Series Models

Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progress has been achieved with the emergence of large language models, exhibiting unprecedented abilities such as few-shot generalization, scalability, and task generality, which are however absent in small deep models. To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), which is generative pre-trained by next token prediction and adapted to various downstream tasks with promising capabilities as an LTSM. Code and datasets are available at: https://github.com/thuml/Large-Time-Series-Model.

Updated: 2024-10-18 03:19:55

标题: 计时器：生成式预训练转换器是大型时间序列模型

摘要: 深度学习在时间序列分析的进展中发挥了显著作用。然而，在现实世界中数据稀缺的情况下，深度模型可能会遇到性能瓶颈，这可能由于在当前基准测试中小型模型的性能饱和而被掩盖。与此同时，大型模型通过大规模预训练在这些情景中展示了强大的能力。随着大型语言模型的出现，取得了持续进展，展现出了前所未有的能力，如少样本泛化、可扩展性和任务的通用性，这些能力在小型深度模型中是缺失的。为了改变从头开始训练特定场景的小型模型的现状，本文旨在早期开发大规模时间序列模型（LTSM）。在预训练过程中，我们整理了包含多达10亿个时间点的大规模数据集，将异构时间序列统一转换成单系列序列（S3）格式，并向LTSM发展了类似GPT的架构。为了满足不同的应用需求，我们将时间序列的预测、填补和异常检测转换为统一的生成任务。本研究的结果是一个名为Timer的时间序列变换器，它通过下一个标记预测进行生成预训练，并在各种下游任务中展现出了作为LTSM的有希望的能力。代码和数据集可在以下链接获取：https://github.com/thuml/Large-Time-Series-Model。

更新时间: 2024-10-18 03:19:55

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.02368v3

Experimenting on Markov Decision Processes with Local Treatments

Utilizing randomized experiments to evaluate the effect of short-term treatments on the short-term outcomes has been well understood and become the golden standard in industrial practice. However, as service systems become increasingly dynamical and personalized, much focus is shifting toward maximizing long-term cumulative outcomes, such as customer lifetime value, through lifetime exposure to interventions. To bridge this gap, we investigate the randomized experiments within dynamical systems modeled as Markov Decision Processes (MDPs). Our goal is to assess the impact of treatment and control policies on long-term cumulative rewards from relatively short-term observations. We first develop optimal inference techniques for assessing the effects of general treatment patterns. Furthermore, recognizing that many real-world treatments tend to be fine-grained and localized for practical efficiency and operational convenience, we then propose methods to harness this localized structure by sharing information on the non-targeted states. Our new estimator effectively overcomes the variance lower bound for general treatments while matching the more stringent lower bound incorporating the local treatment structure. Furthermore, our estimator can optimally achieve a linear reduction with the number of test arms for a major part of the variance. Finally, we explore scenarios with perfect knowledge of the control arm and design estimators that further improve inference efficiency.

Updated: 2024-10-18 03:19:30

标题: 使用局部处理方法对马尔可夫决策过程进行实验

摘要: 利用随机实验评估短期治疗对短期结果的影响已经被充分理解并成为工业实践中的黄金标准。然而，随着服务系统变得越来越动态和个性化，很多关注点正在转向通过终身暴露于干预措施来最大化长期累积结果，例如客户终身价值。为了弥合这一差距，我们研究了将随机实验应用于被建模为马尔可夫决策过程（MDPs）的动态系统中。我们的目标是评估对相对短期观察到的长期累积奖励的治疗和对照策略的影响。我们首先开发了用于评估一般治疗模式效果的最佳推断技术。此外，认识到许多现实世界中的治疗往往是精细化和本地化的，以提高实际效率和操作方便，我们随后提出了利用这种本地化结构的方法，通过共享非目标状态的信息。我们的新估计有效地克服了一般治疗的方差下限，同时与考虑本地治疗结构的更严格下限相匹配。此外，我们的估计器可以在很大程度上实现随着测试臂数量的线性减少的方差。最后，我们探讨了对对照臂有完全了解的情况，并设计了进一步提高推断效率的估计器。

更新时间: 2024-10-18 03:19:30

领域: stat.ME,cs.LG,econ.EM,stat.AP,stat.ML

下载: http://arxiv.org/abs/2407.19618v2

$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization

Understanding and accurately following instructions is critical for large language models (LLMs) to be effective across diverse tasks. In this work, we rigorously examine the key factors that enable models to generalize to unseen instructions, providing insights to guide the collection of data for instruction-tuning. Through controlled experiments, inspired by the Turing-complete Markov algorithm, we demonstrate that such generalization $\textbf{only emerges}$ when training data is diversified enough across semantic domains. Our findings also reveal that merely diversifying within limited domains fails to ensure robust generalization. In contrast, cross-domain data diversification, even under constrained data budgets, significantly enhances a model's adaptability. We further extend our analysis to real-world scenarios, including fine-tuning of $\textit{$\textbf{specialist}$}$ and $\textit{$\textbf{generalist}$}$ models. In both cases, we demonstrate that 1) better performance can be achieved by increasing the diversity of an established dataset while keeping the data size constant, and 2) when scaling up the data, diversifying the semantics of instructions is more effective than simply increasing the quantity of similar data. Our research provides important insights for dataset collation, particularly when optimizing model performance by expanding training data for both specialist and generalist scenarios. We show that careful consideration of data diversification is key: training specialist models with data extending beyond their core domain leads to significant performance improvements, while generalist models benefit from diverse data mixtures that enhance their overall instruction-following capabilities across a wide range of applications. Our results highlight the critical role of strategic diversification and offer clear guidelines for improving data quality.

Updated: 2024-10-18 03:18:50

标题: Only-IF：揭示指导多样性对泛化的决定性影响

摘要: 理解并准确遵循指导是使大型语言模型（LLMs）在各种任务中有效的关键。在这项工作中，我们严格检查了使模型能够泛化到未见指导的关键因素，为指导调整数据收集提供了见解。通过受图灵完备马尔可夫算法启发的受控实验，我们证明了这种泛化只有在训练数据在语义领域上足够多样化时才会出现。我们的研究结果还表明，仅在有限领域内进行多样化并不能确保强大的泛化。相反，在受限数据预算下进行跨领域数据多样化显著增强了模型的适应性。我们进一步将分析扩展到包括专家和通用模型的微调在内的现实场景。在两种情况下，我们证明了通过增加已建立数据集的多样性而保持数据规模恒定可以实现更好的性能，并且在扩大数据规模时，通过多样化指导语义比简单增加相似数据的数量更有效。我们的研究为数据集整理提供了重要见解，特别是在通过扩展专家和通用情景的训练数据来优化模型性能时。我们表明，仔细考虑数据多样化是关键的：用超出其核心领域的数据训练专家模型可以显著提高性能，而通用模型受益于增强其在各种应用中整体遵循指导能力的多样数据混合。我们的结果突显了战略多样化的关键作用，并提供了改善数据质量的明确指导。

更新时间: 2024-10-18 03:18:50

领域: cs.CL,cs.AI,cs.LG,cs.SE

下载: http://arxiv.org/abs/2410.04717v3

Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition

Speech recognition is an essential start ring of human-computer interaction, and recently, deep learning models have achieved excellent success in this task. However, when the model training and private data provider are always separated, some security threats that make deep neural networks (DNNs) abnormal deserve to be researched. In recent years, the typical backdoor attacks have been researched in speech recognition systems. The existing backdoor methods are based on data poisoning. The attacker adds some incorporated changes to benign speech spectrograms or changes the speech components, such as pitch and timbre. As a result, the poisoned data can be detected by human hearing or automatic deep algorithms. To improve the stealthiness of data poisoning, we propose a non-neural and fast algorithm called Random Spectrogram Rhythm Transformation (RSRT) in this paper. The algorithm combines four steps to generate stealthy poisoned utterances. From the perspective of rhythm component transformation, our proposed trigger stretches or squeezes the mel spectrograms and recovers them back to signals. The operation keeps timbre and content unchanged for good stealthiness. Our experiments are conducted on two kinds of speech recognition tasks, including testing the stealthiness of poisoned samples by speaker verification and automatic speech recognition. The results show that our method has excellent effectiveness and stealthiness. The rhythm trigger needs a low poisoning rate and gets a very high attack success rate.

Updated: 2024-10-18 03:17:06

标题: 感知不到的节奏后门攻击：探索利用节奏转换在语音识别中嵌入无法检测的漏洞

摘要: 语音识别是人机交互的一个重要起点，最近，深度学习模型在这一任务中取得了出色的成功。然而，当模型训练和私人数据提供者总是分开时，一些导致深度神经网络异常的安全威胁值得研究。近年来，典型的后门攻击已经在语音识别系统中进行了研究。现有的后门方法是基于数据污染的。攻击者向良性语音频谱图中添加一些混合变化，或者改变语音组件，如音高和音色。结果，毒害数据可以被人类听觉或自动深度算法检测到。为了提高数据污染的隐蔽性，本文提出了一个非神经和快速算法，称为随机频谱节奏转换（RSRT）。该算法结合了四个步骤来生成隐蔽的毒害话语。从节奏组件转换的角度看，我们提出的触发器拉伸或挤压了mel频谱图，并将其恢复为信号。该操作保持音色和内容不变，以获得良好的隐蔽性。我们在两种语音识别任务上进行了实验，包括通过说话人验证和自动语音识别来测试毒害样本的隐蔽性。结果显示，我们的方法具有出色的有效性和隐蔽性。节奏触发器需要低毒害率，并获得非常高的攻击成功率。

更新时间: 2024-10-18 03:17:06

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.10932v3

Hierarchical Conditional Multi-Task Learning for Streamflow Modeling

Streamflow, vital for water resource management, is governed by complex hydrological systems involving intermediate processes driven by meteorological forces. While deep learning models have achieved state-of-the-art results of streamflow prediction, their end-to-end single-task learning approach often fails to capture the causal relationships within these systems. To address this, we propose Hierarchical Conditional Multi-Task Learning (HCMTL), a hierarchical approach that jointly models soil water and snowpack processes based on their causal connections to streamflow. HCMTL utilizes task embeddings to connect network modules, enhancing flexibility and expressiveness while capturing unobserved processes beyond soil water and snowpack. It also incorporates the Conditional Mini-Batch strategy to improve long time series modeling. We compare HCMTL with five baselines on a global dataset. HCMTL's superior performance across hundreds of drainage basins over extended periods shows that integrating domain-specific causal knowledge into deep learning enhances both prediction accuracy and interpretability. This is essential for advancing our understanding of complex hydrological systems and supporting efficient water resource management to mitigate natural disasters like droughts and floods.

Updated: 2024-10-18 03:14:57

标题: 分层条件多任务学习用于洪水流量建模

摘要: 流量对于水资源管理至关重要，受到由气象力量驱动的复杂水文系统的控制。虽然深度学习模型已经取得了流量预测的最先进结果，但它们的端到端单任务学习方法经常无法捕捉这些系统内部的因果关系。为了解决这个问题，我们提出了分层条件多任务学习（HCMTL），这是一种层次化方法，它基于土壤水和积雪过程与流量之间的因果关系共同建模。HCMTL利用任务嵌入来连接网络模块，增强了灵活性和表达力，同时捕捉了超越土壤水和积雪的未观察过程。它还结合了条件小批量策略来改进长时间序列建模。我们在全球数据集上将HCMTL与五个基线进行比较。HCMTL在数百个流域上超长时间段内的卓越性能表明，将领域特定的因果知识整合到深度学习中既提高了预测准确性，也增强了可解释性。这对于推进我们对复杂水文系统的理解以及支持高效的水资源管理以减轻像干旱和洪水这样的自然灾害是至关重要的。

更新时间: 2024-10-18 03:14:57

领域: cs.LG

下载: http://arxiv.org/abs/2410.14137v1

AutoPal: Autonomous Adaptation to Users for Personal AI Companionship

Previous research has demonstrated the potential of AI agents to act as companions that can provide constant emotional support for humans. In this paper, we emphasize the necessity of autonomous adaptation in personal AI companionship, an underexplored yet promising direction. Such adaptability is crucial as it can facilitate more tailored interactions with users and allow the agent to evolve in response to users' changing needs. However, imbuing agents with autonomous adaptability presents unique challenges, including identifying optimal adaptations to meet users' expectations and ensuring a smooth transition during the adaptation process. To address them, we devise a hierarchical framework, AutoPal, that enables controllable and authentic adjustments to the agent's persona based on user interactions. A personamatching dataset is constructed to facilitate the learning of optimal persona adaptations. Extensive experiments demonstrate the effectiveness of AutoPal and highlight the importance of autonomous adaptability in AI companionship.

Updated: 2024-10-18 03:10:13

标题: AutoPal：自主适应用户以实现个人AI伴侣关系

摘要: 先前的研究已经证明了人工智能代理具有作为伴侣的潜力，可以为人类提供持续的情感支持。在本文中，我们强调个人人工智能伴侣关键的自主适应性，这是一个未被充分探索但有前景的方向。这种适应性至关重要，因为它可以促进与用户更加个性化的互动，并允许代理根据用户不断变化的需求而进化。然而，赋予代理自主适应性面临着独特的挑战，包括确定最佳适应以满足用户期望，并确保在适应过程中实现平稳过渡。为了解决这些问题，我们设计了一个分层框架AutoPal，它能够基于用户互动实现可控和真实的调整代理的人格。构建了一个人格匹配数据集，以促进对最佳人格适应性的学习。大量实验证明了AutoPal的有效性，并强调了在人工智能伴侣关系中自主适应性的重要性。

更新时间: 2024-10-18 03:10:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13960v3

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.

Updated: 2024-10-18 03:06:45

标题: 一个数据自适应的先验用于在运算符中贝叶斯学习核

摘要: 核函数在表示非局部依赖性方面效率高，并且被广泛用于设计函数空间之间的运算符。因此，从数据中学习运算符中的核函数是一个具有普遍兴趣的逆问题。由于非局部依赖性，逆问题可能会受到数据相关的奇异反演算子的严重不适定性的影响。贝叶斯方法通过一个非退化的先验来克服不适定性。然而，当观测噪声变小时，固定的非退化先验会导致后验均值发散，如果数据在反演算子的零特征值的特征空间中引起扰动。我们引入了一个数据自适应先验，以实现一个稳定的后验，其均值总是具有一个小的噪声极限。数据自适应先验的协方差是通过L曲线方法选择的自适应于数据的反演算子的超参数。此外，我们对数据自适应先验的计算实践进行了详细分析，并在Toeplitz矩阵和积分算子上进行了演示。数值测试表明，固定的先验在存在四种类型的错误时（离散化误差、模型误差、部分观测和错误的噪声假设）可能会导致后验均值发散。相比之下，数据自适应先验总是获得具有小噪声极限的后验均值。

更新时间: 2024-10-18 03:06:45

领域: stat.ML,cs.LG,stat.CO,62F15, 47A52, 47B32

下载: http://arxiv.org/abs/2212.14163v2

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning. However, significant challenges remain: (1) Simply increasing the rank size of LoRA does not effectively capture high-rank information, which leads to a performance bottleneck.(2) MoE-style LoRA methods substantially increase parameters and inference latency, contradicting the goals of efficient fine-tuning and ease of application. To address these challenges, we introduce Mixture of Ranks (MoR), which learns rank-specific information for different tasks based on input and efficiently integrates multi-rank information. We firstly propose a new framework that equates the integration of multiple LoRAs to expanding the rank of LoRA. Moreover, we hypothesize that low-rank LoRA already captures sufficient intrinsic information, and MoR can derive high-rank information through mathematical transformations of the low-rank components. Thus, MoR can reduces the learning difficulty of LoRA and enhances its multi-task capabilities. MoR achieves impressive results, with MoR delivering a 1.31\% performance improvement while using only 93.93\% of the parameters compared to baseline methods.

Updated: 2024-10-18 03:05:01

标题: MoR：低秩自适应调整的秩混合

摘要: 低秩自适应（LoRA）推动研究，使其性能与完全微调对齐。然而，仍然存在重大挑战：（1）简单地增加LoRA的秩大小并不能有效捕捉高秩信息，导致性能瓶颈。（2）MoE风格的LoRA方法大大增加了参数和推理延迟，与高效微调和应用便捷的目标相矛盾。为了解决这些挑战，我们引入了Mixture of Ranks（MoR），它基于输入学习不同任务的秩特定信息，并有效地集成多秩信息。我们首先提出了一个新框架，将多个LoRA的集成等同于扩展LoRA的秩。此外，我们假设低秩LoRA已经捕捉到足够的固有信息，而MoR可以通过低秩组件的数学变换推导出高秩信息。因此，MoR可以降低LoRA的学习难度并增强其多任务能力。MoR取得了令人印象深刻的结果，MoR相较于基线方法仅使用93.93％的参数，提高了1.31％的性能。

更新时间: 2024-10-18 03:05:01

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.13408v2

UniAutoML: A Human-Centered Framework for Unified Discriminative and Generative AutoML with Large Language Models

Automated Machine Learning (AutoML) has simplified complex ML processes such as data pre-processing, model selection, and hyper-parameter searching. However, traditional AutoML frameworks focus solely on discriminative tasks, often falling short in tackling AutoML for generative models. Additionally, these frameworks lack interpretability and user engagement during the training process, primarily due to the absence of human-centered design. It leads to a lack of transparency in final decision-making and limited user control, potentially reducing trust and adoption of AutoML methods. To address these limitations, we introduce UniAutoML, a human-centered AutoML framework that leverages Large Language Models (LLMs) to unify AutoML for both discriminative (e.g., Transformers and CNNs for classification or regression tasks) and generative tasks (e.g., fine-tuning diffusion models or LLMs). The human-centered design of UniAutoML innovatively features a conversational user interface (CUI) that facilitates natural language interactions, providing users with real-time guidance, feedback, and progress updates for better interpretability. This design enhances transparency and user control throughout the AutoML training process, allowing users to seamlessly break down or modify the model being trained. To mitigate potential risks associated with LLM generated content, UniAutoML incorporates a safety guardline that filters inputs and censors outputs. We evaluated UniAutoML's performance and usability through experiments on eight diverse datasets and user studies involving 25 participants, demonstrating that UniAutoML not only enhances performance but also improves user control and trust. Our human-centered design bridges the gap between AutoML capabilities and user understanding, making ML more accessible to a broader audience.

Updated: 2024-10-18 03:03:01

标题: UniAutoML：一个以人为中心的框架，用于统一的具有大型语言模型的判别性和生成性AutoML

摘要: 自动机器学习（AutoML）已经简化了诸如数据预处理、模型选择和超参数搜索等复杂的机器学习过程。然而，传统的AutoML框架主要关注区分性任务，往往在处理生成模型的AutoML方面存在不足。此外，这些框架缺乏解释性和用户参与度，主要是由于缺乏以人为中心的设计。这导致最终决策的透明度不足，用户控制有限，可能降低对AutoML方法的信任和采用。为了解决这些限制，我们介绍了UniAutoML，一个以人为中心的AutoML框架，利用大型语言模型（LLMs）统一了区分性任务（例如，用于分类或回归任务的Transformer和CNN）和生成任务（例如，微调扩散模型或LLMs）的AutoML。UniAutoML的以人为中心的设计创新地采用了会话用户界面（CUI），促进自然语言交互，为用户提供实时指导、反馈和进度更新，以提高可解释性。这种设计增强了AutoML训练过程中的透明度和用户控制，使用户能够无缝地分解或修改正在训练的模型。为了减轻与LLM生成内容相关的潜在风险，UniAutoML集成了一个安全准则，过滤输入并审查输出。我们通过对八个不同数据集的实验和涉及25名参与者的用户研究评估了UniAutoML的性能和可用性，结果表明UniAutoML不仅提高了性能，还改善了用户控制和信任。我们的以人为中心的设计弥合了AutoML能力和用户理解之间的差距，使机器学习更易于被更广泛的受众使用。

更新时间: 2024-10-18 03:03:01

领域: cs.CL,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.12841v2

Inverse Reinforcement Learning from Non-Stationary Learning Agents

In this paper, we study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy. To address this problem, we propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function. Our method relies on a new variant of the behavior cloning algorithm, which we call bundle behavior cloning, and uses a small number of trajectories generated by the learning agent's policy at different points in time to learn a set of policies that match the distribution of actions observed in the sampled trajectories. We then use the cloned policies to train a neural network model that estimates the reward function of the learning agent. We provide a theoretical analysis to show a complexity result on bound guarantees for our method that beats standard behavior cloning as well as numerical experiments for a reinforcement learning problem that validate the proposed method.

Updated: 2024-10-18 03:02:44

标题: 来自非稳定学习代理的逆强化学习

摘要: 在这篇论文中，我们研究了一个逆强化学习问题，即利用学习代理在学习其最优策略时收集的轨迹数据来学习奖励函数。为了解决这个问题，我们提出了一种逆强化学习方法，可以估计学习代理的策略参数，从而估计其奖励函数。我们的方法依赖于一种新变种的行为克隆算法，称为束行为克隆，并使用学习代理在不同时间点生成的少量轨迹来学习一组与样本轨迹中观察到的动作分布相匹配的策略。然后，我们使用克隆的策略来训练一个神经网络模型，该模型可以估计学习代理的奖励函数。我们提供了一个理论分析，展示了我们的方法在保证范围上的复杂度结果，超过了标准行为克隆，以及一个验证所提出方法的强化学习问题的数值实验。

更新时间: 2024-10-18 03:02:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14135v1

DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection

Domain squatting poses a significant threat to Internet security, with attackers employing increasingly sophisticated techniques. This study introduces DomainLynx, an innovative compound AI system leveraging Large Language Models (LLMs) for enhanced domain squatting detection. Unlike existing methods focusing on predefined patterns for top-ranked domains, DomainLynx excels in identifying novel squatting techniques and protecting less prominent brands. The system's architecture integrates advanced data processing, intelligent domain pairing, and LLM-powered threat assessment. Crucially, DomainLynx incorporates specialized components that mitigate LLM hallucinations, ensuring reliable and context-aware detection. This approach enables efficient analysis of vast security data from diverse sources, including Certificate Transparency logs, Passive DNS records, and zone files. Evaluated on a curated dataset of 1,649 squatting domains, DomainLynx achieved 94.7\% accuracy using Llama-3-70B. In a month-long real-world test, it detected 34,359 squatting domains from 2.09 million new domains, outperforming baseline methods by 2.5 times. This research advances Internet security by providing a versatile, accurate, and adaptable tool for combating evolving domain squatting threats. DomainLynx's approach paves the way for more robust, AI-driven cybersecurity solutions, enhancing protection for a broader range of online entities and contributing to a safer digital ecosystem.

Updated: 2024-10-18 03:01:03

标题: DomainLynx：利用大型语言模型提升域名抢注检测

摘要: 域名抢注对互联网安全构成重大威胁，攻击者采用越来越复杂的技术。本研究介绍了DomainLynx，这是一个创新的复合人工智能系统，利用大型语言模型（LLMs）来增强域名抢注检测。与现有方法侧重于针对排名前列域名的预定义模式不同，DomainLynx在识别新型抢注技术和保护不太知名品牌方面表现出色。该系统的架构集成了先进的数据处理、智能域名配对和由LLM驱动的威胁评估。关键是，DomainLynx包含了专门的组件，用于减轻LLM的幻觉，确保可靠且具有上下文感知的检测。这种方法使得能够高效地分析来自各种来源的大量安全数据，包括证书透明度日志、被动DNS记录和区域文件。在一个包含1,649个抢注域名的策划数据集上进行评估，DomainLynx在使用Llama-3-70B时实现了94.7\%的准确率。在一个为期一个月的真实世界测试中，它从209万个新域名中检测到了34,359个抢注域名，比基准方法表现提高了2.5倍。这项研究通过提供一种多功能、准确和适应性强的工具来应对不断演变的域名抢注威胁，推进了互联网安全。DomainLynx的方法为更加强大的基于人工智能的网络安全解决方案铺平了道路，增强了对更广泛在线实体的保护，并有助于构建更安全的数字生态系统。

更新时间: 2024-10-18 03:01:03

领域: cs.CR

下载: http://arxiv.org/abs/2410.02095v2

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions. Further, we provide the first vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the stochastic environment. We side-step strong impossibility results from prior work by identifying a weaker (and, arguably, fairer) benchmark to compare against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.

Updated: 2024-10-18 03:00:10

标题: 具有装载和覆盖约束的上下文强盗问题：通过回归的模块化拉格朗日方法

摘要: 我们考虑具有线性约束的情境臂（CBwLC）的情境臂变体，其中算法消耗多个资源，受总消耗的线性约束限制。这个问题推广了具有背包约束的情境臂（CBwK），允许包装和覆盖约束，以及正资源和负资源消耗。我们提供了第一个基于回归预言的CBwLC（或CBwK）算法。该算法简单、计算效率高，并在温和假设下具有统计最优性。此外，我们提供了首个超越随机环境的CBwLC（或CBwK）渐进遗憾保证。我们通过识别一个更弱的（并且可以说更公平的）基准来回避先前工作中的强不可能结果。我们的算法建立在拉格朗日BwK（Immorlica等人，FOCS 2019）上，这是一个用于CBwK的基于拉格朗日的技术，以及SquareCB（Foster和Rakhlin，ICML 2020），这是一种基于回归的技术用于情境臂。我们的分析利用了这两种技术的固有模块化特性。

更新时间: 2024-10-18 03:00:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.07484v7

DomainDynamics: Lifecycle-Aware Risk Timeline Construction for Domain Names

The persistent threat posed by malicious domain names in cyber-attacks underscores the urgent need for effective detection mechanisms. Traditional machine learning methods, while capable of identifying such domains, often suffer from high false positive and false negative rates due to their extensive reliance on historical data. Conventional approaches often overlook the dynamic nature of domain names, the purposes and ownership of which may evolve, potentially rendering risk assessments outdated or irrelevant. To address these shortcomings, we introduce DomainDynamics, a novel system designed to predict domain name risks by considering their lifecycle stages. DomainDynamics constructs a timeline for each domain, evaluating the characteristics of each domain at various points in time to make informed, temporal risk determinations. In an evaluation experiment involving over 85,000 actual malicious domains from malware and phishing incidents, DomainDynamics demonstrated a significant improvement in detection rates, achieving an 82.58\% detection rate with a low false positive rate of 0.41\%. This performance surpasses that of previous studies and commercial services, improving detection capability substantially.

Updated: 2024-10-18 02:59:13

标题: 域动态：面向域名的生命周期感知风险时间线构建

摘要: 网络攻击中恶意域名带来的持续威胁凸显了对有效检测机制的迫切需求。传统的机器学习方法虽然能够识别此类域名，但往往因过度依赖历史数据而导致高假阳性和假阴性率。传统方法常常忽视域名的动态性质，其用途和所有权可能发生变化，可能使风险评估过时或无关紧要。为了解决这些缺点，我们引入了DomainDynamics，这是一个新颖的系统，旨在通过考虑域名的生命周期阶段来预测域名风险。DomainDynamics为每个域名构建一个时间线，在不同时间点评估每个域名的特征，以做出明智的、暂时性的风险判断。在一个涉及超过85,000个实际恶意域名的恶意软件和网络钓鱼事件的评估实验中，DomainDynamics表现出了明显的检测率提高，实现了82.58%的检测率，假阳性率低至0.41%。这一表现超越了以往研究和商业服务的水平，大大提高了检测能力。

更新时间: 2024-10-18 02:59:13

领域: cs.CR

下载: http://arxiv.org/abs/2410.02096v2

Deep Learning Applications in Medical Image Analysis: Advancements, Challenges, and Future Directions

Medical image analysis has emerged as an essential element of contemporary healthcare, facilitating physicians in achieving expedited and precise diagnosis. Recent breakthroughs in deep learning, a subset of artificial intelligence, have markedly revolutionized the analysis of medical pictures, improving the accuracy and efficiency of clinical procedures. Deep learning algorithms, especially convolutional neural networks (CNNs), have demonstrated remarkable proficiency in autonomously learning features from multidimensional medical pictures, including MRI, CT, and X-ray scans, without the necessity for manual feature extraction. These models have been utilized across multiple medical disciplines, including pathology, radiology, ophthalmology, and cardiology, where they aid in illness detection, classification, and segmentation tasks......

Updated: 2024-10-18 02:57:14

标题: 深度学习在医学图像分析中的应用：进展、挑战和未来方向

摘要: 医学图像分析已成为当代医疗保健的重要组成部分，帮助医生实现迅速和精确的诊断。深度学习，人工智能的一个子集，最近取得了突破性进展，显著改变了医学图像分析，提高了临床程序的准确性和效率。深度学习算法，特别是卷积神经网络（CNNs），已经显示出在自主学习多维医学图像特征方面具有卓越的能力，包括MRI、CT和X射线扫描，而无需手动特征提取。这些模型已被应用于多个医学领域，包括病理学、放射学、眼科学和心脏病学，在这些领域中，它们有助于疾病检测、分类和分割任务......

更新时间: 2024-10-18 02:57:14

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.14131v1

Disentangling Heterogeneous Knowledge Concept Embedding for Cognitive Diagnosis on Untested Knowledge

Cognitive diagnosis is a fundamental and critical task in learning assessment, which aims to infer students' proficiency on knowledge concepts from their response logs. Current works assume each knowledge concept will certainly be tested and covered by multiple exercises. However, whether online or offline courses, it's hardly feasible to completely cover all knowledge concepts in several exercises. Restricted tests lead to undiscovered knowledge deficits, especially untested knowledge concepts(UKCs). In this paper, we propose a novel framework for Cognitive Diagnosis called Disentangling Heterogeneous Knowledge Cognitive Diagnosis(DisKCD) on untested knowledge. Specifically, we leverage course grades, exercise questions, and learning resources to learn the potential representations of students, exercises, and knowledge concepts. In particular, knowledge concepts are disentangled into tested and untested based on the limiting actual exercises. We construct a heterogeneous relation graph network via students, exercises, tested knowledge concepts(TKCs), and UKCs. Then, through a hierarchical heterogeneous message-passing mechanism, the fine-grained relations are incorporated into the embeddings of the entities. Finally, the embeddings will be applied to multiple existing cognitive diagnosis models to infer students' proficiency on UKCs. Experimental results on real-world datasets show that the proposed model can effectively improve the performance of the task of diagnosing students' proficiency on UKCs. Our code is available at https://github.com/Hubuers/DisKCD.

Updated: 2024-10-18 02:57:01

标题: 解开异质知识概念嵌入以进行未测试知识的认知诊断

摘要: 认知诊断是学习评估中的一个基本和关键任务，旨在从学生的反应记录中推断他们对知识概念的熟练程度。目前的研究假设每个知识概念一定会被多个练习所测试和涵盖。然而，无论是在线还是线下课程，完全覆盖所有知识概念在几个练习中几乎是不可行的。受限的测试导致未发现的知识缺陷，特别是未测试的知识概念（UKCs）。在本文中，我们提出了一个针对未测试知识的认知诊断的新框架，称为Disentangling Heterogeneous Knowledge Cognitive Diagnosis（DisKCD）。具体而言，我们利用课程成绩、练习题和学习资源来学习学生、练习和知识概念的潜在表示。特别是，根据限制的实际练习，知识概念被解释为已测试和未测试。我们通过学生、练习、已测试知识概念（TKCs）和UKCs构建了一个异构关系图网络。然后，通过层次异构消息传递机制，精细的关系被整合到实体的嵌入中。最后，这些嵌入将被应用于多个现有的认知诊断模型，以推断学生对UKCs的熟练程度。在真实数据集上的实验结果显示，所提出的模型可以有效提高诊断学生对UKCs熟练程度的任务性能。我们的代码可以在 https://github.com/Hubuers/DisKCD 上找到。

更新时间: 2024-10-18 02:57:01

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.16003v2

DomainHarvester: Harvesting Infrequently Visited Yet Trustworthy Domain Names

In cybersecurity, allow lists play a crucial role in distinguishing safe websites from potential threats. Conventional methods for compiling allow lists, focusing heavily on website popularity, often overlook infrequently visited legitimate domains. This paper introduces DomainHarvester, a system aimed at generating allow lists that include trustworthy yet infrequently visited domains. By adopting an innovative bottom-up methodology that leverages the web's hyperlink structure, DomainHarvester identifies legitimate yet underrepresented domains. The system uses seed URLs to gather domain names, employing machine learning with a Transformer-based approach to assess their trustworthiness. DomainHarvester has developed two distinct allow lists: one with a global focus and another emphasizing local relevance. Compared to six existing top lists, DomainHarvester's allow lists show minimal overlaps, 4\% globally and 0.1\% locally, while significantly reducing the risk of including malicious domains, thereby enhancing security. The contributions of this research are substantial, illuminating the overlooked aspect of trustworthy yet underrepresented domains and introducing DomainHarvester, a system that goes beyond traditional popularity-based metrics. Our methodology enhances the inclusivity and precision of allow lists, offering significant advantages to users and businesses worldwide, especially in non-English speaking regions.

Updated: 2024-10-18 02:56:54

标题: DomainHarvester：收集不经常访问但值得信赖的域名

摘要: 在网络安全领域，允许列表在区分安全网站和潜在威胁方面发挥着至关重要的作用。传统的编制允许列表的方法主要侧重于网站的流行程度，往往忽视了访问频率较低的合法域名。本文介绍了DomainHarvester，这是一个旨在生成包含信任度高但访问频率较低的域名的允许列表的系统。通过采用一种创新的自下而上方法，利用网络的超链接结构，DomainHarvester识别出了合法但被低估的域名。该系统使用种子URL来收集域名，采用基于Transformer的机器学习方法来评估它们的信任度。DomainHarvester已经开发了两种不同的允许列表：一个全球焦点，另一个强调本地相关性。与六个现有的热门列表相比，DomainHarvester的允许列表显示出最小的重叠，全球为4％，本地为0.1％，同时显著降低了包含恶意域名的风险，从而增强了安全性。这项研究的贡献是巨大的，揭示了值得信赖但被低估的域名这一被忽视的方面，并引入了DomainHarvester，一个超越传统基于流行度指标的系统。我们的方法提升了允许列表的包容性和精准性，为全球用户和企业带来了重要优势，特别是在非英语地区。

更新时间: 2024-10-18 02:56:54

领域: cs.CR

下载: http://arxiv.org/abs/2410.02097v2

ACCEPT: Adaptive Codebook for Composite and Efficient Prompt Tuning

Prompt Tuning has been a popular Parameter-Efficient Fine-Tuning method attributed to its remarkable performance with few updated parameters on various large-scale pretrained Language Models (PLMs). Traditionally, each prompt has been considered indivisible and updated independently, leading the parameters increase proportionally as prompt length grows. To address this issue, we propose Adaptive Codebook for Composite and Efficient Prompt Tuning (ACCEPT). In our method, we refer to the concept of product quantization (PQ), allowing all soft prompts to share a set of learnable codebook vectors in each subspace, with each prompt differentiated by a set of adaptive weights. We achieve the superior performance on 17 diverse natural language tasks including natural language understanding (NLU) and question answering (QA) tasks by tuning only 0.3% of parameters of the PLMs. Our approach also excels in few-shot and large model settings, highlighting its significant potential.

Updated: 2024-10-18 02:56:32

标题: 接受：用于复合和高效提示调整的自适应码书

摘要: Prompt Tuning已成为一种流行的参数高效微调方法，因其在各种大规模预训练语言模型（PLMs）上仅更新少量参数即能展现出卓越性能。传统上，每个提示被认为是不可分割的，并且独立更新，导致参数随着提示长度的增加成比例增加。为了解决这个问题，我们提出了一种自适应编码本用于复合和高效提示微调（ACCEPT）的方法。在我们的方法中，我们参考了产品量化（PQ）的概念，允许所有软提示在每个子空间共享一组可学习的编码本向量，每个提示通过一组自适应权重进行区分。我们通过仅调整PLMs的0.3%参数在包括自然语言理解（NLU）和问答（QA）任务在内的17个不同自然语言任务中取得了卓越的表现。我们的方法在少样本和大模型设置中也表现出色，突显了其重要潜力。

更新时间: 2024-10-18 02:56:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.12847v2

On Subjective Uncertainty Quantification and Calibration in Natural Language Generation

Applications of large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging. This is due to the need to identify task-specific uncertainties (e.g., about the semantics) which appears difficult to define in general cases. This work addresses these challenges from a perspective of Bayesian decision theory, starting from the assumption that our utility is characterized by a similarity measure that compares a generated response with a hypothetical true response. We discuss how this assumption enables principled quantification of the model's subjective uncertainty and its calibration. We further derive a measure for epistemic uncertainty, based on a missing data perspective and its characterization as an excess risk. The proposed methods can be applied to black-box language models. We illustrate the methods on question answering and machine translation tasks. Our experiments provide a principled evaluation of task-specific calibration, and demonstrate that epistemic uncertainty offers a promising deferral strategy for efficient data acquisition in in-context learning.

Updated: 2024-10-18 02:55:27

标题: 关于自然语言生成中主观不确定性量化和校准的研究

摘要: 大型语言模型的应用通常涉及生成自由形式的响应，在这种情况下，不确定性量化变得具有挑战性。这是因为需要识别任务特定的不确定性（例如，关于语义），这在一般情况下很难定义。本文从贝叶斯决策理论的角度解决了这些挑战，假设我们的效用由一个相似性度量来描述，该度量将生成的响应与假定的真实响应进行比较。我们讨论了这一假设如何使模型的主观不确定性及其校准得到合理量化。我们进一步提出了一种基于缺失数据视角的认知不确定度度量，并将其描述为过度风险。所提出的方法可以应用于黑盒语言模型。我们在问答和机器翻译任务上演示了这些方法。我们的实验提供了任务特定校准的合理评估，并证明认知不确定性为在场景学习中高效数据采集提供了一种有前途的推迟策略。

更新时间: 2024-10-18 02:55:27

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.05213v2

An Evolved Universal Transformer Memory

Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs atop pre-trained transformers to provide different latent contexts focusing on the most relevant information for individual layers and attention heads. NAMMs are universally applicable to any model using self-attention as they condition exclusively on the values in the produced attention matrices. Learning NAMMs on a small set of problems, we achieve substantial performance improvements across multiple long-context benchmarks while cutting the model's input contexts up to a fraction of the original sizes. We show the generality of our conditioning enables zero-shot transfer of NAMMs trained only on language to entirely new transformer architectures even across input modalities, with their benefits carrying over to vision and reinforcement learning.

Updated: 2024-10-18 02:53:14

标题: 一个进化的通用Transformer存储器

摘要: 以前的方法提出通过手工设计规则删除现代基础模型的特定部分的上下文来抵消不断增加的成本，同时试图保持它们原始的性能。我们通过神经注意力记忆模型（NAMMs）克服了这种权衡，引入了一个学习网络用于内存管理，从而提高了transformers的性能和效率。我们在预训练transformers的基础上发展了NAMMs，提供了不同的潜在上下文，专注于为各个层和注意力头提供最相关的信息。NAMMs对任何使用自注意力的模型都具有普适性，因为它们仅仅是根据产生的注意力矩阵中的值来进行条件化。通过在一小组问题上学习NAMMs，我们在多个长上下文基准测试中实现了显著的性能改进，同时将模型的输入上下文缩减到原始大小的一小部分。我们展示了我们的条件化的普适性使得只在语言上训练的NAMMs能够零-shot转移到完全新的transformer架构，甚至跨输入模态，其好处也能延续到视觉和强化学习领域。

更新时间: 2024-10-18 02:53:14

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.13166v2

Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data

The ability to generate sentiment-controlled feedback in response to multimodal inputs comprising text and images addresses a critical gap in human-computer interaction. This capability allows systems to provide empathetic, accurate, and engaging responses, with useful applications in education, healthcare, marketing, and customer service. To this end, we have constructed a large-scale Controllable Multimodal Feedback Synthesis (CMFeed) dataset and propose a controllable feedback synthesis system. The system features an encoder, decoder, and controllability block for textual and visual inputs. It extracts features using a transformer and Faster R-CNN networks, combining them to generate feedback. The CMFeed dataset includes images, texts, reactions to the posts, human comments with relevance scores, and reactions to these comments. These reactions train the model to produce feedback with specified sentiments, achieving a sentiment classification accuracy of 77.23\%, which is 18.82\% higher than the accuracy without controllability. The system also incorporates a similarity module for assessing feedback relevance through rank-based metrics and an interpretability technique to analyze the contributions of textual and visual features during feedback generation. Access to the CMFeed dataset and the system's code is available at https://github.com/MIntelligence-Group/CMFeed.

Updated: 2024-10-18 02:50:53

标题: 合成情感控制反馈用于多模态文本和图像数据

摘要: 在回应由文本和图像组成的多模态输入时生成情感控制反馈的能力填补了人机交互中的一个关键空白。这种能力使系统能够提供富有同理心、准确且引人入胜的反馈，在教育、医疗、营销和客户服务等领域具有实用应用。为此，我们构建了一个大规模的可控多模态反馈合成（CMFeed）数据集并提出了一个可控反馈合成系统。该系统具有用于文本和视觉输入的编码器、解码器和可控性模块。它使用Transformer和Faster R-CNN网络提取特征，将它们结合起来生成反馈。CMFeed数据集包括图像、文本、对帖子的反应、具有相关性评分的人类评论以及对这些评论的反应。这些反应训练模型产生具有指定情感的反馈，实现了77.23\%的情感分类准确率，比没有可控性时的准确率高出18.82%。该系统还整合了一个相似性模块，通过基于排名的指标评估反馈的相关性，并采用可解释性技术分析在反馈生成过程中文本和视觉特征的贡献。可以在https://github.com/MIntelligence-Group/CMFeed 上访问CMFeed数据集和系统代码。

更新时间: 2024-10-18 02:50:53

领域: cs.MM,cs.AI

下载: http://arxiv.org/abs/2402.07640v3

CPT: Competence-progressive Training Strategy for Few-shot Node Classification

Graph Neural Networks (GNNs) have made significant advancements in node classification, but their success relies on sufficient labeled nodes per class in the training data. Real-world graph data often exhibits a long-tail distribution with sparse labels, emphasizing the importance of GNNs' ability in few-shot node classification, which entails categorizing nodes with limited data. Traditional episodic meta-learning approaches have shown promise in this domain, but they face an inherent limitation: it might lead the model to converge to suboptimal solutions because of random and uniform task assignment, ignoring task difficulty levels. This could lead the meta-learner to face complex tasks too soon, hindering proper learning. Ideally, the meta-learner should start with simple concepts and advance to more complex ones, like human learning. So, we introduce CPT, a novel two-stage curriculum learning method that aligns task difficulty with the meta-learner's progressive competence, enhancing overall performance. Specifically, in CPT's initial stage, the focus is on simpler tasks, fostering foundational skills for engaging with complex tasks later. Importantly, the second stage dynamically adjusts task difficulty based on the meta-learner's growing competence, aiming for optimal knowledge acquisition. Extensive experiments on popular node classification datasets demonstrate significant improvements of our strategy over existing methods.

Updated: 2024-10-18 02:45:18

标题: CPT：面向少样本节点分类的能力逐步训练策略

摘要: 图神经网络（GNNs）在节点分类方面取得了显著进展，但它们的成功取决于训练数据中每个类别有足够标记的节点。现实世界中的图数据通常呈现长尾分布，标签稀疏，强调了GNNs在少样本节点分类方面的重要性，这意味着对具有有限数据的节点进行分类。传统的情节元学习方法在这个领域显示出了潜力，但它们面临一个固有的限制：可能会导致模型收敛到次优解，因为随机和均匀的任务分配，忽略了任务难度级别。这可能会导致元学习者过早面临复杂任务，阻碍正确学习。理想情况下，元学习者应该从简单的概念开始，并逐渐进阶到更复杂的概念，就像人类学习一样。因此，我们介绍了CPT，一种新颖的两阶段课程学习方法，将任务难度与元学习者逐渐的能力相一致，提高整体性能。具体来说，在CPT的初始阶段，重点放在更简单的任务上，培养基础技能以便后续处理复杂任务。重要的是，第二阶段根据元学习者不断增长的能力动态调整任务难度，旨在实现最佳的知识获取。对流行的节点分类数据集进行的大量实验证明了我们的策略相对于现有方法的显著改进。

更新时间: 2024-10-18 02:45:18

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2402.00450v3

Estimating the Causal Effects of T Cell Receptors

A central question in human immunology is how a patient's repertoire of T cells impacts disease. Here, we introduce a method to infer the causal effects of T cell receptor (TCR) sequences on patient outcomes using observational TCR repertoire sequencing data and clinical outcomes data. Our approach corrects for unobserved confounders, such as a patient's environment and life history, by using the patient's immature, pre-selection TCR repertoire. The pre-selection repertoire can be estimated from nonproductive TCR data, which is widely available. It is generated by a randomized mutational process, V(D)J recombination, which provides a natural experiment. We show formally how to use the pre-selection repertoire to draw causal inferences, and develop a scalable neural-network estimator for our identification formula. Our method produces an estimate of the effect of interventions that add a specific TCR sequence to patient repertoires. As a demonstration, we use it to analyze the effects of TCRs on COVID-19 severity, uncovering potentially therapeutic TCRs that are (1) observed in patients, (2) bind SARS-CoV-2 antigens in vitro and (3) have strong positive effects on clinical outcomes.

Updated: 2024-10-18 02:45:14

标题: 估计T细胞受体的因果效应

摘要: 人类免疫学中的一个核心问题是患者的T细胞库对疾病的影响。在这里，我们引入了一种方法，利用观察到的T细胞受体（TCR）库测序数据和临床结果数据推断T细胞受体序列对患者结果的因果效应。我们的方法通过使用患者未成熟的、在选择之前的TCR库来校正未观察到的混杂因素，如患者的环境和生活史。未成熟的库可以通过广泛可获得的非生产性TCR数据来估计。它是通过随机突变过程V(D)J重组生成的，这提供了一个自然实验。我们正式展示了如何使用未成熟的库进行因果推断，并为我们的识别公式开发了一个可扩展的神经网络估计器。我们的方法产生了对将特定TCR序列添加到患者库中的干预效果的估计。作为示范，我们使用它来分析TCR对COVID-19严重程度的影响，揭示了在患者中观察到的、体外与SARS-CoV-2抗原结合的潜在治疗性TCR，以及对临床结果产生强烈正面影响。

更新时间: 2024-10-18 02:45:14

领域: stat.ML,cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2410.14127v1

TimeSeriesExam: A time series understanding exam

Large Language Models (LLMs) have recently demonstrated a remarkable ability to model time series data. These capabilities can be partly explained if LLMs understand basic time series concepts. However, our knowledge of what these models understand about time series data remains relatively limited. To address this gap, we introduce TimeSeriesExam, a configurable and scalable multiple-choice question exam designed to assess LLMs across five core time series understanding categories: pattern recognition, noise understanding, similarity analysis, anomaly detection, and causality analysis. TimeSeriesExam comprises of over 700 questions, procedurally generated using 104 carefully curated templates and iteratively refined to balance difficulty and their ability to discriminate good from bad models. We test 7 state-of-the-art LLMs on the TimeSeriesExam and provide the first comprehensive evaluation of their time series understanding abilities. Our results suggest that closed-source models such as GPT-4 and Gemini understand simple time series concepts significantly better than their open-source counterparts, while all models struggle with complex concepts such as causality analysis. We believe that the ability to programatically generate questions is fundamental to assessing and improving LLM's ability to understand and reason about time series data.

Updated: 2024-10-18 02:37:14

标题: 时间序列考试：TimeSeriesExam

摘要: 大型语言模型（LLMs）最近展示了在建模时间序列数据方面的显著能力。这些能力在一定程度上可以解释LLMs是否理解基本的时间序列概念。然而，我们对这些模型对时间序列数据的理解仍然相对有限。为了填补这一空白，我们引入了TimeSeriesExam，这是一个可配置和可扩展的多项选择题考试，旨在评估LLMs在五个核心时间序列理解类别上的能力：模式识别、噪声理解、相似性分析、异常检测和因果分析。TimeSeriesExam包含超过700道问题，使用104个精心策划的模板进行程序化生成，并经过迭代改进，以平衡难度和它们区分好坏模型的能力。我们在TimeSeriesExam上测试了7种最先进的LLMs，并对它们的时间序列理解能力进行了首次全面评估。我们的结果表明，像GPT-4和Gemini这样的闭源模型对简单的时间序列概念理解要比它们的开源对应模型好得多，而所有模型在复杂概念如因果分析方面都存在困难。我们相信，能够以编程方式生成问题对于评估和改进LLMs理解和推理时间序列数据的能力是至关重要的。

更新时间: 2024-10-18 02:37:14

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14752v1

JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework

Despite advancements in enhancing LLM safety against jailbreak attacks, evaluating LLM defenses remains a challenge, with current methods often lacking explainability and generalization to complex scenarios, leading to incomplete assessments (e.g., direct judgment without reasoning, low F1 score of GPT-4 in complex cases, bias in multilingual scenarios). To address this, we present JAILJUDGE, a comprehensive benchmark featuring diverse risk scenarios, including synthetic, adversarial, in-the-wild, and multilingual prompts, along with high-quality human-annotated datasets. The JAILJUDGE dataset includes over 35k+ instruction-tune data with reasoning explainability and JAILJUDGETEST, a 4.5k+ labeled set for risk scenarios, and a 6k+ multilingual set across ten languages. To enhance evaluation with explicit reasoning, we propose the JailJudge MultiAgent framework, which enables explainable, fine-grained scoring (1 to 10). This framework supports the construction of instruction-tuning ground truth and facilitates the development of JAILJUDGE Guard, an end-to-end judge model that provides reasoning and eliminates API costs. Additionally, we introduce JailBoost, an attacker-agnostic attack enhancer, and GuardShield, a moderation defense, both leveraging JAILJUDGE Guard. Our experiments demonstrate the state-of-the-art performance of JailJudge methods (JailJudge MultiAgent, JAILJUDGE Guard) across diverse models (e.g., GPT-4, Llama-Guard) and zero-shot scenarios. JailBoost and GuardShield significantly improve jailbreak attack and defense tasks under zero-shot settings, with JailBoost enhancing performance by 29.24% and GuardShield reducing defense ASR from 40.46% to 0.15%.

Updated: 2024-10-18 02:35:22

标题: JAILJUDGE：具有多智能增强解释评估框架的全面越狱评估基准

摘要: 尽管在增强LLM抗越狱攻击方面取得了进展，但评估LLM防御仍然是一个挑战，当前的方法通常缺乏解释性和泛化到复杂场景，导致评估不完整（例如，在复杂情况下直接判断而无理由，GPT-4的F1分数较低，多语言情景中存在偏见）。为了解决这个问题，我们提出了JAILJUDGE，一个包含各种风险场景的综合基准测试，包括合成、对抗性、野外和多语言提示，以及高质量的人工标注数据集。JAILJUDGE数据集包括超过35k+的指令调整数据，具有推理解释性，以及一个4500+标记的风险场景集和一个跨十种语言的6000+多语言集。为了增强评估的明确推理，我们提出了JailJudge MultiAgent框架，它可以实现可解释的、细粒度的评分（1到10）。该框架支持指令调整地面真相的构建，并促进了JAILJUDGE Guard的开发，这是一个提供推理并消除API成本的端到端评估模型。此外，我们引入了JailBoost，一个攻击者不可知的攻击增强器，和GuardShield，一个调节防御，两者都利用了JAILJUDGE Guard。我们的实验表明，JailJudge方法（JailJudge MultiAgent、JAILJUDGE Guard）在各种模型（例如GPT-4、Llama-Guard）和零样本情况下表现出最先进的性能。在零样本设置下，JailBoost和GuardShield显著提高了越狱攻击和防御任务的性能，JailBoost的性能提高了29.24%，GuardShield将防御ASR从40.46%降至0.15%。

更新时间: 2024-10-18 02:35:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.12855v2

Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation

Recent advancements in Automatic Piano Transcription (APT) have significantly improved system performance, but the impact of noisy environments on the system performance remains largely unexplored. This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models and evaluates the performance of the Onsets and Frames model when trained on noise-augmented data. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.

Updated: 2024-10-18 02:31:36

标题: 朝向稳健的转录：探索噪声注入策略用于训练数据增强

摘要: 最近在自动钢琴转录（APT）方面取得了显著进展，系统性能得到了显著改善，但是嘈杂环境对系统性能的影响仍然很少被探索。本研究调查了不同信噪比（SNR）级别的白噪声对最先进的APT模型的影响，并评估了当在噪声增强数据上训练时Onsets and Frames模型的性能。我们希望这项研究能够提供有价值的见解，作为发展转录模型以在各种声学条件下保持一致性性能的初步工作。

更新时间: 2024-10-18 02:31:36

领域: cs.SD,cs.AI,cs.IR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.14122v1

FedMSE: Federated learning for IoT network intrusion detection

This paper proposes a novel federated learning approach for improving IoT network intrusion detection. The rise of IoT has expanded the cyber attack surface, making traditional centralized machine learning methods insufficient due to concerns about data availability, computational resources, transfer costs, and especially privacy preservation. A semi-supervised federated learning model was developed to overcome these issues, combining the Shrink Autoencoder and Centroid one-class classifier (SAE-CEN). This approach enhances the performance of intrusion detection by effectively representing normal network data and accurately identifying anomalies in the decentralized strategy. Additionally, a mean square error-based aggregation algorithm (MSEAvg) was introduced to improve global model performance by prioritizing more accurate local models. The results obtained in our experimental setup, which uses various settings relying on the N-BaIoT dataset and Dirichlet distribution, demonstrate significant improvements in real-world heterogeneous IoT networks in detection accuracy from 93.98$\pm$2.90 to 97.30$\pm$0.49, reduced learning costs when requiring only 50\% of gateways participating in the training process, and robustness in large-scale networks.

Updated: 2024-10-18 02:23:57

标题: FedMSE：物联网网络入侵检测的联邦学习

摘要: 本文提出了一种新颖的联邦学习方法，用于改进物联网网络入侵检测。物联网的兴起扩大了网络攻击面，使得传统的集中式机器学习方法由于数据可用性、计算资源、传输成本和隐私保护等问题而不足。开发了一种半监督的联邦学习模型，结合了Shrink自编码器和质心单类分类器（SAE-CEN）。这种方法通过有效表示正常网络数据并准确识别分布式策略中的异常，提高了入侵检测的性能。此外，引入了基于均方误差的聚合算法（MSEAvg）来通过优先考虑更准确的本地模型来改善全局模型的性能。在我们的实验设置中，使用N-BaIoT数据集和狄利克雷分布依赖于各种设置，结果显示在真实世界的异构物联网网络中，检测准确性从93.98±2.90提高到97.30±0.49，学习成本降低，只需要50%的网关参与训练过程，并且在大规模网络中表现出鲁棒性。

更新时间: 2024-10-18 02:23:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14121v1

Discrete Messages Improve Communication Efficiency among Isolated Intelligent Agents

Individuals, despite having varied life experiences and learning processes, can communicate effectively through languages. This study aims to explore the efficiency of language as a communication medium. We put forth two specific hypotheses: First, discrete messages are more effective than continuous ones when agents have diverse personal experiences. Second, communications using multiple discrete tokens are more advantageous than those using a single token. To valdate these hypotheses, we designed multi-agent machine learning experiments to assess communication efficiency using various information transmission methods between speakers and listeners. Our empirical findings indicate that, in scenarios where agents are exposed to different data, communicating through sentences composed of discrete tokens offers the best inter-agent communication efficiency. The limitations of our finding include lack of systematic advantages over other more sophisticated encoder-decoder model such as variational autoencoder and lack of evluation on non-image dataset, which we will leave for future studies.

Updated: 2024-10-18 02:22:19

标题: 离散消息提高孤立智能体之间的通信效率

摘要: 个体，尽管有不同的生活经历和学习过程，可以通过语言有效地进行沟通。本研究旨在探讨语言作为沟通媒介的效率。我们提出了两个具体的假设：第一，当代理人有不同的个人经验时，离散消息比连续消息更有效。第二，使用多个离散标记进行通信比使用单个标记更有优势。为了验证这些假设，我们设计了多智能体机器学习实验，评估说话者和听众之间使用各种信息传输方法的沟通效率。我们的实证结果表明，在代理人暴露于不同数据的情况下，通过由离散标记组成的句子进行沟通提供了最佳的智能体间通信效率。我们的发现的局限性包括缺乏与其他更复杂的编码器-解码器模型（如变分自动编码器）的系统优势以及对非图像数据集的评估，这些将留待未来研究。

更新时间: 2024-10-18 02:22:19

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2312.15985v3

Residual-INR: Communication Efficient On-Device Learning Using Implicit Neural Representation

Edge computing is a distributed computing paradigm that collects and processes data at or near the source of data generation. The on-device learning at edge relies on device-to-device wireless communication to facilitate real-time data sharing and collaborative decision-making among multiple devices. This significantly improves the adaptability of the edge computing system to the changing environments. However, as the scale of the edge computing system is getting larger, communication among devices is becoming the bottleneck because of the limited bandwidth of wireless communication leads to large data transfer latency. To reduce the amount of device-to-device data transmission and accelerate on-device learning, in this paper, we propose Residual-INR, a fog computing-based communication-efficient on-device learning framework by utilizing implicit neural representation (INR) to compress images/videos into neural network weights. Residual-INR enhances data transfer efficiency by collecting JPEG images from edge devices, compressing them into INR format at the fog node, and redistributing them for on-device learning. By using a smaller INR for full image encoding and a separate object INR for high-quality object region reconstruction through residual encoding, our technique can reduce the encoding redundancy while maintaining the object quality. Residual-INR is a promising solution for edge on-device learning because it reduces data transmission by up to 5.16 x across a network of 10 edge devices. It also facilitates CPU-free accelerated on-device learning, achieving up to 2.9 x speedup without sacrificing accuracy. Our code is available at: https://github.com/sharclab/Residual-INR.

Updated: 2024-10-18 02:15:51

标题: 残差-INR：使用隐式神经表示进行高效的设备端学习通信

摘要: 边缘计算是一种分布式计算范式，它在数据产生的源头或附近收集和处理数据。边缘设备上的学习依赖于设备之间的无线通信，以促进多个设备之间的实时数据共享和协作决策。这显著提高了边缘计算系统对不断变化环境的适应性。然而，随着边缘计算系统规模的扩大，由于有限的无线通信带宽，设备之间的通信成为瓶颈，导致数据传输延迟较大。为了减少设备间数据传输量并加速设备上的学习，在本文中，我们提出了Residual-INR，一种基于雾计算的通信高效的设备上学习框架，利用隐式神经表示（INR）将图像/视频压缩成神经网络权重。Residual-INR通过从边缘设备收集JPEG图像，在雾节点将其压缩成INR格式，并重新分发它们进行设备上学习，增强了数据传输效率。通过使用较小的INR进行完整图像编码和单独的对象INR进行高质量对象区域重建，通过残差编码，我们的技术可以减少编码冗余，同时保持对象质量。Residual-INR是边缘设备上学习的一个有前途的解决方案，因为它可以在10个边缘设备网络中将数据传输减少高达5.16倍。它还促进了无需CPU加速的设备上学习，实现了高达2.9倍的加速而不损失准确性。我们的代码可在以下链接找到：https://github.com/sharclab/Residual-INR。

更新时间: 2024-10-18 02:15:51

领域: cs.LG,cs.AI,cs.CV,cs.DC,cs.IT,math.IT

下载: http://arxiv.org/abs/2408.05617v2

Skill Generalization with Verbs

It is imperative that robots can understand natural language commands issued by humans. Such commands typically contain verbs that signify what action should be performed on a given object and that are applicable to many objects. We propose a method for generalizing manipulation skills to novel objects using verbs. Our method learns a probabilistic classifier that determines whether a given object trajectory can be described by a specific verb. We show that this classifier accurately generalizes to novel object categories with an average accuracy of 76.69% across 13 object categories and 14 verbs. We then perform policy search over the object kinematics to find an object trajectory that maximizes classifier prediction for a given verb. Our method allows a robot to generate a trajectory for a novel object based on a verb, which can then be used as input to a motion planner. We show that our model can generate trajectories that are usable for executing five verb commands applied to novel instances of two different object categories on a real robot.

Updated: 2024-10-18 02:12:18

标题: 动词的技能泛化

摘要: 机器人能够理解人类发出的自然语言命令至关重要。这些命令通常包含表示应对给定对象执行何种动作的动词，并适用于许多对象。我们提出了一种方法，通过动词将操纵技能泛化到新对象。我们的方法学习了一个概率分类器，用于确定给定对象轨迹是否可以用特定动词描述。我们展示了该分类器准确地泛化到新的对象类别，跨越13个对象类别和14个动词的平均准确度为76.69%。然后，我们通过对对象运动学进行策略搜索，找到一条最大化给定动词的分类器预测的对象轨迹。我们的方法允许机器人基于动词为新对象生成轨迹，然后可以将其用作运动规划器的输入。我们展示了我们的模型可以生成可用于在真实机器人上执行应用于两个不同对象类别的新实例的五个动词命令的轨迹。

更新时间: 2024-10-18 02:12:18

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14118v1

Contextual Linear Optimization with Bandit Feedback

Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients and thereby improve average-cost performance. An example is the stochastic shortest path problem with random edge costs (e.g., traffic) and contextual features (e.g., lagged traffic, weather). Existing work on CLO assumes the data has fully observed cost coefficient vectors, but in many applications, we can only see the realized cost of a historical decision, that is, just one projection of the random cost coefficient vector, to which we refer as bandit feedback. We study a class of offline learning algorithms for CLO with bandit feedback, which we term induced empirical risk minimization (IERM), where we fit a predictive model to directly optimize the downstream performance of the policy it induces. We show a fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of the optimization estimate, and we develop computationally tractable surrogate losses. A byproduct of our theory of independent interest is fast-rate regret bound for IERM with full feedback and misspecified policy class. We compare the performance of different modeling choices numerically using a stochastic shortest path example and provide practical insights from the empirical results.

Updated: 2024-10-18 02:02:28

标题: 使用赌博反馈的情境下的线性优化

摘要: 上下文线性优化（CLO）利用预测性上下文特征来减少随机成本系数的不确定性，从而提高平均成本性能。一个例子是具有随机边缘成本（例如，交通）和上下文特征（例如，滞后交通，天气）的随机最短路径问题。现有的CLO工作假设数据具有完全观察的成本系数向量，但在许多应用中，我们只能看到历史决策的实现成本，也就是说，随机成本系数向量的一个投影，我们称之为强盗反馈。我们研究了一类针对具有强盗反馈的CLO的离线学习算法，我们将其称为诱导经验风险最小化（IERM），其中我们拟合一个预测模型，以直接优化其诱导的策略的下游性能。我们展示了IERM的快速遗憾界限，允许错误指定的模型类和优化估计的灵活选择，并开发了可计算的替代损失。我们独立感兴趣的理论副产品是IERM的快速遗憾界限，具有完整反馈和错误指定的政策类。我们使用随机最短路径示例在数值上比较不同建模选择的性能，并提供实证结果的实用见解。

更新时间: 2024-10-18 02:02:28

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.16564v2

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores

Large language models (LLMs) have been widely applied but face challenges in efficient inference. While quantization methods reduce computational demands, ultra-low bit quantization with arbitrary precision is hindered by limited GPU Tensor Core support and inefficient memory management, leading to suboptimal acceleration. To address these challenges, we propose a comprehensive acceleration scheme for arbitrary precision LLMs. At its core, we introduce a novel bipolar-INT data format that facilitates parallel computing and supports symmetric quantization, effectively reducing data redundancy. Building on this, we implement an arbitrary precision matrix multiplication scheme that decomposes and recovers matrices at the bit level, enabling flexible precision while maximizing GPU Tensor Core utilization. Furthermore, we develop an efficient matrix preprocessing method that optimizes data layout for subsequent computations. Finally, we design a data recovery-oriented memory management system that strategically utilizes fast shared memory, significantly enhancing kernel execution speed and minimizing memory access latency. Experimental results demonstrate our approach's effectiveness, with up to 2.4\times speedup in matrix multiplication compared to NVIDIA's CUTLASS. When integrated into LLMs, we achieve up to 6.7\times inference acceleration. These improvements significantly enhance LLM inference efficiency, enabling broader and more responsive applications of LLMs.

Updated: 2024-10-18 02:01:18

标题: 在GPU张量核上为大型语言模型实现高效的任意精度加速

摘要: 大型语言模型（LLMs）已被广泛应用，但在高效推理方面面临挑战。尽管量化方法可以降低计算需求，但使用任意精度的超低比特量化受到有限的GPU Tensor Core支持和低效的内存管理的限制，导致加速效果不佳。为了解决这些挑战，我们提出了一种全面的任意精度LLMs加速方案。在其核心，我们引入了一种新型的双极整数数据格式，有助于并行计算并支持对称量化，有效减少数据冗余。在此基础上，我们实现了一种任意精度矩阵乘法方案，可以在比特级别分解和恢复矩阵，实现灵活精度同时最大化GPU Tensor Core的利用率。此外，我们开发了一种有效的矩阵预处理方法，优化数据布局以供后续计算使用。最后，我们设计了一种以数据恢复为导向的内存管理系统，战略性地利用快速共享内存，显著提高内核执行速度并最小化内存访问延迟。实验结果证明了我们的方法的有效性，与NVIDIA的CUTLASS相比，在矩阵乘法中可以实现高达2.4倍的加速。将其整合到LLMs中，我们实现了高达6.7倍的推理加速。这些改进显著提高了LLMs推理效率，使LLMs的应用范围更广泛且更具响应性。

更新时间: 2024-10-18 02:01:18

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2409.17870v2

A Communication and Computation Efficient Fully First-order Method for Decentralized Bilevel Optimization

Bilevel optimization, crucial for hyperparameter tuning, meta-learning and reinforcement learning, remains less explored in the decentralized learning paradigm, such as decentralized federated learning (DFL). Typically, decentralized bilevel methods rely on both gradients and Hessian matrices to approximate hypergradients of upper-level models. However, acquiring and sharing the second-order oracle is compute and communication intensive. % and sharing this information incurs heavy communication overhead. To overcome these challenges, this paper introduces a fully first-order decentralized method for decentralized Bilevel optimization, $\text{C}^2$DFB which is both compute- and communicate-efficient. In $\text{C}^2$DFB, each learning node optimizes a min-min-max problem to approximate hypergradient by exclusively using gradients information. To reduce the traffic load at the inner-loop of solving the lower-level problem, $\text{C}^2$DFB incorporates a lightweight communication protocol for efficiently transmitting compressed residuals of local parameters. % during the inner loops. Rigorous theoretical analysis ensures its convergence % of the algorithm, indicating a first-order oracle calls of $\tilde{\mathcal{O}}(\epsilon^{-4})$. Experiments on hyperparameter tuning and hyper-representation tasks validate the superiority of $\text{C}^2$DFB across various typologies and heterogeneous data distributions.

Updated: 2024-10-18 02:00:45

标题: 一种通信和计算效率高的完全一阶分布式双层优化方法

摘要: 双层优化在超参数调整、元学习和强化学习中至关重要，在去中心化学习范式中，如去中心化联邦学习（DFL）中仍未得到充分探讨。通常，去中心化双层方法依赖于梯度和Hessian矩阵来近似上层模型的超梯度。然而，获取和共享二阶Oracle计算和通信密集，并且共享此信息会产生沉重的通信开销。为了克服这些挑战，本文介绍了一种完全一阶的去中心化双层优化方法，C^2DFB，既节约计算又高效通信。在C^2DFB中，每个学习节点通过仅使用梯度信息优化一个最小-最小-最大问题来近似超梯度。为了减少解决低层问题内部循环的流量负载，C^2DFB结合了一种轻量级通信协议，用于有效传输本地参数的压缩残差。严格的理论分析确保其算法的收敛性，表明一阶Oracle调用为O(ε^-4)。对超参数调整和超表示任务的实验验证了C^2DFB在各种拓扑和异构数据分布中的优越性。

更新时间: 2024-10-18 02:00:45

领域: cs.LG,cs.AI,cs.DC,math.OC

下载: http://arxiv.org/abs/2410.14115v1

Path-based Explanation for Knowledge Graph Completion

Graph Neural Networks (GNNs) have achieved great success in Knowledge Graph Completion (KGC) by modelling how entities and relations interact in recent years. However, the explanation of the predicted facts has not caught the necessary attention. Proper explanations for the results of GNN-based KGC models increase model transparency and help researchers develop more reliable models. Existing practices for explaining KGC tasks rely on instance/subgraph-based approaches, while in some scenarios, paths can provide more user-friendly and interpretable explanations. Nonetheless, the methods for generating path-based explanations for KGs have not been well-explored. To address this gap, we propose Power-Link, the first path-based KGC explainer that explores GNN-based models. We design a novel simplified graph-powering technique, which enables the generation of path-based explanations with a fully parallelisable and memory-efficient training scheme. We further introduce three new metrics for quantitative evaluation of the explanations, together with a qualitative human evaluation. Extensive experiments demonstrate that Power-Link outperforms the SOTA baselines in interpretability, efficiency, and scalability.

Updated: 2024-10-18 01:58:39

标题: 基于路径的知识图补全解释

摘要: 图神经网络（GNNs）近年来在知识图完成（KGC）领域取得了巨大的成功，通过建模实体和关系之间的相互作用。然而，对预测事实的解释尚未引起必要的关注。基于GNN的KGC模型结果的适当解释可以增加模型的透明度，并帮助研究人员开发更可靠的模型。现有的解释KGC任务的做法依赖于基于实例/子图的方法，而在某些场景中，路径可以提供更加用户友好和可解释的解释。然而，生成基于路径的知识图解释的方法尚未得到很好的探索。为了填补这一空白，我们提出Power-Link，这是第一个探索基于GNN模型的基于路径的KGC解释器。我们设计了一种新颖的简化图功率技术，可以通过完全可并行化和内存有效的训练方案生成基于路径的解释。我们进一步引入了三个用于定量评估解释的新指标，以及一个定性的人类评估。广泛的实验证明，Power-Link在解释性、效率和可扩展性方面优于SOTA基线。

更新时间: 2024-10-18 01:58:39

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2401.02290v2

Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement

The rapid advancement of large language models (LLMs) has significantly enhanced the capabilities of AI-driven agents across various tasks. However, existing agentic systems, whether based on fixed pipeline algorithms or pre-defined meta-learning frameworks, cannot search the whole agent design space due to the restriction of human-designed components, and thus might miss the globally optimal agent design. In this paper, we introduce G\"odel Agent, a self-evolving framework inspired by the G\"odel machine, enabling agents to recursively improve themselves without relying on predefined routines or fixed optimization algorithms. G\"odel Agent leverages LLMs to dynamically modify its own logic and behavior, guided solely by high-level objectives through prompting. Experimental results on mathematical reasoning and complex agent tasks demonstrate that implementation of G\"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.

Updated: 2024-10-18 01:57:51

标题: 哥德尔代理：一种递归自我改进的自指代理框架

摘要: 大型语言模型（LLMs）的快速发展显著提升了AI驱动代理在各种任务中的能力。然而，现有的代理系统，无论是基于固定的管道算法还是预定义的元学习框架，由于受限于人为设计的组件，无法搜索整个代理设计空间，因此可能会错过全局最优的代理设计。本文介绍了G\"odel Agent，这是一个受G\"odel机器启发的自我进化框架，使代理能够在不依赖预定义例程或固定优化算法的情况下递归地改进自己。G\"odel Agent利用LLMs动态修改自己的逻辑和行为，仅通过提示来引导高层次目标。在数学推理和复杂代理任务上的实验结果表明，实现G\"odel Agent可以实现持续的自我改进，超越手工制作的代理在性能、效率和泛化能力方面。

更新时间: 2024-10-18 01:57:51

领域: cs.AI

下载: http://arxiv.org/abs/2410.04444v2

Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analyzing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at github.com/zuwenqiang/EPT.

Updated: 2024-10-18 01:50:27

标题: 嵌入提示调整：朝向增强的预训练模型在医学图像中的校准

摘要: 基于大规模数据预训练的基础模型已被广泛证明在各种自然图像下游任务中取得成功。参数高效微调（PEFT）方法旨在通过仅更新少量参数来适应新领域的基础模型，以减少计算开销。然而，这些PEFT方法的有效性，特别是在跨领域少样本场景下，如医学图像分析，尚未得到充分探讨。在这项工作中，我们促进了研究PEFT在将基础模型调整为医学图像分类任务时的表现。此外，为了缓解主流提示调整方法在Transformer架构上的引入方式和近似能力的局限性，我们提出了嵌入提示调整（EPT）方法，通过将提示令牌嵌入扩展通道。我们还发现，在基础模型的特征空间分布中存在异常，在预训练过程中，提示调整可以帮助减轻这种负面影响。为了解释这一现象，我们还引入了一种新颖的观点来理解提示调整：提示调整是一个分布校准器。通过分析包含在EPT中的基于补丁缩放和特征分离操作，我们支持这一观点。我们的实验表明，EPT在少样本医学图像分类任务中显著优于几种最先进的微调方法，并在高度竞争的时间内完成微调过程，表明EPT是一种有效的PEFT方法。源代码可在github.com/zuwenqiang/EPT上找到。

更新时间: 2024-10-18 01:50:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01003v3

Polyhedral Complex Derivation from Piecewise Trilinear Networks

Recent advancements in visualizing deep neural networks provide insights into their structures and mesh extraction from Continuous Piecewise Affine (CPWA) functions. Meanwhile, developments in neural surface representation learning incorporate non-linear positional encoding, addressing issues like spectral bias; however, this poses challenges in applying mesh extraction techniques based on CPWA functions. Focusing on trilinear interpolating methods as positional encoding, we present theoretical insights and an analytical mesh extraction, showing the transformation of hypersurfaces to flat planes within the trilinear region under the eikonal constraint. Moreover, we introduce a method for approximating intersecting points among three hypersurfaces contributing to broader applications. We empirically validate correctness and parsimony through chamfer distance and efficiency, and angular distance, while examining the correlation between the eikonal loss and the planarity of the hypersurfaces.

Updated: 2024-10-18 01:44:05

标题: 从分段三线性网络中导出多面体复合体

摘要: 最近在可视化深度神经网络方面取得的进展揭示了它们的结构以及从连续分段仿射（CPWA）函数中提取网格的方法。同时，神经表面表示学习方面的发展融合了非线性位置编码，解决了诸如谱偏差等问题；然而，这也带来了在基于CPWA函数的网格提取技术中应用的挑战。以三线性插值方法作为位置编码的焦点，我们提出了理论见解和分析网格提取方法，展示了在欧几里德约束下超曲面向平面的转换。此外，我们介绍了一种近似计算三个超曲面之间交点的方法，以扩展应用范围。我们通过汉明距离和效率以及角距离在实证方面验证了正确性和简洁性，同时检验了欧几里德损失与超曲面的平面性之间的相关性。

更新时间: 2024-10-18 01:44:05

领域: cs.LG,cs.AI,cs.CV,cs.GR

下载: http://arxiv.org/abs/2402.10403v3

MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

LiDAR-based 3D object detection is crucial for various applications but often experiences performance degradation in real-world deployments due to domain shifts. While most studies focus on cross-dataset shifts, such as changes in environments and object geometries, practical corruptions from sensor variations and weather conditions remain underexplored. In this work, we propose a novel online test-time adaptation framework for 3D detectors that effectively tackles these shifts, including a challenging cross-corruption scenario where cross-dataset shifts and corruptions co-occur. By leveraging long-term knowledge from previous test batches, our approach mitigates catastrophic forgetting and adapts effectively to diverse shifts. Specifically, we propose a Model Synergy (MOS) strategy that dynamically selects historical checkpoints with diverse knowledge and assembles them to best accommodate the current test batch. This assembly is directed by our proposed Synergy Weights (SW), which perform a weighted averaging of the selected checkpoints, minimizing redundancy in the composite model. The SWs are computed by evaluating the similarity of predicted bounding boxes on the test data and the independence of features between checkpoint pairs in the model bank. To maintain an efficient and informative model bank, we discard checkpoints with the lowest average SW scores, replacing them with newly updated models. Our method was rigorously tested against existing test-time adaptation strategies across three datasets and eight types of corruptions, demonstrating superior adaptability to dynamic scenes and conditions. Notably, it achieved a 67.3% improvement in a challenging cross-corruption scenario, offering a more comprehensive benchmark for adaptation. The source code will be made publicly available.

Updated: 2024-10-18 01:40:19

标题: MOS：基于LiDAR的3D目标检测测试时适应性的模型协同效应

摘要: 基于LiDAR的三维物体检测对于各种应用至关重要，但在现实世界的部署中经常会因为领域转移而导致性能下降。虽然大多数研究集中在跨数据集的转移上，例如环境和物体几何形状的变化，但传感器变化和天气条件等实际污染的影响仍未得到充分探索。在本研究中，我们提出了一个新颖的在线测试时适应框架，用于3D检测器，有效地处理这些转移，包括一个具有挑战性的跨污染场景，其中跨数据集的转移和污染同时发生。通过利用先前测试批次的长期知识，我们的方法减轻了灾难性遗忘，并有效地适应了多样化的转移。具体来说，我们提出了一个模型协同（MOS）策略，动态选择具有不同知识的历史检查点，并将它们组合以最好地适应当前的测试批次。这种组合由我们提出的协同权重（SW）指导，对所选检查点进行加权平均，最小化组合模型中的冗余。SWs通过评估测试数据上预测的边界框的相似性以及模型库中检查点对之间特征的独立性来计算。为了保持一个高效且信息丰富的模型库，我们丢弃具有最低平均SW分数的检查点，并用新更新的模型替换它们。我们的方法在三个数据集和八种类型的污染上经过严格测试，展示了对动态场景和条件的卓越适应性。值得注意的是，在具有挑战性的跨污染场景中，它实现了67.3%的改进，为适应性提供了更全面的基准。源代码将公开提供。

更新时间: 2024-10-18 01:40:19

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.14878v2

Improving Graph Neural Networks by Learning Continuous Edge Directions

Graph Neural Networks (GNNs) traditionally employ a message-passing mechanism that resembles diffusion over undirected graphs, which often leads to homogenization of node features and reduced discriminative power in tasks such as node classification. Our key insight for addressing this limitation is to assign fuzzy edge directions -- that can vary continuously from node $i$ pointing to node $j$ to vice versa -- to the edges of a graph so that features can preferentially flow in one direction between nodes to enable long-range information transmission across the graph. We also introduce a novel complex-valued Laplacian for directed graphs with fuzzy edges where the real and imaginary parts represent information flow in opposite directions. Using this Laplacian, we propose a general framework, called Continuous Edge Direction (CoED) GNN, for learning on graphs with fuzzy edges and prove its expressivity limits using a generalization of the Weisfeiler-Leman (WL) graph isomorphism test for directed graphs with fuzzy edges. Our architecture aggregates neighbor features scaled by the learned edge directions and processes the aggregated messages from in-neighbors and out-neighbors separately alongside the self-features of the nodes. Since continuous edge directions are differentiable, they can be learned jointly with the GNN weights via gradient-based optimization. CoED GNN is particularly well-suited for graph ensemble data where the graph structure remains fixed but multiple realizations of node features are available, such as in gene regulatory networks, web connectivity graphs, and power grids. We demonstrate through extensive experiments on both synthetic and real datasets that learning continuous edge directions significantly improves performance both for undirected and directed graphs compared with existing methods.

Updated: 2024-10-18 01:34:35

标题: 通过学习连续边方向来改进图神经网络

摘要: 图神经网络（GNNs）传统上采用类似于在无向图上进行扩散的消息传递机制，这经常导致节点特征的同质化以及在节点分类等任务中降低了区分能力。我们解决这一限制的关键见解是为图的边赋予模糊的边方向 - 可以连续变化从节点$i$指向节点$j$到相反方向 - 以便特征可以在节点之间的一个方向上优先流动，从而实现图中的长程信息传输。我们还介绍了一种针对具有模糊边的有向图的新型复数拉普拉斯，其中实部和虚部分别表示相反方向的信息流。利用这个拉普拉斯，我们提出了一个通用框架，称为连续边方向（CoED）GNN，用于在具有模糊边的图上进行学习，并通过对具有模糊边的有向图的Weisfeiler-Leman（WL）图同构测试的推广来证明其表达能力限制。我们的架构聚合了通过学习的边方向进行缩放的邻居特征，并分别处理来自内邻居和外邻居的聚合消息以及节点的自身特征。由于连续边方向是可微分的，它们可以通过基于梯度的优化与GNN权重一起学习。CoED GNN特别适用于图集数据，其中图结构保持不变，但节点特征的多个实现可用，例如基因调控网络、网络连接图和电网。通过在合成和真实数据集上进行广泛实验，我们证明与现有方法相比，学习连续边方向显著提高了无向图和有向图的性能。

更新时间: 2024-10-18 01:34:35

领域: cs.LG

下载: http://arxiv.org/abs/2410.14109v1

ENOT: Expectile Regularization for Fast and Accurate Training of Neural Optimal Transport

We present a new approach for Neural Optimal Transport (NOT) training procedure, capable of accurately and efficiently estimating optimal transportation plan via specific regularization on dual Kantorovich potentials. The main bottleneck of existing NOT solvers is associated with the procedure of finding a near-exact approximation of the conjugate operator (i.e., the c-transform), which is done either by optimizing over non-convex max-min objectives or by the computationally intensive fine-tuning of the initial approximated prediction. We resolve both issues by proposing a new, theoretically justified loss in the form of expectile regularisation which enforces binding conditions on the learning process of dual potentials. Such a regularization provides the upper bound estimation over the distribution of possible conjugate potentials and makes the learning stable, completely eliminating the need for additional extensive fine-tuning. Proposed method, called Expectile-Regularised Neural Optimal Transport (ENOT), outperforms previous state-of-the-art approaches on the established Wasserstein-2 benchmark tasks by a large margin (up to a 3-fold improvement in quality and up to a 10-fold improvement in runtime). Moreover, we showcase performance of ENOT for varying cost functions on different tasks such as image generation, showing robustness of proposed algorithm. OTT-JAX library includes our implementation of ENOT algorithm https://ott-jax.readthedocs.io/en/latest/tutorials/ENOT.html

Updated: 2024-10-18 01:26:27

标题: ENOT：期望正则化用于神经最优输运的快速和准确训练

摘要: 我们提出了一种新的神经最优输运（NOT）训练方法，能够通过对双Kantorovich势特定正则化来准确高效地估计最优输运计划。现有NOT求解器的主要瓶颈与寻找共轭算子（即c-transform）的近似精确解相关，这要么通过优化非凸最大-最小目标，要么通过计算密集的初步近似预测微调来完成。我们通过提出一种新的理论上合理的期望值正则化形式的损失来解决这两个问题，该形式强制对双势的学习过程施加约束条件。这种正则化提供了对可能共轭势分布的上限估计，并使学习稳定，完全消除了对额外广泛微调的需求。提出的方法，称为期望值正则化神经最优输运（ENOT），在已建立的Wasserstein-2基准任务上比以往的最先进方法表现出色（质量提高了3倍，运行时间提高了10倍）。此外，我们展示了ENOT在不同任务的不同成本函数上的表现，如图像生成，展示了所提算法的鲁棒性。OTT-JAX库包括我们的ENOT算法的实现。

更新时间: 2024-10-18 01:26:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.03777v4

Transfer Learning on Transformers for Building Energy Consumption Forecasting -- A Comparative Study

This study investigates the application of Transfer Learning (TL) on Transformer architectures to enhance building energy consumption forecasting. Transformers are a relatively new deep learning architecture, which has served as the foundation for groundbreaking technologies such as ChatGPT. While TL has been studied in the past, these studies considered either one TL strategy or used older deep learning models such as Recurrent Neural Networks or Convolutional Neural Networks. Here, we carry out an extensive empirical study on six different TL strategies and analyse their performance under varying feature spaces. In addition to the vanilla Transformer architecture, we also experiment with Informer and PatchTST, specifically designed for time series forecasting. We use 16 datasets from the Building Data Genome Project 2 to create building energy consumption forecasting models. Experiment results reveal that while TL is generally beneficial, especially when the target domain has no data, careful selection of the exact TL strategy should be made to gain the maximum benefit. This decision largely depends on the feature space properties such as the recorded weather features. We also note that PatchTST outperforms the other two Transformer variants (vanilla Transformer and Informer). We believe our findings would assist researchers in making informed decision in using TL and transformer architectures for building energy consumption forecasting.

Updated: 2024-10-18 01:26:04

标题: Transformers在建筑能耗预测中的迁移学习--一项比较研究

摘要: 这项研究调查了在Transformer架构上应用迁移学习（TL）来增强建筑能源消耗预测的方法。Transformer是一种相对较新的深度学习架构，已经成为了像ChatGPT这样开创性技术的基础。虽然TL过去已经被研究过，但这些研究要么考虑了一个TL策略，要么使用了较旧的深度学习模型，如循环神经网络或卷积神经网络。在这里，我们对六种不同的TL策略进行了广泛的实证研究，并分析了它们在不同特征空间下的性能。除了基本的Transformer架构外，我们还尝试了Informer和PatchTST，这两种模型专门设计用于时间序列预测。我们使用了来自建筑数据基因组项目2的16个数据集来创建建筑能源消耗预测模型。实验结果表明，尽管TL通常是有益的，特别是当目标领域没有数据时，但应该谨慎选择确切的TL策略以获得最大的益处。这个决定在很大程度上取决于特征空间的属性，比如记录的天气特征。我们还注意到，PatchTST在表现上优于另外两种Transformer变种（基本Transformer和Informer）。我们相信我们的发现将帮助研究人员在使用TL和Transformer架构进行建筑能源消耗预测时做出明智的决定。

更新时间: 2024-10-18 01:26:04

领域: cs.LG

下载: http://arxiv.org/abs/2410.14107v1

DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks

Recent studies have revealed that GNNs are highly susceptible to multiple adversarial attacks. Among these, graph backdoor attacks pose one of the most prominent threats, where attackers cause models to misclassify by learning the backdoored features with injected triggers and modified target labels during the training phase. Based on the features of the triggers, these attacks can be categorized into out-of-distribution (OOD) and in-distribution (ID) graph backdoor attacks, triggers with notable differences from the clean sample feature distributions constitute OOD backdoor attacks, whereas the triggers in ID backdoor attacks are nearly identical to the clean sample feature distributions. Existing methods can successfully defend against OOD backdoor attacks by comparing the feature distribution of triggers and clean samples but fail to mitigate stealthy ID backdoor attacks. Due to the lack of proper supervision signals, the main task accuracy is negatively affected in defending against ID backdoor attacks. To bridge this gap, we propose DMGNN against OOD and ID graph backdoor attacks that can powerfully eliminate stealthiness to guarantee defense effectiveness and improve the model performance. Specifically, DMGNN can easily identify the hidden ID and OOD triggers via predicting label transitions based on counterfactual explanation. To further filter the diversity of generated explainable graphs and erase the influence of the trigger features, we present a reverse sampling pruning method to screen and discard the triggers directly on the data level. Extensive experimental evaluations on open graph datasets demonstrate that DMGNN far outperforms the state-of-the-art (SOTA) defense methods, reducing the attack success rate to 5% with almost negligible degradation in model performance (within 3.5%).

Updated: 2024-10-18 01:08:03

标题: DMGNN: 检测和缓解图神经网络中的后门攻击

摘要: 最近的研究表明，图神经网络(GNNs)极易受到多种对抗攻击的影响。其中，图后门攻击是其中最突出的威胁之一，攻击者通过在训练阶段注入触发器和修改目标标签来学习带有后门功能的特征，导致模型误分类。根据触发器的特征，这些攻击可以被分类为分布外(OOD)和分布内(ID)的图后门攻击，具有与干净样本特征分布明显差异的触发器构成OOD后门攻击，而在ID后门攻击中，触发器几乎与干净样本特征分布相同。现有的方法可以通过比较触发器和干净样本的特征分布成功抵御OOD后门攻击，但无法有效缓解隐蔽的ID后门攻击。由于缺乏适当的监督信号，在抵御ID后门攻击时，主要任务的准确性受到负面影响。为了弥补这一差距，我们提出了针对OOD和ID图后门攻击的DMGNN，可以有效消除隐蔽性以保证防御效果，并提高模型性能。具体来说，DMGNN可以通过基于对立解释预测标签转换轻松识别隐藏的ID和OOD触发器。为了进一步过滤生成的可解释图的多样性并消除触发器特征的影响，我们提出了一种逆向采样修剪方法，在数据级别直接筛选和丢弃触发器。在开放图数据集上进行的大量实验评估表明，DMGNN远远优于最先进的防御方法，将攻击成功率降低到5%，并且模型性能几乎没有明显下降(在3.5%内)。

更新时间: 2024-10-18 01:08:03

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.14105v1

Learning-Augmented Decentralized Online Convex Optimization in Networks

This paper studies decentralized online convex optimization in a networked multi-agent system and proposes a novel algorithm, Learning-Augmented Decentralized Online optimization (LADO), for individual agents to select actions only based on local online information. LADO leverages a baseline policy to safeguard online actions for worst-case robustness guarantees, while staying close to the machine learning (ML) policy for average performance improvement. In stark contrast with the existing learning-augmented online algorithms that focus on centralized settings, LADO achieves strong robustness guarantees in a decentralized setting. We also prove the average cost bound for LADO, revealing the tradeoff between average performance and worst-case robustness and demonstrating the advantage of training the ML policy by explicitly considering the robustness requirement.

Updated: 2024-10-18 01:06:40

标题: 学习增强的分布式在线凸优化在网络中的应用

摘要: 本文研究了在网络化多智能体系统中的分散在线凸优化，并提出了一种新颖的算法，即学习增强的分散在线优化（LADO），用于个体智能体只基于本地在线信息选择行动。LADO利用基线策略来保护在线行动，以确保最坏情况下的稳健性保证，同时保持接近机器学习（ML）策略以提高平均性能。与现有的专注于集中式环境的学习增强在线算法形成鲜明对比，LADO在分散设置中实现了强大的稳健性保证。我们还证明了LADO的平均成本界限，揭示了平均性能和最坏情况稳健性之间的权衡，并展示了通过明确考虑稳健性需求来训练ML策略的优势。

更新时间: 2024-10-18 01:06:40

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2306.10158v3

Extreme Precipitation Nowcasting using Multi-Task Latent Diffusion Models

Deep learning models have made remarkable strides in precipitation prediction, yet they continue to struggle with capturing the spatial details of the features of radar images, particularly over high precipitation intensity areas. This shortcoming is evident in the form of low forecast accuracy in the spatial positioning of radar echo images across varying precipitation intensity regions. To address this challenge, we introduce the multi-task latent diffusion model(MTLDM), a novel approach for precipitation prediction. The basic concept of the MTLDM is based on the understanding that the radar image representing precipitation is the result of multiple factors. Therefore, we adopt a divide-and-conquer approach, that is, we decompose the radar image using decomposition technology and then predict the decomposed sub-images separately. We conceptualize the precipitation image as a composition of various components corresponding to different precipitation intensities. The MTLDM decomposes the precipitation image into these distinct components and employs a dedicated task to predict each one. This method enables spatiotemporally consistent prediction of real-world precipitation areas up to 5-80 min in advance, outperforming existing state-of-the-art techniques across multiple evaluation metrics.

Updated: 2024-10-18 00:50:56

标题: 使用多任务潜在扩散模型进行极端降水的即时预报

摘要: 深度学习模型在降水预测方面取得了显著进展，然而它们在捕捉雷达图像特征的空间细节方面仍然存在困难，尤其是在高降水强度区域。这种缺点在雷达回波图像在不同降水强度区域的空间定位方面表现为低预测准确度。为了解决这一挑战，我们引入了多任务潜在扩散模型（MTLDM），这是一种用于降水预测的新方法。MTLDM的基本概念是基于雷达图像代表降水是多个因素的结果的理解。因此，我们采用分解技术对雷达图像进行分解，然后分别预测分解后的子图像。我们将降水图像构想为由对应不同降水强度的各种组件组成。MTLDM将降水图像分解为这些不同的组件，并采用专门的任务来预测每个组件。这种方法能够在多个评估指标上胜过现有的最先进技术，实现对未来5-80分钟内真实世界降水区域的时空一致预测。

更新时间: 2024-10-18 00:50:56

领域: cs.CV,cs.AI,86A10, 68T07,I.2.6; J.7

下载: http://arxiv.org/abs/2410.14103v1

Beyond Dataset Watermarking: Model-Level Copyright Protection for Code Summarization Models

Code Summarization Model (CSM) has been widely used in code production, such as online and web programming for PHP and Javascript. CSMs are essential tools in code production, enhancing software development efficiency and driving innovation in automated code analysis. However, CSMs face risks of exploitation by unauthorized users, particularly in an online environment where CSMs can be easily shared and disseminated. To address these risks, digital watermarks offer a promising solution by embedding imperceptible signatures within the models to assert copyright ownership and track unauthorized usage. Traditional watermarking for CSM copyright protection faces two main challenges: 1) dataset watermarking methods require separate design of triggers and watermark features based on the characteristics of different programming languages, which not only increases the computation complexity but also leads to a lack of generalization, 2) existing watermarks based on code style transformation are easily identifiable by automated detection, demonstrating poor concealment. To tackle these issues, we propose ModMark , a novel model-level digital watermark embedding method. Specifically, by fine-tuning the tokenizer, ModMark achieves cross-language generalization while reducing the complexity of watermark design. Moreover, we employ code noise injection techniques to effectively prevent trigger detection. Experimental results show that our method can achieve 100% watermark verification rate across various programming languages' CSMs, and the concealment and effectiveness of ModMark can also be guaranteed.

Updated: 2024-10-18 00:48:00

标题: 超越数据集水印：面向代码摘要模型的模型级版权保护

摘要: 代码摘要模型（CSM）已广泛应用于代码生产，例如用于PHP和Javascript的在线和Web编程。 CSM是代码生产中必不可少的工具，可以提高软件开发效率并推动自动化代码分析的创新。然而，CSM面临着被未经授权的用户利用的风险，特别是在在线环境中，CSM可以很容易地被共享和传播。为了解决这些风险，数字水印提供了一种有前途的解决方案，通过在模型中嵌入不可察觉的签名来断言版权所有权并跟踪未经授权的使用。传统的用于CSM版权保护的水印技术面临两个主要挑战：1）数据集水印方法需要根据不同编程语言的特征分别设计触发器和水印特征，这不仅增加了计算复杂度，还导致了缺乏泛化性，2）基于代码风格转换的现有水印很容易被自动检测发现，显示出较差的隐蔽性。为了解决这些问题，我们提出了ModMark，一种新颖的模型级数字水印嵌入方法。具体来说，通过微调标记器，ModMark实现了跨语言泛化，同时降低了水印设计的复杂性。此外，我们采用代码噪声注入技术来有效防止触发器检测。实验结果显示，我们的方法可以在各种编程语言的CSM中实现100%的水印验证率，同时ModMark的隐蔽性和有效性也可以得到保证。

更新时间: 2024-10-18 00:48:00

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2410.14102v1

Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech

Visual Text-to-Speech (VTTS) aims to take the spatial environmental image as the prompt to synthesize the reverberation speech for the spoken content. Previous research focused on the RGB modality for global environmental modeling, overlooking the potential of multi-source spatial knowledge like depth, speaker position, and environmental semantics. To address the issues, we propose a novel multi-source spatial knowledge understanding scheme for immersive VTTS, termed MS$^2$KU-VTTS. Specifically, we first prioritize RGB image as the dominant source and consider depth image, speaker position knowledge from object detection, and semantic captions from image understanding LLM as supplementary sources. Afterwards, we propose a serial interaction mechanism to deeply engage with both dominant and supplementary sources. The resulting multi-source knowledge is dynamically integrated based on their contributions.This enriched interaction and integration of multi-source spatial knowledge guides the speech generation model, enhancing the immersive spatial speech experience.Experimental results demonstrate that the MS$^2$KU-VTTS surpasses existing baselines in generating immersive speech. Demos and code are available at: https://github.com/MS2KU-VTTS/MS2KU-VTTS.

Updated: 2024-10-18 00:46:18

标题: 多源空间知识理解用于沉浸式视觉文本转语音

摘要: 视觉文本到语音（VTTS）旨在将空间环境图像作为提示，合成口语内容的混响语音。先前的研究集中在使用RGB模态进行全局环境建模，而忽视了深度、说话者位置和环境语义等多源空间知识的潜力。为了解决这些问题，我们提出了一种新颖的多源空间知识理解方案，用于沉浸式VTTS，称为MS$^2$KU-VTTS。具体而言，我们首先将RGB图像作为主要来源，并将深度图像、从目标检测获得的说话者位置知识，以及从图像理解LLM获得的语义标题作为辅助来源。随后，我们提出了一种串行交互机制，深度参与主要和辅助来源。根据它们的贡献动态集成产生的多源知识。这种丰富的多源空间知识的交互和集成指导了语音生成模型，增强了沉浸式空间语音体验。实验结果表明，MS$^2$KU-VTTS在生成沉浸式语音方面超越了现有的基线。演示和代码可在以下网址找到：https://github.com/MS2KU-VTTS/MS2KU-VTTS。

更新时间: 2024-10-18 00:46:18

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.14101v1

Open Domain Question Answering with Conflicting Contexts

Open domain question answering systems frequently rely on information retrieved from large collections of text (such as the Web) to answer questions. However, such collections of text often contain conflicting information, and indiscriminately depending on this information may result in untruthful and inaccurate answers. To understand the gravity of this problem, we collect a human-annotated dataset, Question Answering with Conflicting Contexts (QACC), and find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search. We evaluate and benchmark three powerful Large Language Models (LLMs) with our dataset QACC and demonstrate their limitations in effectively addressing questions with conflicting information. To explore how humans reason through conflicting contexts, we request our annotators to provide explanations for their selections of correct answers. We demonstrate that by finetuning LLMs to explain their answers, we can introduce richer information into their training that guide them through the process of reasoning with conflicting contexts.

Updated: 2024-10-18 00:32:50

标题: 具有冲突背景的开放域问答

摘要: 开放领域问答系统经常依赖于从大量文本（如网络）中检索信息来回答问题。然而，这些文本集合经常包含冲突的信息，过度依赖这些信息可能导致不真实和不准确的答案。为了了解这个问题的严重性，我们收集了一个人工注释的数据集，名为具有冲突背景的问答（QACC），发现多达25%的明确的开放领域问题在使用谷歌搜索时可能导致冲突的背景。我们使用我们的数据集QACC评估和基准三个强大的大型语言模型（LLMs），并展示它们在有效应对具有冲突信息的问题时的局限性。为了探索人类如何通过冲突的背景进行推理，我们要求我们的标注者为他们选择正确答案提供解释。我们展示，通过微调LLMs来解释他们的答案，我们可以在他们的训练中引入更丰富的信息，指导他们通过处理具有冲突背景的推理过程。

更新时间: 2024-10-18 00:32:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.12311v2

ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction

Predicting human mobility across multiple cities presents significant challenges due to the complex and diverse spatial-temporal dynamics inherent in different urban environments. In this study, we propose a robust approach to predict human mobility patterns called ST-MoE-BERT. Compared to existing methods, our approach frames the prediction task as a spatial-temporal classification problem. Our methodology integrates the Mixture-of-Experts architecture with BERT model to capture complex mobility dynamics and perform the downstream human mobility prediction task. Additionally, transfer learning is integrated to solve the challenge of data scarcity in cross-city prediction. We demonstrate the effectiveness of the proposed model on GEO-BLEU and DTW, comparing it to several state-of-the-art methods. Notably, ST-MoE-BERT achieves an average improvement of 8.29%.

Updated: 2024-10-18 00:32:18

标题: ST-MoE-BERT：一种用于长期跨城市移动预测的空间-时间专家混合框架

摘要: 预测人类跨越多个城市的移动性面临着重大挑战，因为不同城市环境中固有的复杂和多样化的时空动态。在本研究中，我们提出了一种预测人类移动模式的稳健方法，称为ST-MoE-BERT。与现有方法相比，我们的方法将预测任务框架化为一个时空分类问题。我们的方法将专家混合模型架构与BERT模型相结合，以捕捉复杂的移动动态并执行下游人类移动性预测任务。此外，我们还整合了迁移学习来解决跨城市预测中数据稀缺性的挑战。我们通过在GEO-BLEU和DTW上将提出的模型与几种最先进方法进行比较，证明了该模型的有效性。值得注意的是，ST-MoE-BERT实现了平均改进8.29%。

更新时间: 2024-10-18 00:32:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14099v1

Efficient Sparse PCA via Block-Diagonalization

Sparse Principal Component Analysis (Sparse PCA) is a pivotal tool in data analysis and dimensionality reduction. However, Sparse PCA is a challenging problem in both theory and practice: it is known to be NP-hard and current exact methods generally require exponential runtime. In this paper, we propose a novel framework to efficiently approximate Sparse PCA by (i) approximating the general input covariance matrix with a re-sorted block-diagonal matrix, (ii) solving the Sparse PCA sub-problem in each block, and (iii) reconstructing the solution to the original problem. Our framework is simple and powerful: it can leverage any off-the-shelf Sparse PCA algorithm and achieve significant computational speedups, with a minor additive error that is linear in the approximation error of the block-diagonal matrix. Suppose $g(k, d)$ is the runtime of an algorithm (approximately) solving Sparse PCA in dimension $d$ and with sparsity value $k$. Our framework, when integrated with this algorithm, reduces the runtime to $\mathcal{O}\left(\frac{d}{d^\star} \cdot g(k, d^\star) + d^2\right)$, where $d^\star \leq d$ is the largest block size of the block-diagonal matrix. For instance, integrating our framework with the Branch-and-Bound algorithm reduces the complexity from $g(k, d) = \mathcal{O}(k^3\cdot d^k)$ to $\mathcal{O}(k^3\cdot d \cdot (d^\star)^{k-1})$, demonstrating exponential speedups if $d^\star$ is small. We perform large-scale evaluations on many real-world datasets: for exact Sparse PCA algorithm, our method achieves an average speedup factor of 93.77, while maintaining an average approximation error of 2.15%; for approximate Sparse PCA algorithm, our method achieves an average speedup factor of 6.77 and an average approximation error of merely 0.37%.

Updated: 2024-10-18 00:16:10

标题: 通过块对角化实现高效稀疏主成分分析

摘要: 稀疏主成分分析（Sparse PCA）是数据分析和降维中的关键工具。然而，稀疏PCA在理论和实践中都是一个具有挑战性的问题：它被认为是NP难问题，目前的精确方法通常需要指数级的运行时间。在本文中，我们提出了一个新颖的框架，通过（i）用重新排序的块对角矩阵来近似一般输入协方差矩阵，（ii）在每个块中解决稀疏PCA子问题，以及（iii）重建解决方案到原始问题。我们的框架简单而强大：它可以利用任何现成的稀疏PCA算法，并实现显著的计算加速，具有一个线性的次要添加误差，该误差与块对角矩阵的近似误差成正比。假设$g(k, d)$是一个算法在维度$d$和稀疏值$k$下（近似）解决稀疏PCA的运行时间。我们的框架与该算法集成后，将运行时间降低到$\mathcal{O}\left(\frac{d}{d^\star} \cdot g(k, d^\star) + d^2\right)$，其中$d^\star \leq d$是块对角矩阵的最大块大小。例如，将我们的框架与分支和界算法集成后，将复杂度从$g(k, d) = \mathcal{O}(k^3\cdot d^k)$降低到$\mathcal{O}(k^3\cdot d \cdot (d^\star)^{k-1})$，如果$d^\star$很小，则展示了指数级的加速。我们在许多真实世界数据集上进行了大规模评估：对于精确的稀疏PCA算法，我们的方法实现了平均加速比为93.77，同时保持了平均近似误差为2.15%；对于近似的稀疏PCA算法，我们的方法实现了平均加速比为6.77，平均近似误差仅为0.37%。

更新时间: 2024-10-18 00:16:10

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.14092v1

Towards Effective Planning Strategies for Dynamic Opinion Networks

In this study, we investigate the under-explored intervention planning aimed at disseminating accurate information within dynamic opinion networks by leveraging learning strategies. Intervention planning involves identifying key nodes (search) and exerting control (e.g., disseminating accurate/official information through the nodes) to mitigate the influence of misinformation. However, as network size increases, the problem becomes computationally intractable. To address this, we first introduce a novel ranking algorithm (search) to identify key nodes for disseminating accurate information, which facilitates the training of neural network (NN) classifiers for scalable and generalized solutions. Second, we address the complexity of label generation (through search) by developing a Reinforcement Learning (RL)-based dynamic planning framework. We investigate NN-based RL planners tailored for dynamic opinion networks governed by two propagation models for the framework. Each model incorporates both binary and continuous opinion and trust representations. Our experimental results demonstrate that our ranking algorithm-based classifiers provide plans that enhance infection rate control, especially with increased action budgets. Moreover, reward strategies focusing on key metrics, such as the number of susceptible nodes and infection rates, outperform those prioritizing faster blocking strategies. Additionally, our findings reveal that Graph Convolutional Networks (GCNs)-based planners facilitate scalable centralized plans that achieve lower infection rates (higher control) across various network scenarios (e.g., Watts-Strogatz topology, varying action budgets, varying initial infected nodes, and varying degree of infected nodes).

Updated: 2024-10-18 00:13:56

标题: 朝向动态观点网络的有效规划策略

摘要: 在这项研究中，我们研究了通过利用学习策略在动态观点网络中传播准确信息的干预规划，这一领域鲜为人知。干预规划涉及识别关键节点（搜索）并施加控制（例如通过节点传播准确/官方信息）以减轻错误信息的影响。然而，随着网络规模的增加，这个问题变得难以计算。为了解决这个问题，我们首先引入了一种新颖的排名算法（搜索）来识别传播准确信息的关键节点，从而便于训练神经网络（NN）分类器以获得可扩展和普遍的解决方案。其次，我们通过开发基于强化学习（RL）的动态规划框架来解决标签生成的复杂性（通过搜索）。我们研究了针对由两种传播模型控制的动态观点网络量身定制的基于NN的RL规划者。每个模型都包含二进制和连续观点和信任表示。我们的实验结果表明，基于排名算法的分类器提供的计划可以增强感染率控制，特别是在行动预算增加时。此外，以关键指标（如易感节点数量和感染率）为重点的奖励策略优于优先考虑更快的阻断策略的策略。此外，我们的发现表明，基于图卷积网络（GCN）的规划者有助于实现可扩展的集中式计划，在各种网络场景（如Watts-Strogatz拓扑结构、不同的行动预算、不同的初始感染节点以及不同程度的感染节点）中实现更低的感染率（更高的控制）。

更新时间: 2024-10-18 00:13:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.14091v1

A Statistical Machine Learning Approach for Adapting Reduced-Order Models using Projected Gaussian Process

The Proper Orthogonal Decomposition (POD) computes the optimal basis modes that span a low-dimensional subspace where the Reduced-Order Models (ROMs) reside. Because a governing equation is often parameterized by a set of parameters, challenges immediately arise when one would like to investigate how systems behave differently over the parameter space (in design, control, uncertainty quantification and real-time operations). In this case, the POD basis needs to be updated so as to adapt ROM that accurately captures the variation of a system's behavior over its parameter space. This paper proposes a Projected Gaussian Process (pGP) and formulate the problem of adapting POD basis as a supervised statistical learning problem, for which the goal is to learn a mapping from the parameter space to the Grassmann Manifold that contains the optimal vector subspaces. A mapping is firstly found between the Euclidean space and the horizontal space of an orthogonal matrix that spans a reference subspace in the Grassmann Manifold. Then, a second mapping from the horizontal space to the Grassmann Manifold is established through the Exponential/Logarithm maps between the manifold and its tangent space. Finally, given a new parameter, the conditional distribution of a vector can be found in the Euclidean space using the Gaussian Process (GP) regression, and such a distribution is projected to the Grassmann Manifold that yields the optimal subspace for the new parameter. The proposed statistical learning approach allows us to optimally estimate model parameters given data (i.e., the prediction/interpolation becomes problem-specific), and quantify the uncertainty associated with the prediction. Numerical examples are presented to demonstrate the advantages of the proposed pGP for adapting POD basis against parameter changes.

Updated: 2024-10-18 00:02:43

标题: 一种统计机器学习方法用于使用投影高斯过程调整降阶模型

摘要: Proper Orthogonal Decomposition（POD）计算出了最佳基模式，这些模式构成了一个低维子空间，其中存放着简化模型（ROMs）。由于一个主控方程通常是由一组参数化的参数所确定的，当我们想要研究系统在参数空间中的不同行为（设计、控制、不确定性量化和实时操作）时，挑战立即出现。在这种情况下，POD基需要更新，以便调整ROM，从而准确捕捉系统行为在其参数空间中的变化。本文提出了一种投影高斯过程（pGP），并将调整POD基的问题表述为一个监督统计学习问题，其目标是学习一个从参数空间到包含最佳向量子空间的Grassmann流形的映射。首先找到了欧几里得空间与跨越Grassmann流形中参考子空间的正交矩阵的水平空间之间的映射。然后，通过流形和其切空间之间的指数/对数映射，在水平空间和Grassmann流形之间建立了第二个映射。最后，给定一个新的参数，可以使用高斯过程（GP）回归在欧几里得空间中找到一个向量的条件分布，并将这种分布投影到Grassmann流形，从而得到新参数的最佳子空间。所提出的统计学习方法允许我们在给定数据时最优地估计模型参数（即预测/插值成为问题特定），并量化与预测相关的不确定性。通过数值示例展示了所提出的pGP在适应POD基对参数变化的优势。

更新时间: 2024-10-18 00:02:43

领域: stat.ML,cs.LG,math.DS,stat.AP

下载: http://arxiv.org/abs/2410.14090v1