Arxiv Day: Article

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

Large language models (LLMs) have demonstrated remarkable performance across various real-world tasks. However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated. This difficulty increases for contexts that are long or contain distracting information, which can divert LLMs from fully capturing essential evidence. To address this issue, many works use prompting to help LLMs utilize contextual information more faithfully. For instance, iterative prompting highlights key information in two steps that first ask the LLM to identify important pieces of context and then derive answers accordingly. However, prompting methods are constrained to highlighting key information implicitly in token space, which is often insufficient to fully steer the model's attention. To improve model faithfulness more reliably, we propose AutoPASTA, a method that automatically identifies key contextual information and explicitly highlights it by steering an LLM's attention scores. Like prompting, AutoPASTA is applied at inference time and does not require changing any model parameters. Our experiments on open-book QA demonstrate that AutoPASTA effectively enables models to grasp essential contextual information, leading to substantially improved model faithfulness and performance, e.g., an average improvement of 7.95% for LLAMA3-70B-Instruct. Code will be publicly available at https://github.com/QingruZhang/AutoPASTA .

Updated: 2024-09-16 23:52:41

标题: 模型告诉自己在哪里关注：忠诚度遇见自动注意力引导

摘要: 大型语言模型(LLMs)已经在各种现实世界任务中表现出卓越的性能。然而，它们往往难以完全理解和有效利用其输入上下文，导致回答不忠实或幻觉。对于长或包含分散信息的上下文，这种困难会增加，这可能会使LLMs无法完全捕捉关键证据。为了解决这个问题，许多作品使用提示来帮助LLMs更忠实地利用上下文信息。例如，迭代提示在两个步骤中突出显示关键信息，首先要求LLM识别重要的上下文片段，然后相应地得出答案。然而，提示方法受限于在令牌空间中隐含突出显示关键信息，这往往不足以完全引导模型的注意力。为了更可靠地提高模型的忠实性，我们提出了AutoPASTA，这种方法可以自动识别关键的上下文信息，并通过引导LLM的注意力得分来明确突出显示它。像提示一样，AutoPASTA是在推理时应用的，不需要改变任何模型参数。我们在开放式问答实验中的实验表明，AutoPASTA有效地使模型掌握关键的上下文信息，从而大大改善了模型的忠实性和性能，例如，对于LLAMA3-70B-Instruct的平均改进为7.95%。代码将在https://github.com/QingruZhang/AutoPASTA上公开提供。

更新时间: 2024-09-16 23:52:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10790v1

Physics-Informed Neural Networks with Trust-Region Sequential Quadratic Programming

Physics-Informed Neural Networks (PINNs) represent a significant advancement in Scientific Machine Learning (SciML), which integrate physical domain knowledge into an empirical loss function as soft constraints and apply existing machine learning methods to train the model. However, recent research has noted that PINNs may fail to learn relatively complex Partial Differential Equations (PDEs). This paper addresses the failure modes of PINNs by introducing a novel, hard-constrained deep learning method -- trust-region Sequential Quadratic Programming (trSQP-PINN). In contrast to directly training the penalized soft-constrained loss as in PINNs, our method performs a linear-quadratic approximation of the hard-constrained loss, while leveraging the soft-constrained loss to adaptively adjust the trust-region radius. We only trust our model approximations and make updates within the trust region, and such an updating manner can overcome the ill-conditioning issue of PINNs. We also address the computational bottleneck of second-order SQP methods by employing quasi-Newton updates for second-order information, and importantly, we introduce a simple pretraining step to further enhance training efficiency of our method. We demonstrate the effectiveness of trSQP-PINN through extensive experiments. Compared to existing hard-constrained methods for PINNs, such as penalty methods and augmented Lagrangian methods, trSQP-PINN significantly improves the accuracy of the learned PDE solutions, achieving up to 1-3 orders of magnitude lower errors. Additionally, our pretraining step is generally effective for other hard-constrained methods, and experiments have shown the robustness of our method against both problem-specific parameters and algorithm tuning parameters.

Updated: 2024-09-16 23:22:12

标题: 具有信任域序列二次规划的物理信息神经网络

摘要: 物理信息神经网络（PINNs）代表了科学机器学习（SciML）领域的重大进展，它将物理领域知识整合到经验损失函数中作为软约束，并应用现有的机器学习方法来训练模型。然而，最近的研究指出，PINNs可能无法学习相对复杂的偏微分方程（PDEs）。本文通过引入一种新颖的、硬约束的深度学习方法——信任域顺序二次规划（trSQP-PINN）来解决PINNs的失败模式。与直接训练惩罚软约束损失不同，我们的方法对硬约束损失进行线性二次逼近，同时利用软约束损失来自适应地调整信任域半径。我们只信任我们的模型逼近，并在信任域内进行更新，这种更新方式可以克服PINNs的病态问题。我们还通过使用拟牛顿更新来解决二阶SQP方法的计算瓶颈，并且重要的是，我们引入了一个简单的预训练步骤来进一步提高我们方法的训练效率。我们通过大量实验展示了trSQP-PINN的有效性。与现有的PINNs的硬约束方法（如惩罚方法和增广拉格朗日方法）相比，trSQP-PINN显著提高了学习到的PDE解的准确性，误差降低了1-3个数量级。此外，我们的预训练步骤通常也对其他硬约束方法有效，并实验证明了我们方法对问题特定参数和算法调参的稳健性。

更新时间: 2024-09-16 23:22:12

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2409.10777v1

Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks?

Image classification models, including convolutional neural networks (CNNs), perform well on a variety of classification tasks but struggle under conditions of partial occlusion, i.e., conditions in which objects are partially covered from the view of a camera. Methods to improve performance under occlusion, including data augmentation, part-based clustering, and more inherently robust architectures, including Vision Transformer (ViT) models, have, to some extent, been evaluated on their ability to classify objects under partial occlusion. However, evaluations of these methods have largely relied on images containing artificial occlusion, which are typically computer-generated and therefore inexpensive to label. Additionally, methods are rarely compared against each other, and many methods are compared against early, now outdated, deep learning models. We contribute the Image Recognition Under Occlusion (IRUO) dataset, based on the recently developed Occluded Video Instance Segmentation (OVIS) dataset (arXiv:2102.01558). IRUO utilizes real-world and artificially occluded images to test and benchmark leading methods' robustness to partial occlusion in visual recognition tasks. In addition, we contribute the design and results of a human study using images from IRUO that evaluates human classification performance at multiple levels and types of occlusion. We find that modern CNN-based models show improved recognition accuracy on occluded images compared to earlier CNN-based models, and ViT-based models are more accurate than CNN-based models on occluded images, performing only modestly worse than human accuracy. We also find that certain types of occlusion, including diffuse occlusion, where relevant objects are seen through "holes" in occluders such as fences and leaves, can greatly reduce the accuracy of deep recognition models as compared to humans, especially those with CNN backbones.

Updated: 2024-09-16 23:21:22

标题: 深度学习模型在视觉识别任务中对部分对象遮挡是否具有鲁棒性？

摘要: 图像分类模型，包括卷积神经网络（CNN），在各种分类任务上表现良好，但在部分遮挡条件下表现不佳，即在这些条件下，物体被摄像机部分遮盖。改进遮挡条件下性能的方法，包括数据增强、基于部分的聚类以及更具固有鲁棒性的架构，如Vision Transformer（ViT）模型，已在一定程度上评估了它们在部分遮挡下对对象分类的能力。然而，对这些方法的评估很大程度上依赖于包含人工遮挡的图像，这些图像通常是计算机生成的，因此标记成本低廉。此外，这些方法很少相互比较，许多方法与早期、现在已过时的深度学习模型进行比较。我们贡献了基于最近开发的Occluded Video Instance Segmentation（OVIS）数据集（arXiv:2102.01558）的Image Recognition Under Occlusion（IRUO）数据集。IRUO利用真实世界和人工遮挡的图像来测试和评估领先方法在视觉识别任务中对部分遮挡的鲁棒性。此外，我们贡献了一项使用IRUO图像进行的人类研究的设计和结果，评估了人类在多个级别和类型的遮挡下的分类性能。我们发现，现代基于CNN的模型在遮挡图像上的识别准确性比早期基于CNN的模型有所提高，而基于ViT的模型在遮挡图像上比CNN-based模型更准确，仅比人类准确率略差。我们还发现，某些类型的遮挡，包括弥散遮挡，即相关对象通过遮挡物（如栅栏和树叶）上的“孔”看到，与人类相比，可以大大降低深度识别模型的准确率，尤其是那些具有CNN骨干的模型。

更新时间: 2024-09-16 23:21:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.10775v1

Provably Efficient Infinite-Horizon Average-Reward Reinforcement Learning with Linear Function Approximation

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear Markov decision processes (MDPs) and linear mixture MDPs under the Bellman optimality condition. While guaranteeing computational efficiency, our algorithm for linear MDPs achieves the best-known regret upper bound of $\widetilde{\mathcal{O}}(d^{3/2}\mathrm{sp}(v^*)\sqrt{T})$ over $T$ time steps where $\mathrm{sp}(v^*)$ is the span of the optimal bias function $v^*$ and $d$ is the dimension of the feature mapping. For linear mixture MDPs, our algorithm attains a regret bound of $\widetilde{\mathcal{O}}(d\cdot\mathrm{sp}(v^*)\sqrt{T})$. The algorithm applies novel techniques to control the covering number of the value function class and the span of optimistic estimators of the value function, which is of independent interest.

Updated: 2024-09-16 23:13:42

标题: 可以的，这个文献标题的翻译是：具有线性函数逼近的可证明高效的无限时间跨度平均奖励强化学习

摘要: 本文提出了一个可计算的算法，用于学习在贝尔曼最优性条件下的无限时间跨度平均奖励线性马尔可夫决策过程（MDPs）和线性混合MDPs。在保证计算效率的同时，我们的线性MDPs算法实现了迄今为止最佳的遗憾上界，为$\widetilde{\mathcal{O}}(d^{3/2}\mathrm{sp}(v^*)\sqrt{T})$，其中$\mathrm{sp}(v^*)$是最优偏差函数$v^*$的跨度，$d$是特征映射的维度。对于线性混合MDPs，我们的算法达到了一个遗憾上界为$\widetilde{\mathcal{O}}(d\cdot\mathrm{sp}(v^*)\sqrt{T})$。该算法应用了新颖的技术来控制价值函数类的覆盖数量和价值函数的乐观估计器的跨度，这对于独立的兴趣是重要的。

更新时间: 2024-09-16 23:13:42

领域: cs.LG,cs.DS,math.OC

下载: http://arxiv.org/abs/2409.10772v1

Interactive AI Alignment: Specification, Process, and Evaluation Alignment

Modern AI enables a high-level, declarative form of interaction: Users describe the intended outcome they wish an AI to produce, but do not actually create the outcome themselves. In contrast, in traditional user interfaces, users invoke specific operations to create the desired outcome. This paper revisits the basic input-output interaction cycle in light of this declarative style of interaction, and connects concepts in AI alignment to define three objectives for interactive alignment of AI: specification alignment (aligning on what to do), process alignment (aligning on how to do it), and evaluation alignment (assisting users in verifying and understanding what was produced). Using existing systems as examples, we show how these user-centered views of AI alignment can be used descriptively, prescriptively, and as an evaluative aid.

Updated: 2024-09-16 22:54:33

标题: 交互式人工智能对齐：规范、过程和评估对齐

摘要: 现代人工智能实现了一种高级的、陈述形式的交互：用户描述他们希望人工智能产生的预期结果，但并不实际创建结果。相比之下，在传统用户界面中，用户调用特定操作来创建期望的结果。本文重新审视了基本的输入-输出交互循环，考虑到这种陈述式交互方式，并将人工智能对齐的概念与定义AI互动对齐的三个目标联系起来：规范对齐（对要做什么进行对齐），过程对齐（对如何做进行对齐），评估对齐（帮助用户验证和理解产出的内容）。通过使用现有系统作为示例，我们展示了这些以用户为中心的AI对齐观点如何在描述性、规范性和评估性辅助方面得以应用。

更新时间: 2024-09-16 22:54:33

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2311.00710v2

Federated Learning for Smart Grid: A Survey on Applications and Potential Vulnerabilities

The Smart Grid (SG) is a critical energy infrastructure that collects real-time electricity usage data to forecast future energy demands using information and communication technologies (ICT). Due to growing concerns about data security and privacy in SGs, federated learning (FL) has emerged as a promising training framework. FL offers a balance between privacy, efficiency, and accuracy in SGs by enabling collaborative model training without sharing private data from IoT devices. In this survey, we thoroughly review recent advancements in designing FL-based SG systems across three stages: generation, transmission and distribution, and consumption. Additionally, we explore potential vulnerabilities that may arise when implementing FL in these stages. Finally, we discuss the gap between state-of-the-art FL research and its practical applications in SGs and propose future research directions. These focus on potential attack and defense strategies for FL-based SG systems and the need to build a robust FL-based SG infrastructure. Unlike traditional surveys that address security issues in centralized machine learning methods for SG systems, this survey specifically examines the applications and security concerns in FL-based SG systems for the first time. Our aim is to inspire further research into applications and improvements in the robustness of FL-based SG systems.

Updated: 2024-09-16 22:42:25

标题: Title: 智能电网的联邦学习：应用和潜在漏洞调查

摘要: 智能电网（SG）是一个关键的能源基础设施，通过信息和通信技术（ICT）收集实时电力使用数据，以预测未来能源需求。由于对SG中数据安全和隐私的增长关注，联邦学习（FL）已经成为一种有前途的训练框架。FL通过使协作模型训练成为可能，而无需共享来自物联网设备的私人数据，在SG中实现了隐私、效率和准确性之间的平衡。在这项调查中，我们彻底审查了设计基于FL的SG系统在三个阶段（发电、输电和配电、消费）的最新进展。此外，我们探讨了在这些阶段实施FL时可能出现的潜在漏洞。最后，我们讨论了目前最先进的FL研究与其在SG中实际应用之间的差距，并提出了未来研究方向。这些方向关注于FL-based SG系统的潜在攻击和防御策略，以及建立强大的FL-based SG基础设施的需求。与传统研究集中在SG系统的集中式机器学习方法中的安全问题不同，这项调查首次专门研究了基于FL的SG系统的应用和安全问题。我们的目标是激发进一步研究FL-based SG系统的应用和改进其鲁棒性。

更新时间: 2024-09-16 22:42:25

领域: cs.LG,cs.CR,C.2.4

下载: http://arxiv.org/abs/2409.10764v1

High-arity PAC learning via exchangeability

We develop a theory of high-arity PAC learning, which is statistical learning in the presence of "structured correlation". In this theory, hypotheses are either graphs, hypergraphs or, more generally, structures in finite relational languages, and i.i.d. sampling is replaced by sampling an induced substructure, producing an exchangeable distribution. Our main theorems establish a high-arity (agnostic) version of the fundamental theorem of statistical learning.

Updated: 2024-09-16 22:19:25

标题: 通过可交换性实现高阶PAC学习

摘要: 我们开发了一种高阶PAC学习理论，即在存在“结构化相关性”时的统计学习。在这个理论中，假设是图形、超图，或者更一般的，是有限关系语言中的结构，而独立同分布采样被用采样诱导的子结构替代，产生一个可交换的分布。我们的主要定理建立了统计学习基本定理的高阶（不可知）版本。

更新时间: 2024-09-16 22:19:25

领域: cs.LG,math.LO,math.ST,stat.TH,Primary: 68Q32. Secondary: 60F05, 60F15, 03C99

下载: http://arxiv.org/abs/2402.14294v3

Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

The minimal number of neurons required for a feedforward neural network to interpolate $n$ generic input-output pairs from $\mathbb{R}^d\times \mathbb{R}^{d'}$ is $\Theta(\sqrt{nd'})$. While previous results have shown that $\Theta(\sqrt{nd'})$ neurons are sufficient, they have been limited to sigmoid, Heaviside, and rectified linear unit (ReLU) as the activation function. Using a different approach, we prove that $\Theta(\sqrt{nd'})$ neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there. Thus, the only practical activation functions that our result does not apply to are piecewise polynomials. Importantly, this means that activation functions can be freely chosen in a problem-dependent manner without loss of interpolation power.

Updated: 2024-09-16 22:14:55

标题: 用非多项式激活函数的深度神经网络的插值：神经元的必要和充分数量

摘要: 一个前馈神经网络在插值$n$个来自$\mathbb{R}^d\times\mathbb{R}^{d'}$的泛型输入-输出对时所需的最小神经元数量为$\Theta(\sqrt{nd'})$。先前的研究结果表明$\Theta(\sqrt{nd'})$个神经元是足够的，但这些结果仅限于激活函数为sigmoid、Heaviside和修正线性单元（ReLU）的情况。通过采用不同的方法，我们证明了只要激活函数在某点处是实解析的而不是多项式的，那么$\Theta(\sqrt{nd'})$个神经元就是足够的。因此，我们的结果不适用于分段多项式这种激活函数。重要的是，这意味着在问题相关情况下可以自由选择激活函数，而不会损失插值能力。

更新时间: 2024-09-16 22:14:55

领域: cs.LG,math.OC,15A03, 26B10

下载: http://arxiv.org/abs/2405.13738v2

DrugAgent: Explainable Drug Repurposing Agent with Large Language Model-based Reasoning

Drug repurposing offers a promising avenue for accelerating drug development by identifying new therapeutic potentials of existing drugs. In this paper, we propose a multi-agent framework to enhance the drug repurposing process using state-of-the-art machine learning techniques and knowledge integration. Our framework comprises several specialized agents: an AI Agent trains robust drug-target interaction (DTI) models; a Knowledge Graph Agent utilizes the drug-gene interaction database (DGIdb), DrugBank, Comparative Toxicogenomics Database (CTD), and Search Tool for Interactions of Chemicals (STITCH) to systematically extract DTIs; and a Search Agent interacts with biomedical literature to annotate and verify computational predictions. By integrating outputs from these agents, our system effectively harnesses diverse data sources, including external databases, to propose viable repurposing candidates. Preliminary results demonstrate the potential of our approach in not only predicting drug-disease interactions but also in reducing the time and cost associated with traditional drug discovery methods. This paper highlights the scalability of multi-agent systems in biomedical research and their role in driving innovation in drug repurposing. Our approach not only outperforms existing methods in predicting drug repurposing potential but also provides interpretable results, paving the way for more efficient and cost-effective drug discovery processes.

Updated: 2024-09-16 22:13:30

标题: 药物代理：基于大型语言模型推理的可解释药物再利用代理

摘要: 药物再利用为加速药物开发提供了一个有前景的途径，通过发现现有药物的新的治疗潜力。在本文中，我们提出了一个多代理框架，利用最先进的机器学习技术和知识整合来增强药物再利用过程。我们的框架包括几个专门的代理：一个人工智能代理训练强大的药物靶点相互作用（DTI）模型；一个知识图代理利用药物基因相互作用数据库（DGIdb）、DrugBank、比较毒理基因组数据库（CTD）和化学物质相互作用搜索工具（STITCH）来系统地提取DTIs；一个搜索代理与生物医学文献互动，注释和验证计算预测。通过整合这些代理的输出，我们的系统有效地利用多样化的数据来源，包括外部数据库，提出可行的再利用候选药物。初步结果表明我们的方法不仅在预测药物-疾病相互作用方面具有潜力，而且在减少传统药物发现方法所涉及的时间和成本方面也具有潜力。本文强调多代理系统在生物医学研究中的可扩展性以及它们在推动药物再利用创新中的作用。我们的方法不仅在预测药物再利用潜力方面优于现有方法，而且提供可解释的结果，为更高效和成本效益的药物发现流程铺平了道路。

更新时间: 2024-09-16 22:13:30

领域: cs.AI,cs.CL,cs.IR,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2408.13378v3

VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching

Large Language Models (LLMs) have shown promise in tasks like code translation, prompting interest in their potential for automating software vulnerability detection (SVD) and patching (SVP). To further research in this area, establishing a benchmark is essential for evaluating the strengths and limitations of LLMs in these tasks. Despite their capabilities, questions remain regarding whether LLMs can accurately analyze complex vulnerabilities and generate appropriate patches. This paper introduces VulnLLMEval, a framework designed to assess the performance of LLMs in identifying and patching vulnerabilities in C code. Our study includes 307 real-world vulnerabilities extracted from the Linux kernel, creating a well-curated dataset that includes both vulnerable and patched code. This dataset, based on real-world code, provides a diverse and representative testbed for evaluating LLM performance in SVD and SVP tasks, offering a robust foundation for rigorous assessment. Our results reveal that LLMs often struggle with distinguishing between vulnerable and patched code. Furthermore, in SVP tasks, these models tend to oversimplify the code, producing solutions that may not be directly usable without further refinement.

Updated: 2024-09-16 22:00:20

标题: VulnLLMEval：用于评估大型语言模型在软件漏洞检测和修补中的框架

摘要: 大型语言模型（LLMs）在诸如代码翻译等任务中表现出潜力，引起了人们对它们在自动化软件漏洞检测（SVD）和修补（SVP）方面潜力的兴趣。为了进一步研究这一领域，建立一个基准是评估LLMs在这些任务中的优势和局限性的关键。尽管它们具有强大的能力，但仍然存在关于LLMs能否准确分析复杂漏洞并生成适当补丁的问题。本文介绍了VulnLLMEval，这是一个旨在评估LLMs在识别和修补C代码中漏洞方面表现的框架。我们的研究包括从Linux内核中提取的307个真实世界漏洞，创建了一个经过精心策划的数据集，其中包括易受攻击和已修复的代码。这个基于真实代码的数据集为评估LLM在SVD和SVP任务中的性能提供了一个多样且具有代表性的测试平台，为严格评估提供了坚实的基础。我们的结果显示，LLMs在区分易受攻击和已修补代码方面经常遇到困难。此外，在SVP任务中，这些模型往往会简化代码，生成的解决方案可能需要进一步优化才能直接使用。

更新时间: 2024-09-16 22:00:20

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.10756v1

SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages

Question Answering (QA) datasets have been instrumental in developing and evaluating Large Language Model (LLM) capabilities. However, such datasets are scarce for languages other than English due to the cost and difficulties of collection and manual annotation. This means that producing novel models and measuring the performance of multilingual LLMs in low-resource languages is challenging. To mitigate this, we propose $\textbf{S}$yn$\textbf{DAR}$in, a method for generating and validating QA datasets for low-resource languages. We utilize parallel content mining to obtain $\textit{human-curated}$ paragraphs between English and the target language. We use the English data as context to $\textit{generate}$ synthetic multiple-choice (MC) question-answer pairs, which are automatically translated and further validated for quality. Combining these with their designated non-English $\textit{human-curated}$ paragraphs form the final QA dataset. The method allows to maintain the content quality, reduces the likelihood of factual errors, and circumvents the need for costly annotation. To test the method, we created a QA dataset with $1.2$K samples for the Armenian language. The human evaluation shows that $98\%$ of the generated English data maintains quality and diversity in the question types and topics, while the translation validation pipeline can filter out $\sim70\%$ of data with poor quality. We use the dataset to benchmark state-of-the-art LLMs, showing their inability to achieve human accuracy with some model performances closer to random chance. This shows that the generated dataset is non-trivial and can be used to evaluate reasoning capabilities in low-resource language.

Updated: 2024-09-16 21:52:55

标题: SynDARin: 在低资源语言中进行自动推理的数据集合成

摘要: 问答（QA）数据集对于开发和评估大型语言模型（LLM）的能力起到了关键作用。然而，由于收集和手动注释的成本和困难，这些数据集在英语以外的语言中很少见。这意味着在低资源语言中生产新模型并衡量多语言LLM的性能是具有挑战性的。为了缓解这一问题，我们提出了SynDAR，一种用于生成和验证低资源语言QA数据集的方法。我们利用平行内容挖掘获取英语和目标语言之间的人工筛选段落。我们使用英语数据作为上下文来生成合成的多项选择（MC）问题-答案对，这些对会被自动翻译并进一步验证质量。将这些与它们指定的非英语人工筛选段落结合形成最终的QA数据集。该方法可以维持内容质量，减少事实错误的可能性，并避免昂贵的注释需求。为了测试该方法，我们为亚美尼亚语创建了一个包含1.2K个样本的QA数据集。人类评估显示，98%的生成的英语数据在问题类型和主题方面保持了质量和多样性，而翻译验证管道可以过滤出大约70%质量不佳的数据。我们使用该数据集来对比最先进的LLM，显示它们无法达到人类准确性，而一些模型性能接近随机机会。这表明生成的数据集是非平凡的，并可用于评估低资源语言中的推理能力。

更新时间: 2024-09-16 21:52:55

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14425v3

Scaling Law Hypothesis for Multimodal Model

We propose a scaling law hypothesis for multimodal models processing text, audio, images, and video within a shared token and embedding space. Our framework predicts model performance based on modality-specific compression and tokenization efficiency, extending established scaling laws from text-based decoder models to mixed-modality systems. We explore whether leveraging more training data in multiple modalities can reduce the size of the multimodal model, enabling efficient deployment on resource-constrained devices.

Updated: 2024-09-16 21:24:30

标题: 多模态模型的比例定律假设

摘要: 我们提出一个多模态模型处理文本、音频、图像和视频的缩放定律假设，这些模型共享一个令牌和嵌入空间。我们的框架通过模态特定的压缩和标记化效率来预测模型性能，将已建立的基于文本解码器模型的缩放定律扩展到混合模态系统。我们探讨通过利用更多多模态训练数据是否可以减小多模态模型的大小，从而实现在资源受限设备上高效部署。

更新时间: 2024-09-16 21:24:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.06754v2

AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing

Recent advancements in automatic code generation using large language models (LLMs) have brought us closer to fully automated secure software development. However, existing approaches often rely on a single agent for code generation, which struggles to produce secure, vulnerability-free code. Traditional program synthesis with LLMs has primarily focused on functional correctness, often neglecting critical dynamic security implications that happen during runtime. To address these challenges, we propose AutoSafeCoder, a multi-agent framework that leverages LLM-driven agents for code generation, vulnerability analysis, and security enhancement through continuous collaboration. The framework consists of three agents: a Coding Agent responsible for code generation, a Static Analyzer Agent identifying vulnerabilities, and a Fuzzing Agent performing dynamic testing using a mutation-based fuzzing approach to detect runtime errors. Our contribution focuses on ensuring the safety of multi-agent code generation by integrating dynamic and static testing in an iterative process during code generation by LLM that improves security. Experiments using the SecurityEval dataset demonstrate a 13% reduction in code vulnerabilities compared to baseline LLMs, with no compromise in functionality.

Updated: 2024-09-16 21:15:56

标题: AutoSafeCoder：通过静态分析和模糊测试确保LLM代码生成的多代理框架

摘要: 最近使用大型语言模型（LLMs）进行自动代码生成方面取得了重大进展，这让我们更接近完全自动化的安全软件开发。然而，现有方法通常依赖于单个代理进行代码生成，很难生成安全、无漏洞的代码。传统的LLMs程序合成主要关注功能正确性，往往忽视了运行时发生的关键动态安全影响。为了解决这些挑战，我们提出了AutoSafeCoder，这是一个多代理框架，利用LLM驱动的代理进行代码生成、漏洞分析和通过持续协作进行安全增强。该框架由三个代理组成：负责代码生成的编码代理、识别漏洞的静态分析代理和执行基于变异的模糊测试的Fuzzing代理，以检测运行时错误。我们的贡献重点是通过在LLM进行代码生成过程中集成动态和静态测试的迭代过程来确保多代理代码生成的安全性，从而提高安全性。使用SecurityEval数据集的实验表明，与基准LLMs相比，代码漏洞减少了13％，而功能性没有受到影响。

更新时间: 2024-09-16 21:15:56

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.10737v1

Trustworthy Conceptual Explanations for Neural Networks in Robot Decision-Making

Black box neural networks are an indispensable part of modern robots. Nevertheless, deploying such high-stakes systems in real-world scenarios poses significant challenges when the stakeholders, such as engineers and legislative bodies, lack insights into the neural networks' decision-making process. Presently, explainable AI is primarily tailored to natural language processing and computer vision, falling short in two critical aspects when applied in robots: grounding in decision-making tasks and the ability to assess trustworthiness of their explanations. In this paper, we introduce a trustworthy explainable robotics technique based on human-interpretable, high-level concepts that attribute to the decisions made by the neural network. Our proposed technique provides explanations with associated uncertainty scores by matching neural network's activations with human-interpretable visualizations. To validate our approach, we conducted a series of experiments with various simulated and real-world robot decision-making models, demonstrating the effectiveness of the proposed approach as a post-hoc, human-friendly robot learning diagnostic tool.

Updated: 2024-09-16 21:11:12

标题: 机器人决策中神经网络可信的概念解释

摘要: 黑盒神经网络是现代机器人不可或缺的一部分。然而，在真实场景中部署这种高风险系统时，当利益相关者，如工程师和立法机构，缺乏对神经网络决策过程的洞察时，会面临重大挑战。目前，可解释的人工智能主要针对自然语言处理和计算机视觉进行定制，但在机器人中应用时存在两个关键方面的不足：在决策任务中的基础和评估解释的可信度。在本文中，我们介绍了一种可信的可解释机器人技术，该技术基于人类可解释的高级概念，归因于神经网络所做的决策。我们提出的技术通过将神经网络的激活与人类可解释的可视化相匹配，提供带有相关不确定性评分的解释。为验证我们的方法，我们进行了一系列实验，使用各种模拟和真实世界的机器人决策模型，证明了所提出的方法作为事后、人性化机器人学习诊断工具的有效性。

更新时间: 2024-09-16 21:11:12

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2409.10733v1

Generalized Measures of Anticipation and Responsivity in Online Language Processing

We introduce a generalization of classic information-theoretic measures of predictive uncertainty in online language processing, based on the simulation of expected continuations of incremental linguistic contexts. Our framework provides a formal definition of anticipatory and responsive measures, and it equips experimenters with the tools to define new, more expressive measures beyond standard next-symbol entropy and surprisal. While extracting these standard quantities from language models is convenient, we demonstrate that using Monte Carlo simulation to estimate alternative responsive and anticipatory measures pays off empirically: New special cases of our generalized formula exhibit enhanced predictive power compared to surprisal for human cloze completion probability as well as ELAN, LAN, and N400 amplitudes, and greater complementarity with surprisal in predicting reading times.

Updated: 2024-09-16 21:05:15

标题: 在线语言处理中的广义预期和响应性度量

摘要: 我们引入了一种对在线语言处理中经典信息论预测不确定性测量的泛化方法，该方法基于对逐步语言上下文的预期延续的模拟。我们的框架提供了对预测和响应测量的正式定义，并为实验者提供了工具，使其能够定义超出标准下一个符号熵和惊讶之外的新的更具表现力的测量。尽管从语言模型中提取这些标准数量很方便，但我们证明使用蒙特卡罗模拟来估计替代的响应和预测测量在实证上是值得的：我们的泛化公式的新特殊情况相对于惊讶对人类闭合完成概率以及ELAN、LAN和N400振幅具有增强的预测能力，并且在预测阅读时间方面与惊讶具有更大的互补性。

更新时间: 2024-09-16 21:05:15

领域: cs.CL,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2409.10728v1

Deterministic Bounds in Committee Selection: Enhancing Decentralization and Scalability in Distributed Ledgers

Consensus plays a crucial role in distributed ledger systems, impacting both scalability and decentralization. Many blockchain systems use a weighted lottery based on a scarce resource such as a stake, storage, memory, or computing power to select a committee whose members drive the consensus and are responsible for adding new information to the ledger. Therefore, ensuring a robust and fair committee selection process is essential for maintaining security, efficiency, and decentralization. There are two main approaches to randomized committee selection. In one approach, each validator candidate locally checks whether they are elected to the committee and reveals their proof during the consensus phase. In contrast, in the second approach, a sortition algorithm decides a fixed-sized committee that is globally verified. This paper focuses on the latter approach, with cryptographic sortition as a method for fair committee selection that guarantees a constant committee size. Our goal is to develop deterministic guarantees that strengthen decentralization. We introduce novel methods that provide deterministic bounds on the influence of adversaries within the committee, as evidenced by numerical experiments. This approach overcomes the limitations of existing protocols that only offer probabilistic guarantees, often providing large committees that are impractical for many quorum-based applications like atomic broadcast and randomness beacon protocols.

Updated: 2024-09-16 21:02:59

标题: 在委员会选择中的确定性边界：增强分散化和可扩展性在分布式账本中

摘要: 共识在分布式账本系统中起着至关重要的作用，影响着可扩展性和去中心化。许多区块链系统利用基于稀缺资源（如股权、存储空间、内存或计算能力）的加权彩票来选择一个委员会，该委员会的成员推动共识并负责向账本添加新信息。因此，确保一个强大和公平的委员会选举过程对于维护安全性、效率和去中心化至关重要。有两种主要的随机化委员会选举方法。在一种方法中，每个验证者候选人在本地检查是否被选入委员会，并在共识阶段揭示他们的证据。相反，在第二种方法中，一个分配算法决定一个全局验证的固定大小委员会。本文侧重于后一种方法，采用加密分配作为一种公平委员会选举方法，保证委员会规模恒定。我们的目标是开发加强去中心化的确定性保证。我们引入了一种新颖的方法，提供对委员会内对手影响的确定性边界，这一点在数值实验中得到证明。这种方法克服了现有协议的局限，这些协议只提供概率性保证，通常提供大型委员会，对于许多基于法定人数的应用（如原子广播和随机性信标协议）来说是不切实际的。

更新时间: 2024-09-16 21:02:59

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2409.10727v1

AI Consciousness is Inevitable: A Theoretical Computer Science Perspective

We look at consciousness through the lens of Theoretical Computer Science, a branch of mathematics that studies computation under resource limitations. From this perspective, we develop a formal machine model for consciousness. The model is inspired by Alan Turing's simple yet powerful model of computation and Bernard Baars' theater model of consciousness. Though extremely simple, the model aligns at a high level with many of the major scientific theories of human and animal consciousness, supporting our claim that machine consciousness is inevitable.

Updated: 2024-09-16 20:58:36

标题: AI 意识是不可避免的：一个理论计算机科学的观点

摘要: 我们通过理论计算机科学的视角来研究意识，这是数学的一个分支，研究在资源限制下的计算。从这个角度出发，我们为意识发展了一个形式化的机器模型。该模型受到艾伦·图灵简单而强大的计算模型和伯纳德·巴尔斯意识剧场模型的启发。尽管非常简单，该模型在高层次上与许多主要的人类和动物意识科学理论相一致，支持我们的观点，即机器意识是不可避免的。

更新时间: 2024-09-16 20:58:36

领域: cs.AI,68T01,F.1; I.2

下载: http://arxiv.org/abs/2403.17101v7

Mean-AP Guided Reinforced Active Learning for Object Detection

Active learning strategies aim to train high-performance models with minimal labeled data by selecting the most informative instances for labeling. However, existing methods for assessing data informativeness often fail to align directly with task model performance metrics, such as mean average precision (mAP) in object detection. This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL), a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks, directly optimizing the sampling strategy using mAP. MGRAL employs a reinforcement learning agent based on LSTM architecture to efficiently navigate the combinatorial challenge of batch sample selection and the non-differentiable nature between performance and selected batches. The agent optimizes selection using policy gradient with mAP improvement as the reward signal. To address the computational intensity of mAP estimation with unlabeled samples, we implement fast look-up tables, ensuring real-world feasibility. We evaluate MGRAL on PASCAL VOC and MS COCO benchmarks across various backbone architectures. Our approach demonstrates strong performance, establishing a new paradigm in reinforcement learning-based active learning for object detection.

Updated: 2024-09-16 20:54:36

标题: Mean-AP引导强化主动学习用于目标检测

摘要: 主动学习策略旨在通过选择最具信息量的实例进行标记，以使用最少标记数据训练高性能模型。然而，现有的评估数据信息性的方法往往无法直接与任务模型性能指标（如目标检测中的平均精度mAP）对齐。本文介绍了一种名为Mean-AP Guided Reinforced Active Learning for Object Detection（MGRAL）的新方法，该方法利用了期望的模型输出变化作为深度检测网络的信息性，直接使用mAP优化采样策略。MGRAL利用基于LSTM架构的强化学习代理来有效地导航批量样本选择的组合挑战和性能与所选批次之间的不可微特性。该代理使用策略梯度优化选择，以mAP改进作为奖励信号。为了解决使用未标记样本进行mAP估计的计算密集度，我们实现了快速查找表，确保了现实世界的可行性。我们在PASCAL VOC和MS COCO基准测试上评估了MGRAL在各种骨干架构上的表现。我们的方法表现出色，为基于强化学习的目标检测主动学习建立了新的范式。

更新时间: 2024-09-16 20:54:36

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.08387v2

Scalable Differential Privacy Mechanisms for Real-Time Machine Learning Applications

Large language models (LLMs) are increasingly integrated into real-time machine learning applications, where safeguarding user privacy is paramount. Traditional differential privacy mechanisms often struggle to balance privacy and accuracy, particularly in fast-changing environments with continuously flowing data. To address these issues, we introduce Scalable Differential Privacy (SDP), a framework tailored for real-time machine learning that emphasizes both robust privacy guarantees and enhanced model performance. SDP employs a hierarchical architecture to facilitate efficient noise aggregation across various learning agents. By integrating adaptive noise scheduling and gradient compression methods, our approach minimizes performance degradation while ensuring significant privacy protection. Extensive experiments on diverse datasets reveal that SDP maintains high accuracy levels while applying differential privacy effectively, showcasing its suitability for deployment in sensitive domains. This advancement points towards the potential for widespread adoption of privacy-preserving techniques in machine learning workflows.

Updated: 2024-09-16 20:52:04

标题: 可扩展的实时机器学习应用的差分隐私机制

摘要: 大型语言模型（LLMs）越来越多地被整合到实时机器学习应用程序中，其中用户隐私保护至关重要。传统的差分隐私机制通常很难在快速变化的环境中平衡隐私和准确性，特别是在持续流动数据的情况下。为了解决这些问题，我们引入了可扩展差分隐私（SDP），这是一个专为实时机器学习定制的框架，强调既有强大的隐私保证，又提高了模型性能。SDP采用分层架构，以促进各种学习代理之间的有效噪声聚合。通过集成自适应噪声调度和梯度压缩方法，我们的方法在确保显著隐私保护的同时最大程度地减少了性能下降。对各种数据集进行的广泛实验表明，SDP在应用差分隐私的同时保持了高准确性水平，展示了其适用于敏感领域部署的可行性。这一进步指向着隐私保护技术在机器学习工作流程中广泛应用的潜力。

更新时间: 2024-09-16 20:52:04

领域: cs.CR

下载: http://arxiv.org/abs/2410.02462v1

A Missing Data Imputation GAN for Character Sprite Generation

Creating and updating pixel art character sprites with many frames spanning different animations and poses takes time and can quickly become repetitive. However, that can be partially automated to allow artists to focus on more creative tasks. In this work, we concentrate on creating pixel art character sprites in a target pose from images of them facing other three directions. We present a novel approach to character generation by framing the problem as a missing data imputation task. Our proposed generative adversarial networks model receives the images of a character in all available domains and produces the image of the missing pose. We evaluated our approach in the scenarios with one, two, and three missing images, achieving similar or better results to the state-of-the-art when more images are available. We also evaluate the impact of the proposed changes to the base architecture.

Updated: 2024-09-16 20:50:32

标题: 一个用于角色精灵生成的缺失数据插补生成对抗网络

摘要: 创建和更新跨越不同动画和姿势的多帧像素艺术角色精灵需要时间，可能很快变得重复。然而，这可以部分自动化，让艺术家专注于更有创意的任务。在这项工作中，我们专注于从面向其他三个方向的图像中创建像素艺术角色精灵的目标姿势。我们提出了一种将问题构建为缺失数据插补任务的创新方法。我们提出的生成对抗网络模型接收所有可用领域中角色的图像，并生成缺失姿势的图像。我们在有一个、两个和三个缺失图像的情景中评估了我们的方法，在更多图像可用时实现了类似或更好的结果。我们还评估了对基础架构提出的变化的影响。

更新时间: 2024-09-16 20:50:32

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2409.10721v1

On the effects of similarity metrics in decentralized deep learning under distributional shift

Decentralized Learning (DL) enables privacy-preserving collaboration among organizations or users to enhance the performance of local deep learning models. However, model aggregation becomes challenging when client data is heterogeneous, and identifying compatible collaborators without direct data exchange remains a pressing issue. In this paper, we investigate the effectiveness of various similarity metrics in DL for identifying peers for model merging, conducting an empirical analysis across multiple datasets with distribution shifts. Our research provides insights into the performance of these metrics, examining their role in facilitating effective collaboration. By exploring the strengths and limitations of these metrics, we contribute to the development of robust DL methods.

Updated: 2024-09-16 20:48:16

标题: 关于分布变化下去中心化深度学习中相似性度量的影响

摘要: 分散学习（DL）使组织或用户之间能够进行隐私保护的合作，以提高本地深度学习模型的性能。然而，当客户数据异构时，模型聚合变得具有挑战性，并且在没有直接数据交换的情况下识别兼容的合作者仍然是一个紧迫的问题。本文研究了DL中各种相似度度量的有效性，用于识别用于模型合并的对等方，在多个具有分布偏移的数据集上进行了实证分析。我们的研究提供了有关这些度量的性能的见解，考察了它们在促进有效合作中的作用。通过探索这些度量的优势和局限性，我们为健壮的DL方法的发展做出了贡献。

更新时间: 2024-09-16 20:48:16

领域: cs.LG

下载: http://arxiv.org/abs/2409.10720v1

Online Learning via Memory: Retrieval-Augmented Detector Adaptation

This paper presents a novel way of online adapting any off-the-shelf object detection model to a novel domain without retraining the detector model. Inspired by how humans quickly learn knowledge of a new subject (e.g., memorization), we allow the detector to look up similar object concepts from memory during test time. This is achieved through a retrieval augmented classification (RAC) module together with a memory bank that can be flexibly updated with new domain knowledge. We experimented with various off-the-shelf open-set detector and close-set detectors. With only a tiny memory bank (e.g., 10 images per category) and being training-free, our online learning method could significantly outperform baselines in adapting a detector to novel domains.

Updated: 2024-09-16 20:40:26

标题: 通过记忆进行在线学习：检测器适应的检索增强

摘要: 这篇论文提出了一种新颖的在线适应任何现成的物体检测模型到一个新领域的方法，而无需重新训练检测器模型。受到人类如何快速学习新主题知识（例如，记忆）的启发，我们允许检测器在测试时从记忆中查找类似的物体概念。这通过一个检索增强分类（RAC）模块和一个可以灵活更新新领域知识的存储库实现。我们尝试了各种现成的开集检测器和闭集检测器。只需一个微小的存储库（例如，每个类别10张图像）且无需训练，我们的在线学习方法可以显著优于基线方法，使检测器适应新领域。

更新时间: 2024-09-16 20:40:26

领域: cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.10716v1

Self-Attention Limits Working Memory Capacity of Transformer-Based Models

Recent work on Transformer-based large language models (LLMs) has revealed striking limits in their working memory capacity, similar to what has been found in human behavioral studies. Specifically, these models' performance drops significantly on N-back tasks as N increases. However, there is still a lack of mechanistic interpretability as to why this phenomenon would arise. Inspired by the executive attention theory from behavioral sciences, we hypothesize that the self-attention mechanism within Transformer-based models might be responsible for their working memory capacity limits. To test this hypothesis, we train vanilla decoder-only transformers to perform N-back tasks and find that attention scores gradually aggregate to the N-back positions over training, suggesting that the model masters the task by learning a strategy to pay attention to the relationship between the current position and the N-back position. Critically, we find that the total entropy of the attention score matrix increases as N increases, suggesting that the dispersion of attention scores might be the cause of the capacity limit observed in N-back tasks.

Updated: 2024-09-16 20:38:35

标题: 自注意力限制了基于Transformer模型的工作记忆容量

摘要: 最近对基于Transformer的大型语言模型（LLMs）的研究揭示了它们的工作记忆容量存在明显限制，类似于人类行为研究中发现的情况。具体来说，随着N的增加，这些模型在N-back任务上的表现显著下降。然而，关于为什么会出现这种现象的机制解释仍然缺乏。受行为科学中的执行注意力理论启发，我们假设Transformer模型内部的自注意力机制可能是导致它们的工作记忆容量限制的原因。为了验证这一假设，我们训练了纯解码器的Transformer模型执行N-back任务，并发现注意力得分在训练过程中逐渐聚集到N-back位置，这表明模型通过学习一种策略来关注当前位置和N-back位置之间的关系来掌握任务。关键是，我们发现注意力得分矩阵的总熵随着N的增加而增加，这表明注意力得分的分散可能是导致N-back任务中观察到的容量限制的原因。

更新时间: 2024-09-16 20:38:35

领域: cs.CL,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2409.10715v1

GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation

In autonomous robotics, measurement of the robot's internal state and perception of its environment, including interaction with other agents such as collaborative robots, are essential. Estimating the pose of the robot arm from a single view has the potential to replace classical eye-to-hand calibration approaches and is particularly attractive for online estimation and dynamic environments. In addition to its pose, recovering the robot configuration provides a complete spatial understanding of the observed robot that can be used to anticipate the actions of other agents in advanced robotics use cases. Furthermore, this additional redundancy enables the planning and execution of recovery protocols in case of sensor failures or external disturbances. We introduce GISR - a deep configuration and robot-to-camera pose estimation method that prioritizes execution in real-time. GISR consists of two modules: (i) a geometric initialization module that efficiently computes an approximate robot pose and configuration, and (ii) a deep iterative silhouette-based refinement module that arrives at a final solution in just a few iterations. We evaluate GISR on publicly available data and show that it outperforms existing methods of the same class in terms of both speed and accuracy, and can compete with approaches that rely on ground-truth proprioception and recover only the pose.

Updated: 2024-09-16 20:28:00

标题: GISR：单视角机器人姿态和配置估计的几何初始化和基于轮廓的细化

摘要: 在自主机器人领域，测量机器人的内部状态以及感知其环境，包括与其他代理人（如协作机器人）的交互是至关重要的。从单个视图估计机器人手臂的姿态具有取代传统的眼对手标定方法的潜力，并且特别适用于在线估计和动态环境。除了姿态，恢复机器人配置还提供了对观察到的机器人的完整空间理解，可以用于预测高级机器人用例中其他代理人的行动。此外，这种额外的冗余性使得在传感器故障或外部干扰的情况下能够计划和执行恢复协议。我们介绍了GISR - 一种深度配置和机器人到摄像头姿态估计方法，该方法优先执行实时操作。GISR由两个模块组成：（i）一个几何初始化模块，可以高效地计算出近似机器人姿态和配置，以及（ii）一个基于深度迭代轮廓的精化模块，在几次迭代中就能得到最终解决方案。我们在公开可用的数据上评估了GISR，并展示了它在速度和准确性方面优于同类现有方法，并且可以与依赖于地面真实感知并仅恢复姿态的方法竞争。

更新时间: 2024-09-16 20:28:00

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.04890v3

Enhancing Image Layout Control with Loss-Guided Diffusion Models

Diffusion models are a powerful class of generative models capable of producing high-quality images from pure noise using a simple text prompt. While most methods which introduce additional spatial constraints into the generated images (e.g., bounding boxes) require fine-tuning, a smaller and more recent subset of these methods take advantage of the models' attention mechanism, and are training-free. These methods generally fall into one of two categories. The first entails modifying the cross-attention maps of specific tokens directly to enhance the signal in certain regions of the image. The second works by defining a loss function over the cross-attention maps, and using the gradient of this loss to guide the latent. While previous work explores these as alternative strategies, we provide an interpretation for these methods which highlights their complimentary features, and demonstrate that it is possible to obtain superior performance when both methods are used in concert.

Updated: 2024-09-16 20:20:30

标题: 用损失引导扩散模型增强图像布局控制

摘要: 扩散模型是一类强大的生成模型，能够使用简单的文本提示从纯噪声生成高质量的图像。虽然大多数引入额外空间约束的方法（例如，边界框）需要进行微调，但其中一小部分较新的方法利用模型的注意机制，无需进行训练。这些方法通常分为两类。第一种是直接修改特定标记的交叉注意力图，以增强图像某些区域的信号。第二种是通过在交叉注意力图上定义损失函数，并利用该损失的梯度来引导潜在特征。虽然先前的研究将这些作为替代策略进行探讨，我们提供了对这些方法的解释，强调它们互补的特点，并证明当这两种方法结合使用时可以获得更优异的性能表现。

更新时间: 2024-09-16 20:20:30

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2405.14101v2

Self-supervised Speech Models for Word-Level Stuttered Speech Detection

Clinical diagnosis of stuttering requires an assessment by a licensed speech-language pathologist. However, this process is time-consuming and requires clinicians with training and experience in stuttering and fluency disorders. Unfortunately, only a small percentage of speech-language pathologists report being comfortable working with individuals who stutter, which is inadequate to accommodate for the 80 million individuals who stutter worldwide. Developing machine learning models for detecting stuttered speech would enable universal and automated screening for stuttering, enabling speech pathologists to identify and follow up with patients who are most likely to be diagnosed with a stuttering speech disorder. Previous research in this area has predominantly focused on utterance-level detection, which is not sufficient for clinical settings where word-level annotation of stuttering is the norm. In this study, we curated a stuttered speech dataset with word-level annotations and introduced a word-level stuttering speech detection model leveraging self-supervised speech models. Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection. Additionally, we conducted an extensive ablation analysis of our method, providing insight into the most important aspects of adapting self-supervised speech models for stuttered speech detection.

Updated: 2024-09-16 20:18:20

标题: 自监督语音模型用于单词级口吃语音检测

摘要: 口吃的临床诊断需要由持有许可证的言语病理学家进行评估。然而，这一过程耗时且需要接受口吃和流利障碍培训和经验的临床医生。不幸的是，只有很少一部分言语病理学家报告说他们习惯于与口吃者合作，这无法满足全球8千万口吃者的需求。开发用于检测口吃言语的机器学习模型将实现普遍和自动化的口吃筛查，使言语病理学家能够识别并跟踪最有可能被诊断为口吃言语障碍的患者。该领域的先前研究主要集中在话语级别的检测上，这对于在词级别注释口吃在临床设置中是不足够的。在这项研究中，我们筛选了一个带有词级注释的口吃言语数据集，并引入了一个利用自监督言语模型的词级口吃言语检测模型。我们的评估表明，我们的模型在词级口吃言语检测方面超过了先前的方法。此外，我们对我们的方法进行了广泛的消融分析，为自监督言语模型用于口吃言语检测的最重要方面提供了见解。

更新时间: 2024-09-16 20:18:20

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2409.10704v1

Give and Take: An End-To-End Investigation of Giveaway Scam Conversion Rates

Scams -- fraudulent schemes designed to swindle money from victims -- have existed for as long as recorded history. However, the Internet's combination of low communication cost, global reach, and functional anonymity has allowed scam volumes to reach new heights. Designing effective interventions requires first understanding the context: how scammers reach potential victims, the earnings they make, and any potential bottlenecks for durable interventions. In this short paper, we focus on these questions in the context of cryptocurrency giveaway scams, where victims are tricked into irreversibly transferring funds to scammers under the pretense of even greater returns. Combining data from Twitter, YouTube and Twitch livestreams, landing pages, and cryptocurrency blockchains, we measure how giveaway scams operate at scale. We find that 1 in 1000 scam tweets, and 4 in 100,000 livestream views, net a victim, and that scammers managed to extract nearly \$4.62 million from just hundreds of victims during our measurement window.

Updated: 2024-09-16 20:11:25

标题: 互惠共赢：赠送骗局转化率的端到端调查

摘要: 骗局--旨在从受害者那里骗取钱财的欺诈计划--自有记载历史以来就一直存在。然而，互联网低成本的沟通、全球覆盖范围和功能性匿名性的结合使得骗局数量达到了新的高度。设计有效的干预措施首先需要了解背景：骗子如何接触潜在受害者、他们的收入以及任何耐久干预的潜在瓶颈。在这篇简短的论文中，我们关注加密货币赠送骗局的情况，受害者被欺骗以不可逆转地将资金转移给骗子，声称能获得更大的回报。结合来自Twitter、YouTube和Twitch直播、落地页和加密货币区块链的数据，我们测量了赠送骗局的规模运作方式。我们发现每1000条骗局推文和每10万次直播观看中就有一个受害者，而在我们的测量时间窗口内，骗子设法从仅数百名受害者那里提取了近462万美元。

更新时间: 2024-09-16 20:11:25

领域: cs.CR

下载: http://arxiv.org/abs/2405.09757v2

On Evaluation Protocols for Data Augmentation in a Limited Data Scenario

Textual data augmentation (DA) is a prolific field of study where novel techniques to create artificial data are regularly proposed, and that has demonstrated great efficiency on small data settings, at least for text classification tasks. In this paper, we challenge those results, showing that classical data augmentation (which modify sentences) is simply a way of performing better fine-tuning, and that spending more time doing so before applying data augmentation negates its effect. This is a significant contribution as it answers several questions that were left open in recent years, namely~: which DA technique performs best (all of them as long as they generate data close enough to the training set, as to not impair training) and why did DA show positive results (facilitates training of network). We further show that zero- and few-shot DA via conversational agents such as ChatGPT or LLama2 can increase performances, confirming that this form of data augmentation is preferable to classical methods.

Updated: 2024-09-16 20:11:19

标题: 关于有限数据情况下数据增强评估协议的研究

摘要: 文本数据增强（DA）是一个繁荣的研究领域，常常提出新颖的技术来创建人工数据，并且已经在小数据环境下展现出很高的效率，至少对于文本分类任务来说是如此。在本文中，我们挑战这些结果，表明经典的数据增强（修改句子）只是一种更好地微调的方式，并且在应用数据增强之前花更多时间进行微调会抵消其效果。这是一项重大的贡献，因为它回答了近年来一直存在的几个问题，即：哪种DA技术表现最佳（只要它们生成的数据足够接近训练集，不影响训练）以及为什么DA表现出积极的结果（有助于网络的训练）。我们进一步展示，通过像ChatGPT或LLama2这样的对话代理进行零次和少次数据增强可以提高性能，证实这种形式的数据增强优于传统方法。

更新时间: 2024-09-16 20:11:19

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.14895v2

Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs

The growing demand for AI training data has transformed data annotation into a global industry, but traditional approaches relying on human annotators are often time-consuming, labor-intensive, and prone to inconsistent quality. We propose the Model-in-the-Loop (MILO) framework, which integrates AI/ML models into the annotation process. Our research introduces a collaborative paradigm that leverages the strengths of both professional human annotators and large language models (LLMs). By employing LLMs as pre-annotation and real-time assistants, and judges on annotator responses, MILO enables effective interaction patterns between human annotators and LLMs. Three empirical studies on multimodal data annotation demonstrate MILO's efficacy in reducing handling time, improving data quality, and enhancing annotator experiences. We also introduce quality rubrics for flexible evaluation and fine-grained feedback on open-ended annotations. The MILO framework has implications for accelerating AI/ML development, reducing reliance on human annotation alone, and promoting better alignment between human and machine values.

Updated: 2024-09-16 20:05:57

标题: 连接模型（MILO）：利用LLMs加速多模态AI数据标注

摘要: 对于AI训练数据的不断增长需求已经将数据标注转变为一个全球性产业，但传统依赖人类标注者的方法往往耗时、劳动密集且容易出现质量不一致的问题。我们提出了Model-in-the-Loop（MILO）框架，将AI/ML模型整合到标注过程中。我们的研究引入了一种协作范式，充分利用专业人类标注者和大型语言模型（LLM）的优势。通过将LLM用作预标注和实时助手，并作为标注者响应的评判者，MILO实现了人类标注者和LLM之间的有效互动模式。在多模态数据标注的三项实证研究中展示了MILO在减少处理时间、提高数据质量和增强标注者体验方面的有效性。我们还引入了质量评分标准，用于对开放式标注进行灵活评估和细致反馈。MILO框架对于加速AI/ML的发展、减少对人类标注的依赖以及促进人类与机器价值观之间更好的协调具有重要意义。

更新时间: 2024-09-16 20:05:57

领域: cs.HC,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.10702v1

A Green Multi-Attribute Client Selection for Over-The-Air Federated Learning: A Grey-Wolf-Optimizer Approach

Federated Learning (FL) has gained attention across various industries for its capability to train machine learning models without centralizing sensitive data. While this approach offers significant benefits such as privacy preservation and decreased communication overhead, it presents several challenges, including deployment complexity and interoperability issues, particularly in heterogeneous scenarios or resource-constrained environments. Over-the-air (OTA) FL was introduced to tackle these challenges by disseminating model updates without necessitating direct device-to-device connections or centralized servers. However, OTA-FL brought forth limitations associated with heightened energy consumption and network latency. In this paper, we propose a multi-attribute client selection framework employing the grey wolf optimizer (GWO) to strategically control the number of participants in each round and optimize the OTA-FL process while considering accuracy, energy, delay, reliability, and fairness constraints of participating devices. We evaluate the performance of our multi-attribute client selection approach in terms of model loss minimization, convergence time reduction, and energy efficiency. In our experimental evaluation, we assessed and compared the performance of our approach against the existing state-of-the-art methods. Our results demonstrate that the proposed GWO-based client selection outperforms these baselines across various metrics. Specifically, our approach achieves a notable reduction in model loss, accelerates convergence time, and enhances energy efficiency while maintaining high fairness and reliability indicators.

Updated: 2024-09-16 20:03:57

标题: 一种用于联邦学习的绿色多属性客户端选择：一种灰狼优化器方法

摘要: 联邦学习（FL）已经引起各行各业的关注，因为它能够在不集中敏感数据的情况下训练机器学习模型。虽然这种方法提供了诸如隐私保护和减少通信开销等重要好处，但它也带来了一些挑战，包括部署复杂性和互操作性问题，尤其是在异构场景或资源受限环境中。通过进行OTA（Over-the-air）FL来传播模型更新，而无需直接设备到设备的连接或集中式服务器，从而解决了这些挑战。然而，OTA-FL带来的局限性包括增加的能耗和网络延迟。本文提出了一个多属性客户端选择框架，采用灰狼优化器（GWO）来策略性地控制每轮参与者的数量，并优化OTA-FL过程，同时考虑参与设备的准确性、能耗、延迟、可靠性和公平性约束。我们通过模型损失最小化、收敛时间缩短和能源效率等指标来评估我们的多属性客户端选择方法的性能。在实验评估中，我们评估并比较了我们的方法与现有的最先进方法的性能。我们的结果表明，所提出的基于GWO的客户端选择方法在各种指标上优于这些基准。具体而言，我们的方法实现了显著的模型损失减少，加快了收敛时间，并提高了能源效率，同时保持了高公平性和可靠性指标。

更新时间: 2024-09-16 20:03:57

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2409.11442v1

Using Generative Models to Produce Realistic Populations of the United Kingdom Windstorms

Windstorms significantly impact the UK, causing extensive damage to property, disrupting society, and potentially resulting in loss of life. Accurate modelling and understanding of such events are essential for effective risk assessment and mitigation. However, the rarity of extreme windstorms results in limited observational data, which poses significant challenges for comprehensive analysis and insurance modelling. This dissertation explores the application of generative models to produce realistic synthetic wind field data, aiming to enhance the robustness of current CAT models used in the insurance industry. The study utilises hourly reanalysis data from the ERA5 dataset, which covers the period from 1940 to 2022. Three models, including standard GANs, WGAN-GP, and U-net diffusion models, were employed to generate high-quality wind maps of the UK. These models are then evaluated using multiple metrics, including SSIM, KL divergence, and EMD, with some assessments performed in a reduced dimensionality space using PCA. The results reveal that while all models are effective in capturing the general spatial characteristics, each model exhibits distinct strengths and weaknesses. The standard GAN introduced more noise compared to the other models. The WGAN-GP model demonstrated superior performance, particularly in replicating statistical distributions. The U-net diffusion model produced the most visually coherent outputs but struggled slightly in replicating peak intensities and their statistical variability. This research underscores the potential of generative models in supplementing limited reanalysis datasets with synthetic data, providing valuable tools for risk assessment and catastrophe modelling. However, it is important to select appropriate evaluation metrics that assess different aspects of the generated outputs. Future work could refine these models and incorporate more ...

Updated: 2024-09-16 19:53:33

标题: 使用生成模型生成真实的英国风暴人口

摘要: 风暴对英国造成了重大影响，对财产造成了广泛破坏，扰乱了社会秩序，可能导致人员伤亡。准确建模和理解此类事件对于有效的风险评估和减灾至关重要。然而，极端风暴的罕见性导致观测数据有限，这给全面分析和保险建模带来了重大挑战。本论文探讨了生成模型在产生真实合成风场数据方面的应用，旨在增强当前保险行业中使用的CAT模型的稳健性。研究利用了来自ERA5数据集的每小时再分析数据，该数据集覆盖了1940年至2022年的时间段。三种模型，包括标准GANs、WGAN-GP和U-net扩散模型，被用来生成英国的高质量风场地图。然后使用多个指标对这些模型进行评估，包括SSIM、KL散度和EMD，有些评估在使用PCA进行降维空间时进行。结果显示，虽然所有模型都能有效捕捉一般空间特征，但每个模型都展现出各自的优势和劣势。标准GAN与其他模型相比引入了更多噪音。WGAN-GP模型表现出更优越的性能，特别是在复制统计分布方面。U-net扩散模型产生了最具视觉连贯性的输出，但在复制峰值强度及其统计变异性方面略显困难。这项研究强调了生成模型在用合成数据补充有限再分析数据集方面的潜力，为风险评估和灾难建模提供了有价值的工具。然而，选择评估生成输出不同方面的适当评估指标至关重要。未来的工作可以进一步完善这些模型并融入更多...

更新时间: 2024-09-16 19:53:33

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2409.10696v1

Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models

We introduce Playground v3 (PGv3), our latest text-to-image model that achieves state-of-the-art (SoTA) performance across multiple testing benchmarks, excels in graphic design abilities and introduces new capabilities. Unlike traditional text-to-image generative models that rely on pre-trained language models like T5 or CLIP text encoders, our approach fully integrates Large Language Models (LLMs) with a novel structure that leverages text conditions exclusively from a decoder-only LLM. Additionally, to enhance image captioning quality-we developed an in-house captioner, capable of generating captions with varying levels of detail, enriching the diversity of text structures. We also introduce a new benchmark CapsBench to evaluate detailed image captioning performance. Experimental results demonstrate that PGv3 excels in text prompt adherence, complex reasoning, and accurate text rendering. User preference studies indicate the super-human graphic design ability of our model for common design applications, such as stickers, posters, and logo designs. Furthermore, PGv3 introduces new capabilities, including precise RGB color control and robust multilingual understanding.

Updated: 2024-09-16 19:52:24

标题: 游乐场v3：利用深度融合大型语言模型改进文本到图像的对齐

摘要: 我们介绍了我们最新的文本到图像模型Playground v3（PGv3），该模型在多个测试基准上实现了最先进的性能，擅长图形设计能力并引入了新的功能。与传统的依赖于预训练语言模型（如T5或CLIP文本编码器）的文本到图像生成模型不同，我们的方法完全整合了大型语言模型（LLMs），并利用了一种新颖的结构，该结构完全依赖于仅具有解码器的LLM中的文本条件。此外，为了提高图像字幕质量，我们开发了一个内部字幕生成器，能够生成具有不同细节级别的字幕，丰富了文本结构的多样性。我们还引入了一个新的基准CapsBench来评估详细的图像字幕性能。实验结果表明，PGv3在文本提示遵循、复杂推理和准确文本渲染方面表现出色。用户偏好研究表明，我们的模型在常见设计应用程序（如贴纸、海报和标志设计）中具有超人的图形设计能力。此外，PGv3还引入了新的功能，包括精确的RGB颜色控制和强大的多语言理解能力。

更新时间: 2024-09-16 19:52:24

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2409.10695v1

Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach

This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.

Updated: 2024-09-16 19:52:21

标题: 孟加拉文档布局分析--基于YOLOV8的组合方法

摘要: 这篇论文着重于利用YOLOv8模型和创新的后处理技术增强孟加拉文档布局分析（DLA）。我们通过数据增强来应对复杂的孟加拉文字所独有的挑战，以提高模型的鲁棒性。经过细致的验证集评估后，我们对完整数据集进行了微调，采用两阶段预测策略进行准确的元素分割。我们的集成模型结合后处理优于单独的基础架构，解决了BaDLAD数据集中的问题。通过利用这种方法，我们旨在推动孟加拉文档分析，为改进OCR和文档理解做出贡献，BaDLAD作为这一努力的基础资源，有助于未来在该领域的研究。此外，我们的实验为将新策略纳入已建立的解决方案提供了关键见解。

更新时间: 2024-09-16 19:52:21

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2309.00848v4

Mitigating Partial Observability in Adaptive Traffic Signal Control with Transformers

Efficient traffic signal control is essential for managing urban transportation, minimizing congestion, and improving safety and sustainability. Reinforcement Learning (RL) has emerged as a promising approach to enhancing adaptive traffic signal control (ATSC) systems, allowing controllers to learn optimal policies through interaction with the environment. However, challenges arise due to partial observability (PO) in traffic networks, where agents have limited visibility, hindering effectiveness. This paper presents the integration of Transformer-based controllers into ATSC systems to address PO effectively. We propose strategies to enhance training efficiency and effectiveness, demonstrating improved coordination capabilities in real-world scenarios. The results showcase the Transformer-based model's ability to capture significant information from historical observations, leading to better control policies and improved traffic flow. This study highlights the potential of leveraging the advanced Transformer architecture to enhance urban transportation management.

Updated: 2024-09-16 19:46:15

标题: 使用变压器来减轻自适应交通信号控制中的部分可观测性

摘要: 高效的交通信号控制对于管理城市交通、减少拥堵、提升安全性和可持续性至关重要。强化学习（RL）已被证实是增强自适应交通信号控制（ATSC）系统的一种有前途的方法，使控制器通过与环境的互动学习最佳策略。然而，在交通网络中由于部分可观测性（PO）而出现挑战，代理人的可见性有限，影响效果。本文介绍了将基于Transformer的控制器整合到ATSC系统中，以有效解决PO问题。我们提出了增强训练效率和效果的策略，并展示了在现实场景中提高协调能力的结果。结果展示了基于Transformer模型从历史观察中捕获重要信息的能力，从而制定出更好的控制策略并改善交通流量。本研究突显了利用先进的Transformer架构提升城市交通管理潜力的可能性。

更新时间: 2024-09-16 19:46:15

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.10693v1

Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

The foundational capabilities of humanoid robots should include robustly standing, walking, and mimicry of whole and partial-body motions. This work introduces the Masked Humanoid Controller (MHC), which supports all of these capabilities by tracking target trajectories over selected subsets of humanoid state variables while ensuring balance and robustness against disturbances. The MHC is trained in simulation using a carefully designed curriculum that imitates partially masked motions from a library of behaviors spanning standing, walking, optimized reference trajectories, re-targeted video clips, and human motion capture data. It also allows for combining joystick-based control with partial-body motion mimicry. We showcase simulation experiments validating the MHC's ability to execute a wide variety of behaviors from partially-specified target motions. Moreover, we demonstrate sim-to-real transfer on the real-world Digit V3 humanoid robot. To our knowledge, this is the first instance of a learned controller that can realize whole-body control of a real-world humanoid for such diverse multi-modal targets.

Updated: 2024-09-16 19:41:39

标题: 学习多模态全身控制以实现真实世界中人形机器人

摘要: 人形机器人的基本能力应该包括稳健的站立、行走和全身或部分身体动作的模仿。本文介绍了Masked Humanoid Controller（MHC），通过在人形机器人状态变量的选定子集上跟踪目标轨迹，同时确保对抗干扰，支持所有这些能力。MHC在模拟环境中经过精心设计的课程训练，模仿了来自行为库的部分遮盖动作，包括站立、行走、优化参考轨迹、重新定向的视频剪辑和人体运动捕捉数据。它还允许将基于操纵杆的控制与部分身体动作模仿相结合。我们展示了验证MHC能够执行各种部分指定目标动作的模拟实验。此外，我们展示了在真实世界的Digit V3人形机器人上进行模拟到真实的迁移。据我们所知，这是第一个学习控制器的实例，可以实现对真实世界人形机器人的整体控制，以实现如此多样的多模式目标。

更新时间: 2024-09-16 19:41:39

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2408.07295v2

Encoding Reusable Multi-Robot Planning Strategies as Abstract Hypergraphs

Multi-Robot Task Planning (MR-TP) is the search for a discrete-action plan a team of robots should take to complete a task. The complexity of such problems scales exponentially with the number of robots and task complexity, making them challenging for online solution. To accelerate MR-TP over a system's lifetime, this work looks at combining two recent advances: (i) Decomposable State Space Hypergraph (DaSH), a novel hypergraph-based framework to efficiently model and solve MR-TP problems; and \mbox{(ii) learning-by-abstraction,} a technique that enables automatic extraction of generalizable planning strategies from individual planning experiences for later reuse. Specifically, we wish to extend this strategy-learning technique, originally designed for single-robot planning, to benefit multi-robot planning using hypergraph-based MR-TP.

Updated: 2024-09-16 19:39:52

标题: 将可重用的多机器人规划策略编码为抽象的超图

摘要: 多机器人任务规划（MR-TP）是寻找一个团队机器人应该采取的离散动作计划以完成任务。这类问题的复杂性随着机器人数量和任务复杂性呈指数级增长，使其对在线解决方案具有挑战性。为加速系统寿命内的MR-TP，本文研究了结合两个最近的进展：（i）可分解状态空间超图（DaSH），一种基于超图的新颖框架，可以高效地建模和解决MR-TP问题；以及（ii）通过抽象学习，一种技术，可以自动从单个规划经验中提取可推广的规划策略，以供以后重复使用。具体来说，我们希望将最初设计用于单机器人规划的这种策略学习技术扩展到利用基于超图的MR-TP进行多机器人规划的情况。

更新时间: 2024-09-16 19:39:52

领域: cs.RO,cs.AI,cs.MA

下载: http://arxiv.org/abs/2409.10692v1

Overcoming the Stability Gap in Continual Learning

Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users; however, a major problem is model decay, where the DNN's predictions become more erroneous over time, resulting in revenue loss or unhappy users. To mitigate model decay, DNNs are retrained from scratch using old and new data. This is computationally expensive, so retraining happens only once performance significantly decreases. Here, we study how continual learning (CL) could potentially overcome model decay in large pre-trained DNNs and greatly reduce computational costs for keeping DNNs up-to-date. We identify the "stability gap" as a major obstacle in our setting. The stability gap refers to a phenomenon where learning new data causes large drops in performance for past tasks before CL mitigation methods eventually compensate for this drop. We test two hypotheses to investigate the factors influencing the stability gap and identify a method that vastly reduces this gap. In large-scale experiments for both easy and hard CL distributions (e.g., class incremental learning), we demonstrate that our method reduces the stability gap and greatly increases computational efficiency. Our work aligns CL with the goals of the production setting, where CL is needed for many applications.

Updated: 2024-09-16 19:32:48

标题: 克服持续学习中的稳定性差距

摘要: 预训练的深度神经网络（DNNs）正在被业界广泛部署，用于制定业务决策和为用户提供服务；然而，一个主要问题是模型衰减，即随着时间的推移，DNN的预测变得更加错误，导致收入损失或用户不满意。为了减轻模型衰减，DNNs会使用旧数据和新数据从头开始重新训练。这在计算上是昂贵的，因此只有在性能显著下降时才会重新训练。在这里，我们研究了持续学习（CL）如何潜在地克服大型预训练DNN中的模型衰减，并极大降低维持DNN更新的计算成本。我们确定了“稳定性差距”作为我们设置中的一个主要障碍。稳定性差距是指学习新数据导致过去任务性能大幅下降的现象，在CL缓解方法最终补偿这种下降之前。我们测试了两个假设来研究影响稳定性差距的因素，并确定了一种大大减少这种差距的方法。在大规模实验中，我们展示了我们的方法降低了稳定性差距，并极大提高了计算效率。我们的工作将CL与生产环境的目标对齐，其中CL在许多应用中都是必需的。

更新时间: 2024-09-16 19:32:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.01904v4

Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting

We are motivated primarily by the adaptation of text-to-speech synthesis models; however we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. Nevertheless, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker-factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker-factored approximation produces a better preservation of the pre-training knowledge than the diagonal ones.

Updated: 2024-09-16 19:31:47

标题: 贝叶斯参数高效微调以克服灾难性遗忘

摘要: 我们的主要动机是文本转语音合成模型的适应性；然而，我们认为更通用的参数高效微调（PEFT）是进行这种适应的适当框架。然而，灾难性遗忘仍然是PEFT的一个问题，损害了预训练模型的固有能力。我们证明，只要可以可微地计算微调层的参数移位，现有的贝叶斯学习技术可以应用于PEFT来防止灾难性遗忘。在一系列关于语言建模和语音合成任务的原则性实验中，我们利用建立的拉普拉斯近似，包括对角线和克罗内克因子方法，来规范PEFT与低秩适应（LoRA），并比较它们在预训练知识保留方面的性能。我们的结果表明，我们的方法可以克服灾难性遗忘，而不降低微调性能，并且使用克罗内克因子化近似比对角线近似更好地保留了预训练知识。

更新时间: 2024-09-16 19:31:47

领域: eess.AS,cs.LG

下载: http://arxiv.org/abs/2402.12220v2

MotIF: Motion Instruction Fine-tuning

While success in many robotics tasks can be determined by only observing the final state and how it differs from the initial state - e.g., if an apple is picked up - many tasks require observing the full motion of the robot to correctly determine success. For example, brushing hair requires repeated strokes that correspond to the contours and type of hair. Prior works often use off-the-shelf vision-language models (VLMs) as success detectors; however, when success depends on the full trajectory, VLMs struggle to make correct judgments for two reasons. First, modern VLMs are trained only on single frames, and cannot capture changes over a full trajectory. Second, even if we provide state-of-the-art VLMs with an aggregate input of multiple frames, they still fail to detect success due to a lack of robot data. Our key idea is to fine-tune VLMs using abstract representations that are able to capture trajectory-level information such as the path the robot takes by overlaying keypoint trajectories on the final image. We propose motion instruction fine-tuning (MotIF), a method that fine-tunes VLMs using the aforementioned abstract representations to semantically ground the robot's behavior in the environment. To benchmark and fine-tune VLMs for robotic motion understanding, we introduce the MotIF-1K dataset containing 653 human and 369 robot demonstrations across 13 task categories. MotIF assesses the success of robot motion given the image observation of the trajectory, task instruction, and motion description. Our model significantly outperforms state-of-the-art VLMs by at least twice in precision and 56.1% in recall, generalizing across unseen motions, tasks, and environments. Finally, we demonstrate practical applications of MotIF in refining and terminating robot planning, and ranking trajectories on how they align with task and motion descriptions. Project page: https://motif-1k.github.io

Updated: 2024-09-16 19:30:21

标题: MotIF: 动作指导微调

摘要: 虽然在许多机器人任务中成功可以通过观察最终状态及其与初始状态的差异来确定 - 例如，如果一个苹果被拾取 - 但许多任务需要观察机器人的完整运动来正确确定成功。例如，梳头需要重复的刷动，这些刷动对应于头发的轮廓和类型。先前的研究通常将现成的视觉语言模型（VLMs）用作成功检测器；然而，当成功取决于完整轨迹时，VLMs由于两个原因难以做出正确判断。首先，现代VLMs只在单帧上训练，无法捕捉完整轨迹上的变化。其次，即使我们给最先进的VLMs提供多帧的聚合输入，它们仍无法检测成功，因为缺乏机器人数据。我们的关键思想是使用能够捕捉轨迹级信息的抽象表示来微调VLMs，例如通过在最终图像上叠加关键点轨迹来捕捉机器人所采取的路径。我们提出了运动指令微调（MotIF），一种通过使用上述抽象表示来微调VLMs，以在环境中对机器人的行为进行语义基础化。为了对机器人运动理解进行基准测试和微调VLMs，我们引入了包含13个任务类别的653个人类和369个机器人演示的MotIF-1K数据集。MotIF评估了基于图像轨迹观察、任务说明和运动描述的机器人运动成功。我们的模型在精度上至少超过最先进的VLMs两倍，在召回率上超过56.1％，在未见过的运动、任务和环境中具有泛化能力。最后，我们展示了MotIF在优化和终止机器人规划以及根据任务和运动描述对轨迹进行排名的实际应用。项目页面：https://motif-1k.github.io

更新时间: 2024-09-16 19:30:21

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.10683v1

Infiltrating the Sky: Data Delay and Overflow Attacks in Earth Observation Constellations

Low Earth Orbit (LEO) Earth Observation (EO) satellites have changed the way we monitor Earth. Acting like moving cameras, EO satellites are formed in constellations with different missions and priorities, and capture vast data that needs to be transmitted to the ground for processing. However, EO satellites have very limited downlink communication capability, limited by transmission bandwidth, number and location of ground stations, and small transmission windows due to high velocity satellite movement. To optimize resource utilization, EO constellations are expected to share communication spectrum and ground stations for maximum communication efficiency. In this paper, we investigate a new attack surface exposed by resource competition in EO constellations, targeting the delay or drop of Earth monitoring data using legitimate EO services. Specifically, an attacker can inject high-priority requests to temporarily preempt low-priority data transmission windows. Furthermore, we show that by utilizing predictable satellite dynamics, an attacker can intelligently target critical data from low-priority satellites, either delaying its delivery or irreversibly dropping the data. We formulate two attacks, the data delay attack and the data overflow attack, design algorithms to assist attackers in devising attack strategies, and analyze their feasibility or optimality in typical scenarios. We then conduct trace-driven simulations using real-world satellite images and orbit data to evaluate the success probability of launching these attacks under realistic satellite communication settings. We also discuss possible defenses against these attacks.

Updated: 2024-09-16 19:27:56

标题: 渗透天空：地球观测星座中的数据延迟和溢出攻击

摘要: 低地球轨道（LEO）地球观测（EO）卫星已经改变了我们监测地球的方式。地球观测卫星就像移动摄像头，以不同的任务和优先级形成星座，并捕获大量数据需要传输到地面进行处理。然而，地球观测卫星的下行通信能力非常有限，受到传输带宽、地面站数量和位置以及高速卫星运动导致的小传输窗口的限制。为了优化资源利用，地球观测卫星星座应该共享通信频谱和地面站以实现最大的通信效率。在本文中，我们调查了地球观测卫星星座中由资源竞争暴露出的新攻击面，针对使用合法地球观测服务延迟或丢弃地球监测数据。具体来说，攻击者可以注入高优先级请求暂时抢占低优先级数据传输窗口。此外，我们展示了通过利用可预测的卫星动态，攻击者可以智能地针对低优先级卫星的关键数据，延迟其传递或者永久丢弃数据。我们制定了两种攻击，数据延迟攻击和数据溢出攻击，设计算法来帮助攻击者制定攻击策略，并分析它们在典型场景中的可行性或最优性。然后，我们进行了基于实际卫星图像和轨道数据的跟踪驱动模拟，评估在真实卫星通信设置下发动这些攻击的成功概率。我们还讨论了可能的防御措施。

更新时间: 2024-09-16 19:27:56

领域: cs.NI,cs.CR,cs.ET

下载: http://arxiv.org/abs/2409.00897v2

Multi-agent Path Finding in Continuous Environment

We address a variant of multi-agent path finding in continuous environment (CE-MAPF), where agents move along sets of smooth curves. Collisions between agents are resolved via avoidance in the space domain. A new Continuous Environment Conflict-Based Search (CE-CBS) algorithm is proposed in this work. CE-CBS combines conflict-based search (CBS) for the high-level search framework with RRT* for low-level path planning. The CE-CBS algorithm is tested under various settings on diverse CE-MAPF instances. Experimental results show that CE-CBS is competitive w.r.t. to other algorithms that consider continuous aspect in MAPF such as MAPF with continuous time.

Updated: 2024-09-16 19:23:04

标题: 连续环境中的多智能体路径规划

摘要: 我们讨论了在连续环境中的多智能体路径规划的一个变种（CE-MAPF），其中智能体沿着一组平滑曲线移动。智能体之间的碰撞通过空间域中的避让来解决。本文提出了一种新的连续环境冲突基础搜索（CE-CBS）算法。CE-CBS将基于冲突的搜索（CBS）用于高层搜索框架，与RRT*结合用于低层路径规划。CE-CBS算法在不同设置下对各种CE-MAPF实例进行了测试。实验结果表明，CE-CBS在考虑连续时间等连续方面的其他MAPF算法方面具有竞争力。

更新时间: 2024-09-16 19:23:04

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2409.10680v1

Mitigating Sex Bias in Audio Data-driven COPD and COVID-19 Breathing Pattern Detection Models

In the healthcare industry, researchers have been developing machine learning models to automate diagnosing patients with respiratory illnesses based on their breathing patterns. However, these models do not consider the demographic biases, particularly sex bias, that often occur when models are trained with a skewed patient dataset. Hence, it is essential in such an important industry to reduce this bias so that models can make fair diagnoses. In this work, we examine the bias in models used to detect breathing patterns of two major respiratory diseases, i.e., chronic obstructive pulmonary disease (COPD) and COVID-19. Using decision tree models trained with audio recordings of breathing patterns obtained from two open-source datasets consisting of 29 COPD and 680 COVID-19-positive patients, we analyze the effect of sex bias on the models. With a threshold optimizer and two constraints (demographic parity and equalized odds) to mitigate the bias, we witness 81.43% (demographic parity difference) and 71.81% (equalized odds difference) improvements. These findings are statistically significant.

Updated: 2024-09-16 19:20:11

标题: 缓解音频数据驱动的COPD和COVID-19呼吸模式检测模型中的性别偏见

摘要: 在医疗保健行业，研究人员一直在开发机器学习模型，根据患者的呼吸模式自动诊断呼吸道疾病。然而，这些模型并未考虑到在训练时使用倾斜的患者数据集时经常出现的人口统计偏见，特别是性别偏见。因此，在这样一个重要的行业中，减少这种偏见至关重要，以便模型能够进行公平的诊断。在这项工作中，我们研究了用于检测两种主要呼吸道疾病（慢性阻塞性肺病（COPD）和COVID-19）呼吸模式的模型中的偏见。使用基于两个开源数据集获得的29例COPD患者和680例COVID-19阳性患者的呼吸模式音频录音训练的决策树模型，我们分析了性别偏见对模型的影响。通过使用阈值优化器和两个约束条件（人口统计平等和均衡赔率）来减轻偏见，我们观察到81.43%（人口统计平等差异）和71.81%（均衡赔率差异）的改进。这些发现在统计上是显著的。

更新时间: 2024-09-16 19:20:11

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2409.10677v1

Toward Mitigating Sex Bias in Pilot Trainees' Stress and Fatigue Modeling

While researchers have been trying to understand the stress and fatigue among pilots, especially pilot trainees, and to develop stress/fatigue models to automate the process of detecting stress/fatigue, they often do not consider biases such as sex in those models. However, in a critical profession like aviation, where the demographic distribution is disproportionately skewed to one sex, it is urgent to mitigate biases for fair and safe model predictions. In this work, we investigate the perceived stress/fatigue of 69 college students, including 40 pilot trainees with around 63% male. We construct models with decision trees first without bias mitigation and then with bias mitigation using a threshold optimizer with demographic parity and equalized odds constraints 30 times with random instances. Using bias mitigation, we achieve improvements of 88.31% (demographic parity difference) and 54.26% (equalized odds difference), which are also found to be statistically significant.

Updated: 2024-09-16 19:19:12

标题: 朝向减轻飞行员实习生的压力和疲劳建模中的性别偏见

摘要: 研究人员一直在努力理解飞行员，尤其是飞行员实习生之间的压力和疲劳，并开发压力/疲劳模型来自动检测压力/疲劳的过程。然而，他们在这些模型中通常没有考虑到性别等偏见。然而，在诸如航空这样一个人口分布严重倾斜向一性别的关键职业中，迫切需要减少偏见以实现公平和安全的模型预测。在这项工作中，我们调查了69名大学生的感知压力/疲劳，其中包括40名飞行员实习生，男性占63%左右。我们首先使用决策树构建模型，没有进行偏见减少，然后使用具有人口平衡和平等几率约束的阈值优化器进行30次随机实例的偏见减少。通过偏见减少，我们实现了88.31%的改善（人口平衡差异）和54.26%的改善（平等几率差异），这些改善也被证明是具有统计学意义的。

更新时间: 2024-09-16 19:19:12

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2409.10676v1

A Bayesian Interpretation of Adaptive Low-Rank Adaptation

Motivated by the sensitivity-based importance score of the adaptive low-rank adaptation (AdaLoRA), we utilize more theoretically supported metrics, including the signal-to-noise ratio (SNR), along with the Improved Variational Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The resulting Bayesian counterpart not only has matched or surpassed the performance of using the sensitivity-based importance metric but is also a faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a significant connection between the two metrics, providing a Bayesian perspective on the efficacy of sensitivity as an importance score. Furthermore, our findings suggest that the magnitude, rather than the variance, is the primary indicator of the importance of parameters.

Updated: 2024-09-16 19:14:35

标题: 一种贝叶斯解释的自适应低秩调整

摘要: 受自适应低秩适应（AdaLoRA）基于敏感性的重要性评分的启发，我们利用更多理论支持的指标，包括信噪比（SNR），以及改进的变分在线牛顿（IVON）优化器，用于自适应参数预算分配。由此产生的贝叶斯对应物不仅在使用基于敏感性的重要性度量方面表现出匹配或超越的性能，而且也是AdaLoRA与Adam的更快替代方案。我们的理论分析揭示了两个指标之间的重要联系，为敏感性作为重要性评分的有效性提供了贝叶斯视角。此外，我们的研究结果表明，参数的重要性主要指示器是幅度，而不是方差。

更新时间: 2024-09-16 19:14:35

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2409.10673v1

Benchmarking Secure Sampling Protocols for Differential Privacy

Differential privacy (DP) is widely employed to provide privacy protection for individuals by limiting information leakage from the aggregated data. Two well-known models of DP are the central model and the local model. The former requires a trustworthy server for data aggregation, while the latter requires individuals to add noise, significantly decreasing the utility of aggregated results. Recently, many studies have proposed to achieve DP with Secure Multi-party Computation (MPC) in distributed settings, namely, the distributed model, which has utility comparable to central model while, under specific security assumptions, preventing parties from obtaining others' information. One challenge of realizing DP in distributed model is efficiently sampling noise with MPC. Although many secure sampling methods have been proposed, they have different security assumptions and isolated theoretical analyses. There is a lack of experimental evaluations to measure and compare their performances. We fill this gap by benchmarking existing sampling protocols in MPC and performing comprehensive measurements of their efficiency. First, we present a taxonomy of the underlying techniques of these sampling protocols. Second, we extend widely used distributed noise generation protocols to be resilient against Byzantine attackers. Third, we implement discrete sampling protocols and align their security settings for a fair comparison. We then conduct an extensive evaluation to study their efficiency and utility.

Updated: 2024-09-16 19:04:47

标题: 基准测试差分隐私安全抽样协议

摘要: 差分隐私（DP）被广泛应用于通过限制来自聚合数据的信息泄露为个人提供隐私保护。DP的两个广为人知的模型是中心模型和本地模型。前者需要一个可信的服务器进行数据聚合，而后者需要个人添加噪音，从而显著降低了聚合结果的效用。最近，许多研究提出利用安全多方计算（MPC）在分布式环境中实现DP，即分布模型，其效用与中心模型相当，同时在特定安全假设下，防止各方获取其他方的信息。在分布模型中实现DP的一个挑战是通过MPC高效地进行噪音采样。尽管提出了许多安全采样方法，但它们具有不同的安全假设和孤立的理论分析。缺乏实验评估来衡量和比较它们的性能。我们通过对MPC中现有采样协议进行基准测试并对其效率进行全面测量来填补这一空白。首先，我们提出了这些采样协议的基础技术的分类法。其次，我们扩展了广泛使用的分布式噪音生成协议，使其对抗拜占庭攻击者具有弹性。第三，我们实现了离散采样协议，并调整它们的安全设置以进行公平比较。然后，我们进行了广泛的评估，以研究它们的效率和效用。

更新时间: 2024-09-16 19:04:47

领域: cs.CR

下载: http://arxiv.org/abs/2409.10667v1

Label-Looping: Highly Efficient Decoding for Transducers

This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop iterates over labels, while the inner loop iterates over frames searching for the next non-blank symbol. Additionally, we represent partial hypotheses in a special structure using CUDA tensors, supporting parallelized hypotheses manipulations. Experiments show that the label-looping algorithm is up to 2.0X faster than conventional batched decoding when using batch size 32. It can be further combined with other compiler or GPU call-related techniques to achieve even more speedup. Our algorithm is general-purpose and can work with both conventional Transducers and Token-and-Duration Transducers. We open-source our implementation to benefit the research community.

Updated: 2024-09-16 19:04:28

标题: 标签循环：用于传感器的高效解码

摘要: 本文介绍了一种针对基于转换器的语音识别模型的高效贪婪解码算法。我们重新设计了RNN-T解码的标准嵌套循环设计，交换了帧和标签的循环：外部循环迭代标签，而内部循环迭代帧搜索下一个非空白符号。此外，我们使用CUDA张量在特殊结构中表示部分假设，支持并行化假设操作。实验证明，使用批量大小32时，标签循环算法比传统的批量解码快高达2.0倍。它可以进一步与其他编译器或GPU调用相关技术结合，以实现更高的加速。我们的算法是通用的，可以与传统的转换器和令牌及持续时间转换器一起使用。我们开源我们的实现以造福研究社区。

更新时间: 2024-09-16 19:04:28

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.06220v2

Interpretable global minima of deep ReLU neural networks on sequentially separable data

We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.

Updated: 2024-09-16 18:55:22

标题: 可解释的深度ReLU神经网络在序贯可分数据上的全局极小值

摘要: 我们明确构建了零损失的神经网络分类器。我们用累积参数来表示权重矩阵和偏置向量，这些参数确定了递归作用于输入空间的截断映射。考虑的训练数据配置是（i）对应于每个类的足够小、互相分离的簇，以及（ii）顺序线性可分的等价类。在最佳情况下，对于$\mathbb{R}^M$中的$Q$类数据，全局极小值可以用$Q(M+2)$个参数描述。

更新时间: 2024-09-16 18:55:22

领域: cs.LG,cs.AI,math-ph,math.MP,math.OC,stat.ML,57R70, 62M45

下载: http://arxiv.org/abs/2405.07098v2

Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning

Autonomous mobile robots are increasingly employed in pedestrian-rich environments where safe navigation and appropriate human interaction are crucial. While Deep Reinforcement Learning (DRL) enables socially integrated robot behavior, challenges persist in novel or perturbed scenarios to indicate when and why the policy is uncertain. Unknown uncertainty in decision-making can lead to collisions or human discomfort and is one reason why safe and risk-aware navigation is still an open problem. This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL-based navigation framework for uncertainty estimates in decision-making. We, therefore, incorporate Observation-Dependent Variance (ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For different types of perturbations, we compare the ability of Deep Ensembles and Monte-Carlo Dropout (MC-Dropout) to estimate the uncertainties of the policy. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance. The results show that the ODV-PPO algorithm converges faster with better generalization and disentangles the aleatoric and epistemic uncertainties. In addition, the MC-Dropout approach is more sensitive to perturbations and capable to correlate the uncertainty type to the perturbation type better. With the proposed safe action selection scheme, the robot can navigate in perturbed environments with fewer collisions.

Updated: 2024-09-16 18:49:38

标题: 使用深度强化学习解开安全社交导航的不确定性

摘要: 自主移动机器人越来越多地应用于行人密集的环境中，安全导航和适当的人机交互至关重要。深度强化学习（DRL）使机器人行为与社会集成，但在新颖或受干扰的场景中，仍存在挑战，以指示策略何时以及为何不确定。决策中的未知不确定性可能导致碰撞或人类不适，并且这也是为什么安全和风险感知导航仍然是一个悬而未决的问题的原因之一。本文介绍了一种新颖的方法，将不确定性估计集成到基于DRL的导航框架中，用于决策中的不确定性估计。因此，我们将观测依赖方差（ODV）和dropout集成到Proximal Policy Optimization（PPO）算法中。对于不同类型的干扰，我们比较了深度集成和蒙特卡洛dropout（MC-Dropout）估计策略不确定性的能力。在不确定的决策情况下，我们建议改变机器人的社会行为以保守避碰。结果显示，ODV-PPO算法收敛速度更快，泛化性更好，并区分了随机性和认知不确定性。此外，MC-Dropout方法对干扰更为敏感，并能更好地将不确定性类型与干扰类型相关联。通过提出的安全动作选择方案，机器人可以在受干扰的环境中导航，减少碰撞。

更新时间: 2024-09-16 18:49:38

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.10655v1

Logic Synthesis Optimization with Predictive Self-Supervision via Causal Transformers

Contemporary hardware design benefits from the abstraction provided by high-level logic gates, streamlining the implementation of logic circuits. Logic Synthesis Optimization (LSO) operates at one level of abstraction within the Electronic Design Automation (EDA) workflow, targeting improvements in logic circuits with respect to performance metrics such as size and speed in the final layout. Recent trends in the field show a growing interest in leveraging Machine Learning (ML) for EDA, notably through ML-guided logic synthesis utilizing policy-based Reinforcement Learning (RL) methods.Despite these advancements, existing models face challenges such as overfitting and limited generalization, attributed to constrained public circuits and the expressiveness limitations of graph encoders. To address these hurdles, and tackle data scarcity issues, we introduce LSOformer, a novel approach harnessing Autoregressive transformer models and predictive SSL to predict the trajectory of Quality of Results (QoR). LSOformer integrates cross-attention modules to merge insights from circuit graphs and optimization sequences, thereby enhancing prediction accuracy for QoR metrics. Experimental studies validate the effectiveness of LSOformer, showcasing its superior performance over baseline architectures in QoR prediction tasks, where it achieves improvements of 5.74%, 4.35%, and 17.06% on the EPFL, OABCD, and proprietary circuits datasets, respectively, in inductive setup.

Updated: 2024-09-16 18:45:07

标题: 逻辑综合优化：通过因果变换器进行预测性自我监督

摘要: 当代硬件设计受益于高级逻辑门提供的抽象，简化了逻辑电路的实现。逻辑综合优化（LSO）在电子设计自动化（EDA）工作流程的一个抽象层级上运作，旨在改进逻辑电路的性能指标，如最终布局中的大小和速度。该领域的最新趋势显示出对利用机器学习（ML）进行EDA的兴趣不断增长，特别是通过ML引导的逻辑综合，利用基于策略的强化学习（RL）方法。尽管存在这些进展，现有模型面临着过拟合和有限泛化等挑战，这些挑战归因于受限的公共电路和图形编码器的表达能力限制。为了解决这些障碍，并解决数据稀缺问题，我们引入了LSOformer，这是一种利用自回归变压器模型和预测性SSL来预测结果质量（QoR）轨迹的新方法。LSOformer集成了交叉注意力模块，以合并电路图和优化序列的见解，从而增强QoR指标的预测准确性。实验研究验证了LSOformer的有效性，展示了其在QoR预测任务中优于基线架构的卓越性能，在归纳设置中分别在EPFL、OABCD和专有电路数据集上分别实现了5.74%、4.35%和17.06%的改进。

更新时间: 2024-09-16 18:45:07

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.10653v1

RuleFuser: An Evidential Bayes Approach for Rule Injection in Imitation Learned Planners and Predictors for Robustness under Distribution Shifts

Modern motion planners for autonomous driving frequently use imitation learning (IL) to draw from expert driving logs. Although IL benefits from its ability to glean nuanced and multi-modal human driving behaviors from large datasets, the resulting planners often struggle with out-of-distribution (OOD) scenarios and with traffic rule compliance. On the other hand, classical rule-based planners, by design, can generate safe traffic rule compliant behaviors while being robust to OOD scenarios, but these planners fail to capture nuances in agent-to-agent interactions and human drivers' intent. RuleFuser, an evidential framework, combines IL planners with classical rule-based planners to draw on the complementary benefits of both, thereby striking a balance between imitation and safety. Our approach, tested on the real-world nuPlan dataset, combines the IL planner's high performance in in-distribution (ID) scenarios with the rule-based planners' enhanced safety in out-of-distribution (OOD) scenarios, achieving a 38.43% average improvement on safety metrics over the IL planner without much detriment to imitation metrics in OOD scenarios.

Updated: 2024-09-16 18:44:47

标题: RuleFuser：一种证据贝叶斯方法，用于在模仿学习规划器和预测器中注入规则，以增强在分布转移下的鲁棒性

摘要: 现代自动驾驶的运动规划器经常使用模仿学习（IL）来借鉴专家驾驶记录。尽管IL能够从大型数据集中获取细微和多模态的人类驾驶行为，但由此产生的规划器常常在超出分布（OOD）情景和交通规则遵从方面遇到困难。另一方面，经典基于规则的规划器通过设计可以生成安全的符合交通规则的行为，同时对OOD情况具有鲁棒性，但这些规划器无法捕捉主体之间互动和人类驾驶者意图的微妙之处。RuleFuser，一个证据框架，将IL规划器与经典基于规则的规划器结合起来，利用两者的互补优势，从而在模仿和安全之间取得平衡。我们的方法在真实世界的nuPlan数据集上进行了测试，将IL规划器在分布内（ID）情景中的高性能与基于规则的规划器在超出分布（OOD）情景中的增强安全性结合起来，使得在OOD情景中的安全度指标平均提高了38.43%，而在模仿度指标上的损害并不大。

更新时间: 2024-09-16 18:44:47

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.11139v3

CaBaGe: Data-Free Model Extraction using ClAss BAlanced Generator Ensemble

Machine Learning as a Service (MLaaS) is often provided as a pay-per-query, black-box system to clients. Such a black-box approach not only hinders open replication, validation, and interpretation of model results, but also makes it harder for white-hat researchers to identify vulnerabilities in the MLaaS systems. Model extraction is a promising technique to address these challenges by reverse-engineering black-box models. Since training data is typically unavailable for MLaaS models, this paper focuses on the realistic version of it: data-free model extraction. We propose a data-free model extraction approach, CaBaGe, to achieve higher model extraction accuracy with a small number of queries. Our innovations include (1) a novel experience replay for focusing on difficult training samples; (2) an ensemble of generators for steadily producing diverse synthetic data; and (3) a selective filtering process for querying the victim model with harder, more balanced samples. In addition, we create a more realistic setting, for the first time, where the attacker has no knowledge of the number of classes in the victim training data, and create a solution to learn the number of classes on the fly. Our evaluation shows that CaBaGe outperforms existing techniques on seven datasets -- MNIST, FMNIST, SVHN, CIFAR-10, CIFAR-100, ImageNet-subset, and Tiny ImageNet -- with an accuracy improvement of the extracted models by up to 43.13%. Furthermore, the number of queries required to extract a clone model matching the final accuracy of prior work is reduced by up to 75.7%.

Updated: 2024-09-16 18:19:19

标题: CaBaGe: 使用类平衡生成器集合进行无数据模型提取

摘要: 机器学习作为一种服务（MLaaS）通常以按查询付费的形式向客户提供，这是一个黑盒系统。这种黑盒方法不仅阻碍了模型结果的开放复制、验证和解释，还使得白帽研究人员更难识别MLaaS系统中的漏洞。模型提取是一种有希望解决这些挑战的技术，通过对黑盒模型进行逆向工程。由于MLaaS模型通常没有训练数据，本文关注了其现实版本：无数据模型提取。我们提出了一种无数据模型提取方法CaBaGe，可以在少量查询的情况下实现更高的模型提取准确性。我们的创新包括（1）一种新颖的经验重放，专注于困难的训练样本；（2）一个生成器集合，稳定地产生多样化的合成数据；以及（3）一个选择性过滤过程，用于使用更难、更平衡的样本查询受害模型。此外，我们首次创建了一个更加现实的设置，在这个设置中，攻击者对受害训练数据中类别数量一无所知，并创造了一个解决方案来动态学习类别数量。我们的评估表明，CaBaGe在七个数据集上表现优于现有技术——MNIST、FMNIST、SVHN、CIFAR-10、CIFAR-100、ImageNet子集和Tiny ImageNet——提取模型的准确性提高高达43.13％。此外，提取与先前工作最终准确性匹配的克隆模型所需的查询数量减少高达75.7％。

更新时间: 2024-09-16 18:19:19

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2409.10643v1

Exploring Fine-tuned Generative Models for Keyphrase Selection: A Case Study for Russian

Keyphrase selection plays a pivotal role within the domain of scholarly texts, facilitating efficient information retrieval, summarization, and indexing. In this work, we explored how to apply fine-tuned generative transformer-based models to the specific task of keyphrase selection within Russian scientific texts. We experimented with four distinct generative models, such as ruT5, ruGPT, mT5, and mBART, and evaluated their performance in both in-domain and cross-domain settings. The experiments were conducted on the texts of Russian scientific abstracts from four domains: mathematics \& computer science, history, medicine, and linguistics. The use of generative models, namely mBART, led to gains in in-domain performance (up to 4.9\% in BERTScore, 9.0\% in ROUGE-1, and 12.2\% in F1-score) over three keyphrase extraction baselines for the Russian language. Although the results for cross-domain usage were significantly lower, they still demonstrated the capability to surpass baseline performances in several cases, underscoring the promising potential for further exploration and refinement in this research field.

Updated: 2024-09-16 18:15:28

标题: 探索针对关键词选择进行微调的生成模型：俄语案例研究

摘要: 关键短语选择在学术文本领域中起着关键作用，有助于高效的信息检索、摘要和索引。在这项工作中，我们探讨了如何将经过微调的生成式变压器模型应用于俄语科学文本中的关键短语选择特定任务。我们尝试了四种不同的生成模型，如ruT5、ruGPT、mT5和mBART，并在领域内和跨领域环境中评估它们的性能。实验是在四个领域的俄语科学摘要文本上进行的：数学与计算机科学、历史、医学和语言学。使用生成模型，特别是mBART，在俄语语言的三个关键短语提取基线上取得了领域内性能的提升（BERT分数最高提升4.9％，ROUGE-1提升9.0％，F1分数提升12.2％）。尽管跨领域使用的结果明显较低，但在一些情况下仍显示出超越基线性能的能力，强调了在这一研究领域进一步探索和改进的有望潜力。

更新时间: 2024-09-16 18:15:28

领域: cs.CL,cs.AI,cs.LG,68T50,I.2.7; I.7.m; H.3.3

下载: http://arxiv.org/abs/2409.10640v1

Privacy-Preserving Race/Ethnicity Estimation for Algorithmic Bias Measurement in the U.S

AI fairness measurements, including tests for equal treatment, often take the form of disaggregated evaluations of AI systems. Such measurements are an important part of Responsible AI operations. These measurements compare system performance across demographic groups or sub-populations and typically require member-level demographic signals such as gender, race, ethnicity, and location. However, sensitive member-level demographic attributes like race and ethnicity can be challenging to obtain and use due to platform choices, legal constraints, and cultural norms. In this paper, we focus on the task of enabling AI fairness measurements on race/ethnicity for \emph{U.S. LinkedIn members} in a privacy-preserving manner. We present the Privacy-Preserving Probabilistic Race/Ethnicity Estimation (PPRE) method for performing this task. PPRE combines the Bayesian Improved Surname Geocoding (BISG) model, a sparse LinkedIn survey sample of self-reported demographics, and privacy-enhancing technologies like secure two-party computation and differential privacy to enable meaningful fairness measurements while preserving member privacy. We provide details of the PPRE method and its privacy guarantees. We then illustrate sample measurement operations. We conclude with a review of open research and engineering challenges for expanding our privacy-preserving fairness measurement capabilities.

Updated: 2024-09-16 18:15:18

标题: 隐私保护下的美国算法偏见度量中的种族/族裔估计

摘要: 人工智能公平性测量，包括对待平等的测试，通常采取对AI系统的细分评估的形式。这些测量是负责任的人工智能运营的重要组成部分。这些测量比较了系统在不同人口群体或亚群体之间的表现，并通常需要成员级别的人口信号，如性别、种族、族裔和地点。然而，像种族和族裔这样敏感的成员级别人口属性可能由于平台选择、法律限制和文化规范而难以获取和使用。在本文中，我们专注于以隐私保护的方式为美国LinkedIn会员启用AI公平性测量的任务。我们提出了隐私保护概率种族/族裔估计（PPRE）方法来执行这项任务。PPRE结合了贝叶斯改进的姓氏地理编码（BISG）模型、LinkedIn自我报告人口统计的稀疏调查样本，以及安全双方计算和差分隐私等隐私增强技术，以便在保护会员隐私的同时实现有意义的公平性测量。我们提供了PPRE方法及其隐私保证的详细信息。然后我们说明了样本测量操作。最后，我们总结了扩展我们的隐私保护公平性测量能力所面临的开放研究和工程挑战。

更新时间: 2024-09-16 18:15:18

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2409.04652v2

Large Language Model Can Continue Evolving From Mistakes

As world knowledge evolves and new task schemas emerge, Continual Learning (CL) is crucial for keeping Large Language Models (LLMs) up-to-date and addressing their shortcomings. LLMs typically require continual instruction tuning (CIT) and continual pre-training (CPT) to adapt to new tasks and acquire essential knowledge. However, collecting sufficient CPT data while addressing knowledge gaps remains challenging, as does optimizing the efficiency of utilizing this data. Inspired by the 'summarizing mistakes' strategy, we propose the Continue Evolving from Mistakes (CEM) method, a data-efficient approach aiming to collect CPT data and continually improve LLMs' performance through iterative evaluation and supplementation with mistake-relevant knowledge. To enhance data utilization and mitigate forgetting, we introduce a novel training paradigm that combines CIT and CPT data. Experiments demonstrate that CEM significantly enhances model performance and continual evolution. The code and dataset are available in the GitHub.

Updated: 2024-09-16 18:02:06

标题: 大型语言模型可以继续从错误中进化

摘要: 随着世界知识的演进和新的任务模式的出现，持续学习（CL）对于保持大型语言模型（LLMs）的最新状态并解决其缺陷至关重要。LLMs通常需要持续指导调整（CIT）和持续预训练（CPT）来适应新任务并获取必要知识。然而，尽管在解决知识缺口的同时收集足够的CPT数据仍然具有挑战性，优化利用这些数据的效率也是如此。受“总结错误”策略启发，我们提出了“从错误中持续演变”（CEM）方法，这是一种数据高效的方法，旨在通过迭代评估和补充与错误相关的知识，收集CPT数据并持续改进LLMs的性能。为了增强数据利用和减少遗忘，我们引入了一种结合CIT和CPT数据的新颖训练范式。实验表明，CEM显著提升了模型性能和持续演进。代码和数据集可在GitHub上找到。

更新时间: 2024-09-16 18:02:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.08707v5

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Transformer-based large Language Models (LLMs) become increasingly important in various domains. However, the quadratic time complexity of attention operation poses a significant challenge for scaling to longer contexts due to the extremely high inference latency and GPU memory consumption for caching key-value (KV) vectors. This paper proposes RetrievalAttention, a training-free approach to accelerate attention computation. To leverage the dynamic sparse property of attention, RetrievalAttention builds approximate nearest neighbor search (ANNS) indexes upon KV vectors in CPU memory and retrieves the most relevant ones via vector search during generation. Due to the out-of-distribution (OOD) between query vectors and key vectors, off-the-shelf ANNS indexes still need to scan O(N) (usually 30% of all keys) data for accurate retrieval, which fails to exploit the high sparsity. RetrievalAttention first identifies the OOD challenge of ANNS-based attention, and addresses it via an attention-aware vector search algorithm that can adapt to queries and only access 1--3% of data, thus achieving a sub-linear time complexity. RetrievalAttention greatly reduces the inference cost of long-context LLM with much lower GPU memory requirements while maintaining the model accuracy. Especially, RetrievalAttention only needs 16GB GPU memory for serving 128K tokens in LLMs with 8B parameters, which is capable of generating one token in 0.188 seconds on a single NVIDIA RTX4090 (24GB).

Updated: 2024-09-16 17:59:52

标题: 检索注意力：通过向量检索加速长上下文LLM推理

摘要: 基于Transformer的大型语言模型（LLMs）在各个领域变得越来越重要。然而，注意力操作的二次时间复杂度对于扩展到更长的上下文提出了重大挑战，因为为了缓存键-值（KV）向量，推理延迟和GPU内存消耗非常高。本文提出了一种名为RetrievalAttention的无需训练的方法来加速注意力计算。为了利用注意力的动态稀疏属性，RetrievalAttention在CPU内存中的KV向量上构建了近似最近邻搜索（ANNS）索引，并在生成过程中通过向量搜索检索最相关的向量。由于查询向量和键向量之间的分布之外（OOD），现成的ANNS索引仍然需要扫描O（N）（通常为所有键的30%）数据以进行准确检索，这无法充分利用高稀疏性。RetrievalAttention首先识别了基于ANNS的注意力的OOD挑战，并通过一个能够适应查询并只访问1-3%数据的注意力感知向量搜索算法来解决这个问题，从而实现次线性时间复杂度。RetrievalAttention大大降低了长上下文LLM的推理成本，同时保持了模型的准确性。特别是，RetrievalAttention对于8B参数的LLMs中128K令牌只需16GB GPU内存，能够在单个NVIDIA RTX4090（24GB）上以0.188秒的速度生成一个令牌。

更新时间: 2024-09-16 17:59:52

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.10516v1

An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems

Dialog systems, such as voice assistants, are expected to engage with users in complex, evolving conversations. Unfortunately, traditional automatic speech recognition (ASR) systems deployed in such applications are usually trained to recognize each turn independently and lack the ability to adapt to the conversational context or incorporate user feedback. In this work, we introduce a general framework for ASR in dialog systems that can go beyond learning from single-turn utterances and learn over time how to adapt to both explicit supervision and implicit user feedback present in multi-turn conversations. We accomplish that by leveraging advances in student-teacher learning and context-aware dialog processing, and designing contrastive self-supervision approaches with Ohm, a new online hard-negative mining approach. We show that leveraging our new framework compared to traditional training leads to relative WER reductions of close to 10% in real-world dialog systems, and up to 26% on public synthetic data.

Updated: 2024-09-16 17:59:50

标题: 一个高效的自学框架，用于交互式口语对话系统

摘要: 对话系统，如语音助手，预计将与用户进行复杂、不断发展的对话。不幸的是，部署在此类应用程序中的传统自动语音识别（ASR）系统通常被训练为独立识别每个对话，并且缺乏适应对话上下文或融入用户反馈的能力。在这项工作中，我们引入了一个通用框架，用于对话系统中的ASR，可以超越从单轮话语中学习，并随着时间学会如何适应明确监督和隐式用户反馈，这些反馈存在于多轮对话中。我们通过利用学生-教师学习和上下文感知对话处理的进展，并设计具有Ohm的对比自我监督方法，一种新的在线硬负采样方法，来实现这一点。我们展示了与传统训练相比，利用我们的新框架在现实世界的对话系统中导致近10%的相对词错误率（WER）减少，以及在公共合成数据上高达26%的减少。

更新时间: 2024-09-16 17:59:50

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2409.10515v1

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. Across multiple tasks and models, we observe that coverage - the fraction of problems solved by any attempt - scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance. When we apply repeated sampling to SWE-bench Lite, the fraction of issues solved with DeepSeek-V2-Coder-Instruct increases from 15.9% with one sample to 56% with 250 samples, outperforming the single-attempt state-of-the-art of 43% which uses more capable frontier models. Moreover, using current API pricing, amplifying the cheaper DeepSeek model with five samples is more cost-effective and solves more issues than paying a premium for one sample from GPT-4o or Claude 3.5 Sonnet. Interestingly, the relationship between coverage and the number of samples is often log-linear and can be modelled with an exponentiated power law, suggesting the existence of inference-time scaling laws. Finally, we find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers. When solving math word problems from GSM8K and MATH, coverage with Llama-3 models grows to over 95% with 10,000 samples. However, common methods to pick correct solutions from a sample collection, such as majority voting or reward models, plateau beyond several hundred samples and fail to fully scale with the sample budget.

Updated: 2024-09-16 17:58:42

标题: 大型语言猴：通过重复采样扩展推理计算

摘要: 通过增加用于训练语言模型的计算量，可以显著提高其能力。然而，在推理方面，我们通常将计算量限制在每个问题只尝试一次。在这里，我们探讨了推理计算作为另一种扩展方式，通过增加生成样本的数量。在多个任务和模型中，我们观察到覆盖率 - 任何尝试解决问题的比例 - 随着样本数量增加而扩展了四个数量级。在编码和形式证明等领域，所有答案都可以自动验证，这些增加的覆盖率直接转化为了更好的性能。当我们在SWE-bench Lite上应用重复采样时，DeepSeek-V2-Coder-Instruct解决问题的比例从一个样本的15.9%增加到250个样本的56%，优于使用更有能力的前沿模型的43%的单次尝试的最新技术。此外，使用当前的API定价，通过增加5个样本来增强更便宜的DeepSeek模型比为GPT-4o或Claude 3.5 Sonnet支付高额费用获取一个样本更具成本效益，且可以解决更多问题。有趣的是，覆盖率和样本数量之间的关系通常是对数线性的，可以用指数化幂律来建模，表明存在推理时间扩展规律。最后，我们发现在没有自动验证器的领域，从众多生成中识别正确的样本仍然是未来研究的重要方向。在解决来自GSM8K和MATH的数学问题时，使用Llama-3模型的覆盖率随着10,000个样本增长到超过95%。然而，从样本集合中选择正确解决方案的常见方法，如多数投票或奖励模型，在几百个样本后趋于平稳，并未完全随着样本预算扩展。

更新时间: 2024-09-16 17:58:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.21787v2

Zero Knowledge Games

In this paper we model a game such that all strategies are non-revealing, with imperfect recall and incomplete information. We also introduce a modified sliding-block code as a linear transformation which generates common knowledge of how informed a player is. Ultimately, we see that between two players in a zero-knowledge game where both players are informed, the utility of trust is established in the mixed strategy Nash equilibrium. A zero-knowledge game is one of trust and soundness, placing utility in being informed. For any player who may be uninformed, such players reveal they are uninformed.

Updated: 2024-09-16 17:57:27

标题: 零知识游戏

摘要: 在这篇论文中，我们建立了一个模型，其中所有策略都是非透露的，具有不完全回忆和不完整信息。我们还引入了一个修改的滑动块代码作为线性变换，生成玩家了解情况的共同知识。最终，我们发现在一个零知识游戏中，两个玩家之间，两个玩家都被告知，在混合策略纳什均衡中建立了信任的效用。零知识游戏是建立在信任和正确性之上的，将效用放在被告知的情况下。对于任何可能不知情的玩家，这样的玩家会透露自己不知情的情况。

更新时间: 2024-09-16 17:57:27

领域: cs.GT,cs.AI,91-08

下载: http://arxiv.org/abs/2009.13521v6

Decidability of Querying First-Order Theories via Countermodels of Finite Width

We propose a generic framework for establishing the decidability of a wide range of logical entailment problems (briefly called querying), based on the existence of countermodels that are structurally simple, gauged by certain types of width measures (with treewidth and cliquewidth as popular examples). As an important special case of our framework, we identify logics exhibiting width-finite finitely universal model sets, warranting decidable entailment for a wide range of homomorphism-closed queries, subsuming a diverse set of practically relevant query languages. As a particularly powerful width measure, we propose to employ Blumensath's partitionwidth, which subsumes various other commonly considered width measures and exhibits highly favorable computational and structural properties. Focusing on the formalism of existential rules as a popular showcase, we explain how finite partitionwidth sets of rules subsume other known abstract decidable classes but - leveraging existing notions of stratification - also cover a wide range of new rulesets. We expose natural limitations for fitting the class of finite unification sets into our picture and suggest several options for remedy.

Updated: 2024-09-16 17:57:11

标题: 一阶理论通过有限宽度反模型的查询可判定性

摘要: 我们提出了一个通用框架，用于建立广泛逻辑蕴涵问题（简称查询）的可决定性，基于存在结构简单的反模型，通过特定类型的宽度度量来衡量（以树宽和团宽为常见示例）。作为我们框架的一个重要特例，我们确定了展示有限宽度普遍模型集的逻辑，保证了对广泛同态封闭查询的可决定蕴涵，涵盖了一系列实际相关的查询语言。作为一个特别强大的宽度度量，我们提议使用Blumensath的分区宽度，它涵盖了各种其他常考虑的宽度度量，并展示了极具有利的计算和结构特性。聚焦于存在规则的形式化，作为一个流行的展示，我们解释了有限分区宽度规则集如何涵盖其他已知的抽象可决定类，但是-利用现有概念的分层-也覆盖了一系列新的规则集。我们揭示了将有限统一集类适应到我们框架中的自然限制，并提出了几种解决方案。

更新时间: 2024-09-16 17:57:11

领域: cs.LO,cs.AI,cs.DB,cs.DM,math.LO

下载: http://arxiv.org/abs/2304.06348v3

Kolmogorov-Arnold Transformer

Transformers stand as the cornerstone of mordern deep learning. Traditionally, these models rely on multi-layer perceptron (MLP) layers to mix the information between channels. In this paper, we introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance the expressiveness and performance of the model. Integrating KANs into transformers, however, is no easy feat, especially when scaled up. Specifically, we identify three key challenges: (C1) Base function. The standard B-spline function used in KANs is not optimized for parallel computing on modern hardware, resulting in slower inference speeds. (C2) Parameter and Computation Inefficiency. KAN requires a unique function for each input-output pair, making the computation extremely large. (C3) Weight initialization. The initialization of weights in KANs is particularly challenging due to their learnable activation functions, which are critical for achieving convergence in deep neural networks. To overcome the aforementioned challenges, we propose three key solutions: (S1) Rational basis. We replace B-spline functions with rational functions to improve compatibility with modern GPUs. By implementing this in CUDA, we achieve faster computations. (S2) Group KAN. We share the activation weights through a group of neurons, to reduce the computational load without sacrificing performance. (S3) Variance-preserving initialization. We carefully initialize the activation weights to make sure that the activation variance is maintained across layers. With these designs, KAT scales effectively and readily outperforms traditional MLP-based transformers.

Updated: 2024-09-16 17:54:51

标题: 科尔莫戈洛夫-阿诺德变换器

摘要: 变压器是现代深度学习的基石。传统上，这些模型依赖于多层感知器（MLP）层来混合通道之间的信息。在本文中，我们介绍了科尔莫哥洛夫-阿诺德变压器（KAT），这是一种新颖的架构，用科尔莫哥洛夫-阿诺德网络（KAN）层替代MLP层，以增强模型的表现力和性能。然而，将KAN集成到变压器中并不容易，特别是在放大时。具体而言，我们确定了三个关键挑战：（C1）基本函数。KAN中使用的标准B样条函数在现代硬件上并未经过优化，导致推断速度较慢。（C2）参数和计算效率。KAN需要为每个输入-输出对使用独特的函数，使得计算量极大。（C3）权重初始化。由于KAN中的可学习激活函数对于在深度神经网络中实现收敛至关重要，因此权重的初始化尤为具有挑战性。为了克服上述挑战，我们提出了三个关键解决方案：（S1）有理基础。我们用有理函数代替B样条函数，以提高与现代GPU的兼容性。通过在CUDA中实现这一点，我们实现了更快的计算。（S2）群KAN。我们通过一组神经元共享激活权重，以减少计算负载而不影响性能。（S3）保持方差初始化。我们仔细初始化激活权重，以确保激活方差在各层之间保持不变。通过这些设计，KAT能够有效地扩展，并且轻松地胜过传统基于MLP的变压器。

更新时间: 2024-09-16 17:54:51

领域: cs.LG,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2409.10594v1

Assumption-Lean and Data-Adaptive Post-Prediction Inference

A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be costly, labor-intensive, or invasive to obtain. With the rapid development of machine learning (ML), scientists can now employ ML algorithms to predict gold-standard outcomes with variables that are easier to obtain. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce PoSt-Prediction Adaptive inference (PSPA) that allows valid and powerful inference based on ML-predicted data. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML prediction. Its "data-adaptive" feature guarantees an efficiency gain over existing methods, regardless of the accuracy of ML prediction. We demonstrate the statistical superiority and broad applicability of our method through simulations and real-data applications.

Updated: 2024-09-16 17:47:54

标题: 假设-精简和数据自适应后预测推断

摘要: 现代科学研究面临的主要挑战之一是黄金标准数据的有限可用性，获取这些数据可能会成本高昂、耗时且具有侵入性。随着机器学习（ML）的快速发展，科学家现在可以利用ML算法预测较容易获取的变量与黄金标准结果。然而，这些预测结果通常直接用于后续统计分析中，忽略了预测过程引入的不确定性和异质性。这可能导致错误的阳性结果和无效的科学结论。在这项工作中，我们引入了PoSt-Prediction Adaptive inference（PSPA），该方法可以基于ML预测数据进行有效和强大的推断。其“假设较少”的特性保证了在ML预测上无需假设的可靠统计推断。其“数据自适应”特性保证了相对现有方法的效率提升，无论ML预测的准确性如何。我们通过模拟和实际数据应用展示了我们方法的统计优越性和广泛适用性。

更新时间: 2024-09-16 17:47:54

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.14220v4

Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks

The increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications. Understanding model vulnerabilities in high-stakes and knowledge-intensive tasks is essential for quantifying the trustworthiness of model predictions and regulating their use. The recent discovery of named entities as adversarial examples (i.e. adversarial entities) in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs in high-stakes and specialized domains. We examined the use of type-consistent entity substitution as a template for collecting adversarial entities for billion-parameter LLMs with biomedical knowledge. To this end, we developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge with a low query budget and controllable coverage. Our method has favorable query efficiency and scaling over alternative approaches based on random sampling and blackbox gradient-guided search, which we demonstrated for adversarial distractor generation in biomedical question answering. Subsequent failure mode analysis uncovered two regimes of adversarial entities on the attack surface with distinct characteristics and we showed that entity substitution attacks can manipulate token-wise Shapley value explanations, which become deceptive in this setting. Our approach complements standard evaluations for high-capacity models and the results highlight the brittleness of domain knowledge in LLMs.

Updated: 2024-09-16 17:44:17

标题: 通过查询效率高的抽样攻击评估大型语言模型中的生物医学知识鲁棒性

摘要: 大型语言模型（LLMs）参数域知识的不断加深推动了它们在现实世界应用中的快速部署。了解高风险和知识密集型任务中模型的脆弱性对于量化模型预测的可信度和监管其使用至关重要。最近发现在自然语言处理任务中作为对抗示例（即对抗实体）的命名实体，引发了关于它们对高风险和专业领域中预训练和微调LLMs知识鲁棒性的潜在影响的问题。我们研究了使用类型一致的实体替换作为收集亿参数LLMs具有生物医学知识的对抗实体的模板。为此，我们基于幂次距离加权采样开发了一种嵌入空间攻击，以评估其生物医学知识的鲁棒性，具有较低的查询预算和可控的覆盖范围。我们的方法在查询效率和扩展性方面优于基于随机采样和黑盒梯度引导搜索的替代方法，我们在生物医学问题回答中演示了对抗干扰者生成的效果。随后的故障模式分析揭示了攻击表面上两种具有不同特征的对抗实体的制度，并且我们表明实体替换攻击可以操纵基于令人误解的令牌级Shapley值解释，这在这种情况下变得具有欺骗性。我们的方法补充了对高容量模型的标准评估，结果突出了LLMs中领域知识的脆弱性。

更新时间: 2024-09-16 17:44:17

领域: cs.CL,cs.CR,stat.AP

下载: http://arxiv.org/abs/2402.10527v2

Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles

Causal language modeling using the Transformer architecture has yielded remarkable capabilities in Large Language Models (LLMs) over the last few years. However, the extent to which fundamental search and reasoning capabilities emerged within LLMs remains a topic of ongoing debate. In this work, we study if causal language modeling can learn a complex task such as solving Sudoku puzzles. To solve a Sudoku, the model is first required to search over all empty cells of the puzzle to decide on a cell to fill and then apply an appropriate strategy to fill the decided cell. Sometimes, the application of a strategy only results in thinning down the possible values in a cell rather than concluding the exact value of the cell. In such cases, multiple strategies are applied one after the other to fill a single cell. We observe that Transformer models trained on this synthetic task can indeed learn to solve Sudokus (our model solves $94.21\%$ of the puzzles fully correctly) when trained on a logical sequence of steps taken by a solver. We find that training Transformers with the logical sequence of steps is necessary and without such training, they fail to learn Sudoku. We also extend our analysis to Zebra puzzles (known as Einstein puzzles) and show that the model solves $92.04 \%$ of the puzzles fully correctly. In addition, we study the internal representations of the trained Transformer and find that through linear probing, we can decode information about the set of possible values in any given cell from them, pointing to the presence of a strong reasoning engine implicit in the Transformer weights.

Updated: 2024-09-16 17:42:15

标题: 因果语言建模可以在逻辑谜题上引发搜索和推理能力

摘要: 使用Transformer架构进行因果语言建模在过去几年中已经在大型语言模型（LLMs）中显示出了显著的能力。然而，LLMs内部是否出现了基本的搜索和推理能力仍然是一个持续争论的话题。在这项工作中，我们研究因果语言建模是否可以学习解决数独难题这样复杂的任务。为了解决数独难题，模型首先需要搜索整个谜题中的所有空白单元格，以决定要填充的单元格，然后应用适当的策略来填充已决定的单元格。有时，应用某种策略只会导致减少单元格中可能的值，而不是得出单元格的确切值。在这种情况下，需要连续应用多种策略来填充一个单元格。我们观察到，在这种合成任务上训练的Transformer模型确实可以学会解决数独难题（我们的模型完全正确地解决了94.21％的难题），当训练模型按照解题者采取的逻辑步骤进行训练时。我们发现，用逻辑步骤训练Transformers是必要的，没有这样的训练，它们无法学会数独。我们还将分析扩展到斑马难题（也称为爱因斯坦难题），并展示模型完全正确解决了92.04％的难题。此外，我们研究了训练后Transformer的内部表示，并发现通过线性探测，我们可以从中解码关于任何给定单元格可能值集合的信息，指向Transformer权重中存在着一个强大的推理引擎。

更新时间: 2024-09-16 17:42:15

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.10502v1

Partial Distribution Matching via Partial Wasserstein Adversarial Networks

This paper studies the problem of distribution matching (DM), which is a fundamental machine learning problem seeking to robustly align two probability distributions. Our approach is established on a relaxed formulation, called partial distribution matching (PDM), which seeks to match a fraction of the distributions instead of matching them completely. We theoretically derive the Kantorovich-Rubinstein duality for the partial Wasserstain-1 (PW) discrepancy, and develop a partial Wasserstein adversarial network (PWAN) that efficiently approximates the PW discrepancy based on this dual form. Partial matching can then be achieved by optimizing the network using gradient descent. Two practical tasks, point set registration and partial domain adaptation are investigated, where the goals are to partially match distributions in 3D space and high-dimensional feature space respectively. The experiment results confirm that the proposed PWAN effectively produces highly robust matching results, performing better or on par with the state-of-the-art methods.

Updated: 2024-09-16 17:41:45

标题: 通过部分Wasserstein对抗网络实现部分分布匹配

摘要: 本文研究了分布匹配（DM）的问题，这是一个寻求稳健地对齐两个概率分布的基本机器学习问题。我们的方法建立在一个被称为部分分布匹配（PDM）的放松形式上，该形式寻求匹配部分而不是完全匹配分布。我们在理论上推导了针对部分Wasserstain-1（PW）差异的Kantorovich-Rubinstein对偶，并开发了一个部分Wasserstein对抗网络（PWAN），该网络基于这种对偶形式有效地近似PW差异。通过使用梯度下降优化网络，可以实现部分匹配。我们研究了两个实际任务，即点集配准和部分领域自适应，在这些任务中，目标分别是在3D空间和高维特征空间中部分匹配分布。实验结果证实，所提出的PWAN有效地产生高度稳健的匹配结果，表现优于或与最先进的方法相媲美。

更新时间: 2024-09-16 17:41:45

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.10499v1

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

Large Language Models (LLMs) have been widely adopted to process long-context tasks. However, the large memory overhead of the key-value (KV) cache poses significant challenges in long-context scenarios. Existing training-free KV cache compression methods typically focus on quantization and token pruning, which have compression limits, and excessive sparsity can lead to severe performance degradation. Other methods design new architectures with less KV overhead but require significant training overhead. To address the above two drawbacks, we further explore the redundancy in the channel dimension and apply an architecture-level design with minor training costs. Therefore, we introduce CSKV, a training-efficient Channel Shrinking technique for KV cache compression: (1) We first analyze the singular value distribution of the KV cache, revealing significant redundancy and compression potential along the channel dimension. Based on this observation, we propose using low-rank decomposition for key and value layers and storing the low-dimension features. (2) To preserve model performance, we introduce a bi-branch KV cache, including a window-based full-precision KV cache and a low-precision compressed KV cache. (3) To reduce the training costs, we minimize the layer-wise reconstruction loss for the compressed KV cache instead of retraining the entire LLMs. Extensive experiments show that CSKV can reduce the memory overhead of the KV cache by 80% while maintaining the model's long-context capability. Moreover, we show that our method can be seamlessly combined with quantization to further reduce the memory overhead, achieving a compression ratio of up to 95%.

Updated: 2024-09-16 17:36:50

标题: CSKV：长上下文场景中KV缓存的训练高效通道收缩

摘要: 大型语言模型（LLMs）已被广泛采用来处理长上下文任务。然而，键-值（KV）缓存的大内存开销在长上下文场景中提出了重大挑战。现有的无训练KV缓存压缩方法通常侧重于量化和标记修剪，这些方法有压缩限制，而过度稀疏可能导致严重的性能降级。其他方法设计了具有较少KV开销的新架构，但需要显着的训练开销。为了解决上述两个缺点，我们进一步探索了通道维度中的冗余，并应用了一种架构级设计，成本较低。因此，我们引入了CSKV，一种训练高效的通道缩减技术用于KV缓存压缩：（1）我们首先分析KV缓存的奇异值分布，揭示通道维度上的重大冗余和压缩潜力。基于这一观察结果，我们建议对关键和值层进行低秩分解，并存储低维特征。（2）为了保持模型性能，我们引入了一个双分支KV缓存，包括基于窗口的全精度KV缓存和低精度的压缩KV缓存。（3）为了降低训练成本，我们最小化了压缩KV缓存的逐层重建损失，而不是重新训练整个LLMs。大量实验证明，CSKV可将KV缓存的内存开销减少80%，同时保持模型的长上下文能力。此外，我们展示了我们的方法可以与量化无缝结合，进一步减少内存开销，实现高达95%的压缩比。

更新时间: 2024-09-16 17:36:50

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.10593v1

Security Attacks on LLM-based Code Completion Tools

The rapid development of large language models (LLMs) has significantly advanced code completion capabilities, giving rise to a new generation of LLM-based Code Completion Tools (LCCTs). Unlike general-purpose LLMs, these tools possess unique workflows, integrating multiple information sources as input and prioritizing code suggestions over natural language interaction, which introduces distinct security challenges. Additionally, LCCTs often rely on proprietary code datasets for training, raising concerns about the potential exposure of sensitive data. This paper exploits these distinct characteristics of LCCTs to develop targeted attack methodologies on two critical security risks: jailbreaking and training data extraction attacks. Our experimental results expose significant vulnerabilities within LCCTs, including a 99.4% success rate in jailbreaking attacks on GitHub Copilot and a 46.3% success rate on Amazon Q. Furthermore, We successfully extracted sensitive user data from GitHub Copilot, including 54 real email addresses and 314 physical addresses associated with GitHub usernames. Our study also demonstrates that these code-based attack methods are effective against general-purpose LLMs, such as the GPT series, highlighting a broader security misalignment in the handling of code by modern LLMs. These findings underscore critical security challenges associated with LCCTs and suggest essential directions for strengthening their security frameworks. The example code and attack samples from our research are provided at https://github.com/Sensente/Security-Attacks-on-LCCTs.

Updated: 2024-09-16 17:35:10

标题: 基于LLM的代码补全工具的安全攻击

摘要: 大型语言模型（LLMs）的快速发展显著提升了代码补全功能，催生了一代新的基于LLM的代码补全工具（LCCTs）。与通用LLMs不同，这些工具拥有独特的工作流程，整合多个信息源作为输入，并优先考虑代码建议而非自然语言交互，这带来了独特的安全挑战。此外，LCCTs通常依赖专有代码数据集进行训练，引发对敏感数据潜在曝露的担忧。本文利用LCCTs的这些独特特征，开发了针对两种关键安全风险的有针对性攻击方法：越狱和训练数据提取攻击。我们的实验结果揭示了LCCTs内存在的重大漏洞，包括在GitHub Copilot上进行越狱攻击成功率达99.4%，在Amazon Q上成功率达46.3%。此外，我们成功从GitHub Copilot中提取了敏感用户数据，包括54个真实电子邮件地址和314个与GitHub用户名相关的物理地址。我们的研究还表明，这些基于代码的攻击方法对通用LLMs（如GPT系列）也有效，突显了现代LLMs在处理代码方面存在更广泛的安全偏差。这些发现强调了LCCTs相关的关键安全挑战，并提出了加强其安全框架的关键方向。我们研究中的示例代码和攻击样本可在https://github.com/Sensente/Security-Attacks-on-LCCTs 上获取。

更新时间: 2024-09-16 17:35:10

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2408.11006v2

Model-independent variable selection via the rule-based variable priority

While achieving high prediction accuracy is a fundamental goal in machine learning, an equally important task is finding a small number of features with high explanatory power. One popular selection technique is permutation importance, which assesses a variable's impact by measuring the change in prediction error after permuting the variable. However, this can be problematic due to the need to create artificial data, a problem shared by other methods as well. Another problem is that variable selection methods can be limited by being model-specific. We introduce a new model-independent approach, Variable Priority (VarPro), which works by utilizing rules without the need to generate artificial data or evaluate prediction error. The method is relatively easy to use, requiring only the calculation of sample averages of simple statistics, and can be applied to many data settings, including regression, classification, and survival. We investigate the asymptotic properties of VarPro and show, among other things, that VarPro has a consistent filtering property for noise variables. Empirical studies using synthetic and real-world data show the method achieves a balanced performance and compares favorably to many state-of-the-art procedures currently used for variable selection.

Updated: 2024-09-16 17:34:26

标题: 基于规则的变量优先级的模型无关变量选择

摘要: 尽管在机器学习中实现高预测准确度是一个基本目标，但同样重要的任务是找到一小部分具有高解释能力的特征。一个流行的选择技术是排列重要性，它通过在置换变量后测量预测误差的变化来评估变量的影响。然而，这可能存在问题，因为需要创建人工数据，这也是其他方法所共有的问题。另一个问题是变量选择方法可能受限于特定模型。我们引入了一种新的与模型无关的方法，变量优先（VarPro），它通过利用规则而无需生成人工数据或评估预测误差来工作。该方法相对容易使用，只需计算简单统计量的样本平均值，可应用于许多数据设置，包括回归、分类和生存分析。我们研究了VarPro的渐近性质，并展示了，其中之一是，VarPro对噪声变量具有一致的过滤特性。使用合成和真实世界数据的实证研究表明，该方法实现了平衡的性能，并与目前用于变量选择的许多最先进的程序相比具有优势。

更新时间: 2024-09-16 17:34:26

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2409.09003v2

MusicLIME: Explainable Multimodal Music Understanding

Multimodal models are critical for music understanding tasks, as they capture the complex interplay between audio and lyrics. However, as these models become more prevalent, the need for explainability grows-understanding how these systems make decisions is vital for ensuring fairness, reducing bias, and fostering trust. In this paper, we introduce MusicLIME, a model-agnostic feature importance explanation method designed for multimodal music models. Unlike traditional unimodal methods, which analyze each modality separately without considering the interaction between them, often leading to incomplete or misleading explanations, MusicLIME reveals how audio and lyrical features interact and contribute to predictions, providing a holistic view of the model's decision-making. Additionally, we enhance local explanations by aggregating them into global explanations, giving users a broader perspective of model behavior. Through this work, we contribute to improving the interpretability of multimodal music models, empowering users to make informed choices, and fostering more equitable, fair, and transparent music understanding systems.

Updated: 2024-09-16 17:28:21

标题: MusicLIME：可解释的多模态音乐理解

摘要: 多模态模型对音乐理解任务至关重要，因为它们捕捉了音频和歌词之间复杂的相互作用。然而，随着这些模型变得更加普遍，对可解释性的需求也在增长-了解这些系统如何做出决策对于确保公平性、减少偏见和促进信任至关重要。在本文中，我们介绍了MusicLIME，这是一种针对多模态音乐模型设计的模型无关特征重要性解释方法。与传统的单模态方法不同，后者单独分析每种模态而不考虑它们之间的相互作用，这经常导致不完整或误导性的解释。MusicLIME揭示了音频和歌词特征如何相互作用并对预测做出贡献，提供了对模型决策制定的整体视图。此外，我们通过将局部解释聚合为全局解释，使用户对模型行为有更广泛的视角。通过这项工作，我们致力于提高多模态音乐模型的可解释性，使用户能够做出明智的选择，并促进更加公平、公正和透明的音乐理解系统。

更新时间: 2024-09-16 17:28:21

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2409.10496v1

Flash STU: Fast Spectral Transform Units

This paper describes an efficient, open source PyTorch implementation of the Spectral Transform Unit. We investigate sequence prediction tasks over several modalities including language, robotics, and simulated dynamical systems. We find that for the same parameter count, the STU and its variants outperform the Transformer as well as other leading state space models across various modalities.

Updated: 2024-09-16 17:22:34

标题: 快速光谱变换单元：Flash STU

摘要: 本文描述了一个高效的、开源的PyTorch实现光谱变换单元。我们对包括语言、机器人学和模拟动态系统在内的多种模态的序列预测任务进行了研究。我们发现，在相同的参数数量下，STU及其变种在各种模态上的表现优于Transformer以及其他领先的状态空间模型。

更新时间: 2024-09-16 17:22:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.10489v1

Do Pre-trained Vision-Language Models Encode Object States?

For a vision-language model (VLM) to understand the physical world, such as cause and effect, a first step is to capture the temporal dynamics of the visual world, for example how the physical states of objects evolve over time (e.g. a whole apple into a sliced apple). Our paper aims to investigate if VLMs pre-trained on web-scale data learn to encode object states, which can be extracted with zero-shot text prompts. We curate an object state recognition dataset ChangeIt-Frames, and evaluate nine open-source VLMs, including models trained with contrastive and generative objectives. We observe that while these state-of-the-art vision-language models can reliably perform object recognition, they consistently fail to accurately distinguish the objects' physical states. Through extensive experiments, we identify three areas for improvements for VLMs to better encode object states, namely the quality of object localization, the architecture to bind concepts to objects, and the objective to learn discriminative visual and language encoders on object states. Data and code are released.

Updated: 2024-09-16 17:22:18

标题: 预训练的视觉-语言模型是否编码对象状态？

摘要: 为了使视觉语言模型（VLM）能够理解物理世界，例如因果关系，第一步是捕捉视觉世界的时间动态，例如对象的物理状态随时间的演变（例如整个苹果变成切片的苹果）。我们的论文旨在研究预训练在Web规模数据上的VLM是否学会编码对象状态，这些状态可以通过零提示文本提取。我们策划了一个对象状态识别数据集ChangeIt-Frames，并评估了九个开源VLM，包括使用对比和生成目标训练的模型。我们观察到，虽然这些最先进的视觉语言模型可以可靠地执行对象识别，但它们一贯无法准确区分对象的物理状态。通过大量实验，我们确定了三个改进VLM以更好地编码对象状态的领域，即对象定位的质量，将概念绑定到对象的架构，以及学习具有鉴别性的视觉和语言编码器的目标对象状态。数据和代码已发布。

更新时间: 2024-09-16 17:22:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.10488v1

Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance

3D face reconstruction (3DFR) algorithms are based on specific assumptions tailored to distinct application scenarios. These assumptions limit their use when acquisition conditions, such as the subject's distance from the camera or the camera's characteristics, are different than expected, as typically happens in video surveillance. Additionally, 3DFR algorithms follow various strategies to address the reconstruction of a 3D shape from 2D data, such as statistical model fitting, photometric stereo, or deep learning. In the present study, we explore the application of three 3DFR algorithms representative of the SOTA, employing each one as the template set generator for a face verification system. The scores provided by each system are combined by score-level fusion. We show that the complementarity induced by different 3DFR algorithms improves performance when tests are conducted at never-seen-before distances from the camera and camera characteristics (cross-distance and cross-camera settings), thus encouraging further investigations on multiple 3DFR-based approaches.

Updated: 2024-09-16 17:17:47

标题: 探索用于人脸验证的3D人脸重建和融合方法：视频监控中的案例研究

摘要: 三维人脸重建（3DFR）算法基于针对不同应用场景量身定制的特定假设。这些假设限制了它们在获取条件（如被拍摄者与摄像头的距离或摄像头的特性）与预期不同的情况下的使用，这通常在视频监控中发生。此外，3DFR算法遵循各种策略，以解决从2D数据中重建3D形状的问题，例如统计模型拟合、光度立体或深度学习。在本研究中，我们探讨了三种代表SOTA的3DFR算法的应用，将每种算法作为人脸验证系统的模板集生成器。每个系统提供的分数通过分数级融合进行组合。我们展示了不同3DFR算法引起的互补性在从未见过的距离和摄像头特性（跨距离和跨摄像头设置）下进行测试时提高了性能，从而鼓励进一步研究多种基于3DFR的方法。

更新时间: 2024-09-16 17:17:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.10481v1

MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion

Self-supervised learning has proved effective for skeleton-based human action understanding. However, previous works either rely on contrastive learning that suffers false negative problems or are based on reconstruction that learns too much unessential low-level clues, leading to limited representations for downstream tasks. Recently, great advances have been made in generative learning, which is naturally a challenging yet meaningful pretext task to model the general underlying data distributions. However, the representation learning capacity of generative models is under-explored, especially for the skeletons with spacial sparsity and temporal redundancy. To this end, we propose Masked Conditional Diffusion (MacDiff) as a unified framework for human skeleton modeling. For the first time, we leverage diffusion models as effective skeleton representation learners. Specifically, we train a diffusion decoder conditioned on the representations extracted by a semantic encoder. Random masking is applied to encoder inputs to introduce a information bottleneck and remove redundancy of skeletons. Furthermore, we theoretically demonstrate that our generative objective involves the contrastive learning objective which aligns the masked and noisy views. Meanwhile, it also enforces the representation to complement for the noisy view, leading to better generalization performance. MacDiff achieves state-of-the-art performance on representation learning benchmarks while maintaining the competence for generative tasks. Moreover, we leverage the diffusion model for data augmentation, significantly enhancing the fine-tuning performance in scenarios with scarce labeled data. Our project is available at https://lehongwu.github.io/ECCV24MacDiff/.

Updated: 2024-09-16 17:06:10

标题: MacDiff：带有掩码条件扩散的统一骨架建模

摘要: 自我监督学习已经被证明对基于骨架的人体动作理解非常有效。然而，先前的研究要么依赖于存在错误负面问题的对比学习，要么基于学习过多不必要的低级线索的重建，导致下游任务的表示受限。最近，在生成学习方面取得了巨大进展，生成学习自然是一个具有挑战性但有意义的预处理任务，用于建模一般数据分布。然而，生成模型的表示学习能力尚未得到充分探讨，特别是对于具有空间稀疏性和时间冗余的骨架。因此，我们提出了Masked Conditional Diffusion（MacDiff）作为人体骨架建模的统一框架。我们首次利用扩散模型作为有效的骨架表示学习器。具体而言，我们训练一个扩散解码器，以语义编码器提取的表示为条件。随机掩码应用于编码器输入，引入信息瓶颈并消除骨架的冗余。此外，我们在理论上证明了我们的生成目标涉及对比学习目标，该目标使掩码和嘈杂视图对齐。同时，它还强制表示补充嘈杂视图，从而提高了泛化性能。MacDiff在表示学习基准上实现了最新的性能，并保持了生成任务的竞争力。此外，我们利用扩散模型进行数据增强，显著提高了在稀缺标记数据场景中的微调性能。我们的项目可以在https://lehongwu.github.io/ECCV24MacDiff/上找到。

更新时间: 2024-09-16 17:06:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.10473v1

Online Nonconvex Bilevel Optimization with Bregman Divergences

Bilevel optimization methods are increasingly relevant within machine learning, especially for tasks such as hyperparameter optimization and meta-learning. Compared to the offline setting, online bilevel optimization (OBO) offers a more dynamic framework by accommodating time-varying functions and sequentially arriving data. This study addresses the online nonconvex-strongly convex bilevel optimization problem. In deterministic settings, we introduce a novel online Bregman bilevel optimizer (OBBO) that utilizes adaptive Bregman divergences. We demonstrate that OBBO enhances the known sublinear rates for bilevel local regret through a novel hypergradient error decomposition that adapts to the underlying geometry of the problem. In stochastic contexts, we introduce the first stochastic online bilevel optimizer (SOBBO), which employs a window averaging method for updating outer-level variables using a weighted average of recent stochastic approximations of hypergradients. This approach not only achieves sublinear rates of bilevel local regret but also serves as an effective variance reduction strategy, obviating the need for additional stochastic gradient samples at each timestep. Experiments on online hyperparameter optimization and online meta-learning highlight the superior performance, efficiency, and adaptability of our Bregman-based algorithms compared to established online and offline bilevel benchmarks.

Updated: 2024-09-16 17:01:27

标题: 使用Bregman散度的在线非凸双层优化

摘要: 双层优化方法在机器学习中变得越来越重要，特别是在超参数优化和元学习等任务中。与离线设置相比，在线双层优化（OBO）通过容纳时间变化的函数和顺序到达的数据，提供了一个更动态的框架。本研究解决了在线非凸-强凸双层优化问题。在确定性环境中，我们引入了一种新颖的在线Bregman双层优化器（OBBO），利用自适应的Bregman散度。我们通过一种适应于问题底层几何结构的新型超梯度误差分解展示了OBBO如何增强已知的双层局部遗憾的亚线性速率。在随机环境中，我们介绍了第一个随机在线双层优化器（SOBBO），它使用窗口平均方法来更新外层变量，使用最近随机近似超梯度的加权平均值。这种方法不仅实现了双层局部遗憾的亚线性速率，还作为一种有效的方差减少策略，避免了每个时间步需要额外的随机梯度样本。在线超参数优化和在线元学习的实验突显了我们基于Bregman的算法相对于已建立的在线和离线双层基准的卓越性能、效率和适应性。

更新时间: 2024-09-16 17:01:27

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2409.10470v1

Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons

Multilayer Perceptrons (MLPs) have long been a cornerstone in deep learning, known for their capacity to model complex relationships. Recently, Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative, utilizing highly flexible learnable activation functions directly on network edges, a departure from the neuron-centric approach of MLPs. However, KANs significantly increase the number of learnable parameters, raising concerns about their effectiveness in data-scarce environments. This paper presents a comprehensive comparative study of MLPs and KANs from both algorithmic and experimental perspectives, with a focus on low-data regimes. We introduce an effective technique for designing MLPs with unique, parameterized activation functions for each neuron, enabling a more balanced comparison with KANs. Using empirical evaluations on simulated data and two real-world data sets from medicine and engineering, we explore the trade-offs between model complexity and accuracy, with particular attention to the role of network depth. Our findings show that MLPs with individualized activation functions achieve significantly higher predictive accuracy with only a modest increase in parameters, especially when the sample size is limited to around one hundred. For example, in a three-class classification problem within additive manufacturing, MLPs achieve a median accuracy of 0.91, significantly outperforming KANs, which only reach a median accuracy of 0.53 with default hyperparameters. These results offer valuable insights into the impact of activation function selection in neural networks.

Updated: 2024-09-16 16:56:08

标题: 科尔莫戈洛夫-阿诺德网络在低数据情境中的应用：与多层感知器的比较研究

摘要: 多层感知器（MLPs）长期以来一直是深度学习中的基石，以其模拟复杂关系的能力而闻名。最近，科尔莫戈洛夫-阿诺德网络（KANs）作为一种引人注目的替代方案出现，利用高度灵活的可学习激活函数直接作用于网络边缘，与MLPs的神经元为中心的方法有所不同。然而，KANs显著增加了可学习参数的数量，引发了对它们在数据稀缺环境中有效性的担忧。本文从算法和实验角度对MLPs和KANs进行了全面比较研究，重点关注低数据范围。我们提出了一种有效的技术，用于设计具有独特参数化激活函数的MLPs，使其与KANs进行更加平衡的比较。通过对模拟数据和来自医学和工程领域的两个真实数据集的经验评估，我们探讨了模型复杂性和准确性之间的权衡，特别关注网络深度的作用。我们的研究结果表明，具有个性化激活函数的MLPs在样本量约为一百时，仅在参数略微增加的情况下就实现了显著更高的预测准确性。例如，在增材制造中的三类分类问题中，MLPs实现了0.91的中位准确率，明显优于KANs，后者仅在默认超参数下达到了0.53的中位准确率。这些结果为神经网络中激活函数选择的影响提供了宝贵的见解。

更新时间: 2024-09-16 16:56:08

领域: cs.LG,stat.CO,stat.ML

下载: http://arxiv.org/abs/2409.10463v1

Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting

Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and two-meter temperature two weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multi-model approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability.

Updated: 2024-09-16 16:43:41

标题: 超越集合平均值：利用气候模型集合进行次季节预测

摘要: 生产高质量的关键气候变量（如温度和降水）的次季节时间尺度预测长期以来一直是运营预测中的一个空白。本研究探讨了机器学习（ML）模型作为次季节预测后处理工具的应用。滞后数值集合预测（即成员具有不同初始化日期的集合）和观测数据，包括相对湿度、海平面压力和位势高度，被纳入各种ML方法中，以预测美国大陆未来两周的月平均降水和两米温度。对于回归、分位数回归和三分位分类任务，我们考虑使用线性模型、随机森林、卷积神经网络和堆叠模型（基于个体ML模型的预测的多模型方法）。与以往通常仅使用集合均值的ML方法不同，我们利用嵌入在集合预测中的信息来增强预测准确性。此外，我们研究了对规划和减灾工作至关重要的极端事件预测。将集合成员视为空间预测的集合，我们探讨了使用空间信息的不同方法。不同方法之间的权衡可以通过模型堆叠来缓解。我们提出的模型优于标准基线，如气候预测和集合均值。此外，我们调查了特征重要性、使用完整集合还是仅使用集合均值以及不同的空间变异性计算模式之间的权衡。

更新时间: 2024-09-16 16:43:41

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2211.15856v5

Signed Graph Autoencoder for Explainable and Polarization-Aware Network Embeddings

Autoencoders based on Graph Neural Networks (GNNs) have garnered significant attention in recent years for their ability to extract informative latent representations, characterizing the structure of complex topologies, such as graphs. Despite the prevalence of Graph Autoencoders, there has been limited focus on developing and evaluating explainable neural-based graph generative models specifically designed for signed networks. To address this gap, we propose the Signed Graph Archetypal Autoencoder (SGAAE) framework. SGAAE extracts node-level representations that express node memberships over distinct extreme profiles, referred to as archetypes, within the network. This is achieved by projecting the graph onto a learned polytope, which governs its polarization. The framework employs a recently proposed likelihood for analyzing signed networks based on the Skellam distribution, combined with relational archetypal analysis and GNNs. Our experimental evaluation demonstrates the SGAAEs' capability to successfully infer node memberships over the different underlying latent structures while extracting competing communities formed through the participation of the opposing views in the network. Additionally, we introduce the 2-level network polarization problem and show how SGAAE is able to characterize such a setting. The proposed model achieves high performance in different tasks of signed link prediction across four real-world datasets, outperforming several baseline models.

Updated: 2024-09-16 16:40:40

标题: 签名图自编码器用于可解释和意识到极化的网络嵌入

摘要: 基于图神经网络（GNNs）的自动编码器近年来引起了广泛关注，因为它们能够提取信息丰富的潜在表示，描述复杂拓扑结构，如图形。尽管图自动编码器很常见，但在开发和评估专门针对带符号网络设计的可解释性神经图生成模型方面的关注有限。为解决这一差距，我们提出了带符号图原型自动编码器（SGAAE）框架。SGAAE提取节点级表示，表达网络中不同极端特征（原型）上的节点成员资格。这是通过将图形投影到一个学习的多面体上实现的，该多面体控制其极化。该框架采用了最近提出的基于Skellam分布的符号网络分析的可能性，结合了关系原型分析和GNNs。我们的实验评估表明，SGAAE能够成功推断出节点成员资格，同时提取通过网络中对立观点的参与形成的竞争性社区。此外，我们引入了2级网络极化问题，并展示了SGAAE如何描述这种设置。所提出的模型在四个真实数据集上的带符号链接预测任务中表现出高性能，优于多个基准模型。

更新时间: 2024-09-16 16:40:40

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2409.10452v1

Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models

Distributed maximization of a submodular function in the MapReduce (MR) model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property -- which had previously only been known to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive (highly parallelizable) algorithms satisfy the consistency property required to work in the MR setting, which yields practical, parallelizable and distributed algorithms. Separately, we develop the first distributed algorithm with linear query complexity for this problem. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds.

Updated: 2024-09-16 16:39:48

标题: 可扩展的分布式算法：MapReduce和自适应复杂性模型中有大小限制的子模量最大化

摘要: 在MapReduce（MR）模型中分布式最大化子模函数引起了广泛关注，最终形成了两个框架，允许在MR环境中运行集中式算法而不会丢失近似值，只要集中式算法满足某种一致性属性 - 这之前只有标准贪婪和连续贪婪算法已知满足该属性。另一方面，有研究在自适应复杂度模型中并行化子模最大化，其中每个线程可以访问整个基础集。对于大小受限的单调和子模函数最大化，我们展示了几种次线性自适应（高度可并行化）算法满足在MR环境中工作所需的一致性属性，从而实现了实用的、可并行化的和分布式算法。另外，我们开发了这个问题的第一个具有线性查询复杂度的分布式算法。最后，我们提供了一种方法，通过增加额外的MR轮次来增加MR算法的最大基数约束。

更新时间: 2024-09-16 16:39:48

领域: cs.DS,cs.DC,cs.LG

下载: http://arxiv.org/abs/2206.09563v6

Local Methods with Adaptivity via Scaling

The rapid development of machine learning and deep learning has introduced increasingly complex optimization challenges that must be addressed. Indeed, training modern, advanced models has become difficult to implement without leveraging multiple computing nodes in a distributed environment. Distributed optimization is also fundamental to emerging fields such as federated learning. Specifically, there is a need to organize the training process to minimize the time lost due to communication. A widely used and extensively researched technique to mitigate the communication bottleneck involves performing local training before communication. This approach is the focus of our paper. Concurrently, adaptive methods that incorporate scaling, notably led by Adam, have gained significant popularity in recent years. Therefore, this paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods. We consider the classical Local SGD method and enhance it with a scaling feature. A crucial aspect is that the scaling is described generically, allowing us to analyze various approaches, including Adam, RMSProp, and OASIS, in a unified manner. In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.

Updated: 2024-09-16 16:30:09

标题: 本地方法通过缩放实现自适应性

摘要: 机器学习和深度学习的快速发展引入了越来越复杂的优化挑战，必须加以解决。实际上，训练现代先进模型已经变得很难在不利用分布式环境中的多个计算节点的情况下实现。分布式优化对于新兴领域如联邦学习也是至关重要的。具体来说，有必要组织训练过程以最小化由于通信而导致的时间损失。一种广泛使用和深入研究的缓解通信瓶颈的技术涉及在通信之前进行本地训练。这种方法是我们论文的重点。同时，融入缩放的自适应方法，尤其是由Adam领导的方法，在近年来获得了显著的流行度。因此，本文旨在将本地训练技术与自适应方法相结合，开发高效的分布式学习方法。我们考虑经典的本地SGD方法，并增加了一个缩放特性。一个关键的方面是，这种缩放是通用描述的，使我们能够以统一的方式分析各种方法，包括Adam、RMSProp和OASIS。除了理论分析，我们通过训练神经网络来验证我们方法的性能。

更新时间: 2024-09-16 16:30:09

领域: cs.LG,cs.DC,math.OC

下载: http://arxiv.org/abs/2406.00846v3

Discrete Neural Algorithmic Reasoning

Neural algorithmic reasoning aims to capture computations with neural networks via learning the models to imitate the execution of classic algorithms. While common architectures are expressive enough to contain the correct model in the weights space, current neural reasoners are struggling to generalize well on out-of-distribution data. On the other hand, classic computations are not affected by distributional shifts as they can be described as transitions between discrete computational states. In this work, we propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states. To achieve that, we separate discrete and continuous data flows and describe the interaction between them. Trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm. To show this, we evaluate our approach on multiple algorithmic problems and get perfect test scores both in single-task and multitask setups. Moreover, the proposed architectural choice allows us to prove the correctness of the learned algorithms for any test~data.

Updated: 2024-09-16 16:22:40

标题: 离散神经算法推理

摘要: 神经算法推理旨在通过学习模型来模拟经典算法的执行，以捕捉神经网络中的计算。虽然常见的架构在权重空间中包含正确的模型，但当前的神经推理器在处理超出分布数据时仍然存在泛化困难。另一方面，经典计算不受分布变化的影响，因为它们可以描述为离散计算状态之间的过渡。在这项工作中，我们建议强制神经推理器将执行轨迹保持为有限预定义状态的组合。为了实现这一点，我们分离离散和连续数据流，并描述它们之间的交互。通过对算法状态转换进行监督训练，这样的模型能够完美地与原始算法对齐。为了证明这一点，我们在多个算法问题上评估我们的方法，并在单任务和多任务设置中获得完美的测试分数。此外，所提出的架构选择使我们能够证明对任何测试数据学习算法的正确性。

更新时间: 2024-09-16 16:22:40

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.11628v2

Structure-preserving learning for multi-symplectic PDEs

This paper presents an energy-preserving machine learning method for inferring reduced-order models (ROMs) by exploiting the multi-symplectic form of partial differential equations (PDEs). The vast majority of energy-preserving reduced-order methods use symplectic Galerkin projection to construct reduced-order Hamiltonian models by projecting the full models onto a symplectic subspace. However, symplectic projection requires the existence of fully discrete operators, and in many cases, such as black-box PDE solvers, these operators are inaccessible. In this work, we propose an energy-preserving machine learning method that can infer the dynamics of the given PDE using data only, so that the proposed framework does not depend on the fully discrete operators. In this context, the proposed method is non-intrusive. The proposed method is grey box in the sense that it requires only some basic knowledge of the multi-symplectic model at the partial differential equation level. We prove that the proposed method satisfies spatially discrete local energy conservation and preserves the multi-symplectic conservation laws. We test our method on the linear wave equation, the Korteweg-de Vries equation, and the Zakharov-Kuznetsov equation. We test the generalization of our learned models by testing them far outside the training time interval.

Updated: 2024-09-16 16:07:21

标题: 多辛PDE的结构保持学习

摘要: 本文提出了一种能量保持的机器学习方法，通过利用偏微分方程（PDEs）的多辛形式推断降阶模型（ROMs）。绝大多数能量保持的降阶方法使用辛Galerkin投影来构建通过将完整模型投影到辛子空间的降阶哈密顿模型。然而，辛投影需要存在完全离散的算子，而在许多情况下，如黑盒PDE求解器，这些算子是无法访问的。在这项工作中，我们提出了一种能够仅通过数据推断给定PDE的动态的能量保持机器学习方法，因此所提出的框架不依赖于完全离散的算子。在这种情况下，所提出的方法是非侵入式的。所提出的方法在某种程度上是灰盒的，因为它仅需要对偏微分方程水平的多辛模型有一些基础知识。我们证明了所提出的方法满足空间离散的局部能量守恒，并保持多辛守恒定律。我们在线性波动方程、Korteweg-de Vries方程和Zakharov-Kuznetsov方程上测试了我们的方法。我们通过在远离训练时间间隔的地方测试来测试我们学习模型的泛化能力。

更新时间: 2024-09-16 16:07:21

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2409.10432v1

Towards Human-Like Driving: Active Inference in Autonomous Vehicle Control

This paper presents a novel approach to Autonomous Vehicle (AV) control through the application of active inference, a theory derived from neuroscience that conceptualizes the brain as a predictive machine. Traditional autonomous driving systems rely heavily on Modular Pipelines, Imitation Learning, or Reinforcement Learning, each with inherent limitations in adaptability, generalization, and computational efficiency. Active inference addresses these challenges by minimizing prediction error (termed "surprise") through a dynamic model that balances perception and action. Our method integrates active inference with deep learning to manage lateral control in AVs, enabling them to perform lane following maneuvers within a simulated urban environment. We demonstrate that our model, despite its simplicity, effectively learns and generalizes from limited data without extensive retraining, significantly reducing computational demands. The proposed approach not only enhances the adaptability and performance of AVs in dynamic scenarios but also aligns closely with human-like driving behavior, leveraging a generative model to predict and adapt to environmental changes. Results from extensive experiments in the CARLA simulator show promising outcomes, outperforming traditional methods in terms of adaptability and efficiency, thereby advancing the potential of active inference in real-world autonomous driving applications.

Updated: 2024-09-16 16:02:46

标题: 朝向人类驾驶：自动驾驶车辆控制中的主动推理

摘要: 本文提出了一种新颖的自主车辆（AV）控制方法，通过应用源自神经科学的主动推断理论，将大脑概念化为一个预测机器。传统的自主驾驶系统主要依赖模块化管道、模仿学习或强化学习，每种方法都具有适应性、泛化性和计算效率方面的局限性。主动推断通过最小化预测误差（称为“惊讶”）来解决这些挑战，通过平衡感知和行动的动态模型。我们的方法将主动推断与深度学习相结合，管理AV中的横向控制，使其能够在模拟城市环境中执行车道跟随操作。我们展示了尽管我们的模型简单，但它有效地学习和泛化有限数据，无需大量重新训练，显著降低了计算需求。所提出的方法不仅增强了AV在动态场景中的适应性和性能，而且与人类驾驶行为密切相关，利用生成模型来预测和适应环境变化。在CARLA模拟器中进行的大量实验结果显示了有希望的结果，在适应性和效率方面优于传统方法，从而推动了主动推断在真实世界自主驾驶应用中的潜力。

更新时间: 2024-09-16 16:02:46

领域: cs.RO,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.07684v2

HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models

Robots interacting with humans through natural language can unlock numerous applications such as Referring Grasp Synthesis (RGS). Given a text query, RGS determines a stable grasp pose to manipulate the referred object in the robot's workspace. RGS comprises two steps: visual grounding and grasp pose estimation. Recent studies leverage powerful Vision-Language Models (VLMs) for visually grounding free-flowing natural language in real-world robotic execution. However, comparisons in complex, cluttered environments with multiple instances of the same object are lacking. This paper introduces HiFi-CS, featuring hierarchical application of Featurewise Linear Modulation (FiLM) to fuse image and text embeddings, enhancing visual grounding for complex attribute rich text queries encountered in robotic grasping. Visual grounding associates an object in 2D/3D space with natural language input and is studied in two scenarios: Closed and Open Vocabulary. HiFi-CS features a lightweight decoder combined with a frozen VLM and outperforms competitive baselines in closed vocabulary settings while being 100x smaller in size. Our model can effectively guide open-set object detectors like GroundedSAM to enhance open-vocabulary performance. We validate our approach through real-world RGS experiments using a 7-DOF robotic arm, achieving 90.33\% visual grounding accuracy in 15 tabletop scenes. We include our codebase in the supplementary material.

Updated: 2024-09-16 15:50:39

标题: HiFi-CS: 通过视觉-语言模型实现机器人抓取的开放词汇视觉定位

摘要: 通过自然语言与人类互动的机器人可以解锁诸多应用，如参考抓取综合（RGS）。在给定的文本查询中，RGS确定一个稳定的抓取姿势，以操纵机器人工作空间中所指对象。RGS包括两个步骤：视觉定位和抓取姿势估计。最近的研究利用强大的视觉-语言模型（VLMs）来在真实世界的机器人执行中将自由流动的自然语言与视觉定位相结合。然而，在复杂、混乱的环境中缺乏对同一对象多个实例的比较。本文介绍了HiFi-CS，其特点是将特征线性调制（FiLM）层次化地应用于图像和文本嵌入，增强了在机器人抓取中遇到的复杂属性丰富文本查询的视觉定位。视觉定位将2D/3D空间中的对象与自然语言输入关联起来，并在两种场景中进行了研究：封闭词汇表和开放词汇表。HiFi-CS具有轻量级解码器，结合冻结的VLM，在封闭词汇表设置中表现优于竞争基线，同时体积小100倍。我们的模型可以有效地指导类似GroundedSAM的开放集对象检测器，以增强开放词汇表的性能。我们通过使用7自由度机械臂进行真实世界的RGS实验来验证我们的方法，在15个桌面场景中实现了90.33\%的视觉定位准确性。我们将我们的代码库包含在补充材料中。

更新时间: 2024-09-16 15:50:39

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.10419v1

Geometric Clustering for Hardware-Efficient Implementation of Chromatic Dispersion Compensation

Power efficiency remains a significant challenge in modern optical fiber communication systems, driving efforts to reduce the computational complexity of digital signal processing, particularly in chromatic dispersion compensation (CDC) algorithms. While various strategies for complexity reduction have been proposed, many lack the necessary hardware implementation to validate their benefits. This paper provides a theoretical analysis of the tap overlapping effect in CDC filters for coherent receivers, introduces a novel Time-Domain Clustered Equalizer (TDCE) technique based on this concept, and presents a Field-Programmable Gate Array (FPGA) implementation for validation. We developed an innovative parallelization method for TDCE, implementing it in hardware for fiber lengths up to 640 km. A fair comparison with the state-of-the-art frequency domain equalizer (FDE) under identical conditions is also conducted. Our findings highlight that implementation strategies, including parallelization and memory management, are as crucial as computational complexity in determining hardware complexity and energy efficiency. The proposed TDCE hardware implementation achieves up to 70.7\% energy savings and 71.4\% multiplier usage savings compared to FDE, despite its higher computational complexity.

Updated: 2024-09-16 15:48:05

标题: 几何聚类用于硬件高效实现色散补偿

摘要: 能源效率在现代光纤通信系统中仍然是一个重要挑战，推动着降低数字信号处理的计算复杂度，特别是在色散补偿（CDC）算法中。虽然已经提出了各种复杂度降低策略，但许多缺乏必要的硬件实现来验证它们的好处。本文提供了一种关于相干接收机中CDC滤波器中的分支叠加效应的理论分析，介绍了一种基于这一概念的新型时域聚类均衡器（TDCE）技术，并提出了一个用于验证的现场可编程门阵列（FPGA）实现。我们开发了一种创新的TDCE并行化方法，将其实现在硬件中，适用于长达640公里的光纤长度。还在相同条件下与最先进的频域均衡器（FDE）进行了公平比较。我们的研究结果表明，实现策略，包括并行化和存储管理，与计算复杂度一样重要，决定硬件复杂度和能源效率。提出的TDCE硬件实现相比FDE实现可以实现高达70.7\%的能源节约和71.4\%的乘法器使用节省，尽管其计算复杂度更高。

更新时间: 2024-09-16 15:48:05

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2409.10416v1

Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph

Locating objects described in natural language presents a significant challenge for autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform 3D object grounding with simple (bare) queries, but cannot cope with ambiguous descriptions that demand an understanding of object relations. To tackle this problem, we propose a modular approach called BBQ (Beyond Bare Queries), which constructs 3D scene graph representation with metric and semantic edges and utilizes a large language model as a human-to-agent interface through our deductive scene reasoning algorithm. BBQ employs robust DINO-powered associations to construct 3D object-centric map and an advanced raycasting algorithm with a 2D vision-language model to describe them as graph nodes. On the Replica and ScanNet datasets, we have demonstrated that BBQ takes a leading place in open-vocabulary 3D semantic segmentation compared to other zero-shot methods. Also, we show that leveraging spatial relations is especially effective for scenes containing multiple entities of the same semantic class. On challenging Sr3D+, Nr3D and ScanRefer benchmarks, our deductive approach demonstrates a significant improvement, enabling objects grounding by complex queries compared to other state-of-the-art methods. The combination of our design choices and software implementation has resulted in significant data processing speed in experiments on the robot on-board computer. This promising performance enables the application of our approach in intelligent robotics projects. We made the code publicly available at https://linukc.github.io/BeyondBareQueries/.

Updated: 2024-09-16 15:47:45

标题: 超越简单查询：具有3D场景图的开放式词汇对象定位

摘要: 定位自然语言描述的对象对自主代理提出了重要挑战。现有基于CLIP的开放词汇方法成功地使用简单（裸）查询执行3D对象接地，但无法处理需要理解对象关系的模糊描述。为了解决这个问题，我们提出了一种模块化方法，称为BBQ（超越裸查询），它利用度量和语义边构建3D场景图表示，并利用大型语言模型作为通过我们的演绎场景推理算法的人-代理接口。BBQ利用强大的DINO动力关联构建3D以对象为中心的地图，并使用2D视觉语言模型的先进射线投射算法来描述它们作为图节点。在Replica和ScanNet数据集上，我们已经证明了与其他零样本方法相比，BBQ在开放词汇3D语义分割中占据领先地位。此外，我们展示了利用空间关系对包含多个相同语义类别实体的场景尤其有效。在具有挑战性的Sr3D+、Nr3D和ScanRefer基准测试中，我们的演绎方法表现出显著改进，与其他最先进方法相比，使对象接地通过复杂查询成为可能。我们的设计选择和软件实现的结合在机器人板载计算机上的实验中实现了显著的数据处理速度。这种有前途的性能使我们的方法能够应用于智能机器人项目。我们已将代码公开发布在https://linukc.github.io/BeyondBareQueries/。

更新时间: 2024-09-16 15:47:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07113v3

A Large-Scale Privacy Assessment of Android Third-Party SDKs

Third-party Software Development Kits (SDKs) are widely adopted in Android app development, to effortlessly accelerate development pipelines and enhance app functionality. However, this convenience raises substantial concerns about unauthorized access to users' privacy-sensitive information, which could be further abused for illegitimate purposes like user tracking or monetization. Our study offers a targeted analysis of user privacy protection among Android third-party SDKs, filling a critical gap in the Android software supply chain. It focuses on two aspects of their privacy practices, including data exfiltration and behavior-policy compliance (or privacy compliance), utilizing techniques of taint analysis and large language models. It covers 158 widely-used SDKs from two key SDK release platforms, the official one and a large alternative one. From them, we identified 338 instances of privacy data exfiltration. On the privacy compliance, our study reveals that more than 30% of the examined SDKs fail to provide a privacy policy to disclose their data handling practices. Among those that provide privacy policies, 37% of them over-collect user data, and 88% falsely claim access to sensitive data. We revisit the latest versions of the SDKs after 12 months. Our analysis demonstrates a persistent lack of improvement in these concerning trends. Based on our findings, we propose three actionable recommendations to mitigate the privacy leakage risks and enhance privacy protection for Android users. Our research not only serves as an urgent call for industry attention but also provides crucial insights for future regulatory interventions.

Updated: 2024-09-16 15:44:43

标题: 一个关于安卓第三方SDK的大规模隐私评估

摘要: 第三方软件开发工具包（SDK）在Android应用开发中被广泛采用，可以轻松加速开发流程并增强应用功能。然而，这种便利性引发了对未经授权访问用户隐私敏感信息的重大关注，这可能进一步被滥用用于用户跟踪或变现等非法目的。我们的研究针对Android第三方SDK中的用户隐私保护进行了有针对性的分析，填补了Android软件供应链中的重要空白。研究重点关注其隐私实践的两个方面，包括数据外泄和行为政策合规性（或隐私合规性），利用污点分析和大型语言模型技术。研究涵盖了来自两个关键SDK发布平台（官方平台和一个大型替代平台）的158个广泛使用的SDK。我们发现其中有338个隐私数据外泄的实例。在隐私合规性方面，我们的研究显示，超过30%的受审查SDK未能提供隐私政策来披露其数据处理实践。在提供隐私政策的SDK中，37%过度收集用户数据，88%虚假声称可以访问敏感数据。我们在12个月后重新审查了SDK的最新版本。我们的分析表明这些令人担忧的趋势仍然存在持续缺乏改善。基于我们的发现，我们提出了三项可操作的建议，以减轻隐私泄露风险并增强Android用户的隐私保护。我们的研究不仅是对行业关注的紧急呼吁，还为未来监管干预提供了关键见解。

更新时间: 2024-09-16 15:44:43

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2409.10411v1

Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices

Determining the degree of confidence of deep learning model in its prediction is an open problem in the field of natural language processing. Most of the classical methods for uncertainty estimation are quite weak for text classification models. We set the task of obtaining an uncertainty estimate for neural networks based on the Transformer architecture. A key feature of such mo-dels is the attention mechanism, which supports the information flow between the hidden representations of tokens in the neural network. We explore the formed relationships between internal representations using Topological Data Analysis methods and utilize them to predict model's confidence. In this paper, we propose a method for uncertainty estimation based on the topological properties of the attention mechanism and compare it with classical methods. As a result, the proposed algorithm surpasses the existing methods in quality and opens up a new area of application of the attention mechanism, but requires the selection of topological features.

Updated: 2024-09-16 15:41:59

标题: 通过对注意力矩阵的拓扑分析估计变压器预测的不确定性

摘要: 在自然语言处理领域，确定深度学习模型在预测中的自信程度是一个开放的问题。大多数用于不确定性估计的经典方法对于文本分类模型来说相当薄弱。我们设定了基于Transformer架构的神经网络获取不确定性估计的任务。这种模型的一个关键特点是注意机制，它支持神经网络中标记的隐藏表示之间的信息流。我们使用拓扑数据分析方法探索内部表示之间形成的关系，并利用它们来预测模型的自信度。在本文中，我们提出了一种基于注意机制拓扑属性的不确定性估计方法，并将其与经典方法进行比较。结果表明，所提出的算法在质量上超过了现有方法，并开辟了注意机制的新应用领域，但需要选择拓扑特征。

更新时间: 2024-09-16 15:41:59

领域: cs.LG

下载: http://arxiv.org/abs/2308.11295v2

Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes

Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. This article provides a modern survey of DRL for robotics, with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies. Our analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. We highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms, holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks, and principled development and evaluation procedures. This survey is designed to offer insights for both RL practitioners and roboticists toward harnessing RL's power to create generally capable real-world robotic systems.

Updated: 2024-09-16 15:41:05

标题: 机器人的深度强化学习：现实世界成功案例调查

摘要: 强化学习（RL），特别是其与深度神经网络结合的深度强化学习（DRL），在广泛的应用中显示出巨大的潜力，表明其有望促进复杂机器人行为的发展。然而，机器人问题对RL的应用提出了基本困难，源于与物理世界互动的复杂性和成本。本文对机器人领域的DRL进行了现代调查，重点评估了DRL在实现几个关键机器人能力方面取得的真实世界成功。我们的分析旨在识别那些令人兴奋的成功背后的关键因素，揭示尚未充分探索的领域，并提供对DRL在机器人领域的整体特征描述。我们强调了未来工作的几个重要途径，强调了需要稳定且样本高效的真实世界RL范式，以及为了解决复杂的长期任务和开放世界任务而发现和整合各种能力的整体方法，以及原则性的开发和评估程序。这项调查旨在为RL从业者和机器人学家提供洞察力，以利用RL的力量创造普遍有能力的真实世界机器人系统。

更新时间: 2024-09-16 15:41:05

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2408.03539v3

A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration

This paper proposes a knowledge-enhanced disease diagnosis method based on a prompt learning framework. The method retrieves structured knowledge from external knowledge graphs related to clinical cases, encodes it, and injects it into the prompt templates to enhance the language model's understanding and reasoning capabilities for the task.We conducted experiments on three public datasets: CHIP-CTC, IMCS-V2-NER, and KUAKE-QTR. The results show that the proposed method significantly outperforms existing models across multiple evaluation metrics, with an F1 score improvement of 2.4% on the CHIP-CTC dataset, 3.1% on the IMCS-V2-NER dataset,and 4.2% on the KUAKE-QTR dataset. Additionally,ablation studies confirmed the critical role of the knowledge injection module,as the removal of this module resulted in a significant drop in F1 score. The experimental results demonstrate that the proposed method not only effectively improves the accuracy of disease diagnosis but also enhances the interpretability of the predictions, providing more reliable support and evidence for clinical diagnosis.

Updated: 2024-09-16 15:34:58

标题: 基于即时学习和BERT集成的知识增强疾病诊断方法

摘要: 本文提出了一种基于快速学习框架的知识增强疾病诊断方法。该方法从与临床案例相关的外部知识图中检索结构化知识，对其进行编码，并将其注入到提示模板中，以增强语言模型对任务的理解和推理能力。我们在三个公共数据集上进行了实验：CHIP-CTC，IMCS-V2-NER和KUAKE-QTR。结果显示，所提出的方法在多个评估指标上明显优于现有模型，在CHIP-CTC数据集上的F1分数提高了2.4％，在IMCS-V2-NER数据集上提高了3.1％，在KUAKE-QTR数据集上提高了4.2％。此外，消融研究证实了知识注入模块的关键作用，因为删除此模块导致F1分数显著下降。实验结果表明，所提出的方法不仅有效提高了疾病诊断的准确性，还增强了预测的可解释性，为临床诊断提供了更可靠的支持和证据。

更新时间: 2024-09-16 15:34:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10403v1

Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP

Contrastive Language--Image Pre-training (CLIP) has manifested remarkable improvements in zero-shot classification and cross-modal vision-language tasks. Yet, from a geometrical point of view, the CLIP embedding space has been found to have a pronounced modality gap. This gap renders the embedding space overly sparse and disconnected, with different modalities being densely distributed in distinct subregions of the hypersphere. In this work, we aim at answering three main questions: 1. Does sharing the parameter space between the multi-modal encoders reduce the modality gap? 2. Can the gap be mitigated by pushing apart the uni-modal embeddings via intra-modality separation? 3. How do these gap reduction approaches affect the downstream performance? We design AlignCLIP, in order to answer these questions and through extensive experiments, we show that AlignCLIP achieves noticeable enhancements in the cross-modal alignment of the embeddings, and thereby, reduces the modality gap, while improving the performance across several zero-shot and fine-tuning downstream evaluations.

Updated: 2024-09-16 15:32:11

标题: 缓解差距：研究改进CLIP中跨模态对齐的方法

摘要: 对比语言-图像预训练（CLIP）在零样本分类和跨模态视觉语言任务中表现出显著的改进。然而，从几何角度来看，发现CLIP嵌入空间存在明显的模态差距。这种差距使得嵌入空间过于稀疏和断裂，不同的模态在超球面的不同子区域中密集分布。本文旨在回答三个主要问题：1.共享多模态编码器之间的参数空间是否能减少模态差距？2.通过模内分离将单模态嵌入推开是否可以减轻差距？3.这些差距缩小方法如何影响下游性能？我们设计了AlignCLIP来回答这些问题，并通过大量实验表明，AlignCLIP在嵌入的跨模态对齐方面取得了显著的改进，从而减少了模态差距，同时提高了在几个零样本和微调下游评估中的性能。

更新时间: 2024-09-16 15:32:11

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17639v3

MOST: MR reconstruction Optimization for multiple downStream Tasks via continual learning

Deep learning-based Magnetic Resonance (MR) reconstruction methods have focused on generating high-quality images but they often overlook the impact on downstream tasks (e.g., segmentation) that utilize the reconstructed images. Cascading separately trained reconstruction network and downstream task network has been shown to introduce performance degradation due to error propagation and domain gaps between training datasets. To mitigate this issue, downstream task-oriented reconstruction optimization has been proposed for a single downstream task. Expanding this optimization to multi-task scenarios is not straightforward. In this work, we extended this optimization to sequentially introduced multiple downstream tasks and demonstrated that a single MR reconstruction network can be optimized for multiple downstream tasks by deploying continual learning (MOST). MOST integrated techniques from replay-based continual learning and image-guided loss to overcome catastrophic forgetting. Comparative experiments demonstrated that MOST outperformed a reconstruction network without finetuning, a reconstruction network with na\"ive finetuning, and conventional continual learning methods. This advancement empowers the application of a single MR reconstruction network for multiple downstream tasks. The source code is available at: https://github.com/SNU-LIST/MOST

Updated: 2024-09-16 15:31:04

标题: MOST: 利用持续学习进行多个下游任务的MR重建优化

摘要: 基于深度学习的磁共振（MR）重建方法一直专注于生成高质量的图像，但往往忽视了利用重建图像的下游任务（例如分割）所产生的影响。独立训练的重建网络和下游任务网络级联已被证明会引入性能下降，因为错误传播和训练数据集之间的领域差距。为了缓解这一问题，提出了面向下游任务的重建优化，针对单一下游任务。将这种优化扩展到多任务场景并不直观。在这项工作中，我们将这种优化扩展到顺序引入多个下游任务，并证明了单个MR重建网络可以通过部署连续学习（MOST）来优化多个下游任务。MOST整合了基于回放的连续学习和图像引导损失的技术，以克服灾难性遗忘。比较实验表明，MOST优于没有微调的重建网络、具有天真微调的重建网络和传统的连续学习方法。这一进展使得单个MR重建网络可以用于多个下游任务。源代码可在https://github.com/SNU-LIST/MOST上找到。

更新时间: 2024-09-16 15:31:04

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2409.10394v1

Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey

The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber defence, numerous challenges must be overcome before DRL can be applied to autonomous cyber operations (ACO) at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACO problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACO-DRL agent. We provide: i.) A summary of the domain properties that define the ACO problem; ii.) A comprehensive comparison of current ACO environments used for benchmarking DRL approaches; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACO. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACO.

Updated: 2024-09-16 15:28:42

标题: 深度强化学习在自主网络作战中的应用：一项调研

摘要: 近年来网络攻击数量的快速增加引发了对网络防御恶意行为的原则方法的需求。深度强化学习（DRL）已经成为减轻这些攻击的一种有希望的方法。然而，尽管DRL已经显示出在网络防御方面的巨大潜力，但在将DRL应用于大规模自治网络操作（ACO）之前必须克服许多挑战。针对与非常高维状态空间、大型多离散动作空间和对抗学习相对抗的环境，需要原则方法。最近的研究已经报道了成功解决这些问题的情况。在解决实时战略游戏中的这三个问题方面，也已经有了令人印象深刻的工程努力。然而，将DRL应用于完整的ACO问题仍然是一个开放性挑战。在这里，我们调查了相关的DRL文献，并概念化了一个理想化的ACO-DRL代理。我们提供了：i.) 定义ACO问题的领域属性摘要；ii.) 当前用于基准测试DRL方法的ACO环境的全面比较；iii.) 将DRL扩展到将学习者置于维度诅咒中的领域的最新方法概述；iv.) 从ACO的角度调查和批评当前限制代理在对抗环境中易受攻击的方法。我们总结了一些希望能激励未来研究人员和从事ACO研究的实践者的开放性研究问题。

更新时间: 2024-09-16 15:28:42

领域: cs.LG

下载: http://arxiv.org/abs/2310.07745v2

TPFL: Tsetlin-Personalized Federated Learning with Confidence-Based Clustering

The world of Machine Learning (ML) has witnessed rapid changes in terms of new models and ways to process users data. The majority of work that has been done is focused on Deep Learning (DL) based approaches. However, with the emergence of new algorithms such as the Tsetlin Machine (TM) algorithm, there is growing interest in exploring alternative approaches that may offer unique advantages in certain domains or applications. One of these domains is Federated Learning (FL), in which users privacy is of utmost importance. Due to its novelty, FL has seen a surge in the incorporation of personalization techniques to enhance model accuracy while maintaining user privacy under personalized conditions. In this work, we propose a novel approach dubbed TPFL: Tsetlin-Personalized Federated Learning, in which models are grouped into clusters based on their confidence towards a specific class. In this way, clustering can benefit from two key advantages. Firstly, clients share only what they are confident about, resulting in the elimination of wrongful weight aggregation among clients whose data for a specific class may have not been enough during the training. This phenomenon is prevalent when the data are non-Independent and Identically Distributed (non-IID). Secondly, by sharing only weights towards a specific class, communication cost is substantially reduced, making TPLF efficient in terms of both accuracy and communication cost. The results of TPFL demonstrated the highest accuracy on three different datasets; namely MNIST, FashionMNIST and FEMNIST.

Updated: 2024-09-16 15:27:35

标题: TPFL：Tsetlin个性化联邦学习与基于置信度的聚类

摘要: 机器学习（ML）领域在新模型和用户数据处理方法方面发生了快速变化。大多数研究都集中在基于深度学习（DL）的方法上。然而，随着Tsetlin Machine（TM）算法等新算法的出现，人们越来越有兴趣探索可能在某些领域或应用中提供独特优势的替代方法。其中之一是联邦学习（FL），在这种学习中用户的隐私至关重要。由于其新颖性，FL在将个性化技术纳入以提高模型准确性的同时保持用户隐私方面出现了激增。在这项工作中，我们提出了一种新方法，称为TPFL：Tsetlin-Personalized Federated Learning，其中模型根据其对特定类别的置信度被分组为簇。通过这种方式，聚类可以获得两个关键优势。首先，客户端只分享他们确信的内容，从而消除了在训练过程中数据对于特定类别不足的客户端之间的错误权重聚合。当数据是非独立同分布（non-IID）时，这种现象普遍存在。其次，通过仅分享对特定类别的权重，通信成本大大降低，使TPLF在准确性和通信成本方面都更有效率。TPFL的结果在三个不同的数据集上表现出最高的准确性，分别是MNIST、FashionMNIST和FEMNIST。

更新时间: 2024-09-16 15:27:35

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2409.10392v1

Zero-Knowledge Proof-of-Identity: Sybil-Resistant, Anonymous Authentication on Permissionless Blockchains and Incentive Compatible, Strictly Dominant Cryptocurrencies

Zero-Knowledge Proof-of-Identity from trusted public certificates (e.g., national identity cards and/or ePassports; eSIM) is introduced here to permissionless blockchains in order to remove the inefficiencies of Sybil-resistant mechanisms such as Proof-of-Work (i.e., high energy and environmental costs) and Proof-of-Stake (i.e., capital hoarding and lower transaction volume). The proposed solution effectively limits the number of mining nodes a single individual would be able to run while keeping membership open to everyone, circumventing the impossibility of full decentralization and the blockchain scalability trilemma when instantiated on a blockchain with a consensus protocol based on the cryptographic random selection of nodes. Resistance to collusion is also considered. Solving one of the most pressing problems in blockchains, a zk-PoI cryptocurrency is proved to have the following advantageous properties: - an incentive-compatible protocol for the issuing of cryptocurrency rewards based on a unique Nash equilibrium - strict domination of mining over all other PoW/PoS cryptocurrencies, thus the zk-PoI cryptocurrency becoming the preferred choice by miners is proved to be a Nash equilibrium and the Evolutionarily Stable Strategy - PoW/PoS cryptocurrencies are condemned to pay the Price of Crypto-Anarchy, redeemed by the optimal efficiency of zk-PoI as it implements the social optimum - the circulation of a zk-PoI cryptocurrency Pareto dominates other PoW/PoS cryptocurrencies - the network effects arising from the social networks inherent to national identity cards and ePassports dominate PoW/PoS cryptocurrencies - the lower costs of its infrastructure imply the existence of a unique equilibrium where it dominates other forms of payment

Updated: 2024-09-16 15:25:05

标题: 零知识身份证明：对抗Sybil攻击、匿名认证的无许可区块链和激励兼容、严格占优的加密货币

摘要: 这里介绍了一种从受信任的公共证书（如国民身份证和/或电子护照；eSIM）中实现的零知识身份证明，以将其引入无需许可的区块链，以消除Proof-of-Work（即高能耗和环境成本）和Proof-of-Stake（即资本囤积和交易量降低）等抗 Sybil 机制的低效性。所提出的解决方案有效地限制了单个个人能够运行的挖矿节点数量，同时保持会员资格对所有人开放，绕过了在基于加密随机选择节点的共识协议的区块链上实例化时的完全去中心化和区块链可扩展性三难题。还考虑了对串通的抵抗。解决区块链中最迫切的问题之一，证明了zk-PoI加密货币具有以下有利特性： -基于独特纳什均衡的发放加密货币奖励的激励兼容协议 -严格支配挖矿比其他所有PoW/PoS加密货币，因此证明zk-PoI加密货币成为矿工首选选择是纳什均衡和进化稳定策略 -PoW/PoS加密货币被判定为支付加密无政府主义的代价，通过zk-PoI的最佳效率实现社会最优 -zk-PoI加密货币的流通占据优势地位，比其他PoW/PoS加密货币更胜一筹 -源自国民身份证和电子护照固有社交网络的网络效应支配PoW/PoS加密货币 -其基础设施的较低成本意味着存在一个唯一均衡状态，其中它支配其他支付形式。

更新时间: 2024-09-16 15:25:05

领域: cs.CR,cs.GT

下载: http://arxiv.org/abs/1905.09093v3

Revising the Structure of Recurrent Neural Networks to Eliminate Numerical Derivatives in Forming Physics Informed Loss Terms with Respect to Time

Solving unsteady partial differential equations (PDEs) using recurrent neural networks (RNNs) typically requires numerical derivatives between each block of the RNN to form the physics informed loss function. However, this introduces the complexities of numerical derivatives into the training process of these models. In this study, we propose modifying the structure of the traditional RNN to enable the prediction of each block over a time interval, making it possible to calculate the derivative of the output with respect to time using the backpropagation algorithm. To achieve this, the time intervals of these blocks are overlapped, defining a mutual loss function between them. Additionally, the employment of conditional hidden states enables us to achieve a unique solution for each block. The forget factor is utilized to control the influence of the conditional hidden state on the prediction of the subsequent block. This new model, termed the Mutual Interval RNN (MI-RNN), is applied to solve three different benchmarks: the Burgers equation, unsteady heat conduction in an irregular domain, and the Green vortex problem. Our results demonstrate that MI-RNN can find the exact solution more accurately compared to existing RNN models. For instance, in the second problem, MI-RNN achieved one order of magnitude less relative error compared to the RNN model with numerical derivatives.

Updated: 2024-09-16 15:24:25

标题: 将循环神经网络的结构进行修订，以在形成基于物理信息的损失项时消除对时间的数值导数

摘要: 使用循环神经网络（RNNs）解决非定常偏微分方程（PDEs）通常需要在RNN的每个块之间进行数值导数，以形成物理信息损失函数。然而，这将数值导数的复杂性引入这些模型的训练过程中。在本研究中，我们提出修改传统RNN的结构，使其能够预测每个块在一个时间间隔内，从而可以使用反向传播算法计算输出相对于时间的导数。为了实现这一目标，这些块的时间间隔是重叠的，定义了它们之间的相互损失函数。此外，使用条件隐藏状态使我们能够为每个块实现唯一的解决方案。遗忘因子被用来控制条件隐藏状态对后续块预测的影响。这种新模型称为相互间隔RNN（MI-RNN），应用于解决三个不同的基准问题：Burgers方程，不规则域中的非定常热传导和Green涡旋问题。我们的结果表明，与现有的RNN模型相比，MI-RNN可以更准确地找到精确解。例如，在第二个问题中，与具有数值导数的RNN模型相比，MI-RNN实现了一个数量级较小的相对误差。

更新时间: 2024-09-16 15:24:25

领域: cs.LG

下载: http://arxiv.org/abs/2409.10388v1

MARCA: Mamba Accelerator with ReConfigurable Architecture

We propose a Mamba accelerator with reconfigurable architecture, MARCA.We propose three novel approaches in this paper. (1) Reduction alternative PE array architecture for both linear and element-wise operations. For linear operations, the reduction tree connected to PE arrays is enabled and executes the reduction operation. For element-wise operations, the reduction tree is disabled and the output bypasses. (2) Reusable nonlinear function unit based on the reconfigurable PE. We decompose the exponential function into element-wise operations and a shift operation by a fast biased exponential algorithm, and the activation function (SiLU) into a range detection and element-wise operations by a piecewise approximation algorithm. Thus, the reconfigurable PEs are reused to execute nonlinear functions with negligible accuracy loss.(3) Intra-operation and inter-operation buffer management strategy. We propose intra-operation buffer management strategy to maximize input data sharing for linear operations within operations, and inter-operation strategy for element-wise operations between operations. We conduct extensive experiments on Mamba model families with different sizes.MARCA achieves up to 463.22$\times$/11.66$\times$ speedup and up to 9761.42$\times$/242.52$\times$ energy efficiency compared to Intel Xeon 8358P CPU and NVIDIA Tesla A100 GPU implementations, respectively.

Updated: 2024-09-16 15:18:33

标题: MARCA：具有可重构架构的曼巴加速器

摘要: 我们提出了一个具有可重构架构的Mamba加速器MARCA。本文提出了三种新颖的方法。(1) 用于线性和逐元素操作的减少备用PE阵列架构。对于线性操作，与PE阵列连接的减少树被启用并执行减少操作。对于逐元素操作，减少树被禁用，并且输出绕过。(2) 基于可重构PE的可重用非线性函数单元。我们通过快速偏移指数算法将指数函数分解为逐元素操作和移位操作，将激活函数（SiLU）分解为范围检测和逐元素操作，使用分段逼近算法。因此，可重构PE被重复使用以执行非线性函数，精度损失可忽略不计。(3) 内部操作和操作间缓冲区管理策略。我们提出了内部操作缓冲区管理策略，以最大化线性操作中的输入数据共享，并提出了操作间策略，用于在操作之间进行逐元素操作。我们对不同规模的Mamba模型族进行了广泛的实验。MARCA相对于Intel Xeon 8358P CPU和NVIDIA Tesla A100 GPU实现分别获得了高达463.22倍/11.66倍的加速和高达9761.42倍/242.52倍的能效。

更新时间: 2024-09-16 15:18:33

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2409.11440v1

Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling

The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. There has been growing interest in using online Reinforcement Learning (RL) for JSSP. While online RL can quickly find acceptable solutions, especially for larger problems, it produces lower-quality results than traditional methods like Constraint Programming (CP). A significant downside of online RL is that it cannot learn from existing data, such as solutions generated from CP, requiring them to train from scratch, leading to sample inefficiency and making them unable to learn from more optimal examples. We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD), a novel approach for JSSP that addresses these limitations. Offline-LD adapts two CQL-based Q-learning methods (mQRDQN and discrete mSAC) for maskable action spaces, introduces a new entropy bonus modification for discrete SAC, and exploits reward normalization through preprocessing. Our experiments show that Offline-LD outperforms online RL on both generated and benchmark instances. By introducing noise into the dataset, we achieve similar or better results than those obtained from the expert dataset, indicating that a more diverse training set is preferable because it contains counterfactual information.

Updated: 2024-09-16 15:18:10

标题: 离线强化学习用于作业车间调度的学习

摘要: 作业车间调度问题（JSSP）是一个复杂的组合优化问题。近年来，人们对使用在线强化学习（RL）解决JSSP问题产生了越来越大的兴趣。虽然在线RL可以快速找到可接受的解决方案，特别是对于较大的问题，但它产生的质量低于传统方法如约束编程（CP）。在线RL的一个重要缺点是它无法从现有数据中学习，比如从CP生成的解决方案，需要它们从头开始训练，导致样本效率低下，使其无法从更优的例子中学习。我们引入了离线强化学习学习调度（Offline-LD），这是一种新颖的解决JSSP问题的方法，可以解决这些限制。Offline-LD使用两种基于CQL的Q-learning方法（mQRDQN和离散mSAC）适应可屏蔽动作空间，为离散SAC引入了新的熵奖励修改，并通过预处理利用奖励正规化。我们的实验表明，Offline-LD在生成和基准实例上均优于在线RL。通过向数据集引入噪声，我们实现了与专家数据集获得的类似或更好的结果，表明更多样化的训练集是可取的，因为它包含反事实信息。

更新时间: 2024-09-16 15:18:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.10589v1

Instigating Cooperation among LLM Agents Using Adaptive Information Modulation

This paper introduces a novel framework combining LLM agents as proxies for human strategic behavior with reinforcement learning (RL) to engage these agents in evolving strategic interactions within team environments. Our approach extends traditional agent-based simulations by using strategic LLM agents (SLA) and introducing dynamic and adaptive governance through a pro-social promoting RL agent (PPA) that modulates information access across agents in a network, optimizing social welfare and promoting pro-social behavior. Through validation in iterative games, including the prisoner dilemma, we demonstrate that SLA agents exhibit nuanced strategic adaptations. The PPA agent effectively learns to adjust information transparency, resulting in enhanced cooperation rates. This framework offers significant insights into AI-mediated social dynamics, contributing to the deployment of AI in real-world team settings.

Updated: 2024-09-16 15:15:51

标题: 促进LLM代理之间的合作：使用自适应信息调制

摘要: 本文介绍了一种新颖的框架，将LLM代理作为人类战略行为的代理与强化学习（RL）相结合，使这些代理参与团队环境中不断演变的战略互动。我们的方法通过使用战略LLM代理（SLA）扩展了传统的基于代理的模拟，并引入了通过促进社会行为的RL代理（PPA）来实现动态和自适应的治理，调节网络中代理之间的信息访问，优化社会福利并促进亲社会行为。通过在迭代游戏中验证，包括囚徒困境，我们证明SLA代理表现出细致的战略调整。PPA代理有效地学习调整信息透明度，从而提高了合作率。这一框架为AI调解的社会动态提供了重要见解，有助于将AI部署到现实世界的团队环境中。

更新时间: 2024-09-16 15:15:51

领域: cs.AI,cs.CL,cs.CY,cs.GT

下载: http://arxiv.org/abs/2409.10372v1

Learning Gentle Grasping from Human-Free Force Control Demonstration

Humans can steadily and gently grasp unfamiliar objects based on tactile perception. Robots still face challenges in achieving similar performance due to the difficulty of learning accurate grasp-force predictions and force control strategies that can be generalized from limited data. In this article, we propose an approach for learning grasping from ideal force control demonstrations, to achieve similar performance of human hands with limited data size. Our approach utilizes objects with known contact characteristics to automatically generate reference force curves without human demonstrations. In addition, we design the dual convolutional neural networks (Dual-CNN) architecture which incorporating a physics-based mechanics module for learning target grasping force predictions from demonstrations. The described method can be effectively applied in vision-based tactile sensors and enables gentle and stable grasping of objects from the ground. The described prediction model and grasping strategy were validated in offline evaluations and online experiments, and the accuracy and generalizability were demonstrated.

Updated: 2024-09-16 15:14:53

标题: 学习人类无关力控制演示中的轻柔抓取

摘要: 人类可以根据触觉感知稳定而轻柔地抓取陌生物体。机器人仍然面临挑战，因为学习准确的抓取力预测和可以从有限数据中推广的力控制策略具有困难。在本文中，我们提出了一种从理想力控制示范中学习抓取的方法，以实现在有限数据大小的情况下与人类手相似的性能。我们的方法利用具有已知接触特性的物体自动生成参考力曲线，无需人类示范。此外，我们设计了双卷积神经网络（Dual-CNN）架构，该架构结合了基于物理力学的模块，用于从示范中学习目标抓取力预测。所描述的方法可以有效应用于基于视觉的触觉传感器，并使物体能够从地面上轻柔稳定地抓取。所描述的预测模型和抓取策略经过脱机评估和在线实验验证，准确性和泛化能力得到证明。

更新时间: 2024-09-16 15:14:53

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2409.10371v1

Uncovering the Mechanism of Hepatotoxiciy of PFAS Targeting L-FABP Using GCN and Computational Modeling

Per- and polyfluoroalkyl substances (PFAS) are persistent environmental pollutants with known toxicity and bioaccumulation issues. Their widespread industrial use and resistance to degradation have led to global environmental contamination and significant health concerns. While a minority of PFAS have been extensively studied, the toxicity of many PFAS remains poorly understood due to limited direct toxicological data. This study advances the predictive modeling of PFAS toxicity by combining semi-supervised graph convolutional networks (GCNs) with molecular descriptors and fingerprints. We propose a novel approach to enhance the prediction of PFAS binding affinities by isolating molecular fingerprints to construct graphs where then descriptors are set as the node features. This approach specifically captures the structural, physicochemical, and topological features of PFAS without overfitting due to an abundance of features. Unsupervised clustering then identifies representative compounds for detailed binding studies. Our results provide a more accurate ability to estimate PFAS hepatotoxicity to provide guidance in chemical discovery of new PFAS and the development of new safety regulations.

Updated: 2024-09-16 15:13:39

标题: 揭示PFAS靶向L-FABP引发肝毒性机制的GCN和计算建模研究

摘要: 全氟和多氟烷基物质（PFAS）是持久性环境污染物，具有已知的毒性和生物累积问题。它们广泛的工业用途和对降解的抵抗性导致了全球环境污染和重大的健康问题。虽然少数PFAS得到了广泛研究，但由于受限于有限的直接毒理学数据，许多PFAS的毒性仍然知之甚少。本研究通过将半监督图卷积网络（GCNs）与分子描述符和指纹相结合，推进了对PFAS毒性的预测建模。我们提出了一种新颖的方法，通过隔离分子指纹构建图形，然后将描述符设置为节点特征，以增强对PFAS结合亲和力的预测。这种方法特别捕捉了PFAS的结构、物理化学和拓扑特征，而不会由于特征过多而过拟合。无监督聚类然后确定了代表性化合物，用于详细的结合研究。我们的结果提供了更准确的估计PFAS对肝毒性的能力，为化学发现新的PFAS和制定新的安全法规提供指导。

更新时间: 2024-09-16 15:13:39

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2409.10370v1

Robust image representations with counterfactual contrastive learning

Contrastive pretraining can substantially increase model generalisation and downstream performance. However, the quality of the learned representations is highly dependent on the data augmentation strategy applied to generate positive pairs. Positive contrastive pairs should preserve semantic meaning while discarding unwanted variations related to the data acquisition domain. Traditional contrastive pipelines attempt to simulate domain shifts through pre-defined generic image transformations. However, these do not always mimic realistic and relevant domain variations for medical imaging such as scanner differences. To tackle this issue, we herein introduce counterfactual contrastive learning, a novel framework leveraging recent advances in causal image synthesis to create contrastive positive pairs that faithfully capture relevant domain variations. Our method, evaluated across five datasets encompassing both chest radiography and mammography data, for two established contrastive objectives (SimCLR and DINO-v2), outperforms standard contrastive learning in terms of robustness to acquisition shift. Notably, counterfactual contrastive learning achieves superior downstream performance on both in-distribution and on external datasets, especially for images acquired with scanners under-represented in the training set. Further experiments show that the proposed framework extends beyond acquisition shifts, with models trained with counterfactual contrastive learning substantially improving subgroup performance across biological sex.

Updated: 2024-09-16 15:11:00

标题: 具有反事实对比学习的稳健图像表示

摘要: 对比预训练可以显着提高模型的泛化能力和下游性能。然而，所学表示的质量在很大程度上取决于应用于生成正对比对的数据增强策略。正对比对应该保留语义含义，同时丢弃与数据获取领域相关的不需要的变化。传统的对比管道试图通过预定义的通用图像转换模拟领域转移。然而，这些转换并不总是模拟医学成像领域的现实和相关领域变化，如扫描仪差异。为了解决这个问题，我们在这里介绍反事实对比学习，这是一个利用最近在因果图像合成领域取得的进展的新框架，用于创建能够忠实捕捉相关领域变化的对比正对的对比对。我们的方法在涵盖胸部放射学和乳腺X射线摄影数据的五个数据集上进行评估，针对两个已建立的对比目标（SimCLR和DINO-v2），在获取转移方面的稳健性方面优于标准对比学习。值得注意的是，反事实对比学习在分布内和外部数据集上均取得了更高的下游性能，特别是对于在训练集中代表性不足的扫描仪获取的图像。进一步的实验证明，所提出的框架超越了获取转移，通过使用反事实对比学习训练的模型显着提高了生物性别子组的性能。

更新时间: 2024-09-16 15:11:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.10365v1

2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation?

Co-speech gestures are fundamental for communication. The advent of recent deep learning techniques has facilitated the creation of lifelike, synchronous co-speech gestures for Embodied Conversational Agents. "In-the-wild" datasets, aggregating video content from platforms like YouTube via human pose detection technologies, provide a feasible solution by offering 2D skeletal sequences aligned with speech. Concurrent developments in lifting models enable the conversion of these 2D sequences into 3D gesture databases. However, it is important to note that the 3D poses estimated from the 2D extracted poses are, in essence, approximations of the ground-truth, which remains in the 2D domain. This distinction raises questions about the impact of gesture representation dimensionality on the quality of generated motions - a topic that, to our knowledge, remains largely unexplored. Our study examines the effect of using either 2D or 3D joint coordinates as training data on the performance of speech-to-gesture deep generative models. We employ a lifting model for converting generated 2D pose sequences into 3D and assess how gestures created directly in 3D stack up against those initially generated in 2D and then converted to 3D. We perform an objective evaluation using widely used metrics in the gesture generation field as well as a user study to qualitatively evaluate the different approaches.

Updated: 2024-09-16 15:06:12

标题: 2D还是不2D：手势表示的维度性如何影响3D共语手势生成？

摘要: 语音配手势对于交流至关重要。最近深度学习技术的出现促进了为具身对话代理创建逼真、同步的语音配手势。通过人体姿势检测技术从YouTube等平台聚合视频内容的“野外”数据集，提供了一个可行的解决方案，提供了与语音对齐的2D骨架序列。 lifting模型的并行发展使得这些2D序列转换为3D手势数据库成为可能。然而，重要的是要注意，从2D提取的姿势估计出的3D姿势本质上是地面真实的近似，而地面真实仍然在2D领域。这种区别引发了有关手势表示维度对生成动作质量的影响的问题——这是一个我们所知甚少的话题。我们的研究考察了使用2D或3D关节坐标作为训练数据对语音到手势深度生成模型性能的影响。我们采用lifting模型将生成的2D姿势序列转换为3D，并评估直接在3D中创建的手势与最初在2D中生成然后转换为3D的手势之间的差异。我们使用手势生成领域中广泛使用的指标进行客观评估，还进行了用户研究以定性评估不同方法。

更新时间: 2024-09-16 15:06:12

领域: cs.CV,cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2409.10357v1

Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation

Current open-vocabulary scene graph generation algorithms highly rely on both 3D scene point cloud data and posed RGB-D images and thus have limited applications in scenarios where RGB-D images or camera poses are not readily available. To solve this problem, we propose Point2Graph, a novel end-to-end point cloud-based 3D open-vocabulary scene graph generation framework in which the requirement of posed RGB-D image series is eliminated. This hierarchical framework contains room and object detection/segmentation and open-vocabulary classification. For the room layer, we leverage the advantage of merging the geometry-based border detection algorithm with the learning-based region detection to segment rooms and create a "Snap-Lookup" framework for open-vocabulary room classification. In addition, we create an end-to-end pipeline for the object layer to detect and classify 3D objects based solely on 3D point cloud data. Our evaluation results show that our framework can outperform the current state-of-the-art (SOTA) open-vocabulary object and room segmentation and classification algorithm on widely used real-scene datasets.

Updated: 2024-09-16 15:01:28

标题: Point2Graph：一种端到端的基于点云的3D开放词汇场景图，用于机器人导航

摘要: 目前的开放词汇场景图生成算法高度依赖于3D场景点云数据和姿态RGB-D图像，因此在RGB-D图像或相机姿态不容易获取的情况下应用受限。为了解决这个问题，我们提出了Point2Graph，一种新颖的基于点云的3D开放词汇场景图生成框架，其中消除了姿态RGB-D图像序列的要求。这种分层框架包含房间和物体检测/分割以及开放词汇分类。对于房间层，我们利用了将基于几何的边界检测算法与基于学习的区域检测相结合的优势，以分割房间并创建一个“Snap-Lookup”框架用于开放词汇房间分类。此外，我们为物体层创建了一个端到端的管道，仅基于3D点云数据检测和分类3D物体。我们的评估结果显示，我们的框架可以在广泛使用的真实场景数据集上胜过当前最先进的开放词汇对象和房间分割和分类算法。

更新时间: 2024-09-16 15:01:28

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.10350v1

Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models

Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that speech representations from large-scale ASR models contain valuable speaker information. This work explores the limitations of fine-tuning these models for SV using an SSL contrastive objective in an end-to-end approach. Then, we propose a framework to learn speaker representations in an SSL context by fine-tuning a pre-trained WavLM with a supervised loss using pseudo-labels. Initial pseudo-labels are derived from an SSL DINO-based model and are iteratively refined by clustering the model embeddings. Our method achieves 0.99% EER on VoxCeleb1-O, establishing the new state-of-the-art on self-supervised SV. As this performance is close to our supervised baseline of 0.94% EER, this contribution is a step towards supervised performance on SV with SSL.

Updated: 2024-09-16 14:58:01

标题: 朝向利用大规模ASR模型进行自监督学习的说话人验证监督性能

摘要: 最近自监督学习（SSL）的进展在说话者验证（SV）中显示出了有希望的结果。然而，缩小与监督系统的性能差距仍然是一个持续的挑战。一些研究观察到，来自大规模ASR模型的语音表示包含有价值的说话者信息。本研究探讨了使用SSL对这些模型进行微调以进行SV的限制，并采用端到端方法。然后，我们提出了一个框架，通过使用伪标签在SSL环境中微调预训练的WavLM并使用监督损失来学习说话者表示。初始伪标签来自基于SSL的DINO模型，并通过对模型嵌入进行聚类来迭代地优化。我们的方法在VoxCeleb1-O上实现了0.99%的EER，建立了自监督SV的新技术水平。由于这种性能接近我们的监督基线0.94%的EER，这一成果是朝着在SV上实现监督性能的一步。

更新时间: 2024-09-16 14:58:01

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2406.02285v2

Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation

Implicit feedback, often used to build recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to alleviate this by identifying noisy samples based on their diverged patterns, such as higher loss values, and mitigating the noise through sample dropping or reweighting. Despite the progress, we observe existing approaches struggle to distinguish hard samples and noise samples, as they often exhibit similar patterns, thereby limiting their effectiveness in denoising recommendations. To address this challenge, we propose a Large Language Model Enhanced Hard Sample Denoising (LLMHD) framework. Specifically, we construct an LLM-based scorer to evaluate the semantic consistency of items with the user preference, which is quantified based on summarized historical user interactions. The resulting scores are used to assess the hardness of samples for the pointwise or pairwise training objectives. To ensure efficiency, we introduce a variance-based sample pruning strategy to filter potential hard samples before scoring. Besides, we propose an iterative preference update module designed to continuously refine summarized user preference, which may be biased due to false-positive user-item interactions. Extensive experiments on three real-world datasets and four backbone recommenders demonstrate the effectiveness of our approach.

Updated: 2024-09-16 14:57:09

标题: 大型语言模型增强的困难样本识别用于去噪推荐

摘要: 隐式反馈通常用于构建推荐系统，但不可避免地面临由于诸如误点和位置偏见等因素引起的噪声。先前的研究试图通过识别基于它们的差异化模式（例如较高的损失值）的噪声样本，并通过丢弃或重新加权样本来减轻噪声。尽管取得了进展，但我们观察到现有方法往往难以区分困难样本和噪声样本，因为它们通常表现出类似的模式，从而限制了它们在去噪推荐方面的有效性。为了解决这一挑战，我们提出了一种大型语言模型增强困难样本去噪（LLMHD）框架。具体地，我们构建了一个基于LLM的评分器，用于评估物品与用户偏好的语义一致性，这是基于总结的历史用户互动来量化的。得到的评分用于评估用于点对或成对训练目标的样本的困难程度。为了确保效率，我们引入了一种基于方差的样本修剪策略，以在评分之前过滤潜在的困难样本。此外，我们提出了一个迭代的偏好更新模块，旨在持续完善总结的用户偏好，这可能受到由于误报的用户-物品互动而产生的偏见。对三个真实世界数据集和四个骨干推荐系统的广泛实验展示了我们方法的有效性。

更新时间: 2024-09-16 14:57:09

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2409.10343v1

Opponent Shaping for Antibody Development

Anti-viral therapies are typically designed or evolved towards the current strains of a virus. In learning terms, this corresponds to a myopic best response, i.e., not considering the possible adaptive moves of the opponent. However, therapy-induced selective pressures act on viral antigens to drive the emergence of mutated strains, against which initial therapies have reduced efficacy. To motivate our work, we consider antibody designs that target not only the current viral strains but also the wide range of possible future variants that the virus might evolve into under the evolutionary pressure exerted by said antibodies. Building on a computational model of binding between antibodies and viral antigens (the Absolut! framework), we design and implement a genetic simulation of the viral evolutionary escape. Crucially, this allows our antibody optimisation algorithm to consider and influence the entire escape curve of the virus, i.e. to guide (or ''shape'') the viral evolution. This is inspired by opponent shaping which, in general-sum learning, accounts for the adaptation of the co-player rather than playing a myopic best response. Hence we call the optimised antibodies shapers. Within our simulations, we demonstrate that our shapers target both current and simulated future viral variants, outperforming the antibodies chosen in a myopic way. Furthermore, we show that shapers exert specific evolutionary pressure on the virus compared to myopic antibodies. Altogether, shapers modify the evolutionary trajectories of viral strains and minimise the viral escape compared to their myopic counterparts. While this is a simple model, we hope that our proposed paradigm will enable the discovery of better long-lived vaccines and antibody therapies in the future, enabled by rapid advancements in the capabilities of simulation tools.

Updated: 2024-09-16 14:56:27

标题: 对手塑造与抗体开发

摘要: 抗病毒疗法通常是设计或演变成对当前病毒菌株的。在学习术语中，这相当于一种短视的最佳反应，即不考虑对手可能的自适应移动。然而，治疗诱导的选择性压力作用于病毒抗原，促使突变菌株的出现，初期疗法对其效力降低。为了激励我们的工作，我们考虑了不仅针对当前病毒菌株，而且还针对病毒在治疗所施加的进化压力下可能演变成的各种可能未来变体的抗体设计。在抗体和病毒抗原之间的结合的计算模型（Absolut！框架）的基础上，我们设计并实现了病毒进化逃逸的遗传模拟。关键是，这使得我们的抗体优化算法能够考虑和影响病毒的整个逃逸曲线，即引导（或“塑造”）病毒的演变。这受到对手塑造的启发，它在一般和学习中考虑了合作者的适应性，而不是采取短视的最佳反应。因此，我们将优化的抗体称为塑形者。在我们的模拟中，我们展示了我们的塑形者针对当前和模拟的未来病毒变体，优于以短视方式选择的抗体。此外，我们展示了与短视抗体相比，塑形者对病毒施加特定的进化压力。总的来说，与短视对应物相比，塑形者改变了病毒菌株的演化轨迹，并最大程度地减少了病毒逃逸。尽管这是一个简单的模型，但我们希望我们提出的范式将在未来使人们能够发现更好的长效疫苗和抗体疗法，这得益于仿真工具能力的迅速进步。

更新时间: 2024-09-16 14:56:27

领域: q-bio.PE,cs.AI,cs.GT,cs.MA,92-08,I.2.1; J.3

下载: http://arxiv.org/abs/2409.10588v1

Hyperedge Modeling in Hypergraph Neural Networks by using Densest Overlapping Subgraphs

Hypergraphs tackle the limitations of traditional graphs by introducing {\em hyperedges}. While graph edges connect only two nodes, hyperedges connect an arbitrary number of nodes along their edges. Also, the underlying message-passing mechanisms in Hypergraph Neural Networks (HGNNs) are in the form of vertex-hyperedge-vertex, which let HGNNs capture and utilize richer and more complex structural information than traditional Graph Neural Networks (GNNs). More recently, the idea of overlapping subgraphs has emerged. These subgraphs can capture more information about subgroups of vertices without limiting one vertex belonging to just one group, allowing vertices to belong to multiple groups or subgraphs. In addition, one of the most important problems in graph clustering is to find densest overlapping subgraphs (DOS). In this paper, we propose a solution to the DOS problem via Agglomerative Greedy Enumeration (DOSAGE) algorithm as a novel approach to enhance the process of generating the densest overlapping subgraphs and, hence, a robust construction of the hypergraphs. Experiments on standard benchmarks show that the DOSAGE algorithm significantly outperforms the HGNNs and six other methods on the node classification task.

Updated: 2024-09-16 14:56:10

标题: 使用最密集的重叠子图在超图神经网络中进行超边建模

摘要: 超图通过引入{\em 超边}来解决传统图的局限性。虽然图的边仅连接两个节点，但超边可以连接任意数量的节点。此外，超图神经网络（HGNNs）中的基础消息传递机制以顶点-超边-顶点的形式，使得HGNNs能够捕获和利用比传统图神经网络（GNNs）更丰富和复杂的结构信息。最近出现了重叠子图的概念。这些子图可以捕获关于顶点子组的更多信息，而不限制一个顶点仅属于一个组，允许顶点属于多个组或子图。此外，图聚类中最重要的问题之一是找到最密集的重叠子图（DOS）。在本文中，我们提出了一个通过聚合贪婪枚举（DOSAGE）算法解决DOS问题的解决方案，这是一种增强生成最密集重叠子图过程的新方法，从而稳健地构建超图。在标准基准测试中，实验表明DOSAGE算法在节点分类任务上显著优于HGNNs和其他六种方法。

更新时间: 2024-09-16 14:56:10

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2409.10340v1

My part is bigger than yours -- assessment within a group of peers

A project (e.g., writing a collaborative research paper) is often a group effort. At the end, each contributor identifies their contribution, often verbally. The reward, however, is very frequently financial. It leads to the question of what (percentage) share in the creation of the paper is due to individual authors. Different authors may have various opinions on the matter; even worse, their opinions may have different relevance. In this paper, we present simple models that allow aggregation of experts' views, linking the priority of his preference directly to the assessment made by other experts. In this approach, the more significant the contribution of a given expert, the greater the importance of his opinion. The presented method can be considered an attempt to find consensus among peers involved in the same project. Hence, its applications may go beyond the proposed study example of writing a scientific paper.

Updated: 2024-09-16 14:53:54

标题: 我的部分比你的大——在同行群体中的评估

摘要: 一个项目（如撰写一篇合作研究论文）通常是一个团队的努力。在最后，每个贡献者通常会口头确认他们的贡献。然而，奖励往往是财务性的。这就引出了一个问题，即在论文的创作中，个人作者应该获得多大（百分比）的份额。不同的作者可能对此有不同意见；更糟糕的是，他们的意见可能有不同的相关性。在本文中，我们提出了简单的模型，允许聚合专家的观点，将他的首选项的优先级直接与其他专家所作的评估联系起来。在这种方法中，一个专家的贡献越重要，他的意见就越重要。所提出的方法可以被视为在参与同一项目的同行之间寻找共识的尝试。因此，它的应用可能超出了写一篇科学论文的建议研究示例。

更新时间: 2024-09-16 14:53:54

领域: cs.DM,cs.AI

下载: http://arxiv.org/abs/2407.01843v2

From Ad Identifiers to Global Privacy Control: The Status Quo and Future of Opting Out of Ad Tracking on Android

Apps and their integrated third-party libraries often collect personal information from Android users for personalizing ads. This practice can be privacy-invasive. Users can limit ad tracking on Android via the AdID setting; further, the California Consumer Privacy Act (CCPA) gives user an opt-out right via Global Privacy Control (GPC). However, neither of these two privacy controls have been studied before as to whether they help Android users exercise their legally mandated opt-out right. In response, we evaluate how many Android apps are subject to the CCPA opt-out right and find it applicable to approximately 70% of apps on the top free app lists of the US Google Play Store. Our dynamic analysis of 1,811 apps from these lists shows that neither the AdID setting nor GPC effectively prevents the selling and sharing of personal information in California. For example, when disabling the AdID and sending GPC signals to the most common ad tracking domain in our dataset that implements the US Privacy String, only 4% of apps connecting to the domain indicate the opt-out status. To mitigate this shortcoming, Android's AdID setting should be evolved towards a universal GPC setting as part of Google's Privacy Sandbox.

Updated: 2024-09-16 14:53:06

标题: 从广告标识符到全球隐私控制：Android广告跟踪退出的现状和未来

摘要: 应用程序及其集成的第三方库通常会收集Android用户的个人信息，用于个性化广告。这种做法可能侵犯隐私。用户可以通过AdID设置限制Android上的广告跟踪；此外，加利福尼亚消费者隐私法（CCPA）赋予用户通过全球隐私控制（GPC）进行选择退出的权利。然而，这两种隐私控制之前从未被研究过，以确定它们是否帮助Android用户行使其法定规定的选择退出权。为此，我们评估了多少Android应用程序受到CCPA选择退出权的约束，并发现大约70%的应用程序在美国Google Play商店的热门免费应用程序列表中适用。我们对这些列表中的1,811个应用程序进行的动态分析显示，无论是AdID设置还是GPC都无法有效阻止在加利福尼亚州出售和共享个人信息。例如，当禁用AdID并向我们数据集中实现了美国隐私字符串的最常见广告跟踪域发送GPC信号时，仅有4%的连接到该域的应用程序显示选择退出状态。为了解决这一不足，Android的AdID设置应该向Google隐私沙箱的一部分发展成为通用的GPC设置。

更新时间: 2024-09-16 14:53:06

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2407.14938v2

A Scalable and Parallelizable Digital Twin Framework for Sustainable Sim2Real Transition of Multi-Agent Reinforcement Learning Systems

Multi-agent reinforcement learning (MARL) systems usually require significantly long training times due to their inherent complexity. Furthermore, deploying them in the real world demands a feature-rich environment along with multiple embodied agents, which may not be feasible due to budget or space limitations, not to mention energy consumption and safety issues. This work tries to address these pain points by presenting a sustainable digital twin framework capable of accelerating MARL training by selectively scaling parallelized workloads on-demand, and transferring the trained policies from simulation to reality using minimal hardware resources. The applicability of the proposed digital twin framework is highlighted through two representative use cases, which cover cooperative as well as competitive classes of MARL problems. We study the effect of agent and environment parallelization on training time and that of systematic domain randomization on zero-shot sim2real transfer across both the case studies. Results indicate up to 76.3% reduction in training time with the proposed parallelization scheme and as low as 2.9% sim2real gap using the suggested deployment method.

Updated: 2024-09-16 14:52:47

标题: 一个可扩展和可并行化的数字孪生框架，用于多智能体强化学习系统可持续Sim2Real过渡

摘要: 多智能体强化学习（MARL）系统通常需要显著长的训练时间，因为其固有的复杂性。此外，在现实世界中部署它们需要一个功能丰富的环境以及多个具体化的代理，这可能由于预算或空间限制而不可行，更不用说能源消耗和安全问题了。本文试图通过提出一个可持续的数字孪生框架来解决这些痛点，该框架能够通过有选择地扩展并行化工作负载来加速MARL训练，并使用最少的硬件资源将训练策略从模拟转移到现实。通过两个代表性用例突出了所提出的数字孪生框架的适用性，这些用例涵盖合作和竞争类MARL问题。我们研究了代理和环境并行化对训练时间的影响，以及系统性领域随机化对两个案例研究中零冲击sim2real转移的影响。结果表明，通过所提出的并行化方案，训练时间减少了高达76.3％，使用建议的部署方法sim2real差距低至2.9％。

更新时间: 2024-09-16 14:52:47

领域: cs.RO,cs.LG,cs.MA

下载: http://arxiv.org/abs/2403.10996v2

Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents

Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database, then generating an answer by applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with untrusted content are vulnerable to a new class of denial-of-service attacks we call jamming. An adversary can add a single ``blocker'' document to the database that will be retrieved in response to a specific query and result in the RAG system not answering this query - ostensibly because it lacks the information or because the answer is unsafe. We describe and measure the efficacy of several methods for generating blocker documents, including a new method based on black-box optimization. This method (1) does not rely on instruction injection, (2) does not require the adversary to know the embedding or LLM used by the target RAG system, and (3) does not use an auxiliary LLM to generate blocker documents. We evaluate jamming attacks on several LLMs and embeddings and demonstrate that the existing safety metrics for LLMs do not capture their vulnerability to jamming. We then discuss defenses against blocker documents.

Updated: 2024-09-16 14:52:46

标题: 机器对抗RAG：使用阻塞文档干扰检索增强生成

摘要: 检索增强生成（RAG）系统通过从知识数据库中检索相关文档，然后应用LLM生成答案来响应查询。我们证明，对不受信任内容的数据库进行操作的RAG系统容易受到一种我们称之为阻塞的新型拒绝服务攻击。对手可以向数据库添加一个单独的“阻塞”文档，以响应特定查询并导致RAG系统不回答此查询 - 表面上是因为缺乏信息或答案不安全。我们描述并衡量了几种生成阻塞文档的方法的有效性，包括基于黑盒优化的新方法。该方法（1）不依赖于指令注入，（2）不需要对手知道目标RAG系统使用的嵌入或LLM，（3）不使用辅助LLM生成阻塞文档。我们评估了对几种LLM和嵌入的阻塞攻击，并证明现有的LLM安全度量标准无法捕获其对阻塞的脆弱性。然后我们讨论了对抗阻塞文档的防御措施。

更新时间: 2024-09-16 14:52:46

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05870v2

VAE-QWGAN: Improving Quantum GANs for High Resolution Image Generation

This paper presents a novel hybrid quantum generative model, the VAE-QWGAN, which combines the strengths of a classical Variational AutoEncoder (VAE) with a hybrid Quantum Wasserstein Generative Adversarial Network (QWGAN). The VAE-QWGAN integrates the VAE decoder and QGAN generator into a single quantum model with shared parameters, utilizing the VAE's encoder for latent vector sampling during training. To generate new data from the trained model at inference, input latent vectors are sampled from a Gaussian Mixture Model (GMM), learnt on the training latent vectors. This, in turn, enhances the diversity and quality of generated images. We evaluate the model's performance on MNIST/Fashion-MNIST datasets, and demonstrate improved quality and diversity of generated images compared to existing approaches.

Updated: 2024-09-16 14:52:22

标题: VAE-QWGAN：改进量子GAN以生成高分辨率图像

摘要: 本文提出了一种新颖的混合量子生成模型，即VAE-QWGAN，它将经典变分自动编码器（VAE）的优势与混合量子Wasserstein生成对抗网络（QWGAN）相结合。VAE-QWGAN将VAE解码器和QGAN生成器集成到一个共享参数的单一量子模型中，在训练过程中利用VAE的编码器进行潜在向量采样。为了在推断时从训练模型中生成新数据，输入潜在向量从一个在训练潜在向量上学习的高斯混合模型（GMM）中进行采样。这进一步提高了生成图像的多样性和质量。我们在MNIST/Fashion-MNIST数据集上评估了模型的性能，并展示了与现有方法相比生成图像的质量和多样性的改进。

更新时间: 2024-09-16 14:52:22

领域: quant-ph,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.10339v1

The 20 questions game to distinguish large language models

In a parallel with the 20 questions game, we present a method to determine whether two large language models (LLMs), placed in a black-box context, are the same or not. The goal is to use a small set of (benign) binary questions, typically under 20. We formalize the problem and first establish a baseline using a random selection of questions from known benchmark datasets, achieving an accuracy of nearly 100% within 20 questions. After showing optimal bounds for this problem, we introduce two effective questioning heuristics able to discriminate 22 LLMs by using half as many questions for the same task. These methods offer significant advantages in terms of stealth and are thus of interest to auditors or copyright owners facing suspicions of model leaks.

Updated: 2024-09-16 14:50:29

标题: 用于区分大型语言模型的20个问题游戏

摘要: 在与20个问题游戏相似的情况下，我们提出了一种方法来确定两个大型语言模型（LLMs）在黑匣子环境中是否相同。目标是使用一小组（良性）二进制问题，通常不超过20个。我们对问题进行形式化，并首先使用来自已知基准数据集的随机选择的问题建立了一个基准，在20个问题内准确率接近100%。在展示了该问题的最佳界限后，我们引入了两种有效的质疑启发式方法，能够使用一半数量的问题来区分22个LLMs，以完成相同的任务。这些方法在隐蔽性方面提供了显著优势，因此对于面临模型泄漏嫌疑的审计员或版权所有者具有重要意义。

更新时间: 2024-09-16 14:50:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10338v1

Security, Trust and Privacy challenges in AI-driven 6G Networks

The advent of 6G networks promises unprecedented advancements in wireless communication, offering wider bandwidth and lower latency compared to its predecessors. This article explores the evolving infrastructure of 6G networks, emphasizing the transition towards a more disaggregated structure and the integration of artificial intelligence (AI) technologies. Furthermore, it explores the security, trust and privacy challenges and attacks in 6G networks, particularly those related to the use of AI. It presents a classification of network attacks stemming from its AI-centric architecture and explores technologies designed to detect or mitigate these emerging threats. The paper concludes by examining the implications and risks linked to the utilization of AI in ensuring a robust network.

Updated: 2024-09-16 14:48:20

标题: 人工智能驱动的6G网络中的安全、信任和隐私挑战

摘要: 6G网络的出现承诺了无与伦比的无线通信进步，相较于前身提供更宽带和更低延迟。本文探讨了6G网络不断发展的基础设施，强调向更分散的结构转变以及人工智能技术的整合。此外，文章探讨了6G网络中的安全、信任和隐私挑战以及攻击，特别是与人工智能使用相关的挑战。它提出了源自其以人工智能为中心架构的网络攻击分类，并探讨了旨在检测或减轻这些新兴威胁的技术。本文最后通过审视AI在确保强大网络方面的利用所带来的影响和风险来结束。

更新时间: 2024-09-16 14:48:20

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2409.10337v1

Execution-time opacity control for timed automata

Timing leaks in timed automata (TA) can occur whenever an attacker is able to deduce a secret by observing some timed behavior. In execution-time opacity, the attacker aims at deducing whether a private location was visited, by observing only the execution time. It can be decided whether a TA is opaque in this setting. In this work, we tackle control, and show that we are able to decide whether a TA can be controlled at runtime to ensure opacity. Our method is constructive, in the sense that we can exhibit such a controller. We also address the case when the attacker cannot have an infinite precision in its observations.

Updated: 2024-09-16 14:46:52

标题: 执行时间不透明度控制对时态自动机

摘要: 时态自动机（TA）中的时序泄漏可能在攻击者能够通过观察某些时序行为推断出秘密信息时发生。在执行时间不透明性中，攻击者旨在通过观察执行时间来推断是否访问了私密位置。可以在此设置中决定TA是否是不透明的。在这项工作中，我们解决了控制问题，并展示我们能够决定在运行时是否可以控制TA以确保不透明性。我们的方法是建设性的，即我们可以展示这样一个控制器。我们还讨论了当攻击者无法在观察中拥有无限精度时的情况。

更新时间: 2024-09-16 14:46:52

领域: cs.CR

下载: http://arxiv.org/abs/2409.10336v1

Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned

Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in performance. To address this challenge, we presented an innovative paradigm that incorporates three separate forms of curriculum learning specifically targeting from spatial, temporal, and quantile perspectives. Furthermore, our framework incorporates a stacking fusion module to combine diverse information from three types of curriculum learning, resulting in a strong and thorough learning process. We demonstrated the effectiveness of this framework with extensive empirical evaluations, highlighting its better performance in addressing complex ST challenges. We provided thorough ablation studies to investigate the effectiveness of our curriculum and to explain how it contributes to the improvement of learning efficiency on ST data.

Updated: 2024-09-16 14:44:53

标题: 用课程学习增强时空分位数预测：经验教训

摘要: 在空间-时间（ST）数据上训练模型存在一个开放性问题，这是由于数据本身的复杂和多样化性质，直接在原始ST数据上训练模型的性能难以保证。虽然限制训练数据的种类可以使训练变得更容易，但也可能导致模型缺乏知识和信息，从而导致性能下降。为了解决这一挑战，我们提出了一种创新的范式，将三种不同形式的课程学习结合起来，分别从空间、时间和分位数的角度进行针对性训练。此外，我们的框架还包括一个堆叠融合模块，用于将三种类型的课程学习中的多样信息结合起来，从而实现强大而全面的学习过程。我们通过大量的实证评估证明了这一框架的有效性，突显了它在解决复杂ST挑战方面的更好表现。我们进行了彻底的消融研究，以调查我们的课程学习的有效性，并解释它如何有助于提高ST数据的学习效率。

更新时间: 2024-09-16 14:44:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12709v2

Research and Design of a Financial Intelligent Risk Control Platform Based on Big Data Analysis and Deep Machine Learning

In the financial field of the United States, the application of big data technology has become one of the important means for financial institutions to enhance competitiveness and reduce risks. The core objective of this article is to explore how to fully utilize big data technology to achieve complete integration of internal and external data of financial institutions, and create an efficient and reliable platform for big data collection, storage, and analysis. With the continuous expansion and innovation of financial business, traditional risk management models are no longer able to meet the increasingly complex market demands. This article adopts big data mining and real-time streaming data processing technology to monitor, analyze, and alert various business data. Through statistical analysis of historical data and precise mining of customer transaction behavior and relationships, potential risks can be more accurately identified and timely responses can be made. This article designs and implements a financial big data intelligent risk control platform. This platform not only achieves effective integration, storage, and analysis of internal and external data of financial institutions, but also intelligently displays customer characteristics and their related relationships, as well as intelligent supervision of various risk information

Updated: 2024-09-16 14:41:41

标题: 基于大数据分析和深度机器学习的金融智能风险控制平台的研究与设计

摘要: 在美国金融领域，大数据技术的应用已成为金融机构提升竞争力和降低风险的重要手段之一。本文的核心目标是探讨如何充分利用大数据技术实现金融机构内部和外部数据的完全整合，创建一个高效可靠的大数据收集、存储和分析平台。随着金融业务的持续扩张和创新，传统的风险管理模型已不再能满足日益复杂的市场需求。本文采用大数据挖掘和实时流数据处理技术来监控、分析和警示各种业务数据。通过对历史数据的统计分析和精确挖掘客户交易行为和关系，潜在风险可以更准确地识别并及时作出响应。本文设计并实施了一个金融大数据智能风险控制平台。这个平台不仅实现了金融机构内部和外部数据的有效整合、存储和分析，还智能显示客户特征及其相关关系，以及智能监督各种风险信息。

更新时间: 2024-09-16 14:41:41

领域: q-fin.RM,cs.LG

下载: http://arxiv.org/abs/2409.10331v1

Enhancing Next Destination Prediction: A Novel Long Short-Term Memory Neural Network Approach Using Real-World Airline Data

In the modern transportation industry, accurate prediction of travelers' next destinations brings multiple benefits to companies, such as customer satisfaction and targeted marketing. This study focuses on developing a precise model that captures the sequential patterns and dependencies in travel data, enabling accurate predictions of individual travelers' future destinations. To achieve this, a novel model architecture with a sliding window approach based on Long Short-Term Memory (LSTM) is proposed for destination prediction in the transportation industry. The experimental results highlight satisfactory performance and high scores achieved by the proposed model across different data sizes and performance metrics. This research contributes to advancing destination prediction methods, empowering companies to deliver personalized recommendations and optimize customer experiences in the dynamic travel landscape.

Updated: 2024-09-16 14:40:16

标题: 提升下一个目的地预测：使用真实世界航空数据的新型长短期记忆神经网络方法

摘要: 在现代交通运输行业中，准确预测旅行者的下一个目的地给公司带来了多重好处，例如提高客户满意度和精准营销。本研究侧重于开发一个精确的模型，捕捉旅行数据中的序列模式和依赖关系，实现对个体旅行者未来目的地的准确预测。为了实现这一目标，提出了一种基于长短期记忆（LSTM）的滑动窗口方法的新模型架构，用于交通运输行业中的目的地预测。实验结果突出了所提出模型在不同数据规模和性能指标下实现的令人满意的性能和高分数。这项研究有助于推进目的地预测方法，使公司能够提供个性化推荐并优化在动态旅行环境中的客户体验。

更新时间: 2024-09-16 14:40:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.12830v2

InfoDisent: Explainability of Image Classification Models by Information Disentanglement

Understanding the decisions made by image classification networks is a critical area of research in deep learning. This task is traditionally divided into two distinct approaches: post-hoc methods and intrinsic methods. Post-hoc methods, such as GradCam, aim to interpret the decisions of pre-trained models by identifying regions of the image where the network focuses its attention. However, these methods provide only a high-level overview, making it difficult to fully understand the network's decision-making process. Conversely, intrinsic methods, like prototypical parts models, offer a more detailed understanding of network predictions but are constrained by specific architectures, training methods, and datasets. In this paper, we introduce InfoDisent, a hybrid model that combines the advantages of both approaches. By utilizing an information bottleneck, InfoDisent disentangles the information in the final layer of a pre-trained deep network, enabling the breakdown of classification decisions into basic, understandable atomic components. Unlike standard prototypical parts approaches, InfoDisent can interpret the decisions of pre-trained classification networks and be used for making classification decisions, similar to intrinsic models. We validate the effectiveness of InfoDisent on benchmark datasets such as ImageNet, CUB-200-2011, Stanford Cars, and Stanford Dogs for both convolutional and transformer backbones.

Updated: 2024-09-16 14:39:15

标题: InfoDisent:通过信息解缠解释图像分类模型的可解释性

摘要: 理解图像分类网络所做的决策是深度学习中的一个关键研究领域。这项任务传统上被分为两种不同的方法：事后方法和内在方法。事后方法，如GradCam，旨在通过识别网络聚焦注意力的图像区域来解释预训练模型的决策。然而，这些方法只提供了一个高层次的概述，使得完全理解网络的决策过程变得困难。相反，内在方法，如原型部件模型，提供了对网络预测更详细的理解，但受到特定架构、训练方法和数据集的限制。在本文中，我们介绍了InfoDisent，这是一个结合了两种方法优势的混合模型。通过利用信息瓶颈，InfoDisent解开了预训练深度网络最后一层中的信息，使得分类决策可以分解为基本、易理解的原子组件。与标准的原型部件方法不同，InfoDisent可以解释预训练分类网络的决策，并用于制定分类决策，类似于内在模型。我们验证了InfoDisent在ImageNet、CUB-200-2011、Stanford Cars和Stanford Dogs等基准数据集上的有效性，适用于卷积和Transformer骨干网络。

更新时间: 2024-09-16 14:39:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.10329v1

On the Hardness of Meaningful Local Guarantees in Nonsmooth Nonconvex Optimization

We study the oracle complexity of nonsmooth nonconvex optimization, with the algorithm assumed to have access only to local function information. It has been shown by Davis, Drusvyatskiy, and Jiang (2023) that for nonsmooth Lipschitz functions satisfying certain regularity and strictness conditions, perturbed gradient descent converges to local minimizers asymptotically. Motivated by this result and by other recent algorithmic advances in nonconvex nonsmooth optimization concerning Goldstein stationarity, we consider the question of obtaining a non-asymptotic rate of convergence to local minima for this problem class. We provide the following negative answer to this question: Local algorithms acting on regular Lipschitz functions cannot, in the worst case, provide meaningful local guarantees in terms of function value in sub-exponential time, even when all near-stationary points are global minima. This sharply contrasts with the smooth setting, for which it is well-known that standard gradient methods can do so in a dimension-independent rate. Our result complements the rich body of work in the theoretical computer science literature that provide hardness results conditional on conjectures such as $\mathsf{P}\neq\mathsf{NP}$ or cryptographic assumptions, in that ours holds unconditional of any such assumptions.

Updated: 2024-09-16 14:35:00

标题: 关于非光滑非凸优化中有意义的局部保证的困难程度

摘要: 我们研究非光滑非凸优化的Oracle复杂度，假定算法仅能访问局部函数信息。Davis、Drusvyatskiy和Jiang（2023年）已经证明，对于满足一定正则性和严格性条件的非光滑Lipschitz函数，扰动梯度下降在渐近情况下收敛到局部极小值点。受这一结果和最近关于Goldstein静态性的非凸非光滑优化算法进展的启发，我们考虑对于这一问题类别获得收敛到局部极小值的非渐近速率的问题。我们给出了对于这一问题的负面回答：对于正则Lipschitz函数，局部算法在最坏情况下不能在次指数时间内提供关于函数值的有意义的局部保证，即使所有近稳定点都是全局极小值。这与光滑设置形成鲜明对比，对于光滑设置而言，众所周知标准梯度方法可以以与维度无关的速率实现。我们的结果补充了理论计算机科学文献中提供基于假设（如P≠NP或密码学假设）的困难性结果的丰富研究，因为我们的结果与任何这种假设无关。

更新时间: 2024-09-16 14:35:00

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2409.10323v1

SEAL: Towards Safe Autonomous Driving via Skill-Enabled Adversary Learning for Closed-Loop Scenario Generation

Verification and validation of autonomous driving (AD) systems and components is of increasing importance, as such technology increases in real-world prevalence. Safety-critical scenario generation is a key approach to robustify AD policies through closed-loop training. However, existing approaches for scenario generation rely on simplistic objectives, resulting in overly-aggressive or non-reactive adversarial behaviors. To generate diverse adversarial yet realistic scenarios, we propose SEAL, a scenario perturbation approach which leverages learned scoring functions and adversarial, human-like skills. SEAL-perturbed scenarios are more realistic than SOTA baselines, leading to improved ego task success across real-world, in-distribution, and out-of-distribution scenarios, of more than 20%. To facilitate future research, we release our code and tools: https://github.com/cmubig/SEAL

Updated: 2024-09-16 14:33:21

标题: SEAL: 通过技能增强的对手学习实现闭环场景生成的安全自主驾驶

摘要: 自动驾驶系统和组件的验证和验证在现实世界中的普及增加，因此变得越来越重要。安全关键场景生成是通过闭环训练加强自动驾驶策略的关键方法。然而，现有的场景生成方法依赖于简单的目标，导致过于激进或无反应的对抗行为。为了生成多样化的对抗性但现实的场景，我们提出了SEAL，一种场景扰动方法，利用学习到的评分函数和对抗性的、类似人类的技能。与SOTA基线相比，SEAL扰动的场景更加真实，导致在真实世界、分布内和分布外场景中，自我任务成功率提高了超过20%。为了促进未来研究，我们发布了我们的代码和工具：https://github.com/cmubig/SEAL

更新时间: 2024-09-16 14:33:21

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.10320v1

Probabilistic energy forecasting through quantile regression in reproducing kernel Hilbert spaces

Accurate energy demand forecasting is crucial for sustainable and resilient energy development. To meet the Net Zero Representative Concentration Pathways (RCP) $4.5$ scenario in the DACH countries, increased renewable energy production, energy storage, and reduced commercial building consumption are needed. This scenario's success depends on hydroelectric capacity and climatic factors. Informed decisions require quantifying uncertainty in forecasts. This study explores a non-parametric method based on \emph{reproducing kernel Hilbert spaces (RKHS)}, known as kernel quantile regression, for energy prediction. Our experiments demonstrate its reliability and sharpness, and we benchmark it against state-of-the-art methods in load and price forecasting for the DACH region. We offer our implementation in conjunction with additional scripts to ensure the reproducibility of our research.

Updated: 2024-09-16 14:30:14

标题: 在再生核希尔伯特空间中通过分位数回归进行概率能源预测

摘要: 准确的能源需求预测对于可持续和具有弹性的能源发展至关重要。为了满足DACH国家在Net Zero代表性浓度路径(RCP)4.5情景下的要求，需要增加可再生能源生产、能源储存和减少商业建筑的能耗。这一情景的成功取决于水电容量和气候因素。明智的决策需要量化预测中的不确定性。本研究探讨了一种基于再生核希尔伯特空间(RKHS)的非参数方法，即核分位回归，用于能源预测。我们的实验表明了它的可靠性和锐利性，并将其与DACH地区负荷和价格预测的最先进方法进行了基准测试。我们提供我们的实现，配合额外的脚本，以确保我们研究的可重现性。

更新时间: 2024-09-16 14:30:14

领域: cs.LG,cs.AI,cs.SY,eess.SY,I.2; G.4

下载: http://arxiv.org/abs/2408.04405v3

Deep Learning tools to support deforestation monitoring in the Ivory Coast using SAR and Optical satellite imagery

Deforestation is gaining an increasingly importance due to its strong influence on the sorrounding environment, especially in developing countries where population has a disadvantaged economic condition and agriculture is the main source of income. In Ivory Coast, for instance, where the cocoa production is the most remunerative activity, it is not rare to assist to the replacement of portion of ancient forests with new cocoa plantations. In order to monitor this type of deleterious activities, satellites can be employed to recognize the disappearance of the forest to prevent it from expand its area of interest. In this study, Forest-Non-Forest map (FNF) has been used as ground truth for models based on Sentinel images input. State-of-the-art models U-Net, Attention U-Net, Segnet and FCN32 are compared over different years combining Sentinel-1, Sentinel-2 and cloud probability to create forest/non-forest segmentation. Although Ivory Coast lacks of forest coverage datasets and is partially covered by Sentinel images, it is demonstrated the feasibility to create models classifying forest and non-forests pixels over the area using open datasets to predict where deforestation could have occurred. Although a significant portion of the deforestation research is carried out on visible bands, SAR acquisitions are employed to overcome the limits of RGB images over areas often covered by clouds. Finally, the most promising model is employed to estimate the hectares of forest has been cut between 2019 and 2020.

Updated: 2024-09-16 14:26:41

标题: 利用SAR和光学卫星图像的深度学习工具支持象牙海岸森林砍伐监测

摘要: 森林砍伐由于其对周围环境的强烈影响而日益重要，特别是在发展中国家，人口经济条件不利，农业是主要收入来源的情况下。例如，在象牙海岸，可可生产是最有利可图的活动，不少古老森林的部分被新的可可种植园取代并非罕见。为了监控这种有害活动，卫星可以用于识别森林消失以防止其扩大感兴趣的范围。在这项研究中，森林-非森林地图（FNF）被用作基于Sentinel图像输入的模型的基本事实。最先进的模型U-Net，Attention U-Net，Segnet和FCN32在不同年份上结合Sentinel-1，Sentinel-2和云概率进行比较，以创建森林/非森林分割。尽管象牙海岸缺乏森林覆盖数据集，并且部分被Sentinel图像覆盖，但已经证明可以利用开放数据集创建模型，对该地区的森林和非森林像素进行分类，以预测可能发生砍伐的地点。尽管大部分砍伐研究是在可见波段上进行的，但SAR数据采集用于克服经常被云覆盖的区域上RGB图像的限制。最后，最有前景的模型被用来估计2019年至2020年间砍伐了多少公顷的森林。

更新时间: 2024-09-16 14:26:41

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.11186v1

Model Selection of Anomaly Detectors in the Absence of Labeled Validation Data

Anomaly detection is the task of identifying abnormal samples in large unlabeled datasets. While the advent of foundation models has produced powerful zero-shot anomaly detection methods, their deployment in practice is often hindered by the absence of labeled validation data -- without it, their detection performance cannot be evaluated reliably. In this work, we propose SWSA (Selection With Synthetic Anomalies): a general-purpose framework to select image-based anomaly detectors without labeled validation data. Instead of collecting labeled validation data, we generate synthetic anomalies without any training or fine-tuning, using only a small support set of normal images. Our synthetic anomalies are used to create detection tasks that compose a validation framework for model selection. In an empirical study, we evaluate SWSA with three types of synthetic anomalies and on two selection tasks: model selection of image-based anomaly detectors and prompt selection for CLIP-based anomaly detection. SWSA often selects models and prompts that match selections made with a ground-truth validation set, outperforming baseline selection strategies.

Updated: 2024-09-16 14:24:48

标题: 在缺少标记验证数据时异常检测器的模型选择

摘要: 异常检测是在大型未标记数据集中识别异常样本的任务。尽管基础模型的出现产生了强大的零样本异常检测方法，但它们在实践中的部署通常受限于缺乏标记的验证数据 - 没有这些数据，无法可靠地评估它们的检测性能。在这项工作中，我们提出了SWSA（选择与合成异常）：一个通用框架，用于选择基于图像的异常检测器，而无需标记的验证数据。我们不收集标记的验证数据，而是使用仅有的一小组正常图像支持集生成合成异常，而无需任何训练或微调。我们使用合成异常创建检测任务，构成模型选择的验证框架。在一项实证研究中，我们评估了SWSA与三种类型的合成异常以及两个选择任务：基于图像的异常检测器的模型选择和基于CLIP的异常检测的提示选择。SWSA经常选择与基准选择策略相匹配的模型和提示，表现优于基准选择策略。

更新时间: 2024-09-16 14:24:48

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2310.10461v3

Reviewing AI's Role in Non-Muscle-Invasive Bladder Cancer Recurrence Prediction

Notorious for its 70-80% recurrence rate, Non-muscle-invasive Bladder Cancer (NMIBC) imposes a significant human burden and is one of the costliest cancers to manage. Current tools for predicting NMIBC recurrence rely on scoring systems that often overestimate risk and have poor accuracy. This is where Machine learning (ML)-based techniques have emerged as a promising approach for predicting NMIBC recurrence by leveraging molecular and clinical data. This comprehensive review paper critically analyses ML-based frameworks for predicting NMIBC recurrence, focusing on their statistical robustness and algorithmic efficacy. We meticulously examine the strengths and weaknesses of each study, by focusing on various prediction tasks, data modalities, and ML models, highlighting their remarkable performance alongside inherent limitations. A diverse array of ML algorithms that leverage multimodal data spanning radiomics, clinical, histopathological, and genomic data, exhibit significant promise in accurately predicting NMIBC recurrence. However, the path to widespread adoption faces challenges concerning the generalisability and interpretability of models, emphasising the need for collaborative efforts, robust datasets, and the incorporation of cost-effectiveness. Our detailed categorisation and in-depth analysis illuminate the nuances, complexities, and contexts that influence real-world advancement and adoption of these AI-based techniques. This rigorous analysis equips researchers with a deeper understanding of the intricacies of the ML algorithms employed. Researchers can use these insights to refine approaches, address limitations, and boost generalisability of their ML models, ultimately leading to reduced healthcare costs and improved patient outcomes.

Updated: 2024-09-16 14:19:39

标题: 审查人工智能在非肌层浸润性膀胱癌复发预测中的作用

摘要: 以其70-80%的复发率而臭名昭著，非肌内膀胱癌（NMIBC）给人类带来了重大负担，并且是管理成本最高的癌症之一。目前用于预测NMIBC复发的工具依赖于评分系统，这些系统通常高估风险且准确性较差。这就是基于机器学习（ML）技术作为一种有希望的方法出现在预测NMIBC复发方面的原因，通过利用分子和临床数据。本综合性综述文章对预测NMIBC复发的基于ML的框架进行了批判性分析，重点关注它们的统计稳健性和算法效力。我们通过专注于各种预测任务、数据类型和ML模型，细致地分析了每项研究的优势和劣势，突出了它们在显著性能和固有限制方面的表现。利用涵盖放射组学、临床、组织病理学和基因组数据的多模态数据的多样化ML算法显示出在准确预测NMIBC复发方面的显著潜力。然而，普及采用的道路面临挑战，涉及模型的泛化性和可解释性，强调了合作努力、稳健数据集和成本效益的整合的必要性。我们的详细分类和深入分析阐明了影响这些基于AI技术的方法在现实世界中推进和采用的微妙、复杂和上下文因素。这种严谨的分析使研究人员更深入地了解所采用的ML算法的复杂性。研究人员可以利用这些见解来完善方法、解决限制，并提高其ML模型的泛化性，最终降低医疗成本并改善患者结果。

更新时间: 2024-09-16 14:19:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.10586v2

Know your limits! Optimize the robot's behavior through self-awareness

As humanoid robots transition from labs to real-world environments, it is essential to democratize robot control for non-expert users. Recent human-robot imitation algorithms focus on following a reference human motion with high precision, but they are susceptible to the quality of the reference motion and require the human operator to simplify its movements to match the robot's capabilities. Instead, we consider that the robot should understand and adapt the reference motion to its own abilities, facilitating the operator's task. For that, we introduce a deep-learning model that anticipates the robot's performance when imitating a given reference. Then, our system can generate multiple references given a high-level task command, assign a score to each of them, and select the best reference to achieve the desired robot behavior. Our Self-AWare model (SAW) ranks potential robot behaviors based on various criteria, such as fall likelihood, adherence to the reference motion, and smoothness. We integrate advanced motion generation, robot control, and SAW in one unique system, ensuring optimal robot behavior for any task command. For instance, SAW can anticipate falls with 99.29% accuracy. For more information check our project page: https://evm7.github.io/Self-AWare

Updated: 2024-09-16 14:14:58

标题: 了解自己的限制！通过自我意识优化机器人的行为

摘要: 随着人形机器人从实验室过渡到现实世界环境，将机器人控制民主化为非专家用户至关重要。最近的人机模仿算法专注于以高精度跟随参考人类动作，但它们容易受到参考动作质量的影响，并要求人类操作员简化其动作以匹配机器人的能力。相反，我们认为机器人应该理解并调整参考动作以适应自身能力，从而简化操作员的任务。为此，我们引入了一个深度学习模型，可以预测机器人在模仿给定参考时的表现。然后，我们的系统可以生成多个参考，给定一个高级任务命令，为每个参考分配一个分数，并选择最佳参考以实现所需的机器人行为。我们的自我意识模型（SAW）根据各种标准对潜在的机器人行为进行排名，例如摔倒可能性、与参考动作的一致性和平滑度。我们将先进的运动生成、机器人控制和SAW集成到一个独特的系统中，确保任何任务命令的最佳机器人行为。例如，SAW可以以99.29%的准确率预测摔倒。更多信息请查看我们的项目页面：https://evm7.github.io/Self-Aware

更新时间: 2024-09-16 14:14:58

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.10308v1

Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

We consider the well-studied Robust $(k, z)$-Clustering problem, which generalizes the classic $k$-Median, $k$-Means, and $k$-Center problems. Given a constant $z\ge 1$, the input to Robust $(k, z)$-Clustering is a set $P$ of $n$ weighted points in a metric space $(M,\delta)$ and a positive integer $k$. Further, each point belongs to one (or more) of the $m$ many different groups $S_1,S_2,\ldots,S_m$. Our goal is to find a set $X$ of $k$ centers such that $\max_{i \in [m]} \sum_{p \in S_i} w(p) \delta(p,X)^z$ is minimized. This problem arises in the domains of robust optimization [Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010] and in algorithmic fairness. For polynomial time computation, an approximation factor of $O(\log m/\log\log m)$ is known [Makarychev, Vakilian, COLT $2021$], which is tight under a plausible complexity assumption even in the line metrics. For FPT time, there is a $(3^z+\epsilon)$-approximation algorithm, which is tight under GAP-ETH [Goyal, Jaiswal, Inf. Proc. Letters, 2023]. Motivated by the tight lower bounds for general discrete metrics, we focus on \emph{geometric} spaces such as the (discrete) high-dimensional Euclidean setting and metrics of low doubling dimension, which play an important role in data analysis applications. First, for a universal constant $\eta_0 >0.0006$, we devise a $3^z(1-\eta_{0})$-factor FPT approximation algorithm for discrete high-dimensional Euclidean spaces thereby bypassing the lower bound for general metrics. We complement this result by showing that even the special case of $k$-Center in dimension $\Theta(\log n)$ is $(\sqrt{3/2}- o(1))$-hard to approximate for FPT algorithms. Finally, we complete the FPT approximation landscape by designing an FPT $(1+\epsilon)$-approximation scheme (EPAS) for the metric of sub-logarithmic doubling dimension.

Updated: 2024-09-16 14:13:03

标题: 离散几何空间中稳健聚类的参数化近似

摘要: 我们考虑了广泛研究的鲁棒$(k, z)$-聚类问题，该问题推广了经典的$k$-中位数、$k$-均值和$k$-中心问题。给定常数$z\ge 1$，鲁棒$(k, z)$-聚类的输入是在度量空间$(M,\delta)$中的一组$n$个加权点$P$和一个正整数$k$。此外，每个点属于$m$个不同组$S_1,S_2,\ldots,S_m$中的一个（或多个）。我们的目标是找到一个包含$k$个中心的集合$X$，使得$\max_{i \in [m]} \sum_{p \in S_i} w(p) \delta(p,X)^z$最小化。这个问题出现在鲁棒优化[Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010]和算法公平性领域。对于多项式时间的计算，已知的近似因子为$O(\log m/\log\log m)$，这在一个合理的复杂性假设下即使在线性度量中也是紧密的。对于FPT时间，存在一个$(3^z+\epsilon)$-近似算法，这在GAP-ETH假设下是紧密的[Goyal, Jaiswal, Inf. Proc. Letters, 2023]。受一般离散度量的严格下界的启发，我们专注于\emph{几何}空间，如（离散的）高维欧几里得设置和低加倍维度的度量，在数据分析应用中起着重要作用。首先，对于一个通用常数$\eta_0 >0.0006$，我们设计了一个$3^z(1-\eta_{0})$-因子的FPT近似算法，用于离散高维欧几里得空间，从而绕过一般度量的下界。我们通过展示，即使是维度$\Theta(\log n)$中的$k$-中心的特殊情况对于FPT算法也是$(\sqrt{3/2}- o(1))$-难以近似的。最后，我们通过为次对数加倍维度的度量设计一个FPT $(1+\epsilon)$-近似方案（EPAS），完成了FPT近似方案的全景。

更新时间: 2024-09-16 14:13:03

领域: cs.DS,cs.CG,cs.LG

下载: http://arxiv.org/abs/2305.07316v2

How to do impactful research in artificial intelligence for chemistry and materials science

Machine learning has been pervasively touching many fields of science. Chemistry and materials science are no exception. While machine learning has been making a great impact, it is still not reaching its full potential or maturity. In this perspective, we first outline current applications across a diversity of problems in chemistry. Then, we discuss how machine learning researchers view and approach problems in the field. Finally, we provide our considerations for maximizing impact when researching machine learning for chemistry.

Updated: 2024-09-16 14:10:38

标题: 如何在化学和材料科学中进行有影响力的人工智能研究

摘要: 机器学习已经广泛应用于许多科学领域。化学和材料科学也不例外。虽然机器学习已经产生了巨大的影响，但它仍未达到其全部潜力或成熟度。在这个视角中，我们首先概述了化学领域中各种问题的当前应用。然后，我们讨论了机器学习研究人员如何看待和解决该领域的问题。最后，我们提出了在研究化学机器学习时最大化影响的考虑。

更新时间: 2024-09-16 14:10:38

领域: cs.LG,cond-mat.mtrl-sci,cs.AI,physics.chem-ph

下载: http://arxiv.org/abs/2409.10304v1

On Synthetic Texture Datasets: Challenges, Creation, and Curation

The influence of textures on machine learning models has been an ongoing investigation, specifically in texture bias/learning, interpretability, and robustness. However, due to the lack of large and diverse texture data available, the findings in these works have been limited, as more comprehensive evaluations have not been feasible. Image generative models are able to provide data creation at scale, but utilizing these models for texture synthesis has been unexplored and poses additional challenges both in creating accurate texture images and validating those images. In this work, we introduce an extensible methodology and corresponding new dataset for generating high-quality, diverse texture images capable of supporting a broad set of texture-based tasks. Our pipeline consists of: (1) developing prompts from a range of descriptors to serve as input to text-to-image models, (2) adopting and adapting Stable Diffusion pipelines to generate and filter the corresponding images, and (3) further filtering down to the highest quality images. Through this, we create the Prompted Textures Dataset (PTD), a dataset of 362,880 texture images that span 56 textures. During the process of generating images, we find that NSFW safety filters in image generation pipelines are highly sensitive to texture (and flag up to 60\% of our texture images), uncovering a potential bias in these models and presenting unique challenges when working with texture data. Through both standard metrics and a human evaluation, we find that our dataset is high quality and diverse.

Updated: 2024-09-16 14:02:18

标题: 关于合成纹理数据集：挑战、创建和策划

摘要: 纹理对机器学习模型的影响一直是一个持续的研究课题，特别是在纹理偏差/学习、可解释性和鲁棒性方面。然而，由于缺乏大量和多样化的纹理数据，这些研究的发现受到限制，因为更全面的评估是不可行的。图像生成模型能够提供规模化的数据生成，但利用这些模型进行纹理合成尚未被探索，并且在创建准确的纹理图像和验证这些图像方面面临额外挑战。在这项工作中，我们引入了一种可扩展的方法和相应的新数据集，用于生成高质量、多样化的纹理图像，能够支持广泛的基于纹理的任务。我们的流程包括：（1）从一系列描述符中开发提示，作为输入到文本到图像模型，（2）采用并调整稳定扩散流程来生成和过滤相应的图像，（3）进一步过滤到最高质量的图像。通过这一过程，我们创建了Prompted Textures Dataset（PTD），这是一个包含362,880个涵盖56种纹理的纹理图像的数据集。在生成图像的过程中，我们发现图像生成流程中的NSFW安全过滤器对纹理非常敏感（并标记了我们的纹理图像高达60\%），揭示了这些模型中的潜在偏见，并在处理纹理数据时提出了独特挑战。通过标准指标和人工评估，我们发现我们的数据集质量高，多样化。

更新时间: 2024-09-16 14:02:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.10297v1

MGSA: Multi-granularity Graph Structure Attention for Knowledge Graph-to-Text Generation

The Knowledge Graph-to-Text Generation task aims to convert structured knowledge graphs into coherent and human-readable natural language text. Recent efforts in this field have focused on enhancing pre-trained language models (PLMs) by incorporating graph structure information to capture the intricate structure details of knowledge graphs. However, most of these approaches tend to capture only single-granularity structure information, concentrating either on the relationships between entities within the original graph or on the relationships between words within the same entity or across different entities. This narrow focus results in a significant limitation: models that concentrate solely on entity-level structure fail to capture the nuanced semantic relationships between words, while those that focus only on word-level structure overlook the broader relationships between original entire entities. To overcome these limitations, this paper introduces the Multi-granularity Graph Structure Attention (MGSA), which is based on PLMs. The encoder of the model architecture features an entity-level structure encoding module, a word-level structure encoding module, and an aggregation module that synthesizes information from both structure. This multi-granularity structure encoding approach allows the model to simultaneously capture both entity-level and word-level structure information, providing a more comprehensive understanding of the knowledge graph's structure information, thereby significantly improving the quality of the generated text. We conducted extensive evaluations of the MGSA model using two widely recognized KG-to-Text Generation benchmark datasets, WebNLG and EventNarrative, where it consistently outperformed models that rely solely on single-granularity structure information, demonstrating the effectiveness of our approach.

Updated: 2024-09-16 14:01:03

标题: MGSA：用于知识图文本生成的多粒度图结构注意力

摘要: 知识图谱到文本生成任务旨在将结构化知识图谱转换为连贯且易于阅读的自然语言文本。最近在这一领域的努力集中在通过将图结构信息纳入预训练语言模型（PLMs）来增强这些模型，以捕获知识图谱的复杂结构细节。然而，大多数方法往往只捕获单一粒度的结构信息，要么集中于原始图内实体之间的关系，要么集中于同一实体内或跨不同实体之间的单词之间的关系。这种狭窄的焦点导致一个重要限制：仅专注于实体级别结构的模型无法捕获单词之间微妙的语义关系，而仅专注于单词级别结构的模型则忽略了原始整体实体之间更广泛的关系。为了克服这些限制，本文介绍了基于PLMs的多粒度图结构注意力（MGSA）。该模型架构的编码器包括一个实体级结构编码模块，一个单词级结构编码模块和一个汇聚模块，从两种结构中综合信息。这种多粒度结构编码方法使模型能够同时捕获实体级和单词级结构信息，提供对知识图谱结构信息更全面的理解，从而显著提高生成文本的质量。我们使用两个广泛认可的KG-to-Text生成基准数据集WebNLG和EventNarrative对MGSA模型进行了广泛评估，在这些数据集上，该模型始终优于仅依赖于单一粒度结构信息的模型，证明了我们方法的有效性。

更新时间: 2024-09-16 14:01:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10294v1

Neuromorphic Spintronics

Neuromorphic spintronics combines two advanced fields in technology, neuromorphic computing and spintronics, to create brain-inspired, efficient computing systems that leverage the unique properties of the electron's spin. In this book chapter, we first introduce both fields - neuromorphic computing and spintronics and then make a case for neuromorphic spintronics. We discuss concrete examples of neuromorphic spintronics, including computing based on fluctuations, artificial neural networks, and reservoir computing, highlighting their potential to revolutionize computational efficiency and functionality.

Updated: 2024-09-16 13:57:39

标题: 神经形态自旋电子学

摘要: 神经形态自旋电子学结合了两个先进技术领域，神经形态计算和自旋电子学，以创建受大脑启发的高效计算系统，利用电子自旋的独特性质。在这本书的章节中，我们首先介绍了神经形态计算和自旋电子学这两个领域，然后为神经形态自旋电子学提出了论据。我们讨论了神经形态自旋电子学的具体示例，包括基于波动、人工神经网络和储备计算的计算，突出它们在革新计算效率和功能方面的潜力。

更新时间: 2024-09-16 13:57:39

领域: cond-mat.mtrl-sci,cond-mat.mes-hall,cond-mat.other,cs.AI

下载: http://arxiv.org/abs/2409.10290v1

ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework

Empathetic response generation necessitates the integration of emotional and intentional dynamics to foster meaningful interactions. Existing research either neglects the intricate interplay between emotion and intent, leading to suboptimal controllability of empathy, or resorts to large language models (LLMs), which incur significant computational overhead. In this paper, we introduce ReflectDiffu, a lightweight and comprehensive framework for empathetic response generation. This framework incorporates emotion contagion to augment emotional expressiveness and employs an emotion-reasoning mask to pinpoint critical emotional elements. Additionally, it integrates intent mimicry within reinforcement learning for refinement during diffusion. By harnessing an intent twice reflect the mechanism of Exploring-Sampling-Correcting, ReflectDiffu adeptly translates emotional decision-making into precise intent actions, thereby addressing empathetic response misalignments stemming from emotional misrecognition. Through reflection, the framework maps emotional states to intents, markedly enhancing both response empathy and flexibility. Comprehensive experiments reveal that ReflectDiffu outperforms existing models regarding relevance, controllability, and informativeness, achieving state-of-the-art results in both automatic and human evaluations.

Updated: 2024-09-16 13:56:17

标题: ReflectDiffu：通过RL-Diffusion框架在情感意图传染和模仿之间反思，以产生共情回应

摘要: 情感响应生成需要整合情绪和意图动态以促进有意义的互动。现有研究要么忽视情绪和意图之间的复杂相互作用，导致共情控制不佳，要么借助大型语言模型（LLMs），这会带来显著的计算开销。在本文中，我们介绍了ReflectDiffu，这是一个轻量级且全面的共情响应生成框架。该框架引入情感传染以增强情感表达，并利用情感推理掩模来准确定位关键情感元素。此外，它将意图模仿融入强化学习中，以在扩散过程中进行细化。通过利用“探索-采样-校正”机制两次反映意图，ReflectDiffu能够将情感决策精确地转化为意图行为，从而解决由于情感错误识别引起的共情响应不一致问题。通过反思，该框架将情感状态映射到意图，显著提升了响应共情性和灵活性。全面的实验结果表明，ReflectDiffu在相关性、可控性和信息性方面优于现有模型，在自动评估和人工评估中均取得了最先进的结果。

更新时间: 2024-09-16 13:56:17

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.10289v1

How Can We Effectively Expand the Vocabulary of LLMs with 0.01GB of Target Language Text?

Large language models (LLMs) have shown remarkable capabilities in many languages beyond English. Yet, LLMs require more inference steps when generating non-English text due to their reliance on English-centric tokenizers and vocabulary, resulting in higher usage costs to non-English speakers. Vocabulary expansion with target language tokens is a widely used cross-lingual vocabulary adaptation approach to remedy this issue. Despite its effectiveness in inference speedup, previous work on vocabulary expansion has focused on high-resource settings assuming access to a substantial amount of target language data to effectively initialize the embeddings of the new tokens and adapt the LLM to the target language. However, vocabulary expansion in low-resource settings has yet to be explored. In this paper, we investigate vocabulary expansion in low-resource settings by considering embedding initialization methods and continual pre-training strategies. Through extensive experiments across typologically diverse languages, tasks and models, we establish a set of strategies to perform vocabulary expansion for faster inference, maintaining competitive downstream performance to baselines with only 30K sentences ($\sim$0.01GB text data) from the target language.

Updated: 2024-09-16 13:55:24

标题: 如何有效地扩展LLMs的词汇量，只用0.01GB的目标语言文本？

摘要: 大型语言模型（LLMs）在英语之外的许多语言中展现出卓越的能力。然而，由于它们依赖于英语为中心的分词器和词汇，生成非英语文本时需要更多的推理步骤，导致非英语使用者的成本更高。词汇扩展与目标语言标记是一种广泛使用的跨语言词汇适应方法，用于解决此问题。尽管在推理加速方面具有有效性，但以前关于词汇扩展的工作主要集中在高资源环境中，假定可以访问大量目标语言数据，以有效初始化新标记的嵌入并使LLM适应目标语言。然而，在低资源环境中尚未探讨词汇扩展。在本文中，我们通过考虑嵌入初始化方法和持续预训练策略，研究了低资源环境中的词汇扩展。通过在类型多样的语言、任务和模型之间进行广泛实验，我们建立了一组策略，以便通过仅使用来自目标语言的30K句子（约0.01GB文本数据）进行词汇扩展，以实现更快的推理速度，并保持与基准线的竞争性下游性能。

更新时间: 2024-09-16 13:55:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.11477v2

Enhancing learning in spiking neural networks through neuronal heterogeneity and neuromodulatory signaling

Recent progress in artificial intelligence (AI) has been driven by insights from neuroscience, particularly with the development of artificial neural networks (ANNs). This has significantly enhanced the replication of complex cognitive tasks such as vision and natural language processing. Despite these advances, ANNs struggle with continual learning, adaptable knowledge transfer, robustness, and resource efficiency - capabilities that biological systems handle seamlessly. Specifically, ANNs often overlook the functional and morphological diversity of the brain, hindering their computational capabilities. Furthermore, incorporating cell-type specific neuromodulatory effects into ANNs with neuronal heterogeneity could enable learning at two spatial scales: spiking behavior at the neuronal level, and synaptic plasticity at the circuit level, thereby potentially enhancing their learning abilities. In this article, we summarize recent bio-inspired models, learning rules and architectures and propose a biologically-informed framework for enhancing ANNs. Our proposed dual-framework approach highlights the potential of spiking neural networks (SNNs) for emulating diverse spiking behaviors and dendritic compartments to simulate morphological and functional diversity of neuronal computations. Finally, we outline how the proposed approach integrates brain-inspired compartmental models and task-driven SNNs, balances bioinspiration and complexity, and provides scalable solutions for pressing AI challenges, such as continual learning, adaptability, robustness, and resource-efficiency.

Updated: 2024-09-16 13:49:32

标题: 通过神经元异质性和神经调制信号增强脉冲神经网络中的学习

摘要: 人工智能（AI）领域的最新进展受到神经科学的启发，特别是人工神经网络（ANNs）的发展。这显著增强了复杂认知任务的复制，如视觉和自然语言处理。尽管取得了这些进展，人工神经网络在持续学习、适应性知识转移、稳健性和资源效率方面仍面临困难 - 这些是生物系统轻松处理的能力。具体来说，人工神经网络经常忽视大脑的功能和形态多样性，从而阻碍了它们的计算能力。此外，将细胞类型特异性的神经调节效应纳入具有神经元异质性的人工神经网络中，可能实现两个空间尺度上的学习：神经元水平上的尖峰行为和电路水平上的突触可塑性，从而潜在地增强它们的学习能力。在本文中，我们总结了最近的受生物启发的模型、学习规则和架构，并提出了一个以生物为基础的框架来增强人工神经网络。我们提出的双框架方法突出了尖峰神经网络（SNNs）在模拟多样化尖峰行为和模拟神经元计算的形态和功能多样性方面的潜力。最后，我们概述了提出的方法如何整合受大脑启发的分室模型和任务驱动的SNNs，平衡生物启发和复杂性，并为迫切的AI挑战提供可扩展的解决方案，如持续学习、适应性、稳健性和资源效率。

更新时间: 2024-09-16 13:49:32

领域: q-bio.NC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.04525v2

NeuroLGP-SM: Scalable Surrogate-Assisted Neuroevolution for Deep Neural Networks

Evolutionary Algorithms (EAs) play a crucial role in the architectural configuration and training of Artificial Deep Neural Networks (DNNs), a process known as neuroevolution. However, neuroevolution is hindered by its inherent computational expense, requiring multiple generations, a large population, and numerous epochs. The most computationally intensive aspect lies in evaluating the fitness function of a single candidate solution. To address this challenge, we employ Surrogate-assisted EAs (SAEAs). While a few SAEAs approaches have been proposed in neuroevolution, none have been applied to truly large DNNs due to issues like intractable information usage. In this work, drawing inspiration from Genetic Programming semantics, we use phenotypic distance vectors, outputted from DNNs, alongside Kriging Partial Least Squares (KPLS), an approach that is effective in handling these large vectors, making them suitable for search. Our proposed approach, named Neuro-Linear Genetic Programming surrogate model (NeuroLGP-SM), efficiently and accurately estimates DNN fitness without the need for complete evaluations. NeuroLGP-SM demonstrates competitive or superior results compared to 12 other methods, including NeuroLGP without SM, convolutional neural networks, support vector machines, and autoencoders. Additionally, it is worth noting that NeuroLGP-SM is 25% more energy-efficient than its NeuroLGP counterpart. This efficiency advantage adds to the overall appeal of our proposed NeuroLGP-SM in optimising the configuration of large DNNs.

Updated: 2024-09-16 13:48:43

标题: NeuroLGP-SM：用于深度神经网络的可扩展辅助神经进化

摘要: 进化算法（EAs）在人工深度神经网络（DNNs）的架构配置和训练中发挥着至关重要的作用，这个过程被称为神经进化。然而，神经进化受制于其固有的计算开销，需要多代、大规模种群和大量迭代。最耗时的部分在于评估单个候选解的适应度函数。为了解决这一挑战，我们采用了辅助代理进化算法（SAEAs）。虽然在神经进化中提出了一些SAEAs方法，但由于无法处理信息使用等问题，尚未将其应用于真正大规模的DNNs。在这项工作中，我们从遗传编程语义中汲取灵感，利用从DNNs输出的表型距离向量以及Kriging偏最小二乘（KPLS）方法，这种方法在处理这些大向量时非常有效，使其适用于搜索。我们提出的方法被命名为神经线性遗传编程代理模型（NeuroLGP-SM），可以高效准确地估计DNN的适应度，而无需完整评估。与其他12种方法相比，包括没有SM的NeuroLGP、卷积神经网络、支持向量机和自动编码器，NeuroLGP-SM展示出竞争力或更优异的结果。此外，值得注意的是，NeuroLGP-SM比其NeuroLGP对应物更节能25%。这种效率优势增加了我们提出的NeuroLGP-SM在优化大型DNN配置中的吸引力。

更新时间: 2024-09-16 13:48:43

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.08786v4

On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis for Parametric Models

Recent advancements in unsupervised domain adaptation (UDA) and semi-supervised learning (SSL), particularly incorporating causality, have led to significant methodological improvements in these learning problems. However, a formal theory that explains the role of causality in the generalization performance of UDA/SSL is still lacking. In this paper, we consider the UDA/SSL scenarios where we access $m$ labelled source data and $n$ unlabelled target data as training instances under different causal settings with a parametric probabilistic model. We study the learning performance (e.g., excess risk) of prediction in the target domain from an information-theoretic perspective. Specifically, we distinguish two scenarios: the learning problem is called causal learning if the feature is the cause and the label is the effect, and is called anti-causal learning otherwise. We show that in causal learning, the excess risk depends on the size of the source sample at a rate of $O(\frac{1}{m})$ only if the labelling distribution between the source and target domains remains unchanged. In anti-causal learning, we show that the unlabelled data dominate the performance at a rate of typically $O(\frac{1}{n})$. These results bring out the relationship between the data sample size and the hardness of the learning problem with different causal mechanisms.

Updated: 2024-09-16 13:48:17

标题: 关于领域自适应和半监督学习中的因果关系：参数模型的信息论分析

摘要: 最近在无监督领域适应（UDA）和半监督学习（SSL）方面的进展，特别是包含因果关系，已经在这些学习问题中导致了显著的方法论改进。然而，目前仍然缺乏一个能够解释因果关系在UDA/SSL泛化性能中作用的正式理论。在本文中，我们考虑了UDA/SSL场景，其中我们在不同因果设置下使用参数化概率模型访问m个标记源数据和n个未标记目标数据作为训练实例。我们从信息论的角度研究了在目标域中预测的学习性能（例如，过度风险）。具体来说，我们区分了两种情况：如果特征是原因且标签是结果，则学习问题称为因果学习，否则称为反因果学习。我们表明，在因果学习中，过度风险仅在标记分布在源和目标域之间保持不变的情况下以O（1/m）的速率取决于源样本的大小。在反因果学习中，我们表明未标记数据通常在O（1/n）的速率上主导性能。这些结果揭示了数据样本大小与不同因果机制的学习问题的难度之间的关系。

更新时间: 2024-09-16 13:48:17

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2205.04641v2

Enhancing Image Classification in Small and Unbalanced Datasets through Synthetic Data Augmentation

Accurate and robust medical image classification is a challenging task, especially in application domains where available annotated datasets are small and present high imbalance between target classes. Considering that data acquisition is not always feasible, especially for underrepresented classes, our approach introduces a novel synthetic augmentation strategy using class-specific Variational Autoencoders (VAEs) and latent space interpolation to improve discrimination capabilities. By generating realistic, varied synthetic data that fills feature space gaps, we address issues of data scarcity and class imbalance. The method presented in this paper relies on the interpolation of latent representations within each class, thus enriching the training set and improving the model's generalizability and diagnostic accuracy. The proposed strategy was tested in a small dataset of 321 images created to train and validate an automatic method for assessing the quality of cleanliness of esophagogastroduodenoscopy images. By combining real and synthetic data, an increase of over 18\% in the accuracy of the most challenging underrepresented class was observed. The proposed strategy not only benefited the underrepresented class but also led to a general improvement in other metrics, including a 6\% increase in global accuracy and precision.

Updated: 2024-09-16 13:47:52

标题: 通过合成数据增强提高小型和不平衡数据集中的图像分类

摘要: 准确而稳健的医学图像分类是一项具有挑战性的任务，特别是在可用的带标注数据集较小且目标类别之间存在严重不平衡的应用领域。考虑到数据采集并非总是可行，特别是对于代表性不足的类别，我们的方法引入了一种新颖的合成增强策略，使用特定类别的变分自动编码器（VAEs）和潜在空间插值来改进区分能力。通过生成真实、多样化的合成数据，填补特征空间的空白，我们解决了数据稀缺和类别不平衡的问题。本文介绍的方法依赖于在每个类别内插值潜在表示，从而丰富训练集，提高模型的泛化能力和诊断准确性。所提出的策略在一个由321幅图像组成的小型数据集上进行了测试，用于训练和验证一种自动评估食管胃十二指肠镜图像清洁度的方法。通过结合真实和合成数据，观察到最具挑战性的代表性不足类别准确率提高了超过18\%。所提出的策略不仅使代表性不足类别受益，还导致其他指标的普遍改善，包括全局准确率和精确度提高了6\%。

更新时间: 2024-09-16 13:47:52

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.10286v1

ASMA: An Adaptive Safety Margin Algorithm for Vision-Language Drone Navigation via Scene-Aware Control Barrier Functions

In the rapidly evolving field of vision-language navigation (VLN), ensuring robust safety mechanisms remains an open challenge. Control barrier functions (CBFs) are efficient tools which guarantee safety by solving an optimal control problem. In this work, we consider the case of a teleoperated drone in a VLN setting, and add safety features by formulating a novel scene-aware CBF using ego-centric observations obtained through an RGB-D sensor. As a baseline, we implement a vision-language understanding module which uses the contrastive language image pretraining (CLIP) model to query about a user-specified (in natural language) landmark. Using the YOLO (You Only Look Once) object detector, the CLIP model is queried for verifying the cropped landmark, triggering downstream navigation. To improve navigation safety of the baseline, we propose ASMA -- an Adaptive Safety Margin Algorithm -- that crops the drone's depth map for tracking moving object(s) to perform scene-aware CBF evaluation on-the-fly. By identifying potential risky observations from the scene, ASMA enables real-time adaptation to unpredictable environmental conditions, ensuring optimal safety bounds on a VLN-powered drone actions. Using the robot operating system (ROS) middleware on a parrot bebop2 quadrotor in the gazebo environment, ASMA offers 59.4% - 61.8% increase in success rates with insignificant 5.4% - 8.2% increases in trajectory lengths compared to the baseline CBF-less VLN while recovering from unsafe situations.

Updated: 2024-09-16 13:44:50

标题: ASMA：一种用于视觉-语言无人机导航的自适应安全边缘算法，通过场景感知控制屏障函数

摘要: 在快速发展的视觉语言导航（VLN）领域，确保稳健的安全机制仍然是一个开放挑战。控制屏障功能（CBFs）是一种有效的工具，通过解决最优控制问题来确保安全。在这项工作中，我们考虑了VLN环境中的远程操作无人机的情况，并通过利用通过RGB-D传感器获取的自我中心观测来制定一种新颖的基于场景的CBF来添加安全功能。作为基准，我们实现了一个视觉语言理解模块，该模块使用对比语言图像预训练（CLIP）模型来查询用户指定的（用自然语言）地标。使用YOLO（You Only Look Once）目标检测器，CLIP模型用于验证裁剪后的地标，触发下游导航。为了改进基准的导航安全性，我们提出了ASMA - 一种自适应安全边缘算法 - 通过裁剪无人机的深度图来跟踪移动对象，以便在飞行中执行基于场景的CBF评估。通过识别场景中潜在危险的观察结果，ASMA使得实时适应不可预测的环境条件成为可能，确保在VLN动力无人机行动中获得最佳安全边界。在gazebo环境中使用ROS（机器人操作系统）中间件在鹦鹉Bebop2四轴飞行器上，与基准CBF-less VLN相比，ASMA在成功率上提供了59.4% - 61.8%的增长，轨迹长度增加了5.4% - 8.2%，同时可以从不安全的情况中恢复。

更新时间: 2024-09-16 13:44:50

领域: cs.RO,cs.AI,cs.SY,eess.IV,eess.SY

下载: http://arxiv.org/abs/2409.10283v1

DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis

Audio-driven talking head synthesis strives to generate lifelike video portraits from provided audio. The diffusion model, recognized for its superior quality and robust generalization, has been explored for this task. However, establishing a robust correspondence between temporal audio cues and corresponding spatial facial expressions with diffusion models remains a significant challenge in talking head generation. To bridge this gap, we present DreamHead, a hierarchical diffusion framework that learns spatial-temporal correspondences in talking head synthesis without compromising the model's intrinsic quality and adaptability.~DreamHead learns to predict dense facial landmarks from audios as intermediate signals to model the spatial and temporal correspondences.~Specifically, a first hierarchy of audio-to-landmark diffusion is first designed to predict temporally smooth and accurate landmark sequences given audio sequence signals. Then, a second hierarchy of landmark-to-image diffusion is further proposed to produce spatially consistent facial portrait videos, by modeling spatial correspondences between the dense facial landmark and appearance. Extensive experiments show that proposed DreamHead can effectively learn spatial-temporal consistency with the designed hierarchical diffusion and produce high-fidelity audio-driven talking head videos for multiple identities.

Updated: 2024-09-16 13:44:20

标题: DreamHead：通过Hierarchical Diffusion 学习音频驱动的说唱头部合成的时空对应关系

摘要: 音频驱动的说话头部合成旨在从提供的音频生成逼真的视频肖像。扩散模型以其出色的质量和稳健的泛化能力而闻名，已被用于这一任务。然而，在说话头部生成中，建立时间音频提示与扩散模型中相应的空间面部表情之间的稳健对应仍然是一个重要挑战。为了弥合这一差距，我们提出了DreamHead，一个层次扩散框架，它学习了说话头部合成中的空间-时间对应关系，而不会影响模型的固有质量和适应性。DreamHead学习从音频预测密集的面部标志作为中间信号，以建模空间和时间对应关系。具体来说，首先设计了一个音频到标志扩散的第一层次，以预测在给定音频序列信号的情况下具有时间平滑和准确的标志序列。然后，进一步提出了一个标志到图像扩散的第二层次，通过建模密集的面部标志和外观之间的空间对应关系，产生空间一致的面部肖像视频。大量实验表明，提出的DreamHead可以有效地通过设计的层次扩散学习空间-时间一致性，并为多个身份生成高保真度的音频驱动的说话头部视频。

更新时间: 2024-09-16 13:44:20

领域: cs.MM,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2409.10281v1

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

We introduce Cognitive Kernel, an open-source agent system towards the goal of generalist autopilots. Unlike copilot systems, which primarily rely on users to provide essential state information (e.g., task descriptions) and assist users by answering questions or auto-completing contents, autopilot systems must complete tasks from start to finish independently, which requires the system to acquire the state information from the environments actively. To achieve this, an autopilot system should be capable of understanding user intents, actively gathering necessary information from various real-world sources, and making wise decisions. Cognitive Kernel adopts a model-centric design. In our implementation, the central policy model (a fine-tuned LLM) initiates interactions with the environment using a combination of atomic actions, such as opening files, clicking buttons, saving intermediate results to memory, or calling the LLM itself. This differs from the widely used environment-centric design, where a task-specific environment with predefined actions is fixed, and the policy model is limited to selecting the correct action from a given set of options. Our design facilitates seamless information flow across various sources and provides greater flexibility. We evaluate our system in three use cases: real-time information management, private information management, and long-term memory management. The results demonstrate that Cognitive Kernel achieves better or comparable performance to other closed-source systems in these scenarios. Cognitive Kernel is fully dockerized, ensuring everyone can deploy it privately and securely. We open-source the system and the backbone model to encourage further research on LLM-driven autopilot systems.

Updated: 2024-09-16 13:39:05

标题: 认知内核：面向通用自动驾驶系统的开源智能体系统

摘要: 我们介绍了认知核心，这是一个旨在实现通用自动驾驶系统目标的开源代理系统。与副驾驶系统不同，后者主要依赖用户提供基本状态信息（例如任务描述），并通过回答问题或自动完成内容来帮助用户，自动驾驶系统必须独立完成任务，从头到尾，这要求系统主动从环境中获取状态信息。为了实现这一目标，自动驾驶系统应该能够理解用户意图，主动从各种现实世界来源收集必要信息，并做出明智决策。认知核心采用了基于模型的设计。在我们的实现中，中央策略模型（一个经过微调的LLM）使用诸如打开文件、点击按钮、将中间结果保存到内存或调用LLM本身等原子操作的组合与环境进行交互。这与广泛使用的以环境为中心的设计不同，后者固定了具有预定义操作的特定任务环境，并且策略模型仅限于从给定的选项集中选择正确的操作。我们的设计促进了各种来源之间的无缝信息流，并提供了更大的灵活性。我们在三个用例中评估了我们的系统：实时信息管理，私人信息管理和长期记忆管理。结果表明，认知核心在这些场景中实现了与其他闭源系统相比更好或可比的性能。认知核心完全docker化，确保每个人都可以私密且安全地部署它。我们开源了系统和骨干模型，以鼓励对基于LLM驱动的自动驾驶系统进行进一步研究。

更新时间: 2024-09-16 13:39:05

领域: cs.AI

下载: http://arxiv.org/abs/2409.10277v1

Maximum Mean Discrepancy on Exponential Windows for Online Change Detection

Detecting changes is of fundamental importance when analyzing data streams and has many applications, e.g., in predictive maintenance, fraud detection, or medicine. A principled approach to detect changes is to compare the distributions of observations within the stream to each other via hypothesis testing. Maximum mean discrepancy (MMD), a (semi-)metric on the space of probability distributions, provides powerful non-parametric two-sample tests on kernel-enriched domains. In particular, MMD is able to detect any disparity between distributions under mild conditions. However, classical MMD estimators suffer from a quadratic runtime complexity, which renders their direct use for change detection in data streams impractical. In this article, we propose a new change detection algorithm, called Maximum Mean Discrepancy on Exponential Windows (MMDEW), that combines the benefits of MMD with an efficient computation based on exponential windows. We prove that MMDEW enjoys polylogarithmic runtime and logarithmic memory complexity and show empirically that it outperforms the state of the art on benchmark data streams.

Updated: 2024-09-16 13:36:34

标题: 指数窗口上的最大均值差异用于在线变化检测

摘要: 检测变化在分析数据流时具有根本重要性，并在许多应用中发挥作用，例如在预测性维护、欺诈检测或医学领域。一种检测变化的原则性方法是通过假设检验比较数据流中的观测分布。最大均值差异（MMD）是概率分布空间上的（半）度量，在核丰富的领域上提供强大的非参数双样本检验。特别地，MMD能够在温和条件下检测分布之间的任何差异。然而，经典的MMD估计器受到二次运行时复杂性的影响，这使得它们直接用于数据流中的变化检测变得不切实际。在本文中，我们提出了一种新的变化检测算法，称为指数窗口上的最大均值差异（MMDEW），它结合了MMD的优点，并基于指数窗口进行高效计算。我们证明了MMDEW具有多对数级的运行时和对数级的内存复杂性，并通过实证表明，在基准数据流上表现优于现有技术水平。

更新时间: 2024-09-16 13:36:34

领域: cs.LG,I.2.6; H.1.1

下载: http://arxiv.org/abs/2205.12706v3

Causal Discovery in Recommender Systems: Example and Discussion

Causality is receiving increasing attention by the artificial intelligence and machine learning communities. This paper gives an example of modelling a recommender system problem using causal graphs. Specifically, we approached the causal discovery task to learn a causal graph by combining observational data from an open-source dataset with prior knowledge. The resulting causal graph shows that only a few variables effectively influence the analysed feedback signals. This contrasts with the recent trend in the machine learning community to include more and more variables in massive models, such as neural networks.

Updated: 2024-09-16 13:31:04

标题: 在推荐系统中的因果发现：案例和讨论

摘要: 因果关系正在受到人工智能和机器学习领域越来越多的关注。本文以建模推荐系统问题使用因果图为例。具体来说，我们通过结合开源数据集的观测数据和先验知识来进行因果发现任务，学习一个因果图。结果显示，只有少数变量有效地影响分析的反馈信号。这与机器学习社区最近倾向于在庞大模型中包含越来越多变量（如神经网络）的趋势形成鲜明对比。

更新时间: 2024-09-16 13:31:04

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2409.10271v1

BAFNet: Bilateral Attention Fusion Network for Lightweight Semantic Segmentation of Urban Remote Sensing Images

Large-scale semantic segmentation networks often achieve high performance, while their application can be challenging when faced with limited sample sizes and computational resources. In scenarios with restricted network size and computational complexity, models encounter significant challenges in capturing long-range dependencies and recovering detailed information in images. We propose a lightweight bilateral semantic segmentation network called bilateral attention fusion network (BAFNet) to efficiently segment high-resolution urban remote sensing images. The model consists of two paths, namely dependency path and remote-local path. The dependency path utilizes large kernel attention to acquire long-range dependencies in the image. Besides, multi-scale local attention and efficient remote attention are designed to construct remote-local path. Finally, a feature aggregation module is designed to effectively utilize the different features of the two paths. Our proposed method was tested on public high-resolution urban remote sensing datasets Vaihingen and Potsdam, with mIoU reaching 83.20% and 86.53%, respectively. As a lightweight semantic segmentation model, BAFNet not only outperforms advanced lightweight models in accuracy but also demonstrates comparable performance to non-lightweight state-of-the-art methods on two datasets, despite a tenfold variance in floating-point operations and a fifteenfold difference in network parameters.

Updated: 2024-09-16 13:25:42

标题: BAFNet：用于城市遥感图像轻量级语义分割的双向注意力融合网络

摘要: 大规模语义分割网络通常能够取得很高的性能，但当面临有限的样本量和计算资源时，它们的应用可能会面临挑战。在网络尺寸和计算复杂性受限的场景中，模型在捕捉长距离依赖关系和恢复图像中的详细信息方面会遇到重大挑战。我们提出了一种轻量级的双边语义分割网络，称为双边注意融合网络（BAFNet），以高效地分割高分辨率的城市遥感图像。该模型由两个路径组成，即依赖路径和远程-本地路径。依赖路径利用大核心注意力来获取图像中的长距离依赖关系。此外，设计了多尺度本地注意力和高效的远程注意力来构建远程-本地路径。最后，设计了一个特征聚合模块，以有效地利用两个路径的不同特征。我们的方法在公开的高分辨率城市遥感数据集Vaihingen和Potsdam上进行了测试，分别达到了83.20％和86.53％的mIoU。作为一种轻量级的语义分割模型，BAFNet不仅在准确性上优于先进的轻量级模型，而且在两个数据集上展示了与非轻量级最先进方法相当的性能，尽管在浮点运算上存在十倍差异和网络参数上存在十五倍差异。

更新时间: 2024-09-16 13:25:42

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.10269v1

Aligning Machine and Human Visual Representations across Abstraction Levels

Deep neural networks have achieved success across a wide range of applications, including as models of human behavior in vision tasks. However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do, raising questions regarding the similarity of their underlying representations. What is missing for modern learning systems to exhibit more human-like behavior? We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-like structure from its representations into pretrained state-of-the-art vision foundation models. These human-aligned models more accurately approximate human behavior and uncertainty across a wide range of similarity tasks, including a new dataset of human judgments spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognition and more practically useful, thus paving the way toward more robust, interpretable, and human-like artificial intelligence systems.

Updated: 2024-09-16 13:22:16

标题: 调整机器和人类视觉表征在不同抽象层次上的一致性

摘要: 深度神经网络在各种应用中取得成功，包括在视觉任务中作为人类行为模型。然而，神经网络训练和人类学习在根本上有所不同，神经网络往往无法像人类那样具有强大的泛化能力，这引发了对它们底层表示相似性的质疑。现代学习系统要表现出更类似人类行为，还缺少什么？我们指出视觉模型和人类之间的一个关键不一致性：人类的概念知识是从细粒度到粗粒度的层次结构组织，而模型的表示并未准确捕捉所有这些抽象级别。为了解决这种不匹配，我们首先训练一个教师模型来模仿人类判断，然后将人类类似结构从其表示中转移到预训练的最先进的视觉基础模型中。这些与人类对齐的模型更准确地近似了人类行为和不确定性，在包括涵盖多个语义抽象层次的新数据集在内的各种相似性任务中。它们还在各种机器学习任务中表现更好，提高了泛化能力和超出分布的鲁棒性。因此，为神经网络注入额外的人类知识产生了既符合人类认知又更实用的最佳表示，为更强大、可解释和类似人类的人工智能系统铺平了道路。

更新时间: 2024-09-16 13:22:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.06509v2

Enhancing Personalized Recipe Recommendation Through Multi-Class Classification

This paper intends to address the challenge of personalized recipe recommendation in the realm of diverse culinary preferences. The problem domain involves recipe recommendations, utilizing techniques such as association analysis and classification. Association analysis explores the relationships and connections between different ingredients to enhance the user experience. Meanwhile, the classification aspect involves categorizing recipes based on user-defined ingredients and preferences. A unique aspect of the paper is the consideration of recipes and ingredients belonging to multiple classes, recognizing the complexity of culinary combinations. This necessitates a sophisticated approach to classification and recommendation, ensuring the system accommodates the nature of recipe categorization. The paper seeks not only to recommend recipes but also to explore the process involved in achieving accurate and personalized recommendations.

Updated: 2024-09-16 13:21:09

标题: 通过多类分类提升个性化食谱推荐

摘要: 这篇论文旨在解决个性化食谱推荐在不同烹饪偏好领域中的挑战。问题领域涉及食谱推荐，利用关联分析和分类等技术。关联分析探索不同食材之间的关系和连接，以增强用户体验。同时，分类方面涉及根据用户定义的食材和偏好对食谱进行分类。这篇论文的独特之处在于考虑到食谱和食材属于多个类别，认识到烹饪组合的复杂性。这需要一种复杂的分类和推荐方法，确保系统适应食谱分类的本质。该论文不仅旨在推荐食谱，还要探讨实现准确和个性化推荐所涉及的过程。

更新时间: 2024-09-16 13:21:09

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.10267v1

Hierarchical Graph Pooling Based on Minimum Description Length

Graph pooling is an essential part of deep graph representation learning. We introduce MapEqPool, a principled pooling operator that takes the inherent hierarchical structure of real-world graphs into account. MapEqPool builds on the map equation, an information-theoretic objective function for community detection based on the minimum description length principle which naturally implements Occam's razor and balances between model complexity and fit. We demonstrate MapEqPool's competitive performance with an empirical comparison against various baselines across standard graph classification datasets.

Updated: 2024-09-16 13:13:15

标题: 基于最小描述长度的分层图池化

摘要: 图池是深度图表示学习的重要部分。我们引入了MapEqPool，这是一个考虑到现实世界图的固有层次结构的原则性池操作符。MapEqPool建立在地图方程之上，这是一个基于最小描述长度原则的信息论客观函数，用于社区检测，自然地实现奥卡姆剃刀，并在模型复杂性和拟合之间取得平衡。我们通过对标准图分类数据集进行实证比较，展示了MapEqPool的竞争性表现。

更新时间: 2024-09-16 13:13:15

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2409.10263v1

Self-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real-World Settings

The recent emergence of Distributed Acoustic Sensing (DAS) technology has facilitated the effective capture of traffic-induced seismic data. The traffic-induced seismic wave is a prominent contributor to urban vibrations and contain crucial information to advance urban exploration and governance. However, identifying vehicular movements within massive noisy data poses a significant challenge. In this study, we introduce a real-time semi-supervised vehicle monitoring framework tailored to urban settings. It requires only a small fraction of manual labels for initial training and exploits unlabeled data for model improvement. Additionally, the framework can autonomously adapt to newly collected unlabeled data. Before DAS data undergo object detection as two-dimensional images to preserve spatial information, we leveraged comprehensive one-dimensional signal preprocessing to mitigate noise. Furthermore, we propose a novel prior loss that incorporates the shapes of vehicular traces to track a single vehicle with varying speeds. To evaluate our model, we conducted experiments with seismic data from the Stanford 2 DAS Array. The results showed that our model outperformed the baseline model Efficient Teacher and its supervised counterpart, YOLO (You Only Look Once), in both accuracy and robustness. With only 35 labeled images, our model surpassed YOLO's mAP 0.5:0.95 criterion by 18% and showed a 7% increase over Efficient Teacher. We conducted comparative experiments with multiple update strategies for self-updating and identified an optimal approach. This approach surpasses the performance of non-overfitting training conducted with all data in a single pass.

Updated: 2024-09-16 13:10:58

标题: 采用分布式声学传感实现自更新的车辆监测框架，面向真实世界环境

摘要: 最近出现的分布式声学传感（DAS）技术促进了交通引起的地震数据的有效捕获。交通引起的地震波是城市振动的显著贡献者，包含推动城市探索和治理的关键信息。然而，在庞大嘈杂的数据中识别车辆运动是一个重大挑战。在本研究中，我们介绍了一个针对城市环境量身定制的实时半监督车辆监测框架。它仅需要少量手动标签进行初始训练，并利用未标记数据进行模型改进。此外，该框架可以自主适应新收集的未标记数据。在DAS数据经过目标检测处理成二维图像以保留空间信息之前，我们利用全面的一维信号预处理来减轻噪音。此外，我们提出了一种新颖的先验损失，将车辆轨迹的形状纳入考虑以跟踪速度变化的单个车辆。为了评估我们的模型，我们使用了来自斯坦福2DAS阵列的地震数据进行实验。结果显示，我们的模型在准确性和鲁棒性方面均优于基线模型Efficient Teacher及其监督对应物YOLO（You Only Look Once）。仅使用35张标记图像，我们的模型在YOLO的mAP 0.5:0.95标准上超过了18%，并比Efficient Teacher提高了7%。我们使用多种更新策略进行了自更新的比较实验，并确定了最佳方法。这种方法超越了在单次传递中使用所有数据进行非过度拟合训练的性能。

更新时间: 2024-09-16 13:10:58

领域: physics.geo-ph,cs.CV,cs.LG,eess.SP

下载: http://arxiv.org/abs/2409.10259v1

FGR-Net:Interpretable fundus imagegradeability classification based on deepreconstruction learning

The performance of diagnostic Computer-Aided Design (CAD) systems for retinal diseases depends on the quality of the retinal images being screened. Thus, many studies have been developed to evaluate and assess the quality of such retinal images. However, most of them did not investigate the relationship between the accuracy of the developed models and the quality of the visualization of interpretability methods for distinguishing between gradable and non-gradable retinal images. Consequently, this paper presents a novel framework called FGR-Net to automatically assess and interpret underlying fundus image quality by merging an autoencoder network with a classifier network. The FGR-Net model also provides an interpretable quality assessment through visualizations. In particular, FGR-Net uses a deep autoencoder to reconstruct the input image in order to extract the visual characteristics of the input fundus images based on self-supervised learning. The extracted features by the autoencoder are then fed into a deep classifier network to distinguish between gradable and ungradable fundus images. FGR-Net is evaluated with different interpretability methods, which indicates that the autoencoder is a key factor in forcing the classifier to focus on the relevant structures of the fundus images, such as the fovea, optic disk, and prominent blood vessels. Additionally, the interpretability methods can provide visual feedback for ophthalmologists to understand how our model evaluates the quality of fundus images. The experimental results showed the superiority of FGR-Net over the state-of-the-art quality assessment methods, with an accuracy of 89% and an F1-score of 87%.

Updated: 2024-09-16 12:56:23

标题: FGR-Net：基于深度重建学习的可解释眼底图像质量分类

摘要: 诊断性计算机辅助设计（CAD）系统对视网膜疾病的性能取决于筛查的视网膜图像的质量。因此，许多研究已经开展以评估和评估这些视网膜图像的质量。然而，大多数研究并未调查开发模型的准确性与可分辨出可分级和不可分级视网膜图像之间的可视化方法的质量之间的关系。因此，本文提出了一个称为FGR-Net的新框架，通过将自动编码器网络与分类器网络相结合，自动评估和解释潜在的眼底图像质量。FGR-Net模型还通过可视化提供可解释的质量评估。特别地，FGR-Net使用深度自动编码器重建输入图像，以基于自监督学习提取输入眼底图像的视觉特征。然后，自动编码器提取的特征被馈入深度分类器网络，以区分可分级和不可分级的眼底图像。FGR-Net通过不同的可解释性方法进行评估，表明自动编码器是迫使分类器专注于眼底图像的相关结构（如黄斑、视盘和突出的血管）的关键因素。此外，可解释性方法可以为眼科医生提供视觉反馈，以了解我们的模型如何评估眼底图像的质量。实验结果显示，FGR-Net优于最先进的质量评估方法，准确度为89%，F1分数为87%。

更新时间: 2024-09-16 12:56:23

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.10246v1

Generative neural networks for characteristic functions

We provide a simulation algorithm to simulate from a (multivariate) characteristic function, which is only accessible in a black-box format. The method is based on a generative neural network, whose loss function exploits a specific representation of the Maximum-Mean-Discrepancy metric to directly incorporate the targeted characteristic function. The algorithm is universal in the sense that it is independent of the dimension and that it does not require any assumptions on the given characteristic function. Furthermore, finite sample guarantees on the approximation quality in terms of the Maximum-Mean Discrepancy metric are derived. The method is illustrated in a simulation study.

Updated: 2024-09-16 12:48:34

标题: 生成神经网络用于特征函数

摘要: 我们提供了一个模拟算法，用于从一个（多变量）特征函数中模拟，该特征函数仅以黑盒格式可访问。该方法基于一个生成式神经网络，其损失函数利用最大均差距度量的特定表示，直接将目标特征函数纳入其中。该算法在实际意义上是普适的，因为它独立于维度，并且不需要对给定特征函数做任何假设。此外，通过最大均差距度量得出了关于逼近质量的有限样本保证。该方法在一个模拟研究中进行了说明。

更新时间: 2024-09-16 12:48:34

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2401.04778v2

The Role of Deep Learning Regularizations on Actors in Offline RL

Deep learning regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities. However, in the domain of Reinforcement Learning (RL), the application of these techniques has been limited, usually applied to value function estimators, and may result in detrimental effects. This issue is even more pronounced in offline RL settings, which bear greater similarity to supervised learning but have received less attention. Recent work in continuous offline RL has demonstrated that while we can build sufficiently powerful critic networks, the generalization of actor networks remains a bottleneck. In this study, we empirically show that applying standard regularization techniques to actor networks in offline RL actor-critic algorithms yields improvements of 6% on average across two algorithms and three different continuous D4RL domains.

Updated: 2024-09-16 12:45:07

标题: 深度学习正则化在离线强化学习中演员角色的作用

摘要: 深度学习正则化技术，如dropout、层归一化或权重衰减，在构建现代人工神经网络中被广泛采用，通常导致更稳健的训练过程和改进的泛化能力。然而，在强化学习（RL）领域，这些技术的应用受到限制，通常应用于值函数估计器，并可能产生不利影响。这个问题在离线RL设置中尤为突出，其与监督学习更为相似但却受到较少关注。最近在连续离线RL领域的研究表明，尽管我们可以构建足够强大的评论网络，但演员网络的泛化仍然是一个瓶颈。在这项研究中，我们通过实验证明，在离线RL演员-评论算法中应用标准正则化技术，平均可使两种算法在三个不同的连续D4RL领域上提升6%。

更新时间: 2024-09-16 12:45:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.07606v2

Hedging Is Not All You Need: A Simple Baseline for Online Learning Under Haphazard Inputs

Handling haphazard streaming data, such as data from edge devices, presents a challenging problem. Over time, the incoming data becomes inconsistent, with missing, faulty, or new inputs reappearing. Therefore, it requires models that are reliable. Recent methods to solve this problem depend on a hedging-based solution and require specialized elements like auxiliary dropouts, forked architectures, and intricate network design. We observed that hedging can be reduced to a special case of weighted residual connection; this motivated us to approximate it with plain self-attention. In this work, we propose HapNet, a simple baseline that is scalable, does not require online backpropagation, and is adaptable to varying input types. All present methods are restricted to scaling with a fixed window; however, we introduce a more complex problem of scaling with a variable window where the data becomes positionally uncorrelated, and cannot be addressed by present methods. We demonstrate that a variant of the proposed approach can work even for this complex scenario. We extensively evaluated the proposed approach on five benchmarks and found competitive performance.

Updated: 2024-09-16 12:45:03

标题: 对冲不是你所需要的全部：一个简单的基线用于在偶然输入下的在线学习

摘要: 处理随机流数据，例如来自边缘设备的数据，是一个具有挑战性的问题。随着时间的推移，传入的数据变得不一致，出现缺失、故障或新的输入。因此，需要可靠的模型。最近解决这个问题的方法依赖于基于对冲的解决方案，并需要像辅助退出、分叉架构和复杂网络设计等专门的元素。我们观察到对冲可以简化为加权残差连接的特殊情况；这激发我们用普通的自注意力来近似它。在这项工作中，我们提出了HapNet，这是一个可扩展的简单基准，不需要在线反向传播，并且适应不同的输入类型。所有现有方法都受限于固定窗口的扩展；然而，我们引入了一个更复杂的问题，即通过可变窗口扩展，其中数据变得位置不相关，无法通过现有方法解决。我们展示了所提出方法的一个变体甚至可以适用于这种复杂情况。我们在五个基准测试中对所提出的方法进行了广泛评估，并发现了具有竞争力的性能。

更新时间: 2024-09-16 12:45:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.10242v1

CoReEcho: Continuous Representation Learning for 2D+time Echocardiography Analysis

Deep learning (DL) models have been advancing automatic medical image analysis on various modalities, including echocardiography, by offering a comprehensive end-to-end training pipeline. This approach enables DL models to regress ejection fraction (EF) directly from 2D+time echocardiograms, resulting in superior performance. However, the end-to-end training pipeline makes the learned representations less explainable. The representations may also fail to capture the continuous relation among echocardiogram clips, indicating the existence of spurious correlations, which can negatively affect the generalization. To mitigate this issue, we propose CoReEcho, a novel training framework emphasizing continuous representations tailored for direct EF regression. Our extensive experiments demonstrate that CoReEcho: 1) outperforms the current state-of-the-art (SOTA) on the largest echocardiography dataset (EchoNet-Dynamic) with MAE of 3.90 & R2 of 82.44, and 2) provides robust and generalizable features that transfer more effectively in related downstream tasks. The code is publicly available at https://github.com/fadamsyah/CoReEcho.

Updated: 2024-09-16 12:42:47

标题: CoReEcho：用于2D+时间超声心动图分析的连续表示学习

摘要: 深度学习（DL）模型已经在各种模态下推进了自动医学图像分析，包括超声心动图，通过提供一个全面的端到端训练管道。这种方法使DL模型能够直接从2D+时间超声心动图中回归射血分数（EF），从而实现了更优异的性能。然而，端到端训练管道使得学习到的表示更不可解释。这些表示也可能无法捕捉超声心动图剪辑之间的连续关系，表明存在虚假相关性，这可能会对泛化产生负面影响。为了缓解这个问题，我们提出了CoReEcho，这是一个强调为直接EF回归量身定制的连续表示的新型训练框架。我们的广泛实验证明，CoReEcho：1）在最大的超声心动图数据集（EchoNet-Dynamic）上表现优于当前最先进技术（SOTA），MAE为3.90，R2为82.44；2）提供了更强大和可泛化的特征，可以更有效地在相关的下游任务中转移。代码公开可在https://github.com/fadamsyah/CoReEcho获取。

更新时间: 2024-09-16 12:42:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.10164v2

Exploring Loss Landscapes through the Lens of Spin Glass Theory

In the past decade, significant strides in deep learning have led to numerous groundbreaking applications. Despite these advancements, the understanding of the high generalizability of deep learning, especially in such an over-parametrized space, remains limited. For instance, in deep neural networks (DNNs), their internal representations, decision-making mechanism, absence of overfitting in an over-parametrized space, superior generalizability, etc., remain less understood. Successful applications are often considered as empirical rather than scientific achievement. This paper delves into the loss landscape of DNNs through the lens of spin glass in statistical physics, a system characterized by a complex energy landscape with numerous metastable states, as a novel perspective in understanding how DNNs work. We investigated the loss landscape of single hidden layer neural networks activated by Rectified Linear Unit (ReLU) function, and introduced several protocols to examine the analogy between DNNs and spin glass. Specifically, we used (1) random walk in the parameter space of DNNs to unravel the structures in their loss landscape; (2) a permutation-interpolation protocol to study the connection between copies of identical regions in the loss landscape due to the permutation symmetry in the hidden layers; (3) hierarchical clustering to reveal the hierarchy among trained solutions of DNNs, reminiscent of the so-called Replica Symmetry Breaking (RSB) phenomenon (i.e. the Parisi solution) in spin glass; (4) finally, we examine the relationship between the ruggedness of DNN's loss landscape and its generalizability, showing an improvement of flattened minima.

Updated: 2024-09-16 12:39:33

标题: 通过自旋玻璃理论的视角探索损失景观

摘要: 在过去的十年中，深度学习取得了重大进展，导致了许多开创性的应用。尽管取得了这些进展，对于深度学习的高泛化性的理解，尤其是在这样一个过度参数化的空间中，仍然有限。例如，在深度神经网络（DNNs）中，它们的内部表示、决策机制、在过度参数化空间中的过拟合缺失、优越的泛化性等方面仍不太清楚。成功的应用通常被认为是经验性的，而不是科学性的成就。本文通过统计物理中自旋玻璃的视角深入探讨了DNNs的损失景观，自旋玻璃是一种具有众多亚稳态的复杂能量景观的系统，这是一种新颖的理解DNNs工作方式的视角。我们研究了由修正线性单元（ReLU）函数激活的单隐藏层神经网络的损失景观，并引入了几种协议来检查DNNs和自旋玻璃之间的类比。具体而言，我们使用（1）在DNNs参数空间中的随机漫步来揭示其损失景观中的结构；（2）一个置换插值协议来研究由于隐藏层中的置换对称性而导致损失景观中相同区域副本之间的连接；（3）层次聚类来揭示DNNs训练解决方案之间的层次结构，类似于自旋玻璃中所谓的重复对称性破缺（即帕里西解）现象；（4）最后，我们检验了DNNs损失景观的崎岖程度与其泛化性之间的关系，显示了平坦最小值的改善。

更新时间: 2024-09-16 12:39:33

领域: cond-mat.dis-nn,cs.AI

下载: http://arxiv.org/abs/2407.20724v2

Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python

In this work, we present \skfp, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines. It is also highly optimized, featuring parallel computation that enables efficient processing of large molecular datasets. Currently, \skfp~stands as the most feature-rich library in the open source Python ecosystem, offering over 30 molecular fingerprints. Our library simplifies chemoinformatics tasks based on molecular fingerprints, including molecular property prediction and virtual screening. It is also flexible, highly efficient, and fully open source.

Updated: 2024-09-16 12:34:52

标题: Scikit-fingerprints: 在Python中轻松高效地计算分子指纹

摘要: 在这项工作中，我们介绍了\skfp，这是一个用于在化学信息学应用中计算分子指纹的Python包。我们的库提供了一个符合行业标准的scikit-learn接口，使得使用直观简单，并且易于与机器学习流程集成。它还经过高度优化，具有并行计算功能，能够高效处理大型分子数据集。目前，\skfp~是开源Python生态系统中最功能丰富的库，提供超过30种分子指纹。我们的库简化了基于分子指纹的化学信息学任务，包括分子属性预测和虚拟筛选。它还具有灵活性、高效性，并且完全开源。

更新时间: 2024-09-16 12:34:52

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2407.13291v2

Causal Learning in Biomedical Applications: A Benchmark

Learning causal relationships between a set of variables is a challenging problem in computer science. Many existing artificial benchmark datasets are based on sampling from causal models and thus contain residual information that the ${R} ^2$-sortability can identify. Here, we present a benchmark for methods in causal learning using time series. The presented dataset is not ${R}^2$-sortable and is based on a real-world scenario of the Krebs cycle that is used in cells to release energy. We provide four scenarios of learning, including short and long time series, and provide guidance so that testing is unified between possible users.

Updated: 2024-09-16 12:29:26

标题: 生物医学应用中的因果学习：一个基准

摘要: 学习一组变量之间的因果关系是计算机科学中的一个具有挑战性的问题。许多现有的人工基准数据集都是基于从因果模型中采样得到的，因此包含了${R}^2$-可排序性可以识别的残留信息。在这里，我们提出了一个使用时间序列进行因果学习方法的基准数据集。所呈现的数据集不是${R}^2$-可排序的，基于细胞中用于释放能量的克雷布斯循环的真实场景。我们提供了四种学习场景，包括短期和长期时间序列，并提供指导，以便测试在可能的用户之间统一。

更新时间: 2024-09-16 12:29:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.15189v2

Privacy-Preserving Distributed Maximum Consensus Without Accuracy Loss

In distributed networks, calculating the maximum element is a fundamental task in data analysis, known as the distributed maximum consensus problem. However, the sensitive nature of the data involved makes privacy protection essential. Despite its importance, privacy in distributed maximum consensus has received limited attention in the literature. Traditional privacy-preserving methods typically add noise to updates, degrading the accuracy of the final result. To overcome these limitations, we propose a novel distributed optimization-based approach that preserves privacy without sacrificing accuracy. Our method introduces virtual nodes to form an augmented graph and leverages a carefully designed initialization process to ensure the privacy of honest participants, even when all their neighboring nodes are dishonest. Through a comprehensive information-theoretical analysis, we derive a sufficient condition to protect private data against both passive and eavesdropping adversaries. Extensive experiments validate the effectiveness of our approach, demonstrating that it not only preserves perfect privacy but also maintains accuracy, outperforming existing noise-based methods that typically suffer from accuracy loss.

Updated: 2024-09-16 12:21:04

标题: 隐私保护的分布式最大共识无精度损失

摘要: 在分布式网络中，计算最大元素是数据分析中的一个基本任务，被称为分布式最大共识问题。然而，涉及的数据的敏感性使隐私保护至关重要。尽管其重要性，分布式最大共识中的隐私保护在文献中受到了有限的关注。传统的隐私保护方法通常向更新添加噪声，降低了最终结果的准确性。为了克服这些限制，我们提出了一种新颖的基于分布式优化的方法，既保护隐私又不牺牲准确性。我们的方法引入虚拟节点以形成增强图，并利用精心设计的初始化过程来确保诚实参与者的隐私，即使他们所有相邻节点都是不诚实的。通过全面的信息理论分析，我们得出了保护私人数据免受被动和窃听对手的充分条件。大量实验证实了我们方法的有效性，表明它不仅保持了完美的隐私，还保持了准确性，优于现有常常因准确性损失而表现不佳的基于噪声的方法。

更新时间: 2024-09-16 12:21:04

领域: cs.DC,cs.CR,cs.IT,eess.SP,math.IT

下载: http://arxiv.org/abs/2409.10226v1

Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies

Pruning neural networks (NNs) can streamline them but risks removing vital parameters from safe reinforcement learning (RL) policies. We introduce an interpretable RL method called VERINTER, which combines NN pruning with model checking to ensure interpretable RL safety. VERINTER exactly quantifies the effects of pruning and the impact of neural connections on complex safety properties by analyzing changes in safety measurements. This method maintains safety in pruned RL policies and enhances understanding of their safety dynamics, which has proven effective in multiple RL settings.

Updated: 2024-09-16 12:13:41

标题: 基于安全性的修剪和解释强化学习策略

摘要: 修剪神经网络（NNs）可以简化它们，但存在风险，可能会从安全强化学习（RL）策略中删除关键参数。我们引入了一种名为VERINTER的可解释的RL方法，该方法将NN修剪与模型检查结合起来，以确保可解释的RL安全性。VERINTER通过分析安全测量的变化，精确量化修剪的影响和神经连接对复杂安全属性的影响。这种方法在修剪RL策略中保持安全性，并增强了对其安全动态的理解，在多个RL设置中证明有效。

更新时间: 2024-09-16 12:13:41

领域: cs.LG

下载: http://arxiv.org/abs/2409.10218v1

Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations

The advent of Large Language Models (LLMs) has created new opportunities for the automation of scientific research, spanning both experimental processes and computational simulations. This study explores the feasibility of constructing an autonomous simulation agent (ASA) powered by LLM, through sophisticated API integration, to automate the entire research process, from experimental design, remote upload and simulation execution, data analysis, to report compilation. Using a simulation problem of polymer chain conformations as a case study, we assessed the performance of ASAs powered by different LLMs including GPT-4-Turbo. Our findings revealed that ASA-GPT-4o achieved near-flawless execution on designated research missions, underscoring the potential of LLMs to manage complete scientific investigations autonomously. The outlined automation can be iteratively performed up to twenty cycles without human intervention, illustrating the potential of LLMs for large-scale autonomous research endeavors. Additionally, we discussed the intrinsic traits of ASAs in managing extensive tasks, focusing on self-validation mechanisms and the balance between local attention and global oversight.

Updated: 2024-09-16 12:02:27

标题: 朝向由LLMs支持的完全自主研究：仿真案例研究

摘要: 大型语言模型（LLMs）的出现为科学研究的自动化提供了新的机会，涵盖了实验过程和计算模拟。本研究探讨了通过复杂的API集成构建由LLM驱动的自主模拟代理（ASA）的可行性，以自动化整个研究过程，从实验设计、远程上传和模拟执行、数据分析到报告编制。以聚合物链构象模拟问题为案例研究，评估了由不同LLM驱动的ASA（包括GPT-4-Turbo）的性能。我们的研究结果显示，ASA-GPT-4o在指定的研究任务上实现了接近完美的执行，突显了LLMs管理完整科学调查的潜力。所述的自动化可以在没有人类干预的情况下迭代执行多达二十个周期，展示了LLMs用于大规模自主研究工作的潜力。此外，我们讨论了ASA在处理复杂任务时的固有特质，重点关注自我验证机制和局部关注与全局监督之间的平衡。

更新时间: 2024-09-16 12:02:27

领域: cs.AI,cs.CL,physics.chem-ph

下载: http://arxiv.org/abs/2408.15512v2

Embedded Image-to-Image Translation for Efficient Sim-to-Real Transfer in Learning-based Robot-Assisted Soft Manipulation

Recent advances in robotic learning in simulation have shown impressive results in accelerating learning complex manipulation skills. However, the sim-to-real gap, caused by discrepancies between simulation and reality, poses significant challenges for the effective deployment of autonomous surgical systems. We propose a novel approach utilizing image translation models to mitigate domain mismatches and facilitate efficient robot skill learning in a simulated environment. Our method involves the use of contrastive unpaired Image-to-image translation, allowing for the acquisition of embedded representations from these transformed images. Subsequently, these embeddings are used to improve the efficiency of training surgical manipulation models. We conducted experiments to evaluate the performance of our approach, demonstrating that it significantly enhances task success rates and reduces the steps required for task completion compared to traditional methods. The results indicate that our proposed system effectively bridges the sim-to-real gap, providing a robust framework for advancing the autonomy of surgical robots in minimally invasive procedures.

Updated: 2024-09-16 11:55:06

标题: 嵌入式图像到图像翻译在基于学习的机器人辅助软操作中实现高效的从仿真到现实转移

摘要: 最近在模拟中的机器人学习取得了令人印象深刻的进展，加速了复杂操作技能的学习。然而，由于模拟和现实之间的差异引起的模拟到真实之间的差距，给自主外科系统的有效部署带来了重大挑战。我们提出了一种新颖的方法，利用图像翻译模型来缓解域不匹配问题，并促进在模拟环境中高效地学习机器人技能。我们的方法涉及使用对比无配对图像到图像翻译，允许从这些转换后的图像中获取嵌入式表示。随后，这些嵌入式表示被用来提高训练外科操作模型的效率。我们进行了实验来评估我们方法的性能，结果表明，与传统方法相比，它显著提高了任务成功率，并减少了完成任务所需的步骤。结果表明，我们提出的系统有效地弥合了模拟到真实之间的差距，为推进微创手术机器人在微创手术中的自主性提供了一个稳健的框架。

更新时间: 2024-09-16 11:55:06

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2409.10204v1

Efficient Milling Quality Prediction with Explainable Machine Learning

This paper presents an explainable machine learning (ML) approach for predicting surface roughness in milling. Utilizing a dataset from milling aluminum alloy 2017A, the study employs random forest regression models and feature importance techniques. The key contributions include developing ML models that accurately predict various roughness values and identifying redundant sensors, particularly those for measuring normal cutting force. Our experiments show that removing certain sensors can reduce costs without sacrificing predictive accuracy, highlighting the potential of explainable machine learning to improve cost-effectiveness in machining.

Updated: 2024-09-16 11:52:17

标题: 高效的可解释机器学习预测铣削质量

摘要: 本文提出了一种可解释的机器学习（ML）方法，用于预测铣削中的表面粗糙度。利用铣削铝合金2017A的数据集，研究采用了随机森林回归模型和特征重要性技术。主要贡献包括开发准确预测各种粗糙度数值的ML模型，并识别冗余传感器，特别是用于测量法向切削力的传感器。我们的实验表明，移除某些传感器可以降低成本而不牺牲预测精度，突显了可解释机器学习在加工中提高成本效益的潜力。

更新时间: 2024-09-16 11:52:17

领域: cs.LG

下载: http://arxiv.org/abs/2409.10203v1

NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions

This paper addresses the problem of autonomous UAV search missions, where a UAV must locate specific Entities of Interest (EOIs) within a time limit, based on brief descriptions in large, hazard-prone environments with keep-out zones. The UAV must perceive, reason, and make decisions with limited and uncertain information. We propose NEUSIS, a compositional neuro-symbolic system designed for interpretable UAV search and navigation in realistic scenarios. NEUSIS integrates neuro-symbolic visual perception, reasoning, and grounding (GRiD) to process raw sensory inputs, maintains a probabilistic world model for environment representation, and uses a hierarchical planning component (SNaC) for efficient path planning. Experimental results from simulated urban search missions using AirSim and Unreal Engine show that NEUSIS outperforms a state-of-the-art (SOTA) vision-language model and a SOTA search planning model in success rate, search efficiency, and 3D localization. These results demonstrate the effectiveness of our compositional neuro-symbolic approach in handling complex, real-world scenarios, making it a promising solution for autonomous UAV systems in search missions.

Updated: 2024-09-16 11:42:15

标题: NEUSIS：复杂无人机搜索任务中自主感知、推理和规划的组合神经符号框架

摘要: 本文讨论了自主 UAV 搜索任务的问题，其中 UAV 必须在规定时间内基于大型、易受危险环境和禁区的简要描述，定位特定的感兴趣实体（EOIs）。 UAV 必须在信息有限且不确定的情况下感知、推理和做出决策。我们提出了 NEUSIS，这是一个设计用于实际场景中可解释的 UAV 搜索和导航的组合神经符号系统。NEUSIS 整合了神经符号视觉感知、推理和基础（GRiD）来处理原始感官输入，维护了一个概率世界模型来表示环境，并使用了一个分层规划组件（SNaC）来进行有效的路径规划。使用 AirSim 和 Unreal Engine 进行模拟城市搜索任务的实验结果显示，NEUSIS 在成功率、搜索效率和 3D 定位方面优于最先进的（SOTA）视觉语言模型和 SOTA 搜索规划模型。这些结果证明了我们组合神经符号方法在处理复杂的真实场景中的有效性，使其成为自主 UAV 系统在搜索任务中的一个有前途的解决方案。

更新时间: 2024-09-16 11:42:15

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.10196v1

Relative Positioning for Aerial Robot Path Planning in GPS Denied Environment

One of the most useful applications of intelligent aerial robots sometimes called Unmanned Aerial Vehicles (UAV) in Australia is known to be in bushfire monitoring and prediction operations. A swarm of autonomous drones/UAVs programmed to work in real-time observing the fire parameters using their onboard sensors would be valuable in reducing the life-threatening impact of that fire. However autonomous UAVs face serious challenges in their positioning and navigation in critical bushfire conditions such as remoteness and severe weather conditions where GPS signals could also be unreliable. This paper tackles one of the most important factors in autonomous UAV navigation, namely Initial Positioning sometimes called Localisation. The solution provided by this paper will enable a team of autonomous UAVs to establish a relative position to their base of operation to be able to commence a team search and reconnaissance in a bushfire-affected area and find their way back to their base without the help of GPS signals.

Updated: 2024-09-16 11:35:39

标题: 在GPS受限环境中用于航空机器人路径规划的相对定位

摘要: 澳大利亚智能空中机器人（有时被称为无人机）最有用的应用之一被认为是在森林火灾监测和预测操作中。一群自主飞行的无人机/无人机编程实时观察火灾参数，利用其机载传感器将有助于减少火灾对生命的威胁。然而，在关键的森林火灾条件下，如偏远地区和恶劣天气条件下，自主无人机在其定位和导航方面面临严重挑战，其中GPS信号可能也不可靠。本文解决了自主无人机导航中最重要的因素之一，即初始定位，有时称为定位。本文提供的解决方案将使一组自主无人机能够建立与其操作基地的相对位置，以便在受到森林火灾影响的区域展开团队搜索和侦察，并找到回到基地的路线，而无需GPS信号的帮助。

更新时间: 2024-09-16 11:35:39

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.10193v1

PrePaMS: Privacy-Preserving Participant Management System for Studies with Rewards and Prerequisites

Taking part in surveys, experiments, and studies is often compensated by rewards to increase the number of participants and encourage attendance. While privacy requirements are usually considered for participation, privacy aspects of the reward procedure are mostly ignored. To this end, we introduce PrePaMS, an efficient participation management system that supports prerequisite checks and participation rewards in a privacy-preserving way. Our system organizes participations with potential (dis-)qualifying dependencies and enables secure reward payoffs. By leveraging a set of proven cryptographic primitives and mechanisms such as anonymous credentials and zero-knowledge proofs, participations are protected so that service providers and organizers cannot derive the identity of participants even within the reward process. In this paper, we have designed and implemented a prototype of PrePaMS to show its effectiveness and evaluated its performance under realistic workloads. PrePaMS covers the information whether subjects have participated in surveys, experiments, or studies. When combined with other secure solutions for the actual data collection within these events, PrePaMS can represent a cornerstone for more privacy-preserving empirical research.

Updated: 2024-09-16 11:35:17

标题: PrePaMS：用于具有奖励和先决条件的研究的隐私保护参与者管理系统

摘要: 参与调查、实验和研究往往会通过奖励来补偿，以增加参与人数并鼓励出席。虽然通常会考虑参与的隐私要求，但奖励程序的隐私方面大多被忽视。为此，我们介绍了PrePaMS，一种支持先决条件检查和隐私保护的有效参与管理系统。我们的系统组织参与者及其潜在（不）资格依赖关系，并实现安全的奖励支付。通过利用一组经过验证的密码原语和机制，如匿名凭证和零知识证明，参与者受到保护，以至于服务提供商和组织者甚至在奖励过程中也无法推断出参与者的身份。在本文中，我们设计并实现了PrePaMS的原型，以展示其有效性，并在实际工作负载下评估了其性能。PrePaMS覆盖了主体是否参与了调查、实验或研究的信息。当与这些活动中的实际数据收集的其他安全解决方案结合使用时，PrePaMS可以成为更多隐私保护的经验研究的基石。

更新时间: 2024-09-16 11:35:17

领域: cs.CR

下载: http://arxiv.org/abs/2409.10192v1

Enhancing RL Safety with Counterfactual LLM Reasoning

Reinforcement learning (RL) policies may exhibit unsafe behavior and are hard to explain. We use counterfactual large language model reasoning to enhance RL policy safety post-training. We show that our approach improves and helps to explain the RL policy safety.

Updated: 2024-09-16 11:30:39

标题: 使用反事实逻辑推理增强强化学习的安全性

摘要: 强化学习（RL）策略可能表现出不安全的行为，并且难以解释。我们利用反事实大型语言模型推理来增强RL策略在训练后的安全性。我们展示了我们的方法可以改善并帮助解释RL策略的安全性。

更新时间: 2024-09-16 11:30:39

领域: cs.LG

下载: http://arxiv.org/abs/2409.10188v1

CBMAP: Clustering-based manifold approximation and projection for dimensionality reduction

Dimensionality reduction methods are employed to decrease data dimensionality, either to enhance machine learning performance or to facilitate data visualization in two or three-dimensional spaces. These methods typically fall into two categories: feature selection and feature transformation. Feature selection retains significant features, while feature transformation projects data into a lower-dimensional space, with linear and nonlinear methods. While nonlinear methods excel in preserving local structures and capturing nonlinear relationships, they may struggle with interpreting global structures and can be computationally intensive. Recent algorithms, such as the t-SNE, UMAP, TriMap, and PaCMAP prioritize preserving local structures, often at the expense of accurately representing global structures, leading to clusters being spread out more in lower-dimensional spaces. Moreover, these methods heavily rely on hyperparameters, making their results sensitive to parameter settings. To address these limitations, this study introduces a clustering-based approach, namely CBMAP (Clustering-Based Manifold Approximation and Projection), for dimensionality reduction. CBMAP aims to preserve both global and local structures, ensuring that clusters in lower-dimensional spaces closely resemble those in high-dimensional spaces. Experimental evaluations on benchmark datasets demonstrate CBMAP's efficacy, offering speed, scalability, and minimal reliance on hyperparameters. Importantly, CBMAP enables low-dimensional projection of test data, addressing a critical need in machine learning applications. CBMAP is made freely available at https://github.com/doganlab/cbmap and can be installed from the Python Package Directory (PyPI) software repository with the command pip install cbmap.

Updated: 2024-09-16 11:29:25

标题: CBMAP：基于聚类的流形逼近和投影用于降维

摘要: 降维方法被用来减少数据的维度，无论是为了提高机器学习的性能还是为了在二维或三维空间中进行数据可视化。这些方法通常分为两类：特征选择和特征转换。特征选择保留重要特征，而特征转换将数据投影到较低维度空间中，采用线性和非线性方法。虽然非线性方法在保留局部结构和捕捉非线性关系方面表现出色，但它们可能难以解释全局结构，并且可能需要大量计算资源。最近的算法，如t-SNE、UMAP、TriMap和PaCMAP更注重保留局部结构，往往以准确表示全局结构为代价，导致簇在较低维空间中更加分散。此外，这些方法严重依赖于超参数，使得它们的结果对参数设置敏感。为了解决这些限制，本研究引入了一种基于聚类的方法，即CBMAP（基于聚类的流形逼近和投影），用于降维。CBMAP旨在同时保留全局和局部结构，确保较低维度空间中的簇与高维空间中的簇密切相似。对基准数据集的实验评估显示CBMAP的有效性，提供了速度、可扩展性和最小化对超参数的依赖。重要的是，CBMAP可以对测试数据进行低维投影，满足机器学习应用中的一个关键需求。CBMAP可以在https://github.com/doganlab/cbmap免费获得，并且可以通过命令pip install cbmap从Python软件包目录（PyPI）中安装。

更新时间: 2024-09-16 11:29:25

领域: cs.LG

下载: http://arxiv.org/abs/2404.17940v2

Hound: Locating Cryptographic Primitives in Desynchronized Side-Channel Traces Using Deep-Learning

Side-channel attacks allow to extract sensitive information from cryptographic primitives by correlating the partially known computed data and the measured side-channel signal. Starting from the raw side-channel trace, the preprocessing of the side-channel trace to pinpoint the time at which each cryptographic primitive is executed, and, then, to re-align all the collected data to this specific time represent a critical step to setup a successful side-channel attack. The use of hiding techniques has been widely adopted as a low-cost solution to hinder the preprocessing of side-channel traces thus limiting side-channel attacks in real scenarios. This work introduces Hound, a novel deep learning-based pipeline to locate the execution of cryptographic primitives within the side-channel trace even in the presence of trace deformations introduced by the use of dynamic frequency scaling actuators. Hound has been validated through successful attacks on various cryptographic primitives executed on an FPGA-based system-on-chip incorporating a RISC-V CPU, while dynamic frequency scaling is active. Experimental results demonstrate the possibility of identifying the cryptographic primitives in DFS-deformed side-channel traces.

Updated: 2024-09-16 11:20:45

标题: 《Hound：使用深度学习在脱同步侧信道迹中定位加密原语》

摘要: 侧信道攻击允许通过将部分已知计算数据与测量的侧信号相关联来从加密原语中提取敏感信息。从原始侧信道迹开始，对侧信道迹进行预处理以确定每个加密原语执行的时间，然后将所有收集的数据重新对齐到这个特定时间是设置成功的侧信道攻击的关键步骤。隐藏技术的使用已被广泛采用作为阻碍侧信道迹预处理的低成本解决方案，从而限制了真实场景中的侧信道攻击。本文介绍了Hound，这是一种基于深度学习的新型管道，可以在动态频率缩放执行器引入的迹变形的情况下定位侧信道迹中的加密原语执行。Hound已通过成功攻击集成了RISC-V CPU的基于FPGA的片上系统上执行的各种加密原语来验证，同时动态频率缩放处于活动状态。实验结果证明了在DFS变形的侧信道迹中识别加密原语的可能性。

更新时间: 2024-09-16 11:20:45

领域: cs.CR

下载: http://arxiv.org/abs/2408.06296v2

Augmenting Automatic Speech Recognition Models with Disfluency Detection

Speech disfluency commonly occurs in conversational and spontaneous speech. However, standard Automatic Speech Recognition (ASR) models struggle to accurately recognize these disfluencies because they are typically trained on fluent transcripts. Current research mainly focuses on detecting disfluencies within transcripts, overlooking their exact location and duration in the speech. Additionally, previous work often requires model fine-tuning and addresses limited types of disfluencies. In this work, we present an inference-only approach to augment any ASR model with the ability to detect open-set disfluencies. We first demonstrate that ASR models have difficulty transcribing speech disfluencies. Next, this work proposes a modified Connectionist Temporal Classification(CTC)-based forced alignment algorithm from \cite{kurzinger2020ctc} to predict word-level timestamps while effectively capturing disfluent speech. Additionally, we develop a model to classify alignment gaps between timestamps as either containing disfluent speech or silence. This model achieves an accuracy of 81.62\% and an F1-score of 80.07\%. We test the augmentation pipeline of alignment gap detection and classification on a disfluent dataset. Our results show that we captured 74.13\% of the words that were initially missed by the transcription, demonstrating the potential of this pipeline for downstream tasks.

Updated: 2024-09-16 11:13:14

标题: 用离散性检测增强自动语音识别模型

摘要: 口语中常见的语言不流畅现象通常出现在对话和自发性言语中。然而，标准的自动语音识别（ASR）模型往往难以准确识别这些语言不流畅，因为它们通常是在流畅的转录文本上训练的。目前的研究主要集中在检测转录中的语言不流畅，忽略了语音中的确切位置和持续时间。此外，先前的工作经常需要对模型进行微调，并且只涉及有限类型的语言不流畅。在本研究中，我们提出了一种仅推理的方法，用于增强任何ASR模型以侦测开放式语言不流畅。我们首先证明ASR模型在转录语言不流畅时存在困难。接下来，本研究提出了一种修改的基于连接主义时间分类（CTC）的强制对齐算法，用于预测单词级时间戳，同时有效捕捉语言不流畅。此外，我们开发了一个模型，用于将时间戳之间的对齐间隙分类为包含语言不流畅或静音。该模型实现了81.62%的准确率和80.07%的F1分数。我们在一个语言不流畅的数据集上测试了对齐间隙检测和分类的增强流程。我们的结果显示，我们捕获了最初被转录遗漏的74.13%的单词，展示了该流程在下游任务中的潜力。

更新时间: 2024-09-16 11:13:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10177v1

TCDformer-based Momentum Transfer Model for Long-term Sports Prediction

Accurate sports prediction is a crucial skill for professional coaches, which can assist in developing effective training strategies and scientific competition tactics. Traditional methods often use complex mathematical statistical techniques to boost predictability, but this often is limited by dataset scale and has difficulty handling long-term predictions with variable distributions, notably underperforming when predicting point-set-game multi-level matches. To deal with this challenge, this paper proposes TM2, a TCDformer-based Momentum Transfer Model for long-term sports prediction, which encompasses a momentum encoding module and a prediction module based on momentum transfer. TM2 initially encodes momentum in large-scale unstructured time series using the local linear scaling approximation (LLSA) module. Then it decomposes the reconstructed time series with momentum transfer into trend and seasonal components. The final prediction results are derived from the additive combination of a multilayer perceptron (MLP) for predicting trend components and wavelet attention mechanisms for seasonal components. Comprehensive experimental results show that on the 2023 Wimbledon men's tournament datasets, TM2 significantly surpasses existing sports prediction models in terms of performance, reducing MSE by 61.64% and MAE by 63.64%.

Updated: 2024-09-16 11:10:54

标题: TCDformer基于的动量传递模型用于长期体育预测

摘要: 准确的体育预测是专业教练的关键技能，可以帮助制定有效的训练策略和科学的比赛战术。传统方法通常使用复杂的数学统计技术来提高可预测性，但这往往受到数据集规模的限制，并且在处理具有可变分布的长期预测时存在困难，特别是在预测点集比赛的多层次比赛时表现不佳。为了应对这一挑战，本文提出了TM2，一种基于TCDformer的动量传递模型，用于长期体育预测，该模型包括一个动量编码模块和一个基于动量传递的预测模块。TM2首先使用局部线性缩放逼近(LLSA)模块在大规模非结构化时间序列中编码动量。然后，它将带有动量传递的重构时间序列分解为趋势和季节性组件。最终的预测结果来自于用于预测趋势组件的多层感知器(MLP)和用于季节性组件的小波注意机制的加法组合。全面的实验结果表明，在2023年温布尔登男子锦标赛数据集上，TM2在性能方面显著超越了现有的体育预测模型，将均方误差(MSE)降低了61.64%，平均绝对误差(MAE)降低了63.64%。

更新时间: 2024-09-16 11:10:54

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2409.10176v1

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Additionally, Matryoshka Representation Learning is integrated into the training process, allowing flexible truncation of embedding dimensions without compromising performance. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks.

Updated: 2024-09-16 11:10:29

标题: jina-embeddings-v3：具有任务LoRA的多语言嵌入

摘要: 我们介绍了jina-embeddings-v3，这是一个具有5.7亿参数的新型文本嵌入模型，在多语言数据和长上下文检索任务上实现了最先进的性能，支持长度达8192个标记的上下文。该模型包括一组专门针对任务的低秩适应器（LoRA适配器），用于生成高质量的嵌入，用于查询-文档检索、聚类、分类和文本匹配。此外，Matryoshka表示学习被整合到训练过程中，允许灵活截断嵌入维度而不影响性能。在MTEB基准测试上的评估表明，jina-embeddings-v3在英语任务上优于OpenAI和Cohere的最新专有嵌入，同时在所有多语言任务中较多语言-e5-large-instruct表现更优。

更新时间: 2024-09-16 11:10:29

领域: cs.CL,cs.AI,cs.IR,68T50,I.2.7

下载: http://arxiv.org/abs/2409.10173v1

Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control

Safe learning of control policies remains challenging, both in optimal control and reinforcement learning. In this article, we consider safe learning of parametrized predictive controllers that operate with incomplete information about the underlying process. To this end, we employ Bayesian optimization for learning the best parameters from closed-loop data. Our method focuses on the system's overall long-term performance in closed-loop while keeping it safe and stable. Specifically, we parametrize the stage cost function of an MPC using a feedforward neural network. This allows for a high degree of flexibility, enabling the system to achieve a better closed-loop performance with respect to a superordinate measure. However, this flexibility also necessitates safety measures, especially with respect to closed-loop stability. To this end, we explicitly incorporated stability information in the Bayesian-optimization-based learning procedure, thereby achieving rigorous probabilistic safety guarantees. The proposed approach is illustrated using a numeric example.

Updated: 2024-09-16 11:03:58

标题: 安全稳定的闭环学习用于神经网络支持的模型预测控制

摘要: 控制策略的安全学习仍然具有挑战性，无论是在最优控制还是强化学习中。在本文中，我们考虑对操作具有关于基础过程的不完整信息的参数化预测控制器进行安全学习。为此，我们利用贝叶斯优化从闭环数据中学习最佳参数。我们的方法侧重于在闭环中系统的整体长期性能，同时保持其安全和稳定。具体而言，我们使用前馈神经网络对MPC的阶段成本函数进行参数化。这允许具有高度灵活性的系统实现相对于上级指标更好的闭环性能。然而，这种灵活性也需要安全措施，特别是关于闭环稳定性。为此，我们在基于贝叶斯优化的学习过程中明确地将稳定性信息纳入其中，从而实现严格的概率安全保证。所提出的方法通过数值示例进行了说明。

更新时间: 2024-09-16 11:03:58

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2409.10171v1

Algorithmic Behaviors Across Regions: A Geolocation Audit of YouTube Search for COVID-19 Misinformation between the United States and South Africa

Despite being an integral tool for finding health-related information online, YouTube has faced criticism for disseminating COVID-19 misinformation globally to its users. Yet, prior audit studies have predominantly investigated YouTube within the Global North contexts, often overlooking the Global South. To address this gap, we conducted a comprehensive 10-day geolocation-based audit on YouTube to compare the prevalence of COVID-19 misinformation in search results between the United States (US) and South Africa (SA), the countries heavily affected by the pandemic in the Global North and the Global South, respectively. For each country, we selected 3 geolocations and placed sock-puppets, or bots emulating "real" users, that collected search results for 48 search queries sorted by 4 search filters for 10 days, yielding a dataset of 915K results. We found that 31.55% of the top-10 search results contained COVID-19 misinformation. Among the top-10 search results, bots in SA faced significantly more misinformative search results than their US counterparts. Overall, our study highlights the contrasting algorithmic behaviors of YouTube search between two countries, underscoring the need for the platform to regulate algorithmic behavior consistently across different regions of the Globe.

Updated: 2024-09-16 10:56:43

标题: 不同地区的算法行为：美国和南非YouTube搜索COVID-19虚假信息的地理位置审计

摘要: 尽管YouTube是在线寻找与健康相关信息的重要工具，但它因在全球传播COVID-19的信息不准确而受到批评。然而，先前的审计研究主要调查了全球北部环境中的YouTube，往往忽视了全球南部。为了填补这一空白，我们进行了一项为期10天的基于地理位置的YouTube审计，比较了美国和南非的COVID-19不准确信息在搜索结果中的流行程度，这两个国家在全球北部和全球南部分别受到疫情的严重影响。对于每个国家，我们选择了3个地理位置，并放置了模拟“真实”用户的机器人，收集了48个搜索查询的搜索结果，按4个搜索过滤器进行排序，持续10天，产生了一个包含915K结果的数据集。我们发现，前10个搜索结果中有31.55%包含COVID-19的不准确信息。在前10个搜索结果中，南非的机器人面临的错误信息搜索结果显著多于美国的机器人。总体而言，我们的研究突显了YouTube搜索在两个国家之间的对比算法行为，强调了这个平台需要在全球不同地区一致地规范算法行为的必要性。

更新时间: 2024-09-16 10:56:43

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2409.10168v1

Quantile Regression for Distributional Reward Models in RLHF

Reinforcement learning from human feedback (RLHF) has become a key method for aligning large language models (LLMs) with human preferences through the use of reward models. However, traditional reward models typically generate point estimates, which oversimplify the diversity and complexity of human values and preferences. In this paper, we introduce Quantile Reward Models (QRMs), a novel approach to reward modeling that learns a distribution over rewards instead of a single scalar value. Our method uses quantile regression to estimate a full, potentially multimodal distribution over preferences, providing a more powerful and nuanced representation of preferences. This distributional approach can better capture the diversity of human values, addresses label noise, and accommodates conflicting preferences by modeling them as distinct modes in the distribution. Our experimental results show that QRM outperforms comparable traditional point-estimate models on RewardBench. Furthermore, we demonstrate that the additional information provided by the distributional estimates can be utilized in downstream applications, such as risk-aware reinforcement learning, resulting in LLM policies that generate fewer extremely negative responses. Our code and model are released at https://github.com/Nicolinho/QRM.

Updated: 2024-09-16 10:54:04

标题: 分位数回归用于RLHF中的分布式奖励模型

摘要: 来自人类反馈的强化学习（RLHF）已经成为通过使用奖励模型将大型语言模型（LLMs）与人类偏好对齐的关键方法。然而，传统奖励模型通常生成点估计，这会过分简化人类价值观和偏好的多样性和复杂性。在本文中，我们介绍了Quantile Reward Models（QRMs），这是一种新颖的奖励建模方法，它学习奖励的分布而不是单个标量值。我们的方法使用分位数回归来估计一个完整的、潜在的多模态的偏好分布，提供了对偏好更强大和细致的表示。这种分布式方法可以更好地捕捉人类价值观的多样性，解决标签噪音问题，并通过将它们建模为分布中的不同模式来适应冲突的偏好。我们的实验结果表明，在RewardBench上，QRM优于可比较的传统点估计模型。此外，我们还展示了分布估计提供的附加信息可以在下游应用中利用，例如风险感知的强化学习，从而导致LLM策略生成更少的极端负面反应。我们的代码和模型已发布在https://github.com/Nicolinho/QRM。

更新时间: 2024-09-16 10:54:04

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.10164v1

SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting

Sim2Real transfer, particularly for manipulation policies relying on RGB images, remains a critical challenge in robotics due to the significant domain shift between synthetic and real-world visual data. In this paper, we propose SplatSim, a novel framework that leverages Gaussian Splatting as the primary rendering primitive to reduce the Sim2Real gap for RGB-based manipulation policies. By replacing traditional mesh representations with Gaussian Splats in simulators, SplatSim produces highly photorealistic synthetic data while maintaining the scalability and cost-efficiency of simulation. We demonstrate the effectiveness of our framework by training manipulation policies within SplatSim}and deploying them in the real world in a zero-shot manner, achieving an average success rate of 86.25%, compared to 97.5% for policies trained on real-world data.

Updated: 2024-09-16 10:52:16

标题: SplatSim：使用高斯分割的RGB操作策略的零迁移Sim2Real

摘要: Sim2Real转移，特别是对于依赖于RGB图像的操纵策略，由于合成和真实世界视觉数据之间的显着领域转移，仍然是机器人技术中的一个关键挑战。在本文中，我们提出了SplatSim，一个新颖的框架，利用高斯喷溅作为主要渲染原语，以减少基于RGB的操纵策略的Sim2Real差距。通过在模拟器中用高斯喷溅替换传统的网格表示，SplatSim产生高度逼真的合成数据，同时保持了模拟的可伸缩性和成本效益。我们通过在SplatSim中训练操纵策略并以零射模式部署它们到真实世界中来证明我们框架的有效性，实现了86.25%的平均成功率，而在真实世界数据上训练的策略成功率为97.5%。

更新时间: 2024-09-16 10:52:16

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.10161v1

Efficient Network Embedding by Approximate Equitable Partitions

Structural network embedding is a crucial step in enabling effective downstream tasks for complex systems that aims to project a network into a lower-dimensional space while preserving similarities among nodes. We introduce a simple and efficient embedding technique based on approximate variants of equitable partitions. The approximation consists in introducing a user-tunable tolerance parameter relaxing the otherwise strict condition for exact equitable partitions that can be hardly found in real-world networks. We exploit a relationship between equitable partitions and equivalence relations for Markov chains and ordinary differential equations to develop a partition refinement algorithm for computing an approximate equitable partition in polynomial time. We compare our method against state-of-the-art embedding techniques on benchmark networks. We report comparable -- when not superior -- performance for visualization, classification, and regression tasks at a cost between one and three orders of magnitude smaller using a prototype implementation, enabling the embedding of large-scale networks which could not be efficiently handled by most of the competing techniques.

Updated: 2024-09-16 10:51:24

标题: 高效网络嵌入的近似公平分割

摘要: 结构网络嵌入是使复杂系统的下游任务能够有效进行的关键步骤，旨在将网络投影到较低维空间，同时保持节点之间的相似性。我们介绍了一种基于近似公平划分的简单高效的嵌入技术。该近似性在于引入一个用户可调节的容差参数，放宽了对于实际网络中几乎难以找到的精确公平划分的严格条件。我们利用公平划分与马尔可夫链和常微分方程的等价关系来开发一个在多项式时间内计算近似公平划分的划分细化算法。我们将我们的方法与基准网络上的最先进嵌入技术进行比较。我们报告了可视化、分类和回归任务的性能相当甚至更好的结果，成本降低了一到三个数量级，使用原型实现，使得大规模网络的嵌入可以高效处理，这是大多数竞争技术无法做到的。

更新时间: 2024-09-16 10:51:24

领域: cs.SI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.10160v1

Contrastive Learning for Character Detection in Ancient Greek Papyri

This thesis investigates the effectiveness of SimCLR, a contrastive learning technique, in Greek letter recognition, focusing on the impact of various augmentation techniques. We pretrain the SimCLR backbone using the Alpub dataset (pretraining dataset) and fine-tune it on a smaller ICDAR dataset (finetuning dataset) to compare SimCLR's performance against traditional baseline models, which use cross-entropy and triplet loss functions. Additionally, we explore the role of different data augmentation strategies, essential for the SimCLR training process. Methodologically, we examine three primary approaches: (1) a baseline model using cross-entropy loss, (2) a triplet embedding model with a classification layer, and (3) a SimCLR pretrained model with a classification layer. Initially, we train the baseline, triplet, and SimCLR models using 93 augmentations on ResNet-18 and ResNet-50 networks with the ICDAR dataset. From these, the top four augmentations are selected using a statistical t-test. Pretraining of SimCLR is conducted on the Alpub dataset, followed by fine-tuning on the ICDAR dataset. The triplet loss model undergoes a similar process, being pretrained on the top four augmentations before fine-tuning on ICDAR. Our experiments show that SimCLR does not outperform the baselines in letter recognition tasks. The baseline model with cross-entropy loss demonstrates better performance than both SimCLR and the triplet loss model. This study provides a detailed evaluation of contrastive learning for letter recognition, highlighting SimCLR's limitations while emphasizing the strengths of traditional supervised learning models in this task. We believe SimCLR's cropping strategies may cause a semantic shift in the input image, reducing training effectiveness despite the large pretraining dataset. Our code is available at https://github.com/DIVA-DIA/MT_augmentation_and_contrastive_learning/.

Updated: 2024-09-16 10:41:29

标题: 古希腊纸张中字符检测的对比学习

摘要: 这篇论文调查了对希腊字母识别效果的影响，重点关注不同数据增强技术在SimCLR对比学习技术中的作用。我们使用Alpub数据集（预训练数据集）对SimCLR骨干进行预训练，并在较小的ICDAR数据集（微调数据集）上进行微调，以比较SimCLR的性能与传统基线模型的差异，传统基线模型使用交叉熵和三元组损失函数。此外，我们探讨了不同数据增强策略在SimCLR训练过程中的作用。在方法上，我们研究了三种主要方法：（1）使用交叉熵损失的基线模型，（2）带有分类层的三元组嵌入模型，（3）带有分类层的SimCLR预训练模型。最初，我们使用93种数据增强在ICDAR数据集上对ResNet-18和ResNet-50网络进行基线、三元组和SimCLR模型的训练。从中，使用统计t检验选择了前四种数据增强。SimCLR的预训练在Alpub数据集上进行，然后在ICDAR数据集上进行微调。三元组损失模型经历类似的过程，在前四种数据增强上进行预训练，然后在ICDAR上进行微调。我们的实验表明，SimCLR在字母识别任务中并没有优于基线模型。使用交叉熵损失的基线模型表现比SimCLR和三元组损失模型都要好。本研究对字母识别的对比学习进行了详细评估，突出了SimCLR的局限性，同时强调了传统监督学习模型在这项任务中的优势。我们认为SimCLR的裁剪策略可能会导致输入图像的语义转变，降低训练效果，尽管有大量的预训练数据集。我们的代码可在https://github.com/DIVA-DIA/MT_augmentation_and_contrastive_learning/获取。

更新时间: 2024-09-16 10:41:29

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.10156v1

Conformal Predictive Systems Under Covariate Shift

Conformal Predictive Systems (CPS) offer a versatile framework for constructing predictive distributions, allowing for calibrated inference and informative decision-making. However, their applicability has been limited to scenarios adhering to the Independent and Identically Distributed (IID) model assumption. This paper extends CPS to accommodate scenarios characterized by covariate shifts. We therefore propose Weighted CPS (WCPS), akin to Weighted Conformal Prediction (WCP), leveraging likelihood ratios between training and testing covariate distributions. This extension enables the construction of nonparametric predictive distributions capable of handling covariate shifts. We present theoretical underpinnings and conjectures regarding the validity and efficacy of WCPS and demonstrate its utility through empirical evaluations on both synthetic and real-world datasets. Our simulation experiments indicate that WCPS are probabilistically calibrated under covariate shift.

Updated: 2024-09-16 10:32:28

标题: 协变量转移下的一致预测系统

摘要: Conformal Predictive Systems（CPS）为构建预测分布提供了一个多才多艺的框架，可以进行校准推断和信息决策。然而，它们的适用性仅限于遵循独立同分布（IID）模型假设的情况。本文将CPS扩展为适应由协变量转移特征的情景。因此，我们提出了一种称为加权CPS（WCPS）的方法，类似于加权一致性预测（WCP），利用训练和测试协变量分布之间的似然比。这种扩展使得能够构建能够处理协变量转移的非参数预测分布。我们提出了关于WCPS的理论基础和有效性的猜测，并通过对合成和真实世界数据集的实证评估来展示其实用性。我们的模拟实验表明，在协变量转移下，WCPS是概率校准的。

更新时间: 2024-09-16 10:32:28

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.15018v2

AutoPET Challenge III: Testing the Robustness of Generalized Dice Focal Loss trained 3D Residual UNet for FDG and PSMA Lesion Segmentation from Whole-Body PET/CT Images

Automated segmentation of cancerous lesions in PET/CT scans is a crucial first step in quantitative image analysis. However, training deep learning models for segmentation with high accuracy is particularly challenging due to the variations in lesion size, shape, and radiotracer uptake. These lesions can appear in different parts of the body, often near healthy organs that also exhibit considerable uptake, making the task even more complex. As a result, creating an effective segmentation model for routine PET/CT image analysis is challenging. In this study, we utilized a 3D Residual UNet model and employed the Generalized Dice Focal Loss function to train the model on the AutoPET Challenge 2024 dataset. We conducted a 5-fold cross-validation and used an average ensembling technique using the models from the five folds. In the preliminary test phase for Task-1, the average ensemble achieved a mean Dice Similarity Coefficient (DSC) of 0.6687, mean false negative volume (FNV) of 10.9522 ml and mean false positive volume (FPV) 2.9684 ml. More details about the algorithm can be found on our GitHub repository: https://github.com/ahxmeds/autosegnet2024.git. The training code has been shared via the repository: https://github.com/ahxmeds/autopet2024.git.

Updated: 2024-09-16 10:27:30

标题: AutoPET挑战III：测试使用广义Dice焦点损失训练的3D残差UNet对FDG和PSMA病灶在全身PET/CT图像中的分割的稳健性

摘要: PET/CT扫描中癌症病灶的自动分割是定量图像分析的关键第一步。然而，由于病灶大小、形状和放射性示踪剂摄取的变化，训练深度学习模型以高准确度进行分割特别具有挑战性。这些病灶可能出现在身体的不同部位，通常靠近也具有相当摄取的健康器官，使任务变得更加复杂。因此，为常规PET/CT图像分析创建有效的分割模型具有挑战性。在本研究中，我们利用了3D残差UNet模型，并采用广义Dice Focal Loss函数在AutoPET Challenge 2024数据集上对模型进行训练。我们进行了5倍交叉验证，并使用来自五个折叠的模型进行平均集成技术。在任务1的初步测试阶段中，平均集成达到了平均Dice相似性系数（DSC）为0.6687，平均假阴性体积（FNV）为10.9522毫升，平均假阳性体积（FPV）为2.9684毫升。有关算法的更多详细信息可在我们的GitHub存储库上找到：https://github.com/ahxmeds/autosegnet2024.git。培训代码已通过存储库共享：https://github.com/ahxmeds/autopet2024.git。

更新时间: 2024-09-16 10:27:30

领域: cs.CV,cs.AI,physics.med-ph

下载: http://arxiv.org/abs/2409.10151v1

LLMs4OL 2024 Overview: The 1st Large Language Models for Ontology Learning Challenge

This paper outlines the LLMs4OL 2024, the first edition of the Large Language Models for Ontology Learning Challenge. LLMs4OL is a community development initiative collocated with the 23rd International Semantic Web Conference (ISWC) to explore the potential of Large Language Models (LLMs) in Ontology Learning (OL), a vital process for enhancing the web with structured knowledge to improve interoperability. By leveraging LLMs, the challenge aims to advance understanding and innovation in OL, aligning with the goals of the Semantic Web to create a more intelligent and user-friendly web. In this paper, we give an overview of the 2024 edition of the LLMs4OL challenge and summarize the contributions.

Updated: 2024-09-16 10:15:30

标题: LLMs4OL 2024概述：本体学习挑战中的第一个大型语言模型

摘要: 这篇论文概述了LLMs4OL 2024，这是第一届大型语言模型用于本体学习挑战赛。LLMs4OL是一个社区发展倡议，与第23届国际语义网会议（ISWC）同时举行，旨在探索大型语言模型（LLMs）在本体学习（OL）中的潜力，这是增强网络结构化知识以提高互操作性的重要过程。通过利用LLMs，该挑战旨在推动本体学习领域的理解和创新，与语义网的目标一致，旨在创建一个更智能和用户友好的网络。在本文中，我们概述了LLMs4OL挑战赛2024年版，并总结了贡献。

更新时间: 2024-09-16 10:15:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10146v1

AALF: Almost Always Linear Forecasting

Recent works for time-series forecasting more and more leverage the high predictive power of Deep Learning models. With this increase in model complexity, however, comes a lack in understanding of the underlying model decision process, which is problematic for high-stakes decision making. At the same time, simple, interpretable forecasting methods such as Linear Models can still perform very well, sometimes on-par, with Deep Learning approaches. We argue that simple models are good enough most of the time, and forecasting performance can be improved by choosing a Deep Learning method only for certain predictions, increasing the overall interpretability of the forecasting process. In this context, we propose a novel online model selection framework which uses meta-learning to identify these predictions and only rarely uses a non-interpretable, large model. An extensive empirical study on various real-world datasets shows that our selection methodology outperforms state-of-the-art online model selections methods in most cases. We find that almost always choosing a simple Linear Model for forecasting results in competitive performance, suggesting that the need for opaque black-box models in time-series forecasting is smaller than recent works would suggest.

Updated: 2024-09-16 10:13:09

标题: AALF：几乎总是线性预测

摘要: 最近的时间序列预测工作越来越多地利用深度学习模型的高预测能力。然而，随着模型复杂性的增加，对潜在模型决策过程的理解缺乏，这对于高风险决策是有问题的。与此同时，简单易解释的预测方法，如线性模型，仍然可以表现出色，有时与深度学习方法不相上下。我们认为，大多数情况下简单模型已经足够好，并且通过仅选择某些预测使用深度学习方法，可以提高预测性能，增加整个预测过程的可解释性。在这种情况下，我们提出了一种新颖的在线模型选择框架，利用元学习识别这些预测，并很少使用不可解释的大模型。对各种真实世界数据集进行的广泛经验研究表明，我们的选择方法在大多数情况下优于最先进的在线模型选择方法。我们发现，几乎总是选择简单的线性模型进行预测可以取得竞争性表现，这表明时间序列预测中对不透明黑匣模型的需求比最近的研究所暗示的要小。

更新时间: 2024-09-16 10:13:09

领域: cs.LG

下载: http://arxiv.org/abs/2409.10142v1

Towards Explainable Automated Data Quality Enhancement without Domain Knowledge

In the era of big data, ensuring the quality of datasets has become increasingly crucial across various domains. We propose a comprehensive framework designed to automatically assess and rectify data quality issues in any given dataset, regardless of its specific content, focusing on both textual and numerical data. Our primary objective is to address three fundamental types of defects: absence, redundancy, and incoherence. At the heart of our approach lies a rigorous demand for both explainability and interpretability, ensuring that the rationale behind the identification and correction of data anomalies is transparent and understandable. To achieve this, we adopt a hybrid approach that integrates statistical methods with machine learning algorithms. Indeed, by leveraging statistical techniques alongside machine learning, we strike a balance between accuracy and explainability, enabling users to trust and comprehend the assessment process. Acknowledging the challenges associated with automating the data quality assessment process, particularly in terms of time efficiency and accuracy, we adopt a pragmatic strategy, employing resource-intensive algorithms only when necessary, while favoring simpler, more efficient solutions whenever possible. Through a practical analysis conducted on a publicly provided dataset, we illustrate the challenges that arise when trying to enhance data quality while keeping explainability. We demonstrate the effectiveness of our approach in detecting and rectifying missing values, duplicates and typographical errors as well as the challenges remaining to be addressed to achieve similar accuracy on statistical outliers and logic errors under the constraints set in our work.

Updated: 2024-09-16 10:08:05

标题: 朝向无需领域知识的可解释自动化数据质量增强

摘要: 在大数据时代，确保数据集的质量在各个领域变得越来越重要。我们提出了一个全面的框架，旨在自动评估和纠正任何给定数据集中的数据质量问题，无论其具体内容如何，重点放在文本和数值数据上。我们的主要目标是解决三种基本类型的缺陷：缺失、冗余和不一致。我们的方法的核心是对解释性和可解释性的严格要求，确保数据异常的识别和纠正背后的理由是透明和可理解的。为了实现这一目标，我们采用了一种融合统计方法和机器学习算法的混合方法。通过将统计技术与机器学习相结合，我们在准确性和解释性之间取得了平衡，使用户能够信任和理解评估过程。我们意识到自动化数据质量评估过程所面临的挑战，特别是在时间效率和准确性方面，我们采用了一种务实的策略，只在必要时使用资源密集型算法，同时尽可能采用更简单、更高效的解决方案。通过对一个公开提供的数据集进行的实际分析，我们展示了在试图提高数据质量的同时保持可解释性时所面临的挑战。我们展示了我们的方法在检测和纠正缺失值、重复值和印刷错误方面的有效性，以及在我们的工作设定的限制下实现类似准确性的统计异常值和逻辑错误的挑战。

更新时间: 2024-09-16 10:08:05

领域: cs.DB,cs.AI,stat.ML,62H30, 68P99,H.2.7; H.2.8; I.2.1

下载: http://arxiv.org/abs/2409.10139v1

The Impact of Run-Time Variability on Side-Channel Attacks Targeting FPGAs

To defeat side-channel attacks, many recent countermeasures work by enforcing random run-time variability to the target computing platform in terms of clock jitters, frequency and voltage scaling, and phase shift, also combining the contributions from different actuators to maximize the side-channel resistance of the target. However, the robustness of such solutions seems strongly influenced by several hyper-parameters for which an in-depth analysis is still missing. This work proposes a fine-grained dynamic voltage and frequency scaling actuator to investigate the effectiveness of recent desynchronization countermeasures with the goal of highlighting the link between the enforced run-time variability and the vulnerability to side-channel attacks of cryptographic implementations targeting FPGAs. The analysis of the results collected from real hardware allowed for a comprehensive understanding of the protection offered by run-time variability countermeasures against side-channel attacks.

Updated: 2024-09-16 10:07:30

标题: 运行时变异对针对FPGAs的侧信道攻击的影响

摘要: 为了抵御侧信道攻击，许多最近的对策通过在时钟抖动、频率和电压缩放以及相位移等方面对目标计算平台施加随机运行时可变性来工作，同时还结合了来自不同执行器的贡献，以最大化目标的侧信道抵抗力。然而，这些解决方案的鲁棒性似乎受到几个超参数的强烈影响，对这些参数的深入分析仍然缺失。本文提出了一个精细的动态电压和频率调节执行器，以调查最近的去同步对策的有效性，目标是突出强制运行时可变性与针对FPGA的加密实现对侧信道攻击的脆弱性之间的联系。从真实硬件收集的结果分析允许全面了解运行时可变性对抗侧信道攻击的保护所提供的程度。

更新时间: 2024-09-16 10:07:30

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2409.01881v2

Dataset of Pathloss and ToA Radio Maps With Localization Application

In this article, we present a collection of radio map datasets in dense urban setting, which we generated and made publicly available. The datasets include simulated pathloss/received signal strength (RSS) and time of arrival (ToA) radio maps over a large collection of realistic dense urban setting in real city maps. The two main applications of the presented dataset are 1) learning methods that predict the pathloss from input city maps (namely, deep learning-based simulations), and, 2) wireless localization. The fact that the RSS and ToA maps are computed by the same simulations over the same city maps allows for a fair comparison of the RSS and ToA-based localization methods.

Updated: 2024-09-16 10:04:23

标题: 数据集的路径损耗和到达时间无线地图与定位应用

摘要: 在这篇文章中，我们提供了在密集城市环境中生成并公开可用的一组无线电地图数据集。这些数据集包括在真实城市地图中大量收集的模拟路径损耗/接收信号强度（RSS）和到达时间（ToA）无线电地图。所提供数据集的两个主要应用是 1）从输入城市地图中预测路径损耗的学习方法（即基于深度学习的模拟），和 2）无线定位。RSS 和ToA地图由相同的模拟在相同的城市地图上计算，这允许对基于RSS和ToA的定位方法进行公平比较。

更新时间: 2024-09-16 10:04:23

领域: cs.NI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2212.11777v4

Advancing Towards a Marine Digital Twin Platform: Modeling the Mar Menor Coastal Lagoon Ecosystem in the South Western Mediterranean

Coastal marine ecosystems face mounting pressures from anthropogenic activities and climate change, necessitating advanced monitoring and modeling approaches for effective management. This paper pioneers the development of a Marine Digital Twin Platform aimed at modeling the Mar Menor Coastal Lagoon Ecosystem in the Region of Murcia. The platform leverages Artificial Intelligence to emulate complex hydrological and ecological models, facilitating the simulation of what-if scenarios to predict ecosystem responses to various stressors. We integrate diverse datasets from public sources to construct a comprehensive digital representation of the lagoon's dynamics. The platform's modular design enables real-time stakeholder engagement and informed decision-making in marine management. Our work contributes to the ongoing discourse on advancing marine science through innovative digital twin technologies.

Updated: 2024-09-16 10:01:18

标题: 朝着海洋数字孪生平台迈进：在西地中海西南部对马尔梅诺沿海泻湖生态系统进行建模

摘要: 沿海海洋生态系统面临着来自人类活动和气候变化的不断增加的压力，需要采用先进的监测和建模方法进行有效管理。本文开创性地开发了一个旨在模拟穆尔西亚地区马尔梦诺海湾生态系统的海洋数字孪生平台。该平台利用人工智能来模拟复杂的水文和生态模型，便于模拟各种应激源对生态系统反应的情景。我们整合了来自公共来源的多样化数据集，构建了对泻湖动态的全面数字表示。该平台的模块化设计可实现实时利益相关者参与和知情决策，用于海洋管理。我们的工作为通过创新的数字孪生技术推进海洋科学的持续对话做出了贡献。

更新时间: 2024-09-16 10:01:18

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2409.10134v1

Evaluation of Google Translate for Mandarin Chinese translation using sentiment and semantic analysis

Machine translation using large language models (LLMs) is having a significant global impact, making communication easier. Mandarin Chinese is the official language used for communication by the government and media in China. In this study, we provide an automated assessment of translation quality of Google Translate with human experts using sentiment and semantic analysis. In order to demonstrate our framework, we select the classic early twentieth-century novel 'The True Story of Ah Q' with selected Mandarin Chinese to English translations. We use Google Translate to translate the given text into English and then conduct a chapter-wise sentiment analysis and semantic analysis to compare the extracted sentiments across the different translations. Our results indicate that the precision of Google Translate differs both in terms of semantic and sentiment analysis when compared to human expert translations. We find that Google Translate is unable to translate some of the specific words or phrases in Chinese, such as Chinese traditional allusions. The mistranslations may be due to lack of contextual significance and historical knowledge of China.

Updated: 2024-09-16 10:00:52

标题: 对谷歌翻译在普通话翻译中的评估，使用情感和语义分析

摘要: 使用大型语言模型（LLMs）进行机器翻译具有重要的全球影响，使得沟通更加便捷。中文是中国政府和媒体通信所采用的官方语言。在本研究中，我们利用情感和语义分析，提供了对Google翻译的翻译质量进行自动评估，同时与人类专家进行对比。为了展示我们的框架，我们选择了经典的二十世纪早期小说《阿Q正传》，并选取了一些中文到英文的翻译。我们使用Google翻译将给定文本翻译成英文，然后进行逐章情感和语义分析，以比较不同翻译中提取的情感。我们的结果表明，与人类专家翻译相比，Google翻译的准确性在语义和情感分析方面存在差异。我们发现Google翻译无法翻译一些特定的中文单词或短语，如中国传统典故。这些误译可能是由于缺乏上下文意义和对中国历史知识的了解所致。

更新时间: 2024-09-16 10:00:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.04964v2

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression (and classification in Appendix~{\color{blue}B}). These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. Last but not least, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs) from two perpsectives reviewed previously, i.e., approximation and training dynamics.

Updated: 2024-09-16 09:57:35

标题: 深度学习的统计理论概览：近似、训练动态和生成模型

摘要: 在本文中，我们从三个角度回顾了关于神经网络统计理论的文献：逼近、训练动态和生成模型。在第一部分中，我们回顾了神经网络在非参数回归（以及附录B中的分类）框架中的超额风险结果。这些结果依赖于神经网络的显式构造，导致超额风险的快速收敛速度。然而，它们的基本分析仅适用于深度神经网络高度非凸风景中的全局极小化器。这激励我们在第二部分中回顾神经网络的训练动态。具体而言，我们回顾了试图回答“神经网络如何通过基于梯度的方法找到能够在未知数据上泛化良好的解”的论文。特别地，我们回顾了两个众所周知的范例：神经切向核（NTK）范例和平均场（MF）范例。最后但同样重要的是，我们回顾了生成模型中最新的理论进展，包括生成对抗网络（GANs）、扩散模型以及在大型语言模型（LLMs）中的上下文学习（ICL），从之前回顾的逼近和训练动态的两个角度出发。

更新时间: 2024-09-16 09:57:35

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2401.07187v3

StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models

As the modern tool of choice for question answering, large language models (LLMs) are expected to deliver answers with up-to-date knowledge. To achieve such ideal question-answering systems, locating and then editing outdated knowledge in the natural language outputs is a general target of popular knowledge editing methods. However, this target is challenging, as both identifying which tokens to edit in the reasoning steps and ensuring the coherence of the revised reasoning chain are difficult tasks. We argue that these challenges stem from the unstructured nature of natural language outputs. To address the above challenges, we propose $\textbf{Stru}$ctural $\textbf{Edit}$ing ($\textbf{StruEdit}$), an improved baseline for knowledge editing. We first prompt LLMs to produce structured outputs consisting of reasoning triplets. Then, StruEdit removes any potentially outdated knowledge and efficiently refills the structured outputs with up-to-date information in a single step. Experimental results show that StruEdit consistently delivers the highest accuracy with lowest latency compared with other knowledge editing methods.

Updated: 2024-09-16 09:48:56

标题: StruEdit：结构化输出实现大型语言模型快速准确的知识编辑

摘要: 作为现代问答的工具，大型语言模型(LLMs)被期望能够提供具有最新知识的答案。为了实现这样理想的问答系统，定位并编辑自然语言输出中的过时知识是流行的知识编辑方法的一般目标。然而，这个目标具有挑战性，因为在推理步骤中识别要编辑的标记以及确保修订后的推理链的连贯性都是困难的任务。我们认为这些挑战源于自然语言输出的非结构化特性。为了解决上述挑战，我们提出了结构化编辑(StruEdit)，这是一种用于知识编辑的改进基准。我们首先提示LLMs生成由推理三元组组成的结构化输出。然后，StruEdit在单个步骤中移除任何可能过时的知识，并高效地用最新信息重新填充结构化输出。实验结果显示，与其他知识编辑方法相比，StruEdit始终以最高的准确性和最低的延迟交付结果。

更新时间: 2024-09-16 09:48:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10132v1

$Δ\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies

The off-policy paradigm casts recommendation as a counterfactual decision-making task, allowing practitioners to unbiasedly estimate online metrics using offline data. This leads to effective evaluation metrics, as well as learning procedures that directly optimise online success. Nevertheless, the high variance that comes with unbiasedness is typically the crux that complicates practical applications. An important insight is that the difference between policy values can often be estimated with significantly reduced variance, if said policies have positive covariance. This allows us to formulate a pairwise off-policy estimation task: $\Delta\text{-}{\rm OPE}$. $\Delta\text{-}{\rm OPE}$ subsumes the common use-case of estimating improvements of a learnt policy over a production policy, using data collected by a stochastic logging policy. We introduce $\Delta\text{-}{\rm OPE}$ methods based on the widely used Inverse Propensity Scoring estimator and its extensions. Moreover, we characterise a variance-optimal additive control variate that further enhances efficiency. Simulated, offline, and online experiments show that our methods significantly improve performance for both evaluation and learning tasks.

Updated: 2024-09-16 09:30:40

标题: $Δ\text{-}{\rm OPE}$: 使用一对策略进行离策略估计

摘要: 离线策略范式将推荐视为一种反事实决策任务，允许从业者使用离线数据无偏地估计在线指标。这导致了有效的评估指标，以及直接优化在线成功的学习过程。然而，与无偏性相伴随的高方差通常是复杂化实际应用的关键。一个重要的见解是，如果所述策略具有正协方差，那么策略值之间的差异通常可以以显著降低的方差估计。这使我们能够制定一种成对的离线策略估计任务：∆-OPE。∆-OPE包含了估计学习策略相对于生产策略的改进的常见用例，该用例使用由随机记录策略收集的数据。我们基于广泛使用的反倒数估计器及其扩展介绍了∆-OPE方法。此外，我们表征了一种方差最优的附加控制变量，进一步提高了效率。模拟、离线和在线实验表明，我们的方法显著提高了评估和学习任务的性能。

更新时间: 2024-09-16 09:30:40

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2405.10024v2

CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients

The healthcare landscape is evolving, with patients seeking reliable information about their health conditions and available treatment options. Despite the abundance of information sources, the digital age overwhelms individuals with excess, often inaccurate information. Patients primarily trust medical professionals, highlighting the need for expert-endorsed health information. However, increased patient loads on experts has led to reduced communication time, impacting information sharing. To address this gap, we develop CataractBot, an experts-in-the-loop chatbot powered by LLMs, in collaboration with an eye hospital in India. CataractBot answers cataract surgery related questions instantly by querying a curated knowledge base, and provides expert-verified responses asynchronously. It has multimodal and multilingual capabilities. In an in-the-wild deployment study with 55 participants, CataractBot proved valuable, providing anytime accessibility, saving time, accommodating diverse literacy levels, alleviating power differences, and adding a privacy layer between patients and doctors. Users reported that their trust in the system was established through expert verification. Broadly, our results could inform future work on designing expert-mediated LLM bots.

Updated: 2024-09-16 09:22:20

标题: 白内障机器人：一款以LLM为动力的专家辅助聊天机器人，为白内障患者提供服务

摘要: 卫生保健领域正在发生变化，患者正在寻求可靠的关于他们健康状况和可用治疗选择的信息。尽管信息源丰富，但数字时代常常会让个人感到不知所措，因为其中包含过多且常常不准确的信息。患者主要信任医疗专业人员，突显了对专家认可的健康信息的需求。然而，患者数量的增加导致专家们的沟通时间减少，影响了信息共享。为了弥补这一差距，我们与印度一家眼科医院合作开发了CataractBot，这是一个由LLMs驱动的专家参与的聊天机器人。CataractBot通过查询经过筛选的知识库即时回答白内障手术相关的问题，并异步提供专家验证的回复。它具有多模式和多语言能力。在与55名参与者进行的野外部署研究中，CataractBot被证明是有价值的，提供了随时可访问性，节省了时间，适应了不同的读写水平，缓解了权力差异，并在患者和医生之间增加了隐私层。用户报告称，他们通过专家验证建立了对系统的信任。总的来说，我们的结果可以为未来设计专家介入的LLM机器人工作提供信息。

更新时间: 2024-09-16 09:22:20

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2402.04620v2

Multi-Objective Recommendation via Multivariate Policy Learning

Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users. These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness). Scalarisation methods are commonly used to handle this balancing task, where a weighted average of per-objective reward signals determines the final score used for ranking. Naturally, how these weights are computed exactly, is key to success for any online platform. We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e.g. long-term user retention or growth). We extend existing policy learning methods to the continuous multivariate action domain, proposing to maximise a pessimistic lower bound on the North Star reward that the learnt policy will yield. Typical lower bounds based on normal approximations suffer from insufficient coverage, and we propose an efficient and effective policy-dependent correction for this. We provide guidance to design stochastic data collection policies, as well as highly sensitive reward signals. Empirical observations from simulations, offline and online experiments highlight the efficacy of our deployed approach.

Updated: 2024-09-16 09:21:15

标题: 通过多变量策略学习进行多目标推荐

摘要: 实际世界中的推荐系统在决定向用户提供哪些推荐时经常需要平衡多个目标。这些包括行为信号（例如点击、分享、停留时间），以及更广泛的目标（例如多样性、公平性）。标量化方法通常用于处理这种平衡任务，其中每个目标奖励信号的加权平均确定用于排名的最终得分。自然地，这些权重的计算方式对于任何在线平台的成功至关重要。我们将这视为一个决策任务，其中标量化权重是为了最大化整体北极星奖励（例如长期用户保留或增长）而采取的行动。我们将现有的策略学习方法扩展到连续多变量行动领域，建议最大化一个关于学习策略将产生的北极星奖励的悲观下界。基于正态近似的典型下界存在覆盖不足的问题，我们提出了一个高效且有效的策略相关纠正方法。我们提供了设计随机数据收集策略以及高度敏感的奖励信号的指导。来自模拟、离线和在线实验的实证观察突显了我们采用的方法的功效。

更新时间: 2024-09-16 09:21:15

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.02141v2

Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection

Real-world tabular learning production scenarios typically involve evolving data streams, where data arrives continuously and its distribution may change over time. In such a setting, most studies in the literature regarding supervised learning favor the use of instance incremental algorithms due to their ability to adapt to changes in the data distribution. Another significant reason for choosing these algorithms is \textit{avoid storing observations in memory} as commonly done in batch incremental settings. However, the design of instance incremental algorithms often assumes immediate availability of labels, which is an optimistic assumption. In many real-world scenarios, such as fraud detection or credit scoring, labels may be delayed. Consequently, batch incremental algorithms are widely used in many real-world tasks. This raises an important question: "In delayed settings, is instance incremental learning the best option regarding predictive performance and computational efficiency?" Unfortunately, this question has not been studied in depth, probably due to the scarcity of real datasets containing delayed information. In this study, we conduct a comprehensive empirical evaluation and analysis of this question using a real-world fraud detection problem and commonly used generated datasets. Our findings indicate that instance incremental learning is not the superior option, considering on one side state-of-the-art models such as Adaptive Random Forest (ARF) and other side batch learning models such as XGBoost. Additionally, when considering the interpretability of the learning systems, batch incremental solutions tend to be favored. Code: \url{https://github.com/anselmeamekoe/DelayedLabelStream}

Updated: 2024-09-16 09:20:01

标题: 评估延迟标签环境中实例增量学习与批量学习的有效性：基于标签数据流的欺诈检测的实证研究

摘要: 真实世界的表格学习生产场景通常涉及不断发展的数据流，数据不断到达并且其分布可能随时间变化。在这种情况下，文献中大部分关于监督学习的研究倾向于使用实例增量算法，因为它们能够适应数据分布的变化。选择这些算法的另一个重要原因是\textit{避免像在批量增量设置中通常做的那样将观测结果存储在内存中}。然而，实例增量算法的设计通常假设标签立即可用，这是一个乐观的假设。在许多真实世界的情景中，比如欺诈检测或信用评分，标签可能会延迟。因此，在许多真实世界的任务中广泛使用批量增量算法。这引发了一个重要问题：“在延迟设置下，实例增量学习在预测性能和计算效率方面是否是最佳选择？”不幸的是，这个问题尚未深入研究，可能是因为缺乏包含延迟信息的真实数据集。在本研究中，我们使用一个真实世界的欺诈检测问题和常用的生成数据集，进行了全面的实证评估和分析这个问题。我们的研究结果表明，考虑到一方面最先进的模型如自适应随机森林（ARF）和另一方面批量学习模型如XGBoost，实例增量学习并不是最优选项。此外，从学习系统的可解释性方面考虑，通常更倾向于批量增量解决方案。代码：\url{https://github.com/anselmeamekoe/DelayedLabelStream}

更新时间: 2024-09-16 09:20:01

领域: cs.LG,cs.CE,cs.NE

下载: http://arxiv.org/abs/2409.10111v1

Neural Thermodynamic Integration: Free Energies from Energy-based Diffusion Models

Thermodynamic integration (TI) offers a rigorous method for estimating free-energy differences by integrating over a sequence of interpolating conformational ensembles. However, TI calculations are computationally expensive and typically limited to coupling a small number of degrees of freedom due to the need to sample numerous intermediate ensembles with sufficient conformational-space overlap. In this work, we propose to perform TI along an alchemical pathway represented by a trainable neural network, which we term Neural TI. Critically, we parametrize a time-dependent Hamiltonian interpolating between the interacting and non-interacting systems, and optimize its gradient using a score matching objective. The ability of the resulting energy-based diffusion model to sample all intermediate ensembles allows us to perform TI from a single reference calculation. We apply our method to Lennard-Jones fluids, where we report accurate calculations of the excess chemical potential, demonstrating that Neural TI reproduces the underlying changes in free energy without the need for simulations at interpolating Hamiltonians.

Updated: 2024-09-16 09:17:21

标题: 神经热力学集成：基于能量扩散模型的自由能量

摘要: 热力学积分（TI）提供了一种严格的方法，可以通过在一系列插值构象集合上进行积分来估计自由能差异。然而，TI计算在计算上是昂贵的，通常由于需要在具有足够构象空间重叠的许多中间集合中进行采样，因此通常仅限于耦合少量自由度。在这项工作中，我们提出沿着可训练神经网络表示的炼金路径执行TI，我们称之为神经TI。关键是，我们参数化一个插值在相互作用和非相互作用系统之间的时间依赖哈密顿量，并使用评分匹配目标来优化其梯度。由于最终基于能量的扩散模型能够采样所有中间集合，因此我们可以从单个参考计算中执行TI。我们将我们的方法应用于Lennard-Jones流体，在那里我们报告了对过量化学势的精确计算，证明了神经TI能够在不需要在插值哈密顿量上进行模拟的情况下再现自由能的基础变化。

更新时间: 2024-09-16 09:17:21

领域: cond-mat.stat-mech,cs.LG

下载: http://arxiv.org/abs/2406.02313v3

Analysing Attacks on Blockchain Systems in a Layer-based Approach

Blockchain is a growing decentralized system built for transparency and immutability. There have been several major attacks on blockchain-based systems, leaving a gap in the trustability of this system. This article presents a comprehensive study of 23 attacks on blockchain systems and categorizes them using a layer-based approach. This approach provides an in-depth analysis of the feasibility and motivation of these attacks. In addition, a framework is proposed that enables a systematic analysis of the impact and interconnection of these attacks, thereby providing a means of identifying potential attack vectors and designing appropriate countermeasures to strengthen any blockchain system.

Updated: 2024-09-16 09:17:18

标题: 分析区块链系统中的攻击：一种基于层的方法

摘要: 区块链是一个不断增长的去中心化系统，旨在实现透明性和不可篡改性。对基于区块链的系统发起了几次重大攻击，导致了对该系统的信任度存在一定的缺失。本文对23次对区块链系统的攻击进行了全面研究，并采用基于层次的方法对其进行分类。这种方法提供了对这些攻击的可行性和动机的深入分析。此外，提出了一个框架，使得可以系统分析这些攻击的影响和相互连接性，从而为识别潜在的攻击向量并设计适当的对策提供了一种手段，以加强任何区块链系统。

更新时间: 2024-09-16 09:17:18

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2409.10109v1

DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories

This investigation introduces a novel deep reinforcement learning-based suite to control floating platforms in both simulated and real-world environments. Floating platforms serve as versatile test-beds to emulate micro-gravity environments on Earth, useful to test autonomous navigation systems for space applications. Our approach addresses the system and environmental uncertainties in controlling such platforms by training policies capable of precise maneuvers amid dynamic and unpredictable conditions. Leveraging Deep Reinforcement Learning (DRL) techniques, our suite achieves robustness, adaptability, and good transferability from simulation to reality. Our deep reinforcement learning framework provides advantages such as fast training times, large-scale testing capabilities, rich visualization options, and ROS bindings for integration with real-world robotic systems. Being open access, our suite serves as a comprehensive platform for practitioners who want to replicate similar research in their own simulated environments and labs.

Updated: 2024-09-16 09:16:08

标题: 漂移：用于智能浮动平台轨迹的深度强化学习

摘要: 这项研究介绍了一种新颖的基于深度强化学习的套件，用于控制模拟和真实环境中的浮动平台。浮动平台可作为多功能测试平台，用于模拟地球上的微重力环境，有助于测试太空应用的自主导航系统。我们的方法通过训练能够在动态和不可预测条件下进行精确操纵的策略，来解决控制此类平台时的系统和环境不确定性。利用深度强化学习（DRL）技术，我们的套件实现了从模拟到现实的鲁棒性、适应性和良好的可转移性。我们的深度强化学习框架提供了诸如快速训练时间、大规模测试能力、丰富的可视化选项和与ROS绑定以与真实世界机器人系统集成等优势。作为开放获取，我们的套件为希望在自己的模拟环境和实验室中复制类似研究的从业者提供了一个全面的平台。

更新时间: 2024-09-16 09:16:08

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2310.04266v2

Industry 6.0: New Generation of Industry driven by Generative AI and Swarm of Heterogeneous Robots

This paper presents the concept of Industry 6.0, introducing the world's first fully automated production system that autonomously handles the entire product design and manufacturing process based on user-provided natural language descriptions. By leveraging generative AI, the system automates critical aspects of production, including product blueprint design, component manufacturing, logistics, and assembly. A heterogeneous swarm of robots, each equipped with individual AI through integration with Large Language Models (LLMs), orchestrates the production process. The robotic system includes manipulator arms, delivery drones, and 3D printers capable of generating assembly blueprints. The system was evaluated using commercial and open-source LLMs, functioning through APIs and local deployment. A user study demonstrated that the system reduces the average production time to 119.10 minutes, significantly outperforming a team of expert human developers, who averaged 528.64 minutes (an improvement factor of 4.4). Furthermore, in the product blueprinting stage, the system surpassed human CAD operators by an unprecedented factor of 47, completing the task in 0.5 minutes compared to 23.5 minutes. This breakthrough represents a major leap towards fully autonomous manufacturing.

Updated: 2024-09-16 09:12:06

标题: Industry 6.0：由生成式人工智能和异构机器人群驱动的新一代工业

摘要: 本文介绍了工业6.0的概念，引入世界上第一个完全自动化的生产系统，该系统基于用户提供的自然语言描述，自主处理整个产品设计和制造过程。通过利用生成式人工智能，该系统自动化了生产的关键方面，包括产品蓝图设计、零部件制造、物流和装配。一群异构的机器人，每个机器人通过与大型语言模型（LLMs）集成的个体人工智能，协调生产过程。机器人系统包括操作臂、送货无人机和能够生成装配蓝图的3D打印机。该系统通过商业和开源LLMs进行评估，通过API和本地部署进行操作。一项用户研究表明，该系统将平均生产时间缩短至119.10分钟，明显优于平均528.64分钟的专家人类开发人员团队（改进因子为4.4）。此外，在产品蓝图阶段，该系统以前所未有的47倍之巨超越了人类CAD操作员，完成任务仅需0.5分钟，而人类需要23.5分钟。这一突破代表了向完全自主制造的重大飞跃。

更新时间: 2024-09-16 09:12:06

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.10106v1

Rethinking Spectral Graph Neural Networks with Spatially Adaptive Filtering

Whilst spectral Graph Neural Networks (GNNs) are theoretically well-founded in the spectral domain, their practical reliance on polynomial approximation implies a profound linkage to the spatial domain. As previous studies rarely examine spectral GNNs from the spatial perspective, their spatial-domain interpretability remains elusive, e.g., what information is essentially encoded by spectral GNNs in the spatial domain? In this paper, to answer this question, we establish a theoretical connection between spectral filtering and spatial aggregation, unveiling an intrinsic interaction that spectral filtering implicitly leads the original graph to an adapted new graph, explicitly computed for spatial aggregation. Both theoretical and empirical investigations reveal that the adapted new graph not only exhibits non-locality but also accommodates signed edge weights to reflect label consistency among nodes. These findings thus highlight the interpretable role of spectral GNNs in the spatial domain and inspire us to rethink graph spectral filters beyond the fixed-order polynomials, which neglect global information. Built upon the theoretical findings, we revisit the state-of-the-art spectral GNNs and propose a novel Spatially Adaptive Filtering (SAF) framework, which leverages the adapted new graph by spectral filtering for an auxiliary non-local aggregation. Notably, our proposed SAF comprehensively models both node similarity and dissimilarity from a global perspective, therefore alleviating persistent deficiencies of GNNs related to long-range dependencies and graph heterophily. Extensive experiments over 13 node classification benchmarks demonstrate the superiority of our proposed framework to the state-of-the-art models.

Updated: 2024-09-16 09:09:34

标题: 重新思考具有空间自适应滤波器的谱图神经网络

摘要: 尽管谱图神经网络（GNNs）在谱域上在理论上有坚实的基础，但它们在实际中依赖于多项式逼近，暗示了与空间域的深刻联系。由于先前的研究很少从空间角度考虑谱GNNs，它们在空间域的可解释性仍然难以捉摸，例如，谱GNNs在空间域中基本上编码了什么信息？在本文中，为了回答这个问题，我们建立了谱滤波和空间聚合之间的理论联系，揭示了谱滤波隐含地将原始图导向一个为空间聚合明确计算的适应新图的内在交互作用。理论和实证研究均表明，适应新图不仅表现出非局部性，而且还容纳了带符号的边权重以反映节点之间的标签一致性。因此，这些发现突显了谱GNNs在空间域中的可解释角色，并激发我们重新思考超出固定阶多项式的图谱滤波器，这些滤波器忽略了全局信息。基于理论发现，我们重新审视了最先进的谱GNNs，并提出了一个新颖的空间自适应滤波（SAF）框架，该框架利用通过谱滤波获得的适应新图进行辅助的非局部聚合。值得注意的是，我们提出的SAF从全局视角全面建模了节点的相似性和不相似性，因此缓解了GNNs与长距离依赖和图异质性相关的持续缺陷。在13个节点分类基准测试中进行的大量实验表明，我们提出的框架优于最先进的模型。

更新时间: 2024-09-16 09:09:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.09071v5

A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying

In the realm of industrial manufacturing, Artificial Intelligence (AI) is playing an increasing role, from automating existing processes to aiding in the development of new materials and techniques. However, a significant challenge arises in smaller, experimental processes characterized by limited training data availability, questioning the possibility to train AI models in such small data contexts. In this work, we explore the potential of Transfer Learning to address this challenge, specifically investigating the minimum amount of data required to develop a functional AI model. For this purpose, we consider the use case of quality control of Carbon Fiber Reinforced Polymer (CFRP) tape laying in aerospace manufacturing using optical sensors. We investigate the behavior of different open-source computer vision models with a continuous reduction of the training data. Our results show that the amount of data required to successfully train an AI model can be drastically reduced, and the use of smaller models does not necessarily lead to a loss of performance.

Updated: 2024-09-16 09:07:31

标题: 一个应用于小数据的开源计算机视觉模型的比较研究：以CFRP胶带铺设为例

摘要: 在工业制造领域，人工智能(AI)的作用日益增强，从自动化现有流程到辅助开发新材料和技术。然而，在训练数据有限的小型实验过程中出现了一个重要挑战，这种情况质疑了在这种小数据环境中训练AI模型的可能性。在这项工作中，我们探讨了迁移学习解决这一挑战的潜力，具体调查了开发功能性AI模型所需的最小数据量。为此，我们考虑了在航空制造中使用光学传感器进行碳纤维增强聚合物(CFRP)胶带铺设的质量控制的应用案例。我们研究了不同开源计算机视觉模型在不断减少训练数据的情况下的行为。我们的结果表明，成功训练AI模型所需的数据量可以大幅减少，而使用较小的模型并不一定会导致性能损失。

更新时间: 2024-09-16 09:07:31

领域: cs.CV,cs.LG,68T05, 93A30, 74E30,I.2.6; I.4.8; J.2

下载: http://arxiv.org/abs/2409.10104v1

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs). While much of the current research in this field focuses on performance optimization, particularly in terms of accuracy and efficiency, the trustworthiness of RAG systems remains an area still under exploration. From a positive perspective, RAG systems are promising to enhance LLMs by providing them with useful and up-to-date knowledge from vast external databases, thereby mitigating the long-standing problem of hallucination. While from a negative perspective, RAG systems are at the risk of generating undesirable contents if the retrieved information is either inappropriate or poorly utilized. To address these concerns, we propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy. Within this framework, we thoroughly review the existing literature on each dimension. Additionally, we create the evaluation benchmark regarding the six dimensions and conduct comprehensive evaluations for a variety of proprietary and open-source models. Finally, we identify the potential challenges for future research based on our investigation results. Through this work, we aim to lay a structured foundation for future investigations and provide practical insights for enhancing the trustworthiness of RAG systems in real-world applications.

Updated: 2024-09-16 09:06:44

标题: 检索增强生成系统中的可信度：一项调查

摘要: 检索增强生成（RAG）已迅速发展成为大型语言模型（LLM）发展中的一个关键范例。尽管当前在这一领域的许多研究集中在性能优化方面，特别是在准确性和效率方面，但RAG系统的可靠性仍然是一个尚待探索的领域。从积极的角度看，RAG系统有望通过从庞大的外部数据库中提供有用和最新知识来增强LLM，从而减轻长期存在的幻觉问题。然而，从消极的角度看，如果检索到的信息不当或被错误利用，RAG系统有生成不良内容的风险。为了解决这些问题，我们提出了一个统一的框架，评估RAG系统在六个关键维度上的可信度：事实性、稳健性、公平性、透明度、问责制和隐私性。在这个框架内，我们对每个维度的现有文献进行了彻底审查。此外，我们创建了关于这六个维度的评估基准，并对各种专有和开源模型进行了全面评估。最后，我们根据调查结果确定了未来研究可能面临的挑战。通过这项工作，我们旨在为未来的调查奠定结构化基础，并为增强RAG系统在实际应用中的可信度提供实用见解。

更新时间: 2024-09-16 09:06:44

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.10102v1

Motion Forecasting via Model-Based Risk Minimization

Forecasting the future trajectories of surrounding agents is crucial for autonomous vehicles to ensure safe, efficient, and comfortable route planning. While model ensembling has improved prediction accuracy in various fields, its application in trajectory prediction is limited due to the multi-modal nature of predictions. In this paper, we propose a novel sampling method applicable to trajectory prediction based on the predictions of multiple models. We first show that conventional sampling based on predicted probabilities can degrade performance due to missing alignment between models. To address this problem, we introduce a new method that generates optimal trajectories from a set of neural networks, framing it as a risk minimization problem with a variable loss function. By using state-of-the-art models as base learners, our approach constructs diverse and effective ensembles for optimal trajectory sampling. Extensive experiments on the nuScenes prediction dataset demonstrate that our method surpasses current state-of-the-art techniques, achieving top ranks on the leaderboard. We also provide a comprehensive empirical study on ensembling strategies, offering insights into their effectiveness. Our findings highlight the potential of advanced ensembling techniques in trajectory prediction, significantly improving predictive performance and paving the way for more reliable predicted trajectories.

Updated: 2024-09-16 09:03:28

标题: 基于模型的风险最小化的运动预测

摘要: 预测周围代理的未来轨迹对于自动驾驶车辆确保安全、高效和舒适的路径规划至关重要。虽然模型集成已经提高了各个领域的预测准确性，但在轨迹预测中的应用受限于预测的多模态性质。本文提出了一种新的采样方法，适用于基于多个模型的轨迹预测。我们首先展示了基于预测概率的传统采样可能会降低性能，因为模型之间存在缺乏对齐的问题。为了解决这个问题，我们引入了一种新方法，从一组神经网络中生成最佳轨迹，将其作为一个具有可变损失函数的风险最小化问题。通过使用最先进的模型作为基础学习者，我们的方法构建了多样化且有效的集成，用于最佳轨迹采样。对nuScenes预测数据集的大量实验证明了我们的方法超越了当前最先进的技术，在排行榜上取得了最高排名。我们还提供了关于集成策略的全面实证研究，为它们的有效性提供了见解。我们的发现突显了先进的集成技术在轨迹预测中的潜力，显著提高了预测性能，并为更可靠的预测轨迹铺平了道路。

更新时间: 2024-09-16 09:03:28

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2409.10585v1

Robust Reinforcement Learning with Dynamic Distortion Risk Measures

In a reinforcement learning (RL) setting, the agent's optimal strategy heavily depends on her risk preferences and the underlying model dynamics of the training environment. These two aspects influence the agent's ability to make well-informed and time-consistent decisions when facing testing environments. In this work, we devise a framework to solve robust risk-aware RL problems where we simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures. Robustness is introduced by considering all models within a Wasserstein ball around a reference model. We estimate such dynamic robust risk measures using neural networks by making use of strictly consistent scoring functions, derive policy gradient formulae using the quantile representation of distortion risk measures, and construct an actor-critic algorithm to solve this class of robust risk-aware RL problems. We demonstrate the performance of our algorithm on a portfolio allocation example.

Updated: 2024-09-16 08:54:59

标题: 具有动态畸变风险度量的稳健强化学习

摘要: 在强化学习（RL）环境中，代理人的最优策略严重依赖于她的风险偏好和训练环境的基础模型动态。这两个方面影响代理人在面对测试环境时做出明智和时间一致的决策的能力。在这项工作中，我们设计了一个框架来解决强化风险感知RL问题，同时考虑环境不确定性和风险，使用一类动态稳健扭曲风险度量。通过考虑在参考模型周围的Wasserstein球内的所有模型引入了稳健性。我们利用严格一致的评分函数使用神经网络来估计这种动态稳健风险度量，使用扭曲风险度量的分位数表示推导政策梯度公式，并构建一个演员-评论家算法来解决这类稳健风险感知RL问题。我们在一个投资组合分配示例上展示了我们算法的性能。

更新时间: 2024-09-16 08:54:59

领域: cs.LG,q-fin.CP,q-fin.PM,q-fin.RM,stat.ML,68T37, 91G70, 93E25

下载: http://arxiv.org/abs/2409.10096v1

DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

Out-of-Distribution (OoD) detection determines whether the given samples are from the training distribution of the classifier-under-protection, i.e., the In-Distribution (InD), or from a different OoD. Latest researches introduce diffusion models pre-trained on InD data to advocate OoD detection by transferring an OoD image into a generated one that is close to InD, so that one could capture the distribution disparities between original and generated images to detect OoD data. Existing diffusion-based detectors adopt perceptual metrics on the two images to measure such disparities, but ignore a fundamental fact: Perceptual metrics are devised essentially for human-perceived similarities of low-level image patterns, e.g., textures and colors, and are not advisable in evaluating distribution disparities, since images with different low-level patterns could possibly come from the same distribution. To address this issue, we formulate a diffusion-based detection framework that considers the distribution similarity between a tested image and its generated counterpart via a novel proper similarity metric in the informative feature space and probability space learned by the classifier-under-protection. An anomaly-removal strategy is further presented to enlarge such distribution disparities by removing abnormal OoD information in the feature space to facilitate the detection. Extensive empirical results unveil the insufficiency of perceptual metrics and the effectiveness of our distribution similarity framework with new state-of-the-art detection performance.

Updated: 2024-09-16 08:50:47

标题: DDoS: 用于检测超出分布的扩散分布相似性

摘要: Out-of-Distribution (OoD)检测确定给定样本是否来自受保护分类器的训练分布，即内部分布（InD），还是来自不同的OoD。最新研究引入了在InD数据上预训练的扩散模型，通过将一个OoD图像转换成一个接近InD的生成图像来提倡OoD检测，以便捕捉原始和生成图像之间的分布差异以检测OoD数据。现有基于扩散的检测器采用感知度量在两个图像上测量这种差异，但忽视了一个基本事实：感知度量基本上是为了人类感知到的低级图像模式的相似性而设计的，例如纹理和颜色，并且不建议用于评估分布差异，因为具有不同低级模式的图像可能来自同一分布。为了解决这个问题，我们提出了一个考虑测试图像和其生成对应物之间的分布相似性的扩散检测框架，通过分类器学习的信息特征空间和概率空间中的新型适当相似度度量。进一步提出了一种异常移除策略，通过在特征空间中删除异常的OoD信息来扩大这种分布差异以促进检测。大量的经验结果揭示了感知度量的不足以及我们的分布相似性框架具有新的最先进检测性能的有效性。

更新时间: 2024-09-16 08:50:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.10094v1

Mitigating analytical variability in fMRI results with style transfer

We propose a novel approach to improve the reproducibility of neuroimaging results by converting statistic maps across different functional MRI pipelines. We make the assumption that pipelines used to compute fMRI statistic maps can be considered as a style component and we propose to use different generative models, among which, Generative Adversarial Networks (GAN) and Diffusion Models (DM) to convert statistic maps across different pipelines. We explore the performance of multiple GAN frameworks, and design a new DM framework for unsupervised multi-domain styletransfer. We constrain the generation of 3D fMRI statistic maps using the latent space of an auxiliary classifier that distinguishes statistic maps from different pipelines and extend traditional sampling techniques used in DM to improve the transition performance. Our experiments demonstrate that our proposed methods aresuccessful: pipelines can indeed be transferred as a style component, providing animportant source of data augmentation for future medical studies.

Updated: 2024-09-16 08:43:13

标题: 使用风格转移技术减少fMRI结果中的分析变异性

摘要: 我们提出了一种新方法，通过将不同功能磁共振成像管道中的统计图转换来改善神经影像学结果的可重复性。我们假设用于计算fMRI统计图的管道可以被视为样式组件，并建议使用不同的生成模型，其中包括生成对抗网络（GAN）和扩散模型（DM）来转换不同管道中的统计图。我们探索了多个GAN框架的性能，并设计了一个新的DM框架用于无监督的多域样式转移。我们通过使用辅助分类器的潜在空间来限制3D fMRI统计图的生成，该分类器区分不同管道的统计图，并扩展了DM中使用的传统采样技术以改善转换性能。我们的实验表明，我们提出的方法是成功的：管道确实可以作为样式组件转移，为未来医学研究提供了重要的数据增强来源。

更新时间: 2024-09-16 08:43:13

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.03703v2

A Riemannian Approach to Ground Metric Learning for Optimal Transport

Optimal transport (OT) theory has attracted much attention in machine learning and signal processing applications. OT defines a notion of distance between probability distributions of source and target data points. A crucial factor that influences OT-based distances is the ground metric of the embedding space in which the source and target data points lie. In this work, we propose to learn a suitable latent ground metric parameterized by a symmetric positive definite matrix. We use the rich Riemannian geometry of symmetric positive definite matrices to jointly learn the OT distance along with the ground metric. Empirical results illustrate the efficacy of the learned metric in OT-based domain adaptation.

Updated: 2024-09-16 08:42:56

标题: 一种利用黎曼方法进行地面度量学习以实现最优输运的方法

摘要: Optimal transport（OT）理论在机器学习和信号处理应用中引起了广泛关注。OT定义了源数据点和目标数据点的概率分布之间的距离概念。影响基于OT的距离的一个关键因素是源数据点和目标数据点所处的嵌入空间的地面度量。在这项工作中，我们提出了学习一个适当的潜在地面度量，其参数化为对称正定矩阵。我们利用对称正定矩阵的丰富黎曼几何来共同学习OT距离和地面度量。实证结果展示了学习度量在基于OT的领域适应中的有效性。

更新时间: 2024-09-16 08:42:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.10085v1

Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design

Artificial intelligence models have shown great potential in structure-based drug design, generating ligands with high binding affinities. However, existing models have often overlooked a crucial physical constraint: atoms must maintain a minimum pairwise distance to avoid separation violation, a phenomenon governed by the balance of attractive and repulsive forces. To mitigate such separation violations, we propose NucleusDiff. It models the interactions between atomic nuclei and their surrounding electron clouds by enforcing the distance constraint between the nuclei and manifolds. We quantitatively evaluate NucleusDiff using the CrossDocked2020 dataset and a COVID-19 therapeutic target, demonstrating that NucleusDiff reduces violation rate by up to 100.00% and enhances binding affinity by up to 22.16%, surpassing state-of-the-art models for structure-based drug design. We also provide qualitative analysis through manifold sampling, visually confirming the effectiveness of NucleusDiff in reducing separation violations and improving binding affinities.

Updated: 2024-09-16 08:42:46

标题: 基于结构的药物设计的流形约束核级去噪扩散模型

摘要: 人工智能模型在基于结构的药物设计中显示出巨大潜力，能够生成具有高结合亲和力的配体。然而，现有模型往往忽视了一个关键的物理约束：原子必须保持最小的两两间距，以避免分离违规，这一现象受到吸引力和斥力平衡的影响。为了减轻这种分离违规，我们提出了NucleusDiff。它通过强制实施原子核和其周围电子云之间的距离约束来模拟它们之间的相互作用。我们使用CrossDocked2020数据集和COVID-19治疗靶标对NucleusDiff进行定量评估，结果显示NucleusDiff可以将违规率降低高达100.00%，结合亲和力提高高达22.16%，超越了基于结构的药物设计的最新模型。我们还通过流形采样提供定性分析，通过视觉确认NucleusDiff在减少分离违规和提高结合亲和力方面的有效性。

更新时间: 2024-09-16 08:42:46

领域: q-bio.QM,cs.AI,cs.LG,q-bio.BM,stat.ML

下载: http://arxiv.org/abs/2409.10584v1

DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

Multi-modality image fusion aims to integrate complementary data information from different imaging modalities into a single image. Existing methods often generate either blurry fused images that lose fine-grained semantic information or unnatural fused images that appear perceptually cropped from the inputs. In this work, we propose a novel two-phase discriminative autoencoder framework, termed DAE-Fuse, that generates sharp and natural fused images. In the adversarial feature extraction phase, we introduce two discriminative blocks into the encoder-decoder architecture, providing an additional adversarial loss to better guide feature extraction by reconstructing the source images. While the two discriminative blocks are adapted in the attention-guided cross-modality fusion phase to distinguish the structural differences between the fused output and the source inputs, injecting more naturalness into the results. Extensive experiments on public infrared-visible, medical image fusion, and downstream object detection datasets demonstrate our method's superiority and generalizability in both quantitative and qualitative evaluations.

Updated: 2024-09-16 08:37:09

标题: DAE-Fuse: 用于多模态图像融合的自适应判别式自编码器

摘要: 多模态图像融合旨在将来自不同成像模态的互补数据信息整合到一幅图像中。现有方法通常会生成模糊的融合图像，从而丢失细粒度的语义信息，或者生成外观感知上被截取的不自然的融合图像。在这项工作中，我们提出了一个新颖的两阶段判别自动编码器框架，命名为DAE-Fuse，可以生成清晰自然的融合图像。在对抗性特征提取阶段，我们在编码器-解码器架构中引入了两个判别块，提供额外的对抗性损失，以通过重建源图像更好地指导特征提取。而这两个判别块在注意力引导的跨模态融合阶段中被调整，以区分融合输出与源输入之间的结构差异，从而使结果更加自然。对公共红外-可见光、医学图像融合和下游目标检测数据集进行的大量实验证明了我们的方法在定量和定性评估中的优越性和普适性。

更新时间: 2024-09-16 08:37:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.10080v1

Large language models and linguistic intentionality

Do large language models like Chat-GPT or LLaMa meaningfully use the words they produce? Or are they merely clever prediction machines, simulating language use by producing statistically plausible text? There have already been some initial attempts to answer this question by showing that these models meet the criteria for entering meaningful states according to metasemantic theories of mental content. In this paper, I will argue for a different approach - that we should instead consider whether language models meet the criteria given by our best metasemantic theories of linguistic content. In that vein, I will illustrate how this can be done by applying two such theories to the case of language models: Gareth Evans' (1982) account of naming practices and Ruth Millikan's (1984, 2004, 2005) teleosemantics. In doing so, I will argue that it is a mistake to think that the failure of LLMs to meet plausible conditions for mental intentionality thereby renders their outputs meaningless, and that a distinguishing feature of linguistic intentionality - dependency on a pre-existing linguistic system - allows for the plausible result LLM outputs are meaningful.

Updated: 2024-09-16 08:35:51

标题: 大型语言模型和语言意图

摘要: 像Chat-GPT或LLaMa这样的大型语言模型是否有意义地使用它们生成的词语？或者它们只是聪明的预测机器，通过生成统计上合理的文本来模拟语言使用？已经有一些初步尝试回答这个问题，通过展示这些模型符合元语义心智内容理论的进入有意义状态的标准。在本文中，我将提出一种不同的方法 - 我们应该考虑语言模型是否符合我们最好的语言内容的元语义理论所给出的标准。在这方面，我将说明如何通过将两种这样的理论应用于语言模型的情况来实现：加雷斯·埃文斯（1982）关于命名实践和Ruth Millikan（1984，2004，2005）的目的语义学。这样做，我将论证认为认为LLMs未能满足心智意图的合理条件从而使得它们的输出毫无意义是错误的，并且语言意图的一个区别特征 - 依赖于一个现有的语言系统 - 允许得出合理的结论，即LLM的输出是有意义的。

更新时间: 2024-09-16 08:35:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.09576v2

LLM-DER:A Named Entity Recognition Method Based on Large Language Models for Chinese Coal Chemical Domain

Domain-specific Named Entity Recognition (NER), whose goal is to recognize domain-specific entities and their categories, provides an important support for constructing domain knowledge graphs. Currently, deep learning-based methods are widely used and effective in NER tasks, but due to the reliance on large-scale labeled data. As a result, the scarcity of labeled data in a specific domain will limit its application.Therefore, many researches started to introduce few-shot methods and achieved some results. However, the entity structures in specific domains are often complex, and the current few-shot methods are difficult to adapt to NER tasks with complex features.Taking the Chinese coal chemical industry domain as an example,there exists a complex structure of multiple entities sharing a single entity, as well as multiple relationships for the same pair of entities, which affects the NER task under the sample less condition.In this paper, we propose a Large Language Models (LLMs)-based entity recognition framework LLM-DER for the domain-specific entity recognition problem in Chinese, which enriches the entity information by generating a list of relationships containing entity types through LLMs, and designing a plausibility and consistency evaluation method to remove misrecognized entities, which can effectively solve the complex structural entity recognition problem in a specific domain.The experimental results of this paper on the Resume dataset and the self-constructed coal chemical dataset Coal show that LLM-DER performs outstandingly in domain-specific entity recognition, not only outperforming the existing GPT-3.5-turbo baseline, but also exceeding the fully-supervised baseline, verifying its effectiveness in entity recognition.

Updated: 2024-09-16 08:28:05

标题: LLM-DER：一种基于大型语言模型的中文煤化工领域命名实体识别方法

摘要: 特定领域命名实体识别（NER）的目标是识别特定领域的实体及其类别，为构建领域知识图提供重要支持。目前，基于深度学习的方法在NER任务中被广泛使用且有效，但由于依赖于大规模标记数据。因此，在特定领域中标记数据稀缺将限制其应用。因此，许多研究开始引入少样本方法并取得了一些成果。然而，特定领域中的实体结构通常复杂，并且当前的少样本方法难以适应具有复杂特征的NER任务。以中国煤化工行业领域为例，存在多个实体共享一个实体的复杂结构，以及同一对实体的多个关系，这影响了在样本较少条件下的NER任务。本文提出了一种基于大型语言模型（LLMs）的实体识别框架LLM-DER，用于解决中文领域特定实体识别问题，通过LLMs生成包含实体类型的关系列表来丰富实体信息，并设计了一种合理性和一致性评估方法来消除误识别的实体，从而有效解决特定领域中复杂结构实体识别问题。本文在简历数据集和自建煤化学数据集Coal上的实验结果表明，LLM-DER在领域特定实体识别中表现出色，不仅优于现有的GPT-3.5-turbo基线，而且超过了完全监督的基线，验证了其在实体识别中的有效性。

更新时间: 2024-09-16 08:28:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10077v1

Steinmetz Neural Networks for Complex-Valued Data

In this work, we introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valued subnetworks with coupled outputs. Our proposed class of architectures, referred to as Steinmetz Neural Networks, leverages multi-view learning to construct more interpretable representations within the latent space. Subsequently, we present the Analytic Neural Network, which implements a consistency penalty that encourages analytic signal representations in the Steinmetz neural network's latent space. This penalty enforces a deterministic and orthogonal relationship between the real and imaginary components. Utilizing an information-theoretic construction, we demonstrate that the upper bound on the generalization error posited by the analytic neural network is lower than that of the general class of Steinmetz neural networks. Our numerical experiments demonstrate the improved performance and robustness to additive noise, afforded by our proposed networks on benchmark datasets and synthetic examples.

Updated: 2024-09-16 08:26:06

标题: 斯坦梅茨神经网络用于复值数据

摘要: 在这项工作中，我们介绍了一种处理复数数据的新方法，该方法使用由耦合输出的并行实数子网络组成的DNN。我们提出的这类架构被称为斯坦梅茨神经网络，利用多视图学习在潜在空间中构建更可解释的表示。随后，我们提出了分析神经网络，它实现了一种一致性惩罚，鼓励斯坦梅茨神经网络潜在空间中的分析信号表示。这种惩罚强化了实部和虚部之间的确定性和正交关系。利用信息论构造，我们证明了分析神经网络对潜在斯坦梅茨神经网络的泛化误差的上界低于斯坦梅茨神经网络的一般类别。我们的数值实验展示了我们提出的网络在基准数据集和合成示例上所具有的改进性能和对加性噪声的稳健性。

更新时间: 2024-09-16 08:26:06

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2409.10075v1

Privacy-Preserving Federated Learning with Consistency via Knowledge Distillation Using Conditional Generator

Federated Learning (FL) is gaining popularity as a distributed learning framework that only shares model parameters or gradient updates and keeps private data locally. However, FL is at risk of privacy leakage caused by privacy inference attacks. And most existing privacy-preserving mechanisms in FL conflict with achieving high performance and efficiency. Therefore, we propose FedMD-CG, a novel FL method with highly competitive performance and high-level privacy preservation, which decouples each client's local model into a feature extractor and a classifier, and utilizes a conditional generator instead of the feature extractor to perform server-side model aggregation. To ensure the consistency of local generators and classifiers, FedMD-CG leverages knowledge distillation to train local models and generators at both the latent feature level and the logit level. Also, we construct additional classification losses and design new diversity losses to enhance client-side training. FedMD-CG is robust to data heterogeneity and does not require training extra discriminators (like cGAN). We conduct extensive experiments on various image classification tasks to validate the superiority of FedMD-CG.

Updated: 2024-09-16 08:23:09

标题: 隐私保护的联邦学习：通过使用条件生成器进行知识蒸馏以保持一致性

摘要: 联邦学习（FL）作为一种分布式学习框架，仅共享模型参数或梯度更新，并在本地保留私有数据而日益受到欢迎。然而，FL存在隐私泄露风险，因为隐私推断攻击。大多数现有的FL隐私保护机制与实现高性能和效率存在冲突。因此，我们提出了FedMD-CG，一种具有高竞争性性能和高级隐私保护的新型FL方法，将每个客户端的本地模型分解为特征提取器和分类器，并利用条件生成器代替特征提取器执行服务器端模型聚合。为确保本地生成器和分类器的一致性，FedMD-CG利用知识蒸馏在潜在特征水平和逻辑水平训练本地模型和生成器。此外，我们构建额外的分类损失并设计新的多样性损失以增强客户端训练。FedMD-CG对数据异质性具有鲁棒性，不需要训练额外的鉴别器（如cGAN）。我们进行了大量的实验，验证了FedMD-CG的优越性能在各种图像分类任务上。

更新时间: 2024-09-16 08:23:09

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2409.06955v2

LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling

Learning compact and meaningful latent space representations has been shown to be very useful in generative modeling tasks for visual data. One particular example is applying Vector Quantization (VQ) in variational autoencoders (VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance in many modern generative modeling applications. Quantizing the latent space has been justified by the assumption that the data themselves are inherently discrete in the latent space (like pixel values). In this paper, we propose an alternative representation of the latent space by relaxing the structural assumption than the VQ formulation. Specifically, we assume that the latent space can be approximated by a union of subspaces model corresponding to a dictionary-based representation under a sparsity constraint. The dictionary is learned/updated during the training process. We apply this approach to look at two models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs with Generative Adversarial Networks (DL-GANs). We show empirically that our more latent space is more expressive and has leads to better representations than the VQ approach in terms of reconstruction quality at the expense of a small computational overhead for the latent space computation. Our results thus suggest that the true benefit of the VQ approach might not be from discretization of the latent space, but rather the lossy compression of the latent space. We confirm this hypothesis by showing that our sparse representations also address the codebook collapse issue as found common in VQ-family models.

Updated: 2024-09-16 08:20:58

标题: 激光：用于生成建模的稀疏表示的潜空间编码

摘要: 学习紧凑且有意义的潜在空间表示已经被证明在视觉数据的生成建模任务中非常有用。一个特别的例子是将向量量化（VQ）应用于变分自动编码器（VQ-VAE、VQ-GAN等），在许多现代生成建模应用中展现出最先进的性能。对潜在空间进行量化是基于这样一个假设：数据本身在潜在空间中是离散的（如像素值）。在本文中，我们提出了一种潜在空间的另类表示，通过放松VQ公式的结构假设。具体来说，我们假设潜在空间可以通过在稀疏约束下对应于基于字典的子空间模型的并集来近似表示。字典在训练过程中进行学习/更新。我们将这种方法应用于两个模型：字典学习变分自动编码器（DL-VAEs）和带有生成对抗网络（DL-GANs）的DL-VAEs。我们实证表明，我们更多样的潜在空间在重建质量方面比VQ方法更具表现力，并且在潜在空间计算方面的小型计算开销下导致更好的表示。因此，我们的结果表明VQ方法的真正好处可能不是来自潜在空间的离散化，而是来自潜在空间的有损压缩。我们通过展示我们的稀疏表示也解决了在VQ系列模型中普遍发现的码本崩溃问题，从而证实了这一假设。

更新时间: 2024-09-16 08:20:58

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2409.11184v1

DFDG: Data-Free Dual-Generator Adversarial Distillation for One-Shot Federated Learning

Federated Learning (FL) is a distributed machine learning scheme in which clients jointly participate in the collaborative training of a global model by sharing model information rather than their private datasets. In light of concerns associated with communication and privacy, one-shot FL with a single communication round has emerged as a de facto promising solution. However, existing one-shot FL methods either require public datasets, focus on model homogeneous settings, or distill limited knowledge from local models, making it difficult or even impractical to train a robust global model. To address these limitations, we propose a new data-free dual-generator adversarial distillation method (namely DFDG) for one-shot FL, which can explore a broader local models' training space via training dual generators. DFDG is executed in an adversarial manner and comprises two parts: dual-generator training and dual-model distillation. In dual-generator training, we delve into each generator concerning fidelity, transferability and diversity to ensure its utility, and additionally tailor the cross-divergence loss to lessen the overlap of dual generators' output spaces. In dual-model distillation, the trained dual generators work together to provide the training data for updates of the global model. At last, our extensive experiments on various image classification tasks show that DFDG achieves significant performance gains in accuracy compared to SOTA baselines.

Updated: 2024-09-16 08:18:59

标题: DFDG：无数据双生成器对抗蒸馏用于一次性联邦学习

摘要: 联邦学习（FL）是一种分布式机器学习方案，其中客户端通过共享模型信息而不是他们的私人数据集共同参与全局模型的协作训练。考虑到与通信和隐私相关的问题，单次通信往往被视为一种有前途的解决方案。然而，现有的单次FL方法要么需要公共数据集，要么专注于模型同质设置，要么从本地模型中提取有限的知识，这使得训练一个强大的全局模型变得困难甚至不切实际。为了解决这些局限性，我们提出了一种新的无数据双生成器对抗蒸馏方法（即DFDG）用于单次FL，它可以通过训练双生成器来探索更广泛的本地模型训练空间。DFDG以对抗的方式执行，并包括两部分：双生成器训练和双模型蒸馏。在双生成器训练中，我们深入研究每个生成器的忠实度、可转移性和多样性，以确保其实用性，并另外定制交叉发散损失以减少双生成器输出空间的重叠。在双模型蒸馏中，训练好的双生成器一起提供全局模型更新的训练数据。最后，我们在各种图像分类任务上进行了广泛的实验，结果显示DFDG在准确性方面相对于SOTA基准线获得了显著的性能提升。

更新时间: 2024-09-16 08:18:59

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2409.07734v2

Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks

Dialogue summarization aims to provide a concise and coherent summary of conversations between multiple speakers. While recent advancements in language models have enhanced this process, summarizing dialogues accurately and faithfully remains challenging due to the need to understand speaker interactions and capture relevant information. Indeed, abstractive models used for dialog summarization may generate summaries that contain inconsistencies. We suggest using the semantic information proposed for performing Spoken Language Understanding (SLU) in human-machine dialogue systems for goal-oriented human-human dialogues to obtain a more semantically faithful summary regarding the task. This study introduces three key contributions: First, we propose an exploration of how incorporating task-related information can enhance the summarization process, leading to more semantically accurate summaries. Then, we introduce a new evaluation criterion based on task semantics. Finally, we propose a new dataset version with increased annotated data standardized for research on task-oriented dialogue summarization. The study evaluates these methods using the DECODA corpus, a collection of French spoken dialogues from a call center. Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.

Updated: 2024-09-16 08:15:35

标题: 通过口语理解任务提高人类对话摘要的忠实度

摘要: 对话摘要旨在提供多个发言人之间对话的简洁而连贯的总结。尽管最近语言模型的进步增强了这一过程，但由于需要理解发言人之间的互动并捕捉相关信息，准确和忠实地总结对话仍然具有挑战性。事实上，用于对话摘要的抽象模型可能生成包含不一致性的总结。我们建议使用为在人机对话系统中执行口语理解（SLU）提出的语义信息来进行目标导向的人际对话，以获得更加语义忠实的任务总结。本研究提出了三个关键贡献：首先，我们提出了如何将与任务相关的信息纳入总结过程中，从而导致更加语义准确的总结。然后，我们引入了一个基于任务语义的新评估标准。最后，我们提出了一个新的数据集版本，增加了标注数据，用于研究面向任务的对话摘要。该研究使用DECODA语料库对这些方法进行评估，该语料库包含来自呼叫中心的法语口语对话。结果显示，集成具有与任务相关信息的模型可以提高总结的准确性，即使单词错误率有所变化。

更新时间: 2024-09-16 08:15:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10070v1

Enhancing Anomaly Detection via Generating Diversified and Hard-to-distinguish Synthetic Anomalies

Unsupervised anomaly detection is a daunting task, as it relies solely on normality patterns from the training data to identify unseen anomalies during testing. Recent approaches have focused on leveraging domain-specific transformations or perturbations to generate synthetic anomalies from normal samples. The objective here is to acquire insights into normality patterns by learning to differentiate between normal samples and these crafted anomalies. However, these approaches often encounter limitations when domain-specific transformations are not well-specified such as in tabular data, or when it becomes trivial to distinguish between them. To address these issues, we introduce a novel domain-agnostic method that employs a set of conditional perturbators and a discriminator. The perturbators are trained to generate input-dependent perturbations, which are subsequently utilized to construct synthetic anomalies, and the discriminator is trained to distinguish normal samples from them. We ensure that the generated anomalies are both diverse and hard to distinguish through two key strategies: i) directing perturbations to be orthogonal to each other and ii) constraining perturbations to remain in proximity to normal samples. Throughout experiments on real-world datasets, we demonstrate the superiority of our method over state-of-the-art benchmarks, which is evident not only in image data but also in tabular data, where domain-specific transformation is not readily accessible. Additionally, we empirically confirm the adaptability of our method to semi-supervised settings, demonstrating its capacity to incorporate supervised signals to enhance anomaly detection performance even further.

Updated: 2024-09-16 08:15:23

标题: 通过生成多样化且难以区分的合成异常来增强异常检测

摘要: 无监督异常检测是一项艰巨的任务，因为它仅依赖训练数据中的正常模式来识别测试中未见过的异常。最近的方法集中于利用领域特定的转换或扰动来从正常样本中生成合成异常。这里的目标是通过学习区分正常样本和这些人工异常来获取对正常模式的洞察。然而，当领域特定的转换未明确定时（如在表格数据中），或者当很容易区分它们时，这些方法经常遇到限制。为了解决这些问题，我们引入了一种新领域无关的方法，该方法采用一组有条件的扰动器和鉴别器。扰动器经过训练以生成依赖于输入的扰动，随后用于构建合成异常，鉴别器则经过训练以区分正常样本和这些异常。我们通过两个关键策略确保生成的异常既多样化又难以区分：i）将扰动指向相互正交的方向，ii）将扰动限制在与正常样本相邻的范围内。通过对真实世界数据集的实验，我们展示了我们的方法在图像数据和表格数据中的优越性，这不仅在图像数据中明显，而且在表格数据中，领域特定的转换不容易获取。此外，我们在半监督设置下实证确认了我们的方法的适应性，展示了其能够整合监督信号以进一步增强异常检测性能。

更新时间: 2024-09-16 08:15:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.10069v1

Spatiotemporal Covariance Neural Networks

Modeling spatiotemporal interactions in multivariate time series is key to their effective processing, but challenging because of their irregular and often unknown structure. Statistical properties of the data provide useful biases to model interdependencies and are leveraged by correlation and covariance-based networks as well as by processing pipelines relying on principal component analysis (PCA). However, PCA and its temporal extensions suffer instabilities in the covariance eigenvectors when the corresponding eigenvalues are close to each other, making their application to dynamic and streaming data settings challenging. To address these issues, we exploit the analogy between PCA and graph convolutional filters to introduce the SpatioTemporal coVariance Neural Network (STVNN), a relational learning model that operates on the sample covariance matrix of the time series and leverages joint spatiotemporal convolutions to model the data. To account for the streaming and non-stationary setting, we consider an online update of the parameters and sample covariance matrix. We prove the STVNN is stable to the uncertainties introduced by these online estimations, thus improving over temporal PCA-based methods. Experimental results corroborate our theoretical findings and show that STVNN is competitive for multivariate time series processing, it adapts to changes in the data distribution, and it is orders of magnitude more stable than online temporal PCA.

Updated: 2024-09-16 08:05:58

标题: 时空协方差神经网络

摘要: 在多元时间序列中建模时空交互作用对于它们的有效处理至关重要，但由于其不规则和常常未知的结构，这是具有挑战性的。数据的统计特性提供了有用的偏差来建模相互依赖关系，并由相关性和基于协方差的网络以及依赖主成分分析（PCA）的处理流程所利用。然而，当相应的特征值彼此接近时，PCA及其时间扩展在协方差特征向量中会出现不稳定，使其应用于动态和流数据环境具有挑战性。为了解决这些问题，我们利用PCA与图卷积滤波器之间的类比，引入了时空协方差神经网络（STVNN），这是一个在时间序列的样本协方差矩阵上运行的关系学习模型，并利用联合时空卷积来建模数据。为了考虑流数据和非平稳设置，我们考虑在线更新参数和样本协方差矩阵。我们证明STVNN对这些在线估计引入的不确定性是稳定的，因此比基于时间的PCA方法更加优越。实验结果证实了我们的理论发现，并显示STVNN在多元时间序列处理中具有竞争力，它能够适应数据分布的变化，并且比在线时间PCA稳定性更高数个数量级。

更新时间: 2024-09-16 08:05:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.10068v1

MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM

Mental health disorders are among the most prevalent diseases worldwide, affecting nearly one in four people. Despite their widespread impact, the intervention rate remains below 25%, largely due to the significant cooperation required from patients for both diagnosis and intervention. The core issue behind this low treatment rate is stigma, which discourages over half of those affected from seeking help. This paper presents MindGuard, an accessible, stigma-free, and professional mobile mental healthcare system designed to provide mental health first aid. The heart of MindGuard is an innovative edge LLM, equipped with professional mental health knowledge, that seamlessly integrates objective mobile sensor data with subjective Ecological Momentary Assessment records to deliver personalized screening and intervention conversations. We conduct a broad evaluation of MindGuard using open datasets spanning four years and real-world deployment across various mobile devices involving 20 subjects for two weeks. Remarkably, MindGuard achieves results comparable to GPT-4 and outperforms its counterpart with more than 10 times the model size. We believe that MindGuard paves the way for mobile LLM applications, potentially revolutionizing mental healthcare practices by substituting self-reporting and intervention conversations with passive, integrated monitoring within daily life, thus ensuring accessible and stigma-free mental health support.

Updated: 2024-09-16 07:58:56

标题: MindGuard：通过边缘LLM实现易获得且无污名的心理健康急救

摘要: 精神健康障碍是全球最普遍的疾病之一，影响近四分之一的人口。尽管其影响广泛，干预率仍然低于25%，这主要是因为诊断和干预都需要患者的重要合作。导致低治疗率的核心问题是社会污名，这使一半以上受影响的人不愿寻求帮助。本文介绍了MindGuard，一款可访问、无污名、专业的移动精神保健系统，旨在提供精神健康急救。MindGuard的核心是一种创新的边缘LLM，配备专业的心理健康知识，无缝地将客观的移动传感器数据与主观的生态瞬时评估记录相结合，以提供个性化的筛查和干预对话。我们使用跨越四年的开放数据集和在各种移动设备上实际部署的真实世界数据对MindGuard进行了广泛评估，涉及20名受试者进行为期两周的测试。令人惊讶的是，MindGuard实现了与GPT-4相当的结果，并超过了其对手10倍以上的模型大小。我们相信MindGuard为移动LLM应用铺平了道路，可能通过在日常生活中替代自我报告和干预对话与被动、综合监测相结合，从而确保可访问且无污名的精神健康支持，从而彻底改变精神保健实践。

更新时间: 2024-09-16 07:58:56

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2409.10064v1

GlobalMapNet: An Online Framework for Vectorized Global HD Map Construction

High-definition (HD) maps are essential for autonomous driving systems. Traditionally, an expensive and labor-intensive pipeline is implemented to construct HD maps, which is limited in scalability. In recent years, crowdsourcing and online mapping have emerged as two alternative methods, but they have limitations respectively. In this paper, we provide a novel methodology, namely global map construction, to perform direct generation of vectorized global maps, combining the benefits of crowdsourcing and online mapping. We introduce GlobalMapNet, the first online framework for vectorized global HD map construction, which updates and utilizes a global map on the ego vehicle. To generate the global map from scratch, we propose GlobalMapBuilder to match and merge local maps continuously. We design a new algorithm, Map NMS, to remove duplicate map elements and produce a clean map. We also propose GlobalMapFusion to aggregate historical map information, improving consistency of prediction. We examine GlobalMapNet on two widely recognized datasets, Argoverse2 and nuScenes, showing that our framework is capable of generating globally consistent results.

Updated: 2024-09-16 07:56:41

标题: GlobalMapNet：用于矢量化全球高清地图构建的在线框架

摘要: 高清晰度（HD）地图对于自动驾驶系统至关重要。传统上，实施一个昂贵且劳动密集的流程来构建HD地图，这在可伸缩性方面存在局限性。近年来，众包和在线地图绘制已经成为两种替代方法，但它们各自存在局限性。在本文中，我们提出了一种新的方法论，即全局地图构建，以直接生成向量化全局地图，结合了众包和在线地图绘制的优势。我们介绍了GlobalMapNet，第一个用于向量化全局HD地图构建的在线框架，它更新并利用自车上的全局地图。为了从零开始生成全局地图，我们提出了GlobalMapBuilder，不断匹配和合并本地地图。我们设计了一个新算法，Map NMS，用于去除重复的地图元素并生成一个干净的地图。我们还提出了GlobalMapFusion，用于汇总历史地图信息，提高预测的一致性。我们在两个广为认可的数据集Argoverse2和nuScenes上检验了GlobalMapNet，结果显示我们的框架能够生成全局一致的结果。

更新时间: 2024-09-16 07:56:41

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2409.10063v1

A Response to: A Note on "Privacy Preserving n-Party Scalar Product Protocol"

We reply to the comments on our proposed privacy preserving n-party scalar product protocol made by Liu. In their comment Liu raised concerns regarding the security and scalability of the $n$-party scalar product protocol. In this reply, we show that their concerns are unfounded and that the $n$-party scalar product protocol is safe for its intended purposes. Their concerns regarding the security are based on a misunderstanding of the protocol. Additionally, while the scalability of the protocol puts limitations on its use, the protocol still has numerous practical applications when applied in the correct scenarios. Specifically within vertically partitioned scenarios, which often involve few parties, the protocol remains practical. In this reply we clarify Liu's misunderstanding. Additionally, we explain why the protocols scaling is not a practical problem in its intended application.

Updated: 2024-09-16 07:36:37

标题: 对《关于“隐私保护的n方标量乘积协议”的回应》

摘要: 我们回答了刘对我们提出的保护隐私的n方标量积协议的评论。在他们的评论中，刘提出了关于$n$方标量积协议安全性和可扩展性的担忧。在这个回复中，我们展示了他们的担忧是没有根据的，并且$n$方标量积协议对其预期用途是安全的。他们对安全性的担忧基于对协议的误解。此外，虽然协议的可扩展性对其使用提出了限制，但在正确的场景中，协议仍然有许多实际应用。特别是在垂直分区场景中，通常涉及少数方，协议仍然是实用的。在这个回复中，我们澄清了刘的误解。此外，我们解释了为什么协议的扩展性在其预期应用中不是一个实际问题。

更新时间: 2024-09-16 07:36:37

领域: cs.CR

下载: http://arxiv.org/abs/2409.10057v1

Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments

Although deep reinforcement learning (DRL) approaches in audio signal processing have seen substantial progress in recent years, audio-driven DRL for tasks such as navigation, gaze control and head-orientation control in the context of human-robot interaction have received little attention. Here, we propose an audio-driven DRL framework in which we utilise deep Q-learning to develop an autonomous agent that orients towards a talker in the acoustic environment based on stereo speech recordings. Our results show that the agent learned to perform the task at a near perfect level when trained on speech segments in anechoic environments (that is, without reverberation). The presence of reverberation in naturalistic acoustic environments affected the agent's performance, although the agent still substantially outperformed a baseline, randomly acting agent. Finally, we quantified the degree of generalization of the proposed DRL approach across naturalistic acoustic environments. Our experiments revealed that policies learned by agents trained on medium or high reverb environments generalized to low reverb environments, but policies learned by agents trained on anechoic or low reverb environments did not generalize to medium or high reverb environments. Taken together, this study demonstrates the potential of audio-driven DRL for tasks such as head-orientation control and highlights the need for training strategies that enable robust generalization across environments for real-world audio-driven DRL applications.

Updated: 2024-09-16 07:20:33

标题: 在自然环境中基于音频驱动的头部定向强化学习

摘要: 尽管近年来在音频信号处理领域深度强化学习（DRL）方法取得了相当大的进展，但在人机交互的背景下，诸如导航、凝视控制和头部方向控制等任务的音频驱动DRL却鲜有关注。在这里，我们提出了一个音频驱动的DRL框架，利用深度Q学习来开发一个自主代理，根据立体语音录音在声学环境中定向朝向说话者。我们的结果显示，当在无混响的吸音环境中（即无回声）训练时，代理学会了以接近完美水平执行任务。自然环境中的混响影响了代理的性能，尽管代理仍明显优于基线，即随机行为的代理。最后，我们量化了所提出的DRL方法在自然声学环境中的泛化程度。我们的实验结果显示，在中等或高混响环境中训练的代理学到的策略可以泛化到低混响环境，但在无混响或低混响环境中训练的代理学到的策略无法泛化到中等或高混响环境。综合来看，这项研究展示了音频驱动DRL在头部方向控制等任务中的潜力，并强调了需要训练策略来实现对真实世界音频驱动DRL应用环境的稳健泛化。

更新时间: 2024-09-16 07:20:33

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.10048v1

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods for protecting personal data in a realistic scenario. Additionally, we introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection, which enables the model to learn which individuals' information should be protected without affecting its ability to answer questions related to other unrelated individuals. Our extensive experiments demonstrate that NAUF achieves a state-of-the-art average unlearning score, surpassing the best baseline method by 5.65 points, effectively protecting target individuals' personal data while maintaining the model's general capabilities.

Updated: 2024-09-16 07:20:13

标题: 学习拒绝：减轻LLMs中的隐私风险

摘要: 大型语言模型（LLMs）在理解和生成自然语言方面表现出卓越的能力。然而，这些模型可能会无意中记住私人信息，带来重大的隐私风险。本研究解决了使LLMs能够保护特定个人私人数据的挑战，而无需完全重新训练的需求。我们提出了一个名为\return 的Real-world pErsonal daTa UnleaRNing数据集，包括来自维基百科的2,492个个体及其相关的问答对，用于评估在现实场景中保护个人数据的机器遗忘（MU）方法。此外，我们引入了Name-Aware Unlearning Framework（NAUF）用于隐私保护，使模型能够学习哪些个人信息应该受到保护，而不会影响其回答与其他不相关个人有关的问题的能力。我们的广泛实验表明，NAUF实现了最先进的平均遗忘分数，比最佳基准方法高出5.65个点，有效地保护目标个体的个人数据，同时保持模型的一般能力。

更新时间: 2024-09-16 07:20:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.10058v2

Global Lightning-Ignited Wildfires Prediction and Climate Change Projections based on Explainable Machine Learning Models

Wildfires pose a significant natural disaster risk to populations and contribute to accelerated climate change. As wildfires are also affected by climate change, extreme wildfires are becoming increasingly frequent. Although they occur less frequently globally than those sparked by human activities, lightning-ignited wildfires play a substantial role in carbon emissions and account for the majority of burned areas in certain regions. While existing computational models, especially those based on machine learning, aim to predict lightning-ignited wildfires, they are typically tailored to specific regions with unique characteristics, limiting their global applicability. In this study, we present machine learning models designed to characterize and predict lightning-ignited wildfires on a global scale. Our approach involves classifying lightning-ignited versus anthropogenic wildfires, and estimating with high accuracy the probability of lightning to ignite a fire based on a wide spectrum of factors such as meteorological conditions and vegetation. Utilizing these models, we analyze seasonal and spatial trends in lightning-ignited wildfires shedding light on the impact of climate change on this phenomenon. We analyze the influence of various features on the models using eXplainable Artificial Intelligence (XAI) frameworks. Our findings highlight significant global differences between anthropogenic and lightning-ignited wildfires. Moreover, we demonstrate that, even over a short time span of less than a decade, climate changes have steadily increased the global risk of lightning-ignited wildfires. This distinction underscores the imperative need for dedicated predictive models and fire weather indices tailored specifically to each type of wildfire.

Updated: 2024-09-16 07:19:08

标题: 基于可解释机器学习模型的全球闪电引发的野火预测和气候变化预测

摘要: 野火对人口构成显著的自然灾害风险，并加速气候变化。由于野火也受气候变化影响，极端野火变得越来越频繁。虽然全球范围内由人类活动引发的野火比闪电引发的野火更少见，但闪电引发的野火在某些地区的燃烧面积中占据主导地位，并在碳排放中发挥重要作用。尽管现有的计算模型，特别是基于机器学习的模型，旨在预测闪电引发的野火，但它们通常针对具有独特特征的特定地区，限制了它们的全球适用性。在本研究中，我们提出了设计用于对全球范围内闪电引发的野火进行特征化和预测的机器学习模型。我们的方法包括对闪电引发与人为引发的野火进行分类，并根据气象条件和植被等广泛因素准确估计闪电引发火灾的概率。利用这些模型，我们分析了闪电引发的野火的季节性和空间趋势，揭示了气候变化对这一现象的影响。我们使用可解释人工智能（XAI）框架分析了各种特征对模型的影响。我们的研究结果突显了人为引发和闪电引发的野火之间的显著全球差异。此外，我们证明，即使在不到十年的短时间跨度内，气候变化也稳步增加了全球闪电引发的野火风险。这种区别强调了需要专门针对每种野火类型定制的预测模型和火灾天气指数的迫切性。

更新时间: 2024-09-16 07:19:08

领域: cs.LG,cs.IR,physics.ao-ph

下载: http://arxiv.org/abs/2409.10046v1

Learning Latent Wireless Dynamics from Channel State Information

In this work, we propose a novel data-driven machine learning (ML) technique to model and predict the dynamics of the wireless propagation environment in latent space. Leveraging the idea of channel charting, which learns compressed representations of high-dimensional channel state information (CSI), we incorporate a predictive component to capture the dynamics of the wireless system. Hence, we jointly learn a channel encoder that maps the estimated CSI to an appropriate latent space, and a predictor that models the relationships between such representations. Accordingly, our problem boils down to training a joint-embedding predictive architecture (JEPA) that simulates the latent dynamics of a wireless network from CSI. We present numerical evaluations on measured data and show that the proposed JEPA displays a two-fold increase in accuracy over benchmarks, for longer look-ahead prediction tasks.

Updated: 2024-09-16 07:15:46

标题: 学习从信道状态信息中学习潜在的无线动态信息

摘要: 在这项工作中，我们提出了一种新颖的数据驱动机器学习（ML）技术，用于在潜在空间中建模和预测无线传播环境的动态。利用信道图表的思想，学习高维信道状态信息（CSI）的压缩表示，我们加入了一个预测组件来捕捉无线系统的动态。因此，我们共同学习一个将估计的CSI映射到适当潜在空间的信道编码器，以及一个建模这些表示之间关系的预测器。因此，我们的问题归结为训练一个联合嵌入预测架构（JEPA），从CSI模拟无线网络的潜在动态。我们对测得的数据进行了数值评估，并展示了所提出的JEPA在长期预测任务中比基准模型准确度提升了两倍。

更新时间: 2024-09-16 07:15:46

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2409.10045v1

Benchmarking Large Language Model Uncertainty for Prompt Optimization

Prompt optimization algorithms for Large Language Models (LLMs) excel in multi-step reasoning but still lack effective uncertainty estimation. This paper introduces a benchmark dataset to evaluate uncertainty metrics, focusing on Answer, Correctness, Aleatoric, and Epistemic Uncertainty. Through analysis of models like GPT-3.5-Turbo and Meta-Llama-3.1-8B-Instruct, we show that current metrics align more with Answer Uncertainty, which reflects output confidence and diversity, rather than Correctness Uncertainty, highlighting the need for improved metrics that are optimization-objective-aware to better guide prompt optimization. Our code and dataset are available at https://github.com/0Frett/PO-Uncertainty-Benchmarking.

Updated: 2024-09-16 07:13:30

标题: 基准测试大型语言模型的不确定性，用于提示优化

摘要: 大语言模型（LLMs）的即时优化算法在多步推理方面表现出色，但仍然缺乏有效的不确定性估计。本文介绍了一个基准数据集，用于评估不确定性度量，重点关注答案、正确性、偶然性和认识性不确定性。通过分析诸如GPT-3.5-Turbo和Meta-Llama-3.1-8B-Instruct等模型，我们表明当前的度量更符合答案不确定性，反映输出的信心和多样性，而不是正确性不确定性，强调需要改进的度量，即优化目标意识到更好地指导提示优化。我们的代码和数据集可在https://github.com/0Frett/PO-Uncertainty-Benchmarking 上找到。

更新时间: 2024-09-16 07:13:30

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.10044v1

Central Answer Modeling for an Embodied Multi-LLM System

Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time-consuming and costly. In this work, we consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment. To generate one answer for each query, we use the individual responses to train a Central Answer Model (CAM) that aggregates responses for a robust answer. While prior Question Answering (QA) work has used a central module based on answers from multiple LLM-based experts, we specifically look at applying this framework to embodied LLM-based agents that must physically explore the environment first to become experts on their given environment to answer questions. Our work is the first to utilize a central answer model framework with embodied agents that must rely on exploring an unknown environment. We set up a variation of EQA where instead of the agents exploring the environment after the question is asked, the agents first explore the environment for a set amount of time and then answer a set of queries. Using CAM, we observe a $46\%$ higher EQA accuracy when compared against aggregation methods for ensemble LLM, such as voting schemes and debates. CAM does not require any form of agent communication, alleviating it from the associated costs. We ablate CAM with various nonlinear (neural network, random forest, decision tree, XGBoost) and linear (logistic regression classifier, SVM) algorithms. We experiment in various topological graph environments and examine the case where one of the agents is malicious and purposes contribute responses it believes to be wrong.

Updated: 2024-09-16 07:12:12

标题: 中央答案建模用于具有多个低电平机器人系统

摘要: 具体问题回答（EQA）是一个重要的问题，涉及代理探索环境以回答用户查询。在现有文献中，EQA已经在单一代理场景中进行了研究，其中探索可能耗时且昂贵。在这项工作中，我们考虑在一个涉及多个基于大型语言模型（LLM）的代理独立回答有关家庭环境的查询的多代理框架中进行EQA。为了为每个查询生成一个答案，我们使用个体响应来训练一个中央答案模型（CAM），该模型聚合响应以获得强大的答案。虽然先前的问题回答（QA）工作已经使用了基于多个LLM专家答案的中央模块，但我们特别研究了将这个框架应用于必须首先在给定环境中进行物理探索以成为专家以回答问题的具体化的LLM代理。我们的工作是第一个利用中央答案模型框架的具体化代理，这些代理必须依靠探索未知环境。我们建立了一个EQA的变化，代理在问题提出后不是探索环境，而是首先在一定时间内探索环境，然后回答一系列查询。使用CAM，我们观察到与集成LLM的聚合方法（如投票方案和辩论）相比，EQA准确性提高了46％。CAM不需要任何形式的代理通信，从而减少了相关成本。我们用各种非线性（神经网络、随机森林、决策树、XGBoost）和线性（逻辑回归分类器、支持向量机）算法对CAM进行了消融。我们在各种拓扑图环境中进行实验，并研究一个代理是恶意的情况，并故意提供它认为是错误的响应。

更新时间: 2024-09-16 07:12:12

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10918v4

On the Diagram of Thought

We introduce Diagram of Thought (DoT), a framework that models iterative reasoning in large language models (LLMs) as the construction of a directed acyclic graph (DAG) within a single model. Unlike traditional approaches that represent reasoning as linear chains or trees, DoT organizes propositions, critiques, refinements, and verifications into a cohesive DAG structure, allowing the model to explore complex reasoning pathways while maintaining logical consistency. Each node in the diagram corresponds to a proposition that has been proposed, critiqued, refined, or verified, enabling the LLM to iteratively improve its reasoning through natural language feedback. By leveraging auto-regressive next-token prediction with role-specific tokens, DoT facilitates seamless transitions between proposing ideas and critically evaluating them, providing richer feedback than binary signals. Furthermore, we formalize the DoT framework using Topos Theory, providing a mathematical foundation that ensures logical consistency and soundness in the reasoning process. This approach enhances both the training and inference processes within a single LLM, eliminating the need for multiple models or external control mechanisms. DoT offers a conceptual framework for designing next-generation reasoning-specialized models, emphasizing training efficiency, robust reasoning capabilities, and theoretical grounding. The code is available at https://github.com/diagram-of-thought/diagram-of-thought.

Updated: 2024-09-16 07:01:41

标题: 关于思维图的讨论

摘要: 我们介绍了一种名为思维图（DoT）的框架，它将大型语言模型（LLMs）中的迭代推理建模为在单一模型内构建有向无环图（DAG）。与传统方法将推理表示为线性链或树形结构不同，DoT将命题、批评、完善和验证组织成一个连贯的DAG结构，使模型能够探索复杂的推理路径同时保持逻辑一致性。图中的每个节点都对应一个已经被提出、批评、完善或验证的命题，使LLM通过自然语言反馈逐步改进其推理。通过利用具有特定角色标记的自回归下一标记预测，DoT促进了在提出想法和批判性评估之间的无缝过渡，提供比二进制信号更丰富的反馈。此外，我们利用拓扑理论对DoT框架进行形式化，提供了一个数学基础，确保推理过程中的逻辑一致性和正确性。这种方法增强了单一LLM内的训练和推理过程，消除了需要多个模型或外部控制机制的必要性。DoT提供了一个设计下一代推理专用模型的概念框架，强调训练效率、强大的推理能力和理论基础。代码可在https://github.com/diagram-of-thought/diagram-of-thought获得。

更新时间: 2024-09-16 07:01:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.10038v1

Graph Neural Networks for Parkinsons Disease Detection

Despite the promising performance of state of the art approaches for Parkinsons Disease (PD) detection, these approaches often analyze individual speech segments in isolation, which can lead to suboptimal results. Dysarthric cues that characterize speech impairments from PD patients are expected to be related across segments from different speakers. Isolated segment analysis fails to exploit these inter segment relationships. Additionally, not all speech segments from PD patients exhibit clear dysarthric symptoms, introducing label noise that can negatively affect the performance and generalizability of current approaches. To address these challenges, we propose a novel PD detection framework utilizing Graph Convolutional Networks (GCNs). By representing speech segments as nodes and capturing the similarity between segments through edges, our GCN model facilitates the aggregation of dysarthric cues across the graph, effectively exploiting segment relationships and mitigating the impact of label noise. Experimental results demonstrate theadvantages of the proposed GCN model for PD detection and provide insights into its underlying mechanisms

Updated: 2024-09-16 07:00:15

标题: 图神经网络用于帕金森病检测

摘要: 尽管目前最先进的帕金森病（PD）检测方法表现出色，这些方法通常分析独立的语音片段，这可能导致结果不佳。表征PD患者语音障碍的发音障碍线索预计会在不同说话者的不同片段之间存在关联。孤立的片段分析无法利用这些片段之间的关系。此外，并非所有来自PD患者的语音片段都表现出清晰的发音障碍症状，引入可能会对当前方法的性能和泛化能力产生负面影响的标签噪声。为了解决这些挑战，我们提出了一种利用图卷积网络（GCNs）的新型PD检测框架。通过将语音片段表示为节点，并通过边捕捉片段之间的相似性，我们的GCN模型促进了跨图聚合发音障碍线索，有效地利用了片段之间的关系，并缓解了标签噪声的影响。实验结果证明了所提出的GCN模型在PD检测方面的优势，并提供了对其基本机制的见解。

更新时间: 2024-09-16 07:00:15

领域: cs.LG,eess.AS

下载: http://arxiv.org/abs/2409.07884v3

Multi-agent Attacks for Black-box Social Recommendations

The rise of online social networks has facilitated the evolution of social recommender systems, which incorporate social relations to enhance users' decision-making process. With the great success of Graph Neural Networks (GNNs) in learning node representations, GNN-based social recommendations have been widely studied to model user-item interactions and user-user social relations simultaneously. Despite their great successes, recent studies have shown that these advanced recommender systems are highly vulnerable to adversarial attacks, in which attackers can inject well-designed fake user profiles to disrupt recommendation performances. While most existing studies mainly focus on argeted attacks to promote target items on vanilla recommender systems, untargeted attacks to degrade the overall prediction performance are less explored on social recommendations under a black-box scenario. To perform untargeted attacks on social recommender systems, attackers can construct malicious social relationships for fake users to enhance the attack performance. However, the coordination of social relations and item profiles is challenging for attacking black-box social recommendations. To address this limitation, we first conduct several preliminary studies to demonstrate the effectiveness of cross-community connections and cold-start items in degrading recommendations performance. Specifically, we propose a novel framework MultiAttack based on multi-agent reinforcement learning to coordinate the generation of cold-start item profiles and cross-community social relations for conducting untargeted attacks on black-box social recommendations. Comprehensive experiments on various real-world datasets demonstrate the effectiveness of our proposed attacking framework under the black-box setting.

Updated: 2024-09-16 06:51:46

标题: 多智能体攻击黑盒社交推荐

摘要: 在线社交网络的兴起促进了社交推荐系统的发展，这些系统将社交关系纳入其中，以增强用户的决策过程。随着图神经网络（GNNs）在学习节点表示方面取得了巨大成功，基于GNN的社交推荐系统被广泛研究，以同时建模用户-物品交互和用户-用户社交关系。尽管它们取得了巨大成功，最近的研究表明这些先进的推荐系统极易受到对抗攻击的影响，攻击者可以注入设计精良的虚假用户资料来干扰推荐表现。虽然大多数现有研究主要关注于有针对性的攻击，以促进普通推荐系统上的目标物品，但在黑盒情况下对社交推荐系统进行未指定目标的攻击，以降低整体预测性能的研究较少。为了对社交推荐系统进行未指定目标的攻击，攻击者可以构建恶意社交关系，以增强攻击表现。然而，协调社交关系和物品资料对黑盒社交推荐系统进行攻击是具有挑战性的。为了解决这一局限性，我们首先进行了几项初步研究，以展示跨社区连接和冷启动物品在降低推荐性能方面的有效性。具体而言，我们提出了一种基于多智能体强化学习的新型框架MultiAttack，用于协调生成冷启动物品资料和跨社区社交关系，以进行对黑盒社交推荐系统的未指定目标攻击。在各种真实世界数据集上进行的全面实验表明了我们提出的攻击框架在黑盒设置下的有效性。

更新时间: 2024-09-16 06:51:46

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2311.07127v4

Can GPT-O1 Kill All Bugs?

ChatGPT has long been proven to be effective in automatic program repair (APR). With the continuous iterations and upgrades of the ChatGPT version, its performance in terms of fixes has already reached state-of-the-art levels. However, there are few works comparing the effectiveness and variations of different versions of ChatGPT on APR. In this work, we evaluate the performance of the latest version of ChatGPT (O1-preview and O1-mini), ChatGPT-4o, and historical version of ChatGPT on APR. We study the improvements of the O1 model over traditional ChatGPT in terms of APR from multiple perspectives (repair success rate, repair cost, behavior patterns), and find that O1's repair capability exceeds that of traditional ChatGPT, successfully fixing all 40 bugs in the benchmark. Our work can serve as a reference for further in-depth exploration of the applications of ChatGPT in APR.

Updated: 2024-09-16 06:51:32

标题: GPT-O1能够杀灭所有虫害吗？

摘要: ChatGPT在自动程序修复（APR）方面的有效性已经被证明。随着ChatGPT版本的持续迭代和升级，其修复性能已经达到了最先进水平。然而，很少有研究比较不同版本的ChatGPT在APR上的有效性和变化。在这项工作中，我们评估了最新版本的ChatGPT（O1-preview和O1-mini）、ChatGPT-4o和历史版本的ChatGPT在APR上的性能。我们从多个角度（修复成功率、修复成本、行为模式）研究了O1模型相对于传统ChatGPT在APR方面的改进，并发现O1的修复能力超过了传统ChatGPT，成功修复了基准中的所有40个错误。我们的工作可以作为ChatGPT在APR应用进一步深入探索的参考。

更新时间: 2024-09-16 06:51:32

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.10033v1

Regret Analysis for Randomized Gaussian Process Upper Confidence Bound

Gaussian process upper confidence bound (GP-UCB) is a theoretically established algorithm for Bayesian optimization (BO), where we assume the objective function $f$ follows GP. One notable drawback of GP-UCB is that the theoretical confidence parameter $\beta$ increased along with the iterations is too large. To alleviate this drawback, this paper analyzes the randomized variant of GP-UCB called improved randomized GP-UCB (IRGP-UCB), which uses the confidence parameter generated from the shifted exponential distribution. We analyze the expected regret and conditional expected regret, where the expectation and the probability are taken respectively with $f$ and noises and with the randomness of the BO algorithm. In both regret analyses, IRGP-UCB achieves a sub-linear regret upper bound without increasing the confidence parameter if the input domain is finite. Finally, we show numerical experiments using synthetic and benchmark functions and real-world emulators.

Updated: 2024-09-16 06:46:32

标题: 随机高斯过程上置信界的遗憾分析

摘要: 高斯过程上置信界限（GP-UCB）是一种理论上建立的贝叶斯优化（BO）算法，其中我们假设目标函数$f$遵循GP。GP-UCB的一个显着缺点是随着迭代次数增加的理论置信参数$\beta$太大。为了缓解这一缺点，本文分析了GP-UCB的随机变体，称为改进的随机GP-UCB（IRGP-UCB），它使用来自偏移指数分布的置信参数。我们分析了期望遗憾和条件期望遗憾，在遗憾分析中，如果输入域是有限的，IRGP-UCB可以实现次线性遗憾上界，而不增加置信参数。最后，我们展示了使用合成和基准函数以及真实世界仿真器进行的数值实验。

更新时间: 2024-09-16 06:46:32

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.00979v2

Assessing the Impact of Sanctions in the Crypto Ecosystem: Effective Measures or Ineffective Deterrents?

Regulatory authorities aim to tackle illegal activities by targeting the economic incentives that drive such behaviour. This is typically achieved through the implementation of financial sanctions against the entities involved in the crimes. However, the rise of cryptocurrencies has presented new challenges, allowing entities to evade these sanctions and continue criminal operations. Consequently, enforcement measures have been expanded to include crypto assets information of sanctioned entities. Yet, due to the nature of the crypto ecosystem, blocking or freezing these digital assets is harder and, in some cases, such as with Bitcoin, unfeasible. Therefore, sanctions serve merely as deterrents. For this reason, in this study, we aim to assess the impact of these sanctions on entities' crypto activities, particularly those related to the Bitcoin ecosystem. Our objective is to shed light on the validity and effectiveness (or lack thereof) of such countermeasures. Specifically, we analyse the transactions and the amount of USD moved by punished entities that possess crypto addresses after being sanctioned by the authority agency. Results indicate that while sanctions have been effective for half of the examined entities, the others continue to move funds through sanctioned addresses. Furthermore, punished entities demonstrate a preference for utilising rapid exchange services to convert their funds, rather than employing dedicated money laundering services. To the best of our knowledge, this study offers valuable insights into how entities use crypto assets to circumvent sanctions.

Updated: 2024-09-16 06:43:45

标题: 评估制裁对加密生态系统的影响：有效措施还是无效威慑？

摘要: 监管机构旨在通过针对驱使非法活动的经济激励来打击这些行为。通常通过对涉及犯罪的实体实施金融制裁来实现这一目标。然而，加密货币的兴起带来了新的挑战，使得实体可以规避这些制裁并继续犯罪活动。因此，执法措施已扩展至包括受制裁实体的加密资产信息。然而，由于加密生态系统的特性，封锁或冻结这些数字资产更加困难，有些情况下，比如比特币，是不可行的。因此，制裁仅仅起到威慑作用。基于这一原因，本研究旨在评估这些制裁对实体的加密活动的影响，特别是与比特币生态系统相关的活动。我们的目标是揭示这些对抗措施的有效性（或缺乏有效性）。具体来说，我们分析了受到机构制裁后仍持有加密地址的受罚实体进行的交易和转移的美元金额。结果表明，虽然对于一半受查实体而言，制裁是有效的，但其他实体继续通过受制裁地址转移资金。此外，受罚实体表现出更倾向于利用快速兑换服务转换资金，而不是利用专门的洗钱服务。据我们所知，这项研究为实体如何利用加密资产规避制裁提供了有价值的见解。

更新时间: 2024-09-16 06:43:45

领域: cs.CR,cs.CE

下载: http://arxiv.org/abs/2409.10031v1

AttnMod: Attention-Based New Art Styles

Imagine a human artist looking at the generated photo of a diffusion model, and hoping to create a painting out of it. There could be some feature of the object in the photo that the artist wants to emphasize, some color to disperse, some silhouette to twist, or some part of the scene to be materialized. These intentions can be viewed as the modification of the cross attention from the text prompt onto UNet, during the desoising diffusion. This work presents AttnMod, to modify attention for creating new unpromptable art styles out of existing diffusion models. The style-creating behavior is studied across different setups.

Updated: 2024-09-16 06:38:25

标题: 关注模式：基于注意力的新艺术风格

摘要: 想象一位人类艺术家看着扩散模型生成的照片，希望用它来创作一幅画。照片中可能有一些对象的特征艺术家想要强调，一些颜色要散开，一些轮廓要扭曲，或者一些场景的某个部分要实体化。这些意图可以被视为在去噪扩散过程中，从文本提示到UNet的交叉关注的修改。本文介绍了AttnMod，用于修改关注以创造新的非提示艺术风格，基于现有的扩散模型。研究了在不同设置下的创造风格的行为。

更新时间: 2024-09-16 06:38:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.10028v1

E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions across a range of tasks, including robotic manipulation and navigation. However, existing methods are primarily designed for static environments and do not leverage the agent's own experiences to refine its initial plans. Given that real-world environments are inherently stochastic, initial plans based solely on LLMs' general knowledge may fail to achieve their objectives, unlike in static scenarios. To address this limitation, this study introduces the Experience-and-Emotion Map (E2Map), which integrates not only LLM knowledge but also the agent's real-world experiences, drawing inspiration from human emotional responses. The proposed methodology enables one-shot behavior adjustments by updating the E2Map based on the agent's experiences. Our evaluation in stochastic navigation environments, including both simulations and real-world scenarios, demonstrates that the proposed method significantly enhances performance in stochastic environments compared to existing LLM-based approaches. Code and supplementary materials are available at https://e2map.github.io/.

Updated: 2024-09-16 06:35:18

标题: E2Map：基于语言模型的自我反思机器人导航体验与情感地图

摘要: 大型语言模型(LLMs)在指导具有实体的代理执行语言指令方面显示出显著潜力，包括机器人操作和导航。然而，现有方法主要设计用于静态环境，并且没有利用代理的实际经验来完善其初始计划。鉴于现实环境本质上是随机的，仅基于LLMs的一般知识的初始计划可能无法实现其目标，不像在静态场景中那样。为了解决这一限制，本研究介绍了经验和情感地图(E2Map)，它不仅整合了LLM知识，还整合了代理的实际经验，从人类情感反应中汲取灵感。所提出的方法使得通过根据代理的经验更新E2Map来进行一次性行为调整成为可能。我们在包括模拟和真实场景在内的随机导航环境中的评估表明，与现有基于LLM的方法相比，所提出的方法在随机环境中显著提高了性能。代码和补充材料可在https://e2map.github.io/获取。

更新时间: 2024-09-16 06:35:18

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.10027v1

NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

In the study of auditory attention, it has been revealed that there exists a robust correlation between attended speech and elicited neural responses, measurable through electroencephalography (EEG). Therefore, it is possible to use the attention information available within EEG signals to guide the extraction of the target speaker in a cocktail party computationally. In this paper, we present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue to extract attended speech from monaural speech mixtures. We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations, generating a speaker extraction mask. Experimental results on a publicly available dataset demonstrate that our proposed model outperforms two baseline models across various evaluation metrics.

Updated: 2024-09-16 06:35:07

标题: NeuroSpex：具有跨模态注意力的神经引导说话者提取

摘要: 在听觉注意力的研究中，已经揭示出在受试者的注意力集中于言语时，通过脑电图（EEG）可以测量到被注意到的言语和引发的神经反应之间存在着强有力的相关性。因此，可以利用EEG信号中可用的注意力信息来计算地指导混音环境下目标说话者的提取。在本文中，我们提出了一种神经引导的说话者提取模型，即NeuroSpex，使用听众的EEG反应作为唯一的辅助参考线索，从单声道语音混合物中提取被关注的言语。我们提出了一种新颖的EEG信号编码器，捕捉注意力信息。此外，我们提出了一种交叉注意力（CA）机制，以增强语音特征表示，生成说话者提取掩模。在一个公开可用的数据集上进行的实验结果表明，我们提出的模型在各种评估指标上优于两个基线模型。

更新时间: 2024-09-16 06:35:07

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.02489v2

Reinforcement learning-based statistical search strategy for an axion model from flavor

We propose a reinforcement learning-based search strategy to explore new physics beyond the Standard Model. The reinforcement learning, which is one of machine learning methods, is a powerful approach to find model parameters with phenomenological constraints. As a concrete example, we focus on a minimal axion model with a global $U(1)$ flavor symmetry. Agents of the learning succeed in finding $U(1)$ charge assignments of quarks and leptons solving the flavor and cosmological puzzles in the Standard Model, and find more than 150 realistic solutions for the quark sector taking renormalization effects into account. For the solutions found by the reinforcement learning-based analysis, we discuss the sensitivity of future experiments for the detection of an axion which is a Nambu-Goldstone boson of the spontaneously broken $U(1)$. We also examine how fast the reinforcement learning-based searching method finds the best discrete parameters in comparison with conventional optimization methods. In conclusion, the efficient parameter search based on the reinforcement learning-based strategy enables us to perform a statistical analysis of the vast parameter space associated with the axion model from flavor.

Updated: 2024-09-16 06:21:21

标题: 基于强化学习的统计搜索策略：从 flavor 到轴子模型

摘要: 我们提出了一种基于强化学习的搜索策略，用于探索超出标准模型的新物理。强化学习是机器学习方法之一，是一种强大的方法，可以找到满足现象学约束的模型参数。作为一个具体的例子，我们专注于一个具有全局$U(1)$味对称性的最小轴子模型。学习代理成功地找到了夸克和轻子的$U(1)$电荷分配，解决了标准模型中的味和宇宙学难题，并考虑了重整化效应，找到了超过150种逼真的夸克部门解决方案。对于强化学习分析找到的解决方案，我们讨论了未来实验对检测作为自发破缺$U(1)$的南部-戈德斯通玻色子的轴子的敏感性。我们还检查了强化学习搜索方法相对于传统优化方法在找到最佳离散参数方面的速度。总之，基于强化学习策略的高效参数搜索使我们能够对与轴子模型相关的巨大参数空间进行统计分析。

更新时间: 2024-09-16 06:21:21

领域: hep-ph,cs.LG,hep-th

下载: http://arxiv.org/abs/2409.10023v1

Li-MSD: A lightweight mitigation solution for DAO insider attack in RPL-based IoT

Many IoT applications run on a wireless infrastructure supported by resource-constrained nodes which is popularly known as Low-Power and Lossy Networks (LLNs). Currently, LLNs play a vital role in digital transformation of industries. The resource limitations of LLNs restrict the usage of traditional routing protocols and therefore require an energy-efficient routing solution. IETF's Routing Protocol for Low-power Lossy Networks (RPL, pronounced 'ripple') is one of the most popular energy-efficient protocols for LLNs, specified in RFC 6550. In RPL, Destination Advertisement Object (DAO) control message is transmitted by a child node to pass on its reachability information to its immediate parent or root node. An attacker may exploit the insecure DAO sending mechanism of RPL to perform 'DAO insider attack' by transmitting DAO multiple times. This paper shows that an aggressive DAO insider attacker can drastically degrade network performance. We propose a Lightweight Mitigation Solution for DAO insider attack, which is termed as 'Li-MSD'. Li-MSD uses a blacklisting strategy to mitigate the attack and restore RPL performance, significantly. By using simulations, it is shown that Li-MSD outperforms the existing solution in the literature.

Updated: 2024-09-16 06:17:20

标题: Li-MSD：一种轻量级的RPL-based IoT中DAO内部攻击的缓解解决方案

摘要: 许多物联网应用程序在由资源受限节点支持的无线基础设施上运行，这种基础设施通常被称为低功耗和有损网络（LLNs）。目前，LLNs在产业数字化转型中发挥着至关重要的作用。LLNs的资源限制限制了传统路由协议的使用，因此需要一种能效路由解决方案。IETF的低功耗有损网络路由协议（RPL，发音为“ripple”）是LLNs最受欢迎的能效协议之一，规定在RFC 6550中。在RPL中，目的地广告对象（DAO）控制消息由子节点传输，将其可达性信息传递给其直接父节点或根节点。攻击者可以利用RPL的不安全DAO发送机制执行“DAO内部人攻击”，通过多次传输DAO来实施。本文表明，侵略性DAO内部人攻击者可以严重降低网络性能。我们提出了一种用于DAO内部人攻击的轻量级缓解解决方案，称为“Li-MSD”。Li-MSD使用黑名单策略来缓解攻击并显著恢复RPL性能。通过模拟，表明Li-MSD在文献中胜过现有解决方案。

更新时间: 2024-09-16 06:17:20

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2409.10020v1

AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing

With the development of data-centric AI, the focus has shifted from model-driven approaches to improving data quality. Academic literature, as one of the crucial types, is predominantly stored in PDF formats and needs to be parsed into texts before further processing. However, parsing diverse structured texts in academic literature remains challenging due to the lack of datasets that cover various text structures. In this paper, we introduce AceParse, the first comprehensive dataset designed to support the parsing of a wide range of structured texts, including formulas, tables, lists, algorithms, and sentences with embedded mathematical expressions. Based on AceParse, we fine-tuned a multimodal model, named AceParser, which accurately parses various structured texts within academic literature. This model outperforms the previous state-of-the-art by 4.1% in terms of F1 score and by 5% in Jaccard Similarity, demonstrating the potential of multimodal models in academic literature parsing. Our dataset is available at https://github.com/JHW5981/AceParse.

Updated: 2024-09-16 06:06:34

标题: AceParse：一个包含多样化结构文本的综合数据集，用于学术文献解析

摘要: 随着数据中心人工智能的发展，重点已从基于模型驱动的方法转向改善数据质量。学术文献作为其中一个关键类型，主要存储为PDF格式，需要在进一步处理之前将其解析为文本。然而，由于缺乏涵盖各种文本结构的数据集，解析学术文献中的多样化结构文本仍然具有挑战性。在本文中，我们介绍了AceParse，这是第一个旨在支持解析各种结构化文本的全面数据集，包括公式、表格、列表、算法以及具有嵌入数学表达式的句子。基于AceParse，我们对一个名为AceParser的多模态模型进行了微调，该模型精确解析学术文献中的各种结构化文本。该模型在F1分数方面比先前的最先进模型提高了4.1％，在Jaccard相似度方面提高了5％，展示了多模态模型在学术文献解析中的潜力。我们的数据集可在https://github.com/JHW5981/AceParse 上找到。

更新时间: 2024-09-16 06:06:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10016v1

Reinforcement Learning with Quasi-Hyperbolic Discounting

Reinforcement learning has traditionally been studied with exponential discounting or the average reward setup, mainly due to their mathematical tractability. However, such frameworks fall short of accurately capturing human behavior, which has a bias towards immediate gratification. Quasi-Hyperbolic (QH) discounting is a simple alternative for modeling this bias. Unlike in traditional discounting, though, the optimal QH-policy, starting from some time $t_1,$ can be different to the one starting from $t_2.$ Hence, the future self of an agent, if it is naive or impatient, can deviate from the policy that is optimal at the start, leading to sub-optimal overall returns. To prevent this behavior, an alternative is to work with a policy anchored in a Markov Perfect Equilibrium (MPE). In this work, we propose the first model-free algorithm for finding an MPE. Using a two-timescale analysis, we show that, if our algorithm converges, then the limit must be an MPE. We also validate this claim numerically for the standard inventory system with stochastic demands. Our work significantly advances the practical application of reinforcement learning.

Updated: 2024-09-16 06:00:42

标题: 使用准双曲折扣的强化学习

摘要: 强化学习传统上是采用指数折现或平均奖励设置进行研究的，主要是因为它们在数学上易处理。然而，这些框架未能准确捕捉到人类行为，因为人类倾向于即时满足。准双曲折现是一种简单的替代方法，用于建模这种偏好。与传统折现不同的是，从某个时间$t_1$开始的最优准双曲折现策略可能与从$t_2$开始的策略不同。因此，如果一个代理的未来自身是天真或急躁的话，可能会偏离最初的最优策略，导致次优的总回报。为了防止这种行为，一个替代方法是使用基于马尔可夫完美均衡(MPE)的策略。在这项工作中，我们提出了第一个用于找到MPE的无模型算法。通过使用双时间尺度分析，我们表明，如果我们的算法收敛，那么极限必须是MPE。我们还通过对具有随机需求的标准库存系统进行数值验证来验证这一说法。我们的工作显著推动了强化学习的实际应用。

更新时间: 2024-09-16 06:00:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.10583v1

HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

Large language models (LLMs) have significantly advanced natural language processing tasks, yet they are susceptible to generating inaccurate or unreliable responses, a phenomenon known as hallucination. In critical domains such as health and medicine, these hallucinations can pose serious risks. This paper introduces HALO, a novel framework designed to enhance the accuracy and reliability of medical question-answering (QA) systems by focusing on the detection and mitigation of hallucinations. Our approach generates multiple variations of a given query using LLMs and retrieves relevant information from external open knowledge bases to enrich the context. We utilize maximum marginal relevance scoring to prioritize the retrieved context, which is then provided to LLMs for answer generation, thereby reducing the risk of hallucinations. The integration of LangChain further streamlines this process, resulting in a notable and robust increase in the accuracy of both open-source and commercial LLMs, such as Llama-3.1 (from 44% to 65%) and ChatGPT (from 56% to 70%). This framework underscores the critical importance of addressing hallucinations in medical QA systems, ultimately improving clinical decision-making and patient care. The open-source HALO is available at: https://github.com/ResponsibleAILab/HALO.

Updated: 2024-09-16 05:50:39

标题: HALO：幻觉分析与学习优化，赋予LLM与检索增强上下文，用于引导临床决策-making

摘要: 大型语言模型（LLMs）已经显著推进了自然语言处理任务，但它们容易生成不准确或不可靠的响应，这种现象被称为幻觉。在健康和医疗等关键领域，这些幻觉可能带来严重的风险。本文介绍了HALO，一个旨在通过专注于检测和减轻幻觉来提高医疗问答系统准确性和可靠性的新框架。我们的方法利用LLMs生成给定查询的多个变体，并从外部开放知识库中检索相关信息来丰富上下文。我们利用最大边际相关性评分来优先考虑检索到的上下文，然后将其提供给LLMs生成答案，从而降低幻觉的风险。LangChain的整合进一步简化了这一过程，显著提高了开源和商业LLMs（如Llama-3.1从44%提高到65%和ChatGPT从56%提高到70%）的准确性。该框架强调了在医疗问答系统中解决幻觉的重要性，最终改善了临床决策和患者护理。开源的HALO可在以下链接找到：https://github.com/ResponsibleAILab/HALO。

更新时间: 2024-09-16 05:50:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10011v1

SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

In recent years,Text-to-SQL, the problem of automatically converting questions posed in natural language to formal SQL queries, has emerged as an important problem at the intersection of natural language processing and data management research. Large language models (LLMs) have delivered impressive performance when used in an off-the-shelf performance, but still fall significantly short of expected expert-level performance. Errors are especially probable when a nuanced understanding is needed of database schemas, questions, and SQL clauses to do proper Text-to-SQL conversion. We introduce SelECT-SQL, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought (CoT) prompting, self-correction, and ensemble methods to yield a new state-of-the-art result on challenging Text-to-SQL benchmarks. Specifically, when configured using GPT-3.5-Turbo as the base LLM, SelECT-SQL achieves 84.2% execution accuracy on the Spider leaderboard's development set, exceeding both the best results of other baseline GPT-3.5-Turbo-based solutions (81.1%), and the peak performance (83.5%) of the GPT-4 result reported on the leaderboard.

Updated: 2024-09-16 05:40:18

标题: SelECT-SQL：用于文本到SQL的自我纠正集成思维链

摘要: 近年来，文本到SQL的问题，即自动将自然语言提出的问题转换为正式的SQL查询，已经成为自然语言处理和数据管理研究交叉领域的一个重要问题。大型语言模型(LLMs)在现成的性能中表现出色，但仍远远不及预期的专家级性能。当需要对数据库模式、问题和SQL子句进行细致理解以进行正确的文本到SQL转换时，错误尤为可能。我们引入了SelECT-SQL，一种新颖的上下文学习解决方案，使用一种算法组合的思维链(CoT)提示、自我校正和集成方法，以在具有挑战性的文本到SQL基准测试中实现最新的技术成果。具体来说，当使用GPT-3.5-Turbo作为基础LLM进行配置时，SelECT-SQL在Spider排行榜的开发集上实现了84.2%的执行准确度，超过了其他基线GPT-3.5-Turbo解决方案的最佳结果(81.1%)，以及在排行榜上报告的GPT-4结果的峰值性能(83.5%)。

更新时间: 2024-09-16 05:40:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.10007v1

Fully Spiking Neural Network for Legged Robots

Recent advancements in legged robots using deep reinforcement learning have led to significant progress. Quadruped robots can perform complex tasks in challenging environments, while bipedal and humanoid robots have also achieved breakthroughs. Current reinforcement learning methods leverage diverse robot bodies and historical information to perform actions, but previous research has not emphasized the speed and energy consumption of network inference and the biological significance of neural networks. Most networks are traditional artificial neural networks that utilize multilayer perceptrons (MLP). This paper presents a novel Spiking Neural Network (SNN) for legged robots, showing exceptional performance in various simulated terrains. SNNs provide natural advantages in inference speed and energy consumption, and their pulse-form processing enhances biological interpretability. This study presents a highly efficient SNN for legged robots that can be seamless integrated into other learning models.

Updated: 2024-09-16 05:35:27

标题: 四肢机器人的全脉冲神经网络

摘要: 最近利用深度强化学习技术在四肢机器人领域取得了显著进展。四肢机器人可以在具有挑战性的环境中执行复杂任务，而双足机器人和人形机器人也取得了突破。目前的强化学习方法利用多样化的机器人身体和历史信息来执行动作，但以往的研究并未强调网络推理速度和能源消耗以及神经网络的生物学意义。大多数网络是利用多层感知器（MLP）的传统人工神经网络。本文提出了一种新颖的脉冲神经网络（SNN）用于四肢机器人，在各种模拟地形中表现出卓越的性能。SNN在推理速度和能源消耗方面具有自然优势，并且它们的脉冲形式处理增强了生物可解释性。本研究提出了一种高效的SNN，可以无缝集成到其他学习模型中供四肢机器人使用。

更新时间: 2024-09-16 05:35:27

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2310.05022v3

CCE: Sample Efficient Sparse Reward Policy Learning for Robotic Navigation via Confidence-Controlled Exploration

We introduce Confidence-Controlled Exploration (CCE), a novel exploration scheme designed to enhance the training sample efficiency of reinforcement learning (RL) algorithms for sparse reward settings such as robot navigation. Sparse rewards are common in RL and convenient to design and implement, but typically hard to deal with due to the challenges of exploration. Existing methods deploy regularization-based methods to deal with the exploration challenges. However, it is hard to characterize the balance between exploration and exploitation because regularization modifies the reward function itself, hence changing the objective we are optimizing for. In contrast to regularization-based approaches in the existing literature, our approach, CCE, is based on a novel relationship we provide between gradient estimation and policy entropy. CCE dynamically adjusts the number of samples of the gradient update used during training to control exploration. Interestingly, CCE can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC) for goal-reaching robotic navigation tasks. We demonstrate through simulated and real-world experiments that CCE outperforms conventional methods that employ constant trajectory lengths and entropy regularization when constraining the sample budget. For a fixed sample budget, CCE achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of CCE by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments.

Updated: 2024-09-16 05:34:01

标题: CCE：通过置信度控制的探索，在机器人导航中实现高效稀疏奖励策略学习

摘要: 我们介绍了一种名为置信度控制探索（CCE）的新颖探索方案，旨在增强稀疏奖励设置下强化学习（RL）算法的训练样本效率，例如机器人导航。稀疏奖励在RL中很常见，方便设计和实施，但通常很难处理由于探索挑战而产生的问题。现有方法采用基于正则化的方法来应对探索挑战。然而，由于正则化修改了奖励函数本身，因此很难确定探索和开发之间的平衡，从而改变了我们正在优化的目标。与现有文献中基于正则化方法相比，我们的方法CCE基于我们提供的梯度估计和策略熵之间的新颖关系。CCE动态调整训练过程中用于控制探索的梯度更新样本数量。有趣的是，CCE可以应用于现有的在线和离线RL方法，我们通过实验证明了其在三种流行的RL方法（REINFORCE，Proximal Policy Optimization（PPO）和Soft Actor-Critic（SAC））中的有效性：用于达到目标的机器人导航任务。我们通过模拟和现实世界实验表明，当限制样本预算时，CCE胜过采用恒定轨迹长度和熵正则化的常规方法。在固定样本预算下，CCE实现了导航成功率增加18％，导航路径长度减少20-38％，以及高程成本降低9.32％。此外，我们通过将CCE与Clearpath Husky机器人集成，展示了其在复杂的户外环境中的适用性。

更新时间: 2024-09-16 05:34:01

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.06192v7

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

Large Language models (LLMs), while powerful, exhibit harmful social biases. Debiasing is often challenging due to computational costs, data constraints, and potential degradation of multi-task language capabilities. This work introduces a novel approach utilizing ChatGPT to generate synthetic training data, aiming to enhance the debiasing of LLMs. We propose two strategies: Targeted Prompting, which provides effective debiasing for known biases but necessitates prior specification of bias in question; and General Prompting, which, while slightly less effective, offers debiasing across various categories. We leverage resource-efficient LLM debiasing using adapter tuning and compare the effectiveness of our synthetic data to existing debiasing datasets. Our results reveal that: (1) ChatGPT can efficiently produce high-quality training data for debiasing other LLMs; (2) data produced via our approach surpasses existing datasets in debiasing performance while also preserving internal knowledge of a pre-trained LLM; and (3) synthetic data exhibits generalizability across categories, effectively mitigating various biases, including intersectional ones. These findings underscore the potential of synthetic data in advancing the fairness of LLMs with minimal retraining cost.

Updated: 2024-09-16 05:28:43

标题: ChatGPT基于数据增强的LLM参数高效去偏见化

摘要: 大型语言模型(LLMs)虽然强大，但存在有害的社会偏见。去偏见通常具有挑战性，因为计算成本高、数据限制多，还可能降低多任务语言能力。本研究引入了一种新方法，利用ChatGPT生成合成训练数据，旨在增强LLMs的去偏见能力。我们提出了两种策略：有针对性的提示，为已知偏见提供有效的去偏见，但需要事先指定问题中的偏见；以及一般提示，虽然效果略逊一筹，但提供了跨不同类别的去偏见。我们利用资源高效的LLM去偏见，通过适配器调整比较我们的合成数据与现有去偏见数据集的有效性。我们的结果显示：(1) ChatGPT能够高效产生高质量的训练数据，用于去偏见其他LLMs；(2)通过我们的方法产生的数据在去偏见性能方面优于现有数据集，同时保留了预训练LLM的内部知识；以及(3)合成数据在各个类别间具有泛化性，有效缓解各种偏见，包括交叉性偏见。这些发现强调了合成数据在推动LLMs公平性方面的潜力，并且具有较低的重训练成本。

更新时间: 2024-09-16 05:28:43

领域: cs.CL,cs.AI,cs.CY,68T50,I.2.7; K.4.1

下载: http://arxiv.org/abs/2402.11764v2

V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

Advancements in autonomous driving have increasingly focused on end-to-end (E2E) systems that manage the full spectrum of driving tasks, from environmental perception to vehicle navigation and control. This paper introduces V2X-VLM, an innovative E2E vehicle-infrastructure cooperative autonomous driving (VICAD) framework with Vehicle-to-Everything (V2X) systems and large vision-language models (VLMs). V2X-VLM is designed to enhance situational awareness, decision-making, and ultimate trajectory planning by integrating multimodel data from vehicle-mounted cameras, infrastructure sensors, and textual information. The contrastive learning method is further employed to complement VLM by refining feature discrimination, assisting the model to learn robust representations of the driving environment. Evaluations on the DAIR-V2X dataset show that V2X-VLM outperforms state-of-the-art cooperative autonomous driving methods, while additional tests on corner cases validate its robustness in real-world driving conditions.

Updated: 2024-09-16 05:23:07

标题: V2X-VLM：基于大视觉语言模型的端到端V2X合作自动驾驶

摘要: 自动驾驶技术的发展越来越注重管理驾驶任务全谱的端到端（E2E）系统，从环境感知到车辆导航和控制。本文介绍了V2X-VLM，这是一种创新的端到端车辆基础设施合作自动驾驶（VICAD）框架，采用了车辆到一切（V2X）系统和大型视觉语言模型（VLM）。V2X-VLM旨在通过整合来自车载摄像头、基础设施传感器和文本信息的多模态数据，提升情境感知、决策制定和最终轨迹规划能力。对比学习方法进一步用于补充VLM，通过精细化特征区分来帮助模型学习驾驶环境的稳健表示。对DAIR-V2X数据集的评估显示，V2X-VLM胜过了最先进的合作自动驾驶方法，而对角落情况的额外测试验证了其在真实驾驶条件下的稳健性。

更新时间: 2024-09-16 05:23:07

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.09251v2

Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks.

Updated: 2024-09-16 05:09:57

标题: 大型语言模型（LLM）用于电信：原理、关键技术和机会的全面调查

摘要: 最近，大型语言模型（LLMs）由于其出色的理解和推理能力而受到了广泛关注，这导致了许多领域的巨大进展。LLM技术的进步也为自动化电信领域的许多任务提供了有希望的机会。在预训练和微调之后，LLMs可以根据人类指令执行各种下游任务，为人工智能通用智能（AGI）实现6G铺平道路。鉴于LLM技术的巨大潜力，本文旨在提供LLM启用的电信网络的全面概述。具体而言，我们首先介绍LLM的基础知识，包括模型架构、预训练、微调、推理和利用、模型评估以及电信部署。然后，我们介绍了LLM启用的关键技术和电信应用，涉及生成、分类、优化和预测问题。具体来说，LLM启用的生成应用包括电信领域知识、代码和网络配置生成。之后，基于LLM的分类应用涉及网络安全、文本、图像和流量分类问题。此外，还介绍了多种LLM启用的优化技术，如自动化奖励函数设计用于强化学习和语言强化学习。此外，对于LLM辅助的预测问题，我们讨论了电信的时间序列预测模型和多模态预测问题。最后，我们强调了挑战，并确定了LLM启用的电信网络的未来方向。

更新时间: 2024-09-16 05:09:57

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.10825v2

FreeMark: A Non-Invasive White-Box Watermarking for Deep Neural Networks

Deep neural networks (DNNs) have achieved significant success in real-world applications. However, safeguarding their intellectual property (IP) remains extremely challenging. Existing DNN watermarking for IP protection often require modifying DNN models, which reduces model performance and limits their practicality. This paper introduces FreeMark, a novel DNN watermarking framework that leverages cryptographic principles without altering the original host DNN model, thereby avoiding any reduction in model performance. Unlike traditional DNN watermarking methods, FreeMark innovatively generates secret keys from a pre-generated watermark vector and the host model using gradient descent. These secret keys, used to extract watermark from the model's activation values, are securely stored with a trusted third party, enabling reliable watermark extraction from suspect models. Extensive experiments demonstrate that FreeMark effectively resists various watermark removal attacks while maintaining high watermark capacity.

Updated: 2024-09-16 05:05:03

标题: FreeMark：一种用于深度神经网络的非侵入式白盒水印技术

摘要: 深度神经网络在现实世界的应用中取得了显著成功。然而，保护其知识产权（IP）仍然极具挑战性。现有的用于IP保护的DNN水印技术通常需要修改DNN模型，这会降低模型性能并限制其实用性。本文介绍了一种名为FreeMark的新型DNN水印框架，利用了加密原则而不修改原始主机DNN模型，从而避免降低模型性能。与传统的DNN水印方法不同，FreeMark通过梯度下降从预生成的水印向量和主机模型中创新地生成秘钥。这些秘钥用于从模型的激活值中提取水印，安全地存储在可信第三方，从而实现可靠地从可疑模型中提取水印。大量实验证明，FreeMark能够有效抵抗各种水印去除攻击，同时保持高水印容量。

更新时间: 2024-09-16 05:05:03

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.09996v1

SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning

The ability of neural networks to perform robotic perception and control tasks such as depth and optical flow estimation, simultaneous localization and mapping (SLAM), and automatic control has led to their widespread adoption in recent years. Deep Reinforcement Learning has been used extensively in these settings, as it does not have the unsustainable training costs associated with supervised learning. However, DeepRL suffers from poor sample efficiency, i.e., it requires a large number of environmental interactions to converge to an acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft Actor-Critic attempt to remedy this shortcoming but can not provide the explainability required in applications such as autonomous robotics. Humans intuitively understand the long-time-horizon sequential tasks common in robotics. Properly using such intuition can make RL policies more explainable while enhancing their sample efficiency. In this work, we propose SHIRE, a novel framework for encoding human intuition using Probabilistic Graphical Models (PGMs) and using it in the Deep RL training pipeline to enhance sample efficiency. Our framework achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost. Additionally, by teaching RL agents the encoded elementary behavior, SHIRE enhances policy explainability. A real-world demonstration further highlights the efficacy of policies trained using our framework.

Updated: 2024-09-16 04:46:22

标题: SHIRE：在强化学习中利用人类直觉提高样本效率

摘要: 神经网络在执行机器人感知和控制任务方面的能力，如深度和光流估计、同时定位和映射（SLAM）以及自动控制，导致它们在近年来被广泛采用。深度强化学习在这些环境中被广泛使用，因为它没有监督学习所伴随的不可持续的训练成本。然而，深度强化学习存在样本效率低的问题，即需要大量的环境交互才能收敛到可接受的解决方案。现代强化学习算法，如深度Q学习和软演员-评论家算法，试图解决这一缺点，但无法提供在自主机器人等应用中所需的可解释性。人类直观地理解机器人常见的长时间范围序贯任务。妥善利用这种直觉可以使强化学习策略更具可解释性，同时提高样本效率。在这项工作中，我们提出了SHIRE，一种利用概率图模型（PGMs）对人类直觉进行编码，并将其应用于深度强化学习训练流水线以增强样本效率的新框架。我们的框架在我们评估的环境中实现了25-78%的样本效率收益，且成本极低。此外，通过教导RL代理人编码的基本行为，SHIRE增强了策略的可解释性。一个真实世界的演示进一步突显了使用我们框架训练的策略的有效性。

更新时间: 2024-09-16 04:46:22

领域: cs.LG,cs.NE,cs.RO

下载: http://arxiv.org/abs/2409.09990v1

Comprehensive Study on Sentiment Analysis: From Rule-based to modern LLM based system

This paper provides a comprehensive survey of sentiment analysis within the context of artificial intelligence (AI) and large language models (LLMs). Sentiment analysis, a critical aspect of natural language processing (NLP), has evolved significantly from traditional rule-based methods to advanced deep learning techniques. This study examines the historical development of sentiment analysis, highlighting the transition from lexicon-based and pattern-based approaches to more sophisticated machine learning and deep learning models. Key challenges are discussed, including handling bilingual texts, detecting sarcasm, and addressing biases. The paper reviews state-of-the-art approaches, identifies emerging trends, and outlines future research directions to advance the field. By synthesizing current methodologies and exploring future opportunities, this survey aims to understand sentiment analysis in the AI and LLM context thoroughly.

Updated: 2024-09-16 04:44:52

标题: 情感分析的综合研究：从基于规则的系统到现代LLM系统

摘要: 这篇论文在人工智能（AI）和大型语言模型（LLMs）的背景下全面调查了情感分析。情感分析是自然语言处理（NLP）中的一个关键方面，从传统基于规则的方法发展到先进的深度学习技术。本研究考察了情感分析的历史发展，突出了从基于词典和基于模式的方法过渡到更复杂的机器学习和深度学习模型。讨论了关键挑战，包括处理双语文本、检测讽刺和解决偏见。该论文审查了最新技术方法，确定了新兴趋势，并概述了未来研究方向以推动该领域发展。通过综合当前方法论并探索未来机遇，本调查旨在全面了解AI和LLM背景下的情感分析。

更新时间: 2024-09-16 04:44:52

领域: cs.CL,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2409.09989v1

Graphical Structural Learning of rs-fMRI data in Heavy Smokers

Recent studies revealed structural and functional brain changes in heavy smokers. However, the specific changes in topological brain connections are not well understood. We used Gaussian Undirected Graphs with the graphical lasso algorithm on rs-fMRI data from smokers and non-smokers to identify significant changes in brain connections. Our results indicate high stability in the estimated graphs and identify several brain regions significantly affected by smoking, providing valuable insights for future clinical research.

Updated: 2024-09-16 04:42:10

标题: 重度吸烟者静息态功能磁共振成像数据的图形结构学习

摘要: 最近的研究揭示了重度吸烟者大脑结构和功能的变化。然而，大脑拓扑连接中具体的变化尚不明确。我们利用高斯无向图与图形套索算法对吸烟者和非吸烟者的静息态功能磁共振数据进行分析，以识别大脑连接中的显著变化。我们的结果表明，估计的图形具有高稳定性，并确定了几个受吸烟显著影响的大脑区域，为未来临床研究提供了宝贵的见解。

更新时间: 2024-09-16 04:42:10

领域: q-bio.QM,cs.LG,stat.AP

下载: http://arxiv.org/abs/2409.08395v2

Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate

The sharpness-aware minimization (SAM) algorithm and its variants, including gap guided SAM (GSAM), have been successful at improving the generalization capability of deep neural network models by finding flat local minima of the empirical loss in training. Meanwhile, it has been shown theoretically and practically that increasing the batch size or decaying the learning rate avoids sharp local minima of the empirical loss. In this paper, we consider the GSAM algorithm with increasing batch sizes or decaying learning rates, such as cosine annealing or linear learning rate, and theoretically show its convergence. Moreover, we numerically compare SAM (GSAM) with and without an increasing batch size and conclude that using an increasing batch size or decaying learning rate finds flatter local minima than using a constant batch size and learning rate.

Updated: 2024-09-16 04:27:11

标题: 使用逐渐增大的批量大小和衰减的学习率的锐度感知最小化算法的收敛性

摘要: 锐度感知最小化（SAM）算法及其变体，包括指导型SAM（GSAM），通过在训练中找到经验损失的平坦局部最小值成功改善深度神经网络模型的泛化能力。与此同时，理论上和实践中已经证明，增加批量大小或衰减学习率可以避免经验损失的尖锐局部最小值。本文考虑使用增加批量大小或衰减学习率（如余弦退火或线性学习率）的GSAM算法，并在理论上证明其收敛性。此外，我们在数值上比较了带有增加批量大小和不带有增加批量大小的SAM（GSAM），并得出结论：使用增加批量大小或衰减学习率找到的平坦局部最小值要比使用恒定批量大小和学习率找到的局部最小值更平坦。

更新时间: 2024-09-16 04:27:11

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2409.09984v1

Elliptic Curve Pairing Stealth Address Protocols

Protecting the privacy of blockchain transactions is extremely important for users. Stealth address protocols (SAP) allow users to receive assets via stealth addresses that they do not associate with their stealth meta-addresses. SAP can be generated using different cryptographic approaches. DKSAP uses an elliptic curve multiplication and hashing of the resulting shared secret. Another approach is to use a elliptic curve pairing. This paper presents four SA protocols that use elliptic curve pairing as a cryptographic solution. ECPDKSAPs are pairing-based protocols that include viewing key and spending key, while ECPSKSAP is a pairing-based protocol that uses a single key with which spending and the viewing key are derived. We find that ECPDKSAPs give significantly better results than DKSAP with the view tag. The best results are achieved with Protocol 3 (Elliptic Curve Pairing Dual Key Stealth Address Protocol), which is Ethereum-friendly. ECPSKSAP is significantly slower, but it provides an interesting theoretical result as it uses only one private key.

Updated: 2024-09-16 04:25:46

标题: 椭圆曲线配对隐私地址协议

摘要: 保护区块链交易的隐私对用户来说是非常重要的。隐形地址协议（SAP）允许用户通过隐形地址接收资产，而这些资产与他们的隐形元地址没有关联。SAP可以使用不同的加密方法生成。DKSAP使用椭圆曲线乘法和哈希生成共享密钥。另一种方法是使用椭圆曲线配对。本文介绍了四种使用椭圆曲线配对作为加密解决方案的SA协议。ECPDKSAP是基于配对的协议，包括查看密钥和消费密钥，而ECPSKSAP是一种基于配对的协议，使用单个密钥来派生消费和查看密钥。我们发现，ECPDKSAP的效果比带有查看标签的DKSAP要好得多。在协议3（椭圆曲线配对双密钥隐形地址协议）中取得了最佳结果，该协议对以太坊友好。虽然ECPSKSAP明显较慢，但它提供了一个有趣的理论结果，因为它只使用一个私钥。

更新时间: 2024-09-16 04:25:46

领域: cs.CR

下载: http://arxiv.org/abs/2312.12131v4

From Bytes to Bites: Using Country Specific Machine Learning Models to Predict Famine

Hunger crises are critical global issues affecting millions, particularly in low-income and developing countries. This research investigates how machine learning can be utilized to predict and inform decisions regarding famine and hunger crises. By leveraging a diverse set of variables (natural, economic, and conflict-related), three machine learning models (Linear Regression, XGBoost, and RandomForestRegressor) were employed to predict food consumption scores, a key indicator of household nutrition. The RandomForestRegressor emerged as the most accurate model, with an average prediction error of 10.6%, though accuracy varied significantly across countries, ranging from 2% to over 30%. Notably, economic indicators were consistently the most significant predictors of average household nutrition, while no single feature dominated across all regions, underscoring the necessity for comprehensive data collection and tailored, country-specific models. These findings highlight the potential of machine learning, particularly Random Forests, to enhance famine prediction, suggesting that continued research and improved data gathering are essential for more effective global hunger forecasting.

Updated: 2024-09-16 04:23:06

标题: 从字节到食物：使用特定国家的机器学习模型来预测饥荒

摘要: 饥饿危机是影响数百万人的关键全球问题，尤其是在低收入和发展中国家。本研究调查了如何利用机器学习来预测和指导饥荒和饥饿危机的决策。通过利用多样化的变量（自然、经济和冲突相关），采用了三种机器学习模型（线性回归、XGBoost和RandomForestRegressor）来预测食物消费得分，这是家庭营养的关键指标。RandomForestRegressor模型表现为最准确的模型，平均预测误差为10.6%，尽管准确性在各国之间有显著差异，范围从2%到超过30%。值得注意的是，经济指标一直是平均家庭营养的最重要预测因素，而没有单一特征在所有地区都占主导地位，强调了全面数据收集和量身定制的国家特定模型的必要性。这些发现突显了机器学习，特别是随机森林，在增强饥荒预测方面的潜力，表明持续的研究和改善数据收集对于更有效地全球饥饿预测至关重要。

更新时间: 2024-09-16 04:23:06

领域: cs.LG

下载: http://arxiv.org/abs/2409.09980v1

WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency

Recent advancements in single image super-resolution have been predominantly driven by token mixers and transformer architectures. WaveMixSR utilized the WaveMix architecture, employing a two-dimensional discrete wavelet transform for spatial token mixing, achieving superior performance in super-resolution tasks with remarkable resource efficiency. In this work, we present an enhanced version of the WaveMixSR architecture by (1) replacing the traditional transpose convolution layer with a pixel shuffle operation and (2) implementing a multistage design for higher resolution tasks ($4\times$). Our experiments demonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms other architectures in multiple super-resolution tasks, achieving state-of-the-art for the BSD100 dataset, while also consuming fewer resources, exhibits higher parameter efficiency, lower latency and higher throughput. Our code is available at https://github.com/pranavphoenix/WaveMixSR.

Updated: 2024-09-16 04:16:52

标题: WaveMixSR-V2：提高超分辨率效率的方法

摘要: 最近单图像超分辨率的进展主要是由标记混合器和变压器架构推动的。WaveMixSR利用WaveMix架构，采用二维离散小波变换进行空间标记混合，在超分辨率任务中实现了出色的性能，资源效率显著。在这项工作中，我们通过（1）用像素混洗操作替换传统的转置卷积层和（2）实现多阶段设计来提高分辨率任务（$4\times$）。我们的实验表明，我们增强的模型-- WaveMixSR-V2--在多个超分辨率任务中胜过其他架构，在BSD100数据集中达到了最先进水平，同时消耗更少的资源，表现出更高的参数效率，较低的延迟和更高的吞吐量。我们的代码可以在https://github.com/pranavphoenix/WaveMixSR上找到。

更新时间: 2024-09-16 04:16:52

领域: eess.IV,cs.AI,cs.CV,cs.LG,I.2.10; I.4.0; I.4.1; I.4.2; I.4.6; I.4.7; I.4.8; I.4.9; I.4.10; I.2.10; I.5.1; I.5.2; I.5.4; I.4.3; I.4.4; I.4.5

下载: http://arxiv.org/abs/2409.10582v1

Trading Devil: Robust backdoor attack via Stochastic investment models and Bayesian approach

With the growing use of voice-activated systems and speech recognition technologies, the danger of backdoor attacks on audio data has grown significantly. This research looks at a specific type of attack, known as a Stochastic investment-based backdoor attack (MarketBack), in which adversaries strategically manipulate the stylistic properties of audio to fool speech recognition systems. The security and integrity of machine learning models are seriously threatened by backdoor attacks, in order to maintain the reliability of audio applications and systems, the identification of such attacks becomes crucial in the context of audio data. Experimental results demonstrated that MarketBack is feasible to achieve an average attack success rate close to 100% in seven victim models when poisoning less than 1% of the training data.

Updated: 2024-09-16 04:16:35

标题: 交易恶魔：通过随机投资模型和贝叶斯方法进行强大的后门攻击

摘要: 随着语音激活系统和语音识别技术的日益普及，对音频数据进行后门攻击的危险显著增加。本研究探讨了一种特定类型的攻击，称为随机投资型后门攻击（MarketBack），对手敌人有策略地操纵音频的风格属性，以欺骗语音识别系统。后门攻击严重威胁着机器学习模型的安全性和完整性，为了维护音频应用和系统的可靠性，识别此类攻击在音频数据的背景下变得至关重要。实验结果表明，MarketBack 在毒化不到 1% 的训练数据时，可以实现在七个受害模型中平均攻击成功率接近 100%。

更新时间: 2024-09-16 04:16:35

领域: cs.CR,cs.LG,q-fin.CP,q-fin.ST,stat.ML

下载: http://arxiv.org/abs/2406.10719v4

Context-Conditioned Spatio-Temporal Predictive Learning for Reliable V2V Channel Prediction

Achieving reliable multidimensional Vehicle-to-Vehicle (V2V) channel state information (CSI) prediction is both challenging and crucial for optimizing downstream tasks that depend on instantaneous CSI. This work extends traditional prediction approaches by focusing on four-dimensional (4D) CSI, which includes predictions over time, bandwidth, and antenna (TX and RX) space. Such a comprehensive framework is essential for addressing the dynamic nature of mobility environments within intelligent transportation systems, necessitating the capture of both temporal and spatial dependencies across diverse domains. To address this complexity, we propose a novel context-conditioned spatiotemporal predictive learning method. This method leverages causal convolutional long short-term memory (CA-ConvLSTM) to effectively capture dependencies within 4D CSI data, and incorporates context-conditioned attention mechanisms to enhance the efficiency of spatiotemporal memory updates. Additionally, we introduce an adaptive meta-learning scheme tailored for recurrent networks to mitigate the issue of accumulative prediction errors. We validate the proposed method through empirical studies conducted across three different geometric configurations and mobility scenarios. Our results demonstrate that the proposed approach outperforms existing state-of-the-art predictive models, achieving superior performance across various geometries. Moreover, we show that the meta-learning framework significantly enhances the performance of recurrent-based predictive models in highly challenging cross-geometry settings, thus highlighting its robustness and adaptability.

Updated: 2024-09-16 04:15:36

标题: 上下文条件化的时空预测学习用于可靠的V2V信道预测

摘要: 实现可靠的多维车辆对车辆（V2V）信道状态信息（CSI）预测既具有挑战性，也至关重要，因为这对于优化依赖于瞬时CSI的下游任务非常关键。本文通过专注于四维（4D）CSI来扩展传统预测方法，其中包括对时间、带宽和天线（发射和接收）空间的预测。这样一个全面的框架对于解决智能交通系统中移动环境的动态特性至关重要，需要跨越多个领域捕获时间和空间依赖关系。为了解决这种复杂性，我们提出了一种新颖的上下文条件的时空预测学习方法。该方法利用因果卷积长短期记忆（CA-ConvLSTM）有效地捕获4D CSI数据内的依赖关系，并结合上下文条件的注意机制来增强时空记忆更新的效率。此外，我们引入了一种针对循环网络量身定制的自适应元学习方案，以减轻累积预测误差问题。我们通过在三种不同的几何配置和移动场景下进行的实证研究来验证所提出的方法。我们的结果表明，所提出的方法优于现有的最先进的预测模型，在各种几何结构下表现出卓越的性能。此外，我们展示了元学习框架如何显著提高循环式预测模型在高度具有挑战性的跨几何设置中的性能，从而突显其鲁棒性和适应性。

更新时间: 2024-09-16 04:15:36

领域: eess.SY,cs.LG,cs.NI,cs.SY

下载: http://arxiv.org/abs/2409.09978v1

Artificial Intelligence-Based Opportunistic Coronary Calcium Screening in the Veterans Affairs National Healthcare System

Coronary artery calcium (CAC) is highly predictive of cardiovascular events. While millions of chest CT scans are performed annually in the United States, CAC is not routinely quantified from scans done for non-cardiac purposes. A deep learning algorithm was developed using 446 expert segmentations to automatically quantify CAC on non-contrast, non-gated CT scans (AI-CAC). Our study differs from prior works as we leverage imaging data across the Veterans Affairs national healthcare system, from 98 medical centers, capturing extensive heterogeneity in imaging protocols, scanners, and patients. AI-CAC performance on non-gated scans was compared against clinical standard ECG-gated CAC scoring. Non-gated AI-CAC differentiated zero vs. non-zero and less than 100 vs. 100 or greater Agatston scores with accuracies of 89.4% (F1 0.93) and 87.3% (F1 0.89), respectively, in 795 patients with paired gated scans within a year of a non-gated CT scan. Non-gated AI-CAC was predictive of 10-year all-cause mortality (CAC 0 vs. >400 group: 25.4% vs. 60.2%, Cox HR 3.49, p < 0.005), and composite first-time stroke, MI, or death (CAC 0 vs. >400 group: 33.5% vs. 63.8%, Cox HR 3.00, p < 0.005). In a screening dataset of 8,052 patients with low-dose lung cancer-screening CTs (LDCT), 3,091/8,052 (38.4%) individuals had AI-CAC >400. Four cardiologists qualitatively reviewed LDCT images from a random sample of >400 AI-CAC patients and verified that 527/531 (99.2%) would benefit from lipid-lowering therapy. To the best of our knowledge, this is the first non-gated CT CAC algorithm developed across a national healthcare system, on multiple imaging protocols, without filtering intra-cardiac hardware, and compared against a strong gated CT reference. We report superior performance relative to previous CAC algorithms evaluated against paired gated scans that included patients with intra-cardiac hardware.

Updated: 2024-09-16 03:59:01

标题: 基于人工智能的机会性冠状动脉钙化筛查在退伍军人事务局国家医疗系统中的应用

摘要: 冠状动脉钙化（CAC）高度预测心血管事件。尽管每年在美国进行数百万次胸部CT扫描，但CAC并不是常规从非心脏目的扫描中定量的。利用446个专家分割开发了一种深度学习算法，用于自动定量非对比、非门控CT扫描上的CAC（AI-CAC）。我们的研究与先前的工作不同，因为我们利用了横跨退伍军人事务部全国医疗系统的影像数据，来自98个医疗中心，捕捉了影像协议、扫描仪和患者的广泛异质性。非门控AI-CAC在795名在一年内进行了非门控CT扫描的患者中与临床标准心电图门控CAC评分进行了比较。非门控AI-CAC在区分零和非零及小于100和100或更大的Agatston分数方面的准确率分别为89.4％（F1 0.93）和87.3％（F1 0.89）。非门控AI-CAC对10年全因死亡率（CAC 0 vs. >400组：25.4％vs. 60.2％，Cox HR 3.49，p <0.005）和合并首次中风、心肌梗死或死亡（CAC 0 vs. >400组：33.5％vs. 63.8％，Cox HR 3.00，p <0.005）具有预测性。在一组进行低剂量肺癌筛查CT的8,052名患者中，有3,091/8,052（38.4％）个体的AI-CAC>400。四名心脏病专家从一个随机样本中回顾了>400 AI-CAC患者的LDCT图像，并确认527/531（99.2％）将受益于降脂治疗。据我们所知，这是在全国医疗系统中开发的第一个非门控CT CAC算法，使用多种影像协议，无需过滤心内硬件，并与强门控CT参考进行比较。我们报告相对于先前针对具有心内硬件的患者进行评估的CAC算法的优越表现。

更新时间: 2024-09-16 03:59:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.09968v1

InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation

We present InterACT: Inter-dependency aware Action Chunking with Hierarchical Attention Transformers, a novel imitation learning framework for bimanual manipulation that integrates hierarchical attention to capture inter-dependencies between dual-arm joint states and visual inputs. InterACT consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both designed to enhance information aggregation and coordination. The encoder processes multi-modal inputs through segment-wise and cross-segment attention mechanisms, while the decoder leverages synchronization blocks to refine individual action predictions, providing the counterpart's prediction as context. Our experiments on a variety of simulated and real-world bimanual manipulation tasks demonstrate that InterACT significantly outperforms existing methods. Detailed ablation studies validate the contributions of key components of our work, including the impact of CLS tokens, cross-segment encoders, and synchronization blocks.

Updated: 2024-09-16 03:34:47

标题: InterACT：具有分层注意力变换器的双手操作的相互依赖意识行动分块

摘要: 我们提出了InterACT：具有分层注意力变换器的互相依赖感知行动分块，这是一种新颖的双手操纵模仿学习框架，集成了分层注意力以捕捉双臂关节状态和视觉输入之间的相互依赖关系。InterACT由分层注意力编码器和多臂解码器组成，两者均旨在增强信息聚合和协调。编码器通过分段和跨段的注意力机制处理多模态输入，而解码器利用同步块来优化单独的动作预测，提供对方的预测作为上下文。我们在各种模拟和真实世界的双手操纵任务上的实验表明，InterACT明显优于现有方法。详细的消融研究验证了我们工作的关键组件的贡献，包括CLS令牌、跨段编码器和同步块的影响。

更新时间: 2024-09-16 03:34:47

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.07914v2

An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

In recent years, significant progress has been made in multi-objective reinforcement learning (RL) research, which aims to balance multiple objectives by incorporating preferences for each objective. In most existing studies, specific preferences must be provided during deployment to indicate the desired policies explicitly. However, designing these preferences depends heavily on human prior knowledge, which is typically obtained through extensive observation of high-performing demonstrations with expected behaviors. In this work, we propose a simple yet effective offline adaptation framework for multi-objective RL problems without assuming handcrafted target preferences, but only given several demonstrations to implicitly indicate the preferences of expected policies. Additionally, we demonstrate that our framework can naturally be extended to meet constraints on safety-critical objectives by utilizing safe demonstrations, even when the safety thresholds are unknown. Empirical results on offline multi-objective and safe tasks demonstrate the capability of our framework to infer policies that align with real preferences while meeting the constraints implied by the provided demonstrations.

Updated: 2024-09-16 03:08:09

标题: 一个用于受限多目标强化学习的离线适应框架

摘要: 近年来，在多目标强化学习（RL）研究领域取得了重大进展，旨在通过将每个目标的偏好纳入考虑来平衡多个目标。在大多数现有研究中，必须在部署过程中提供具体偏好，以明确指示所需的策略。然而，设计这些偏好在很大程度上取决于人类先验知识，通常通过对表现良好的演示进行广泛观察以获取预期行为而获得。在这项工作中，我们提出了一个简单而有效的离线适应框架，用于解决多目标RL问题，而不假设手工制作目标偏好，而仅仅给出几个演示，以暗示预期政策的偏好。此外，我们展示了我们的框架如何能够自然扩展以满足安全关键目标上的约束，通过利用安全演示，即使安全阈值未知。离线多目标和安全任务上的实证结果表明，我们的框架能够推断出符合真实偏好的策略，同时满足所提供演示所暗示的约束。

更新时间: 2024-09-16 03:08:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.09958v1

Deep Graph Anomaly Detection: A Survey and New Perspectives

Graph anomaly detection (GAD), which aims to identify unusual graph instances (nodes, edges, subgraphs, or graphs), has attracted increasing attention in recent years due to its significance in a wide range of applications. Deep learning approaches, graph neural networks (GNNs) in particular, have been emerging as a promising paradigm for GAD, owing to its strong capability in capturing complex structure and/or node attributes in graph data. Considering the large number of methods proposed for GNN-based GAD, it is of paramount importance to summarize the methodologies and findings in the existing GAD studies, so that we can pinpoint effective model designs for tackling open GAD problems. To this end, in this work we aim to present a comprehensive review of deep learning approaches for GAD. Existing GAD surveys are focused on task-specific discussions, making it difficult to understand the technical insights of existing methods and their limitations in addressing some unique challenges in GAD. To fill this gap, we first discuss the problem complexities and their resulting challenges in GAD, and then provide a systematic review of current deep GAD methods from three novel perspectives of methodology, including GNN backbone design, proxy task design for GAD, and graph anomaly measures. To deepen the discussions, we further propose a taxonomy of 13 fine-grained method categories under these three perspectives to provide more in-depth insights into the model designs and their capabilities. To facilitate the experiments and validation, we also summarize a collection of widely-used GAD datasets and empirical comparison. We further discuss multiple open problems to inspire more future high-quality research. A continuously updated repository for datasets, links to the codes of algorithms, and empirical comparison is available at https://github.com/mala-lab/Awesome-Deep-Graph-Anomaly-Detection.

Updated: 2024-09-16 03:05:11

标题: 深度图异常检测：调查和新视角

摘要: 图形异常检测（GAD）旨在识别异常的图实例（节点、边、子图或图），近年来受到越来越多的关注，因为它在各种应用中的重要性。深度学习方法，尤其是图神经网络（GNN），已经成为GAD的有前途的范式，因为它在捕获图数据中复杂结构和/或节点属性方面具有强大的能力。考虑到为基于GNN的GAD提出的大量方法，总结现有GAD研究中的方法和发现至关重要，这样我们就可以找出解决开放GAD问题的有效模型设计。为此，在这项工作中，我们旨在对GAD的深度学习方法进行全面审查。现有的GAD调查集中在特定任务的讨论上，这使得很难理解现有方法的技术见解以及它们在解决GAD中的一些独特挑战方面的限制。为了填补这一空白，我们首先讨论了GAD中的问题复杂性及其导致的挑战，然后从方法学的三个新颖视角（包括GNN骨干设计、GAD的代理任务设计和图异常度量）对当前的深度GAD方法进行系统回顾。为了深入讨论，我们进一步提出了这三个视角下的13个细粒度方法类别的分类法，以提供更深入的洞察模型设计及其能力。为了促进实验和验证，我们还总结了一些广泛使用的GAD数据集和实证比较。我们进一步讨论了多个开放问题，以激发更多未来高质量的研究。一个持续更新的数据集存储库，链接到算法代码及实证比较可在https://github.com/mala-lab/Awesome-Deep-Graph-Anomaly-Detection找到。

更新时间: 2024-09-16 03:05:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.09957v1

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as a valuable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed ......

Updated: 2024-09-16 02:54:50

标题: 参数高效微调大型模型：一项全面调查 (Note: This translation may vary depending on the context of the document.)

摘要: 大型模型代表了多个应用领域的突破性进展，使得在各种任务中取得了显著的成就。然而，它们前所未有的规模也带来了显著的计算成本。这些模型通常由数十亿个参数组成，需要大量的计算资源来执行。特别是，在为特定的下游任务定制这些模型时，其庞大的规模和计算需求会在受限于计算能力的硬件平台上提出重大挑战。参数高效微调（PEFT）通过有效地调整大型模型以适应各种下游任务，提供了一个实际的解决方案。具体而言，PEFT指的是调整预训练的大型模型的参数，以适应特定任务或领域，同时最大限度地减少引入的额外参数或所需的计算资源数量。当涉及具有高参数数量的大规模语言模型时，这种方法尤为重要，因为从头开始微调这些模型可能计算成本高，资源密集，对支持系统平台设计提出了重大挑战。在本调查中，我们对各种PEFT算法进行了全面研究，考察了它们的性能和计算开销。此外，我们还概述了使用不同PEFT算法开发的应用程序，并讨论了用于缓解PEFT计算成本的常见技术。除了从算法角度提供广泛调查外，我们还研究了各种真实世界系统设计，以调查与不同PEFT方法相关的实施成本。这项调查为那些希望了解PEFT算法及其系统实施的研究人员提供了有价值的资源，提供了详细的......

更新时间: 2024-09-16 02:54:50

领域: cs.LG

下载: http://arxiv.org/abs/2403.14608v7

Optimal ablation for interpretability

Interpretability studies often involve tracing the flow of information through machine learning models to identify specific model components that perform relevant computations for tasks of interest. Prior work quantifies the importance of a model component on a particular task by measuring the impact of performing ablation on that component, or simulating model inference with the component disabled. We propose a new method, optimal ablation (OA), and show that OA-based component importance has theoretical and empirical advantages over measuring importance via other ablation methods. We also show that OA-based component importance can benefit several downstream interpretability tasks, including circuit discovery, localization of factual recall, and latent prediction.

Updated: 2024-09-16 02:45:54

标题: 最佳消融以提高可解释性

摘要: 可解释性研究通常涉及追踪信息在机器学习模型中的流动，以识别执行感兴趣任务的特定模型组件。先前的工作通过在该组件上执行消蚀或模拟禁用该组件的模型推理来量化模型组件对特定任务的重要性。我们提出了一种新方法，最优消蚀（OA），并展示了OA基于组件重要性在理论和实证上的优势，超过了通过其他消蚀方法衡量重要性。我们还展示了基于OA的组件重要性可以使多个下游可解释性任务受益，包括电路发现、事实召回的定位和潜在预测。

更新时间: 2024-09-16 02:45:54

领域: cs.LG

下载: http://arxiv.org/abs/2409.09951v1

Enhancing Industrial Cybersecurity: SoftHSM Implementation on SBCs for Mitigating MITM Attacks

The rapid growth of industrial technology, driven by automation, IoT, and cloud computing, has also increased the risk of cyberattacks, such as Man-in-the-Middle (MITM) attacks. A standard solution to protect data is using a Hardware Security Module (HSM), but its high implementation cost has led to the development of a more affordable alternative: SoftHSM. This software-based module manages encryption and decryption keys using cryptographic algorithms. This study simulates the use of SoftHSM on a single-board computer (SBC) to enhance industrial system security and cost-effectively mitigate MITM attacks. The security system integrates AES and RSA cryptographic algorithms, with SoftHSM handling RSA key storage. The results show that HSM protects RSA private keys from extraction attempts, ensuring data security. In terms of performance, the system achieved an average encryption time of 3.29 seconds, a slot access time of 0.018 seconds, and a decryption time of 2.558 seconds. It also demonstrated efficient memory usage, with 37.24% for encryption and 24.24% for decryption, while consuming 5.20 V and 0.72 A during processing.

Updated: 2024-09-16 02:40:02

标题: 提升工业网络安全性：在单板计算机上实施SoftHSM以减轻中间人攻击

摘要: 随着自动化、物联网和云计算推动的工业技术的迅速增长，网络攻击的风险也在增加，如中间人攻击（MITM）。保护数据的标准解决方案是使用硬件安全模块（HSM），但其高昂的实施成本导致了更经济的替代方案的开发：SoftHSM。这个基于软件的模块使用密码算法来管理加密和解密密钥。本研究模拟了在单板计算机（SBC）上使用SoftHSM来增强工业系统安全性并有效地减轻MITM攻击。安全系统集成了AES和RSA密码算法，SoftHSM负责处理RSA密钥存储。结果显示，HSM保护了RSA私钥免受提取尝试，确保了数据安全。在性能方面，系统实现了平均加密时间为3.29秒，槽访问时间为0.018秒，解密时间为2.558秒。它还展示了高效的内存使用率，加密占37.24%，解密占24.24%，在处理过程中消耗了5.20V和0.72A。

更新时间: 2024-09-16 02:40:02

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2409.09948v1

Tracking the spatial dynamics of the synthetic opioid crisis in the USA, 2013-2020 using human mobility-based graph neural network

Synthetic opioids are the most common drugs involved in drug-involved overdose mortalities in the U.S. The Center for Disease Control and Prevention reported that in 2018, about 70% of all drug overdose deaths involved opioids and 67% of all opioid-involved deaths were accounted for by synthetic opioids. In this study, we investigated the spread of synthetic opioids between 2013 and 2020 in the U.S., and analyzed the relationship between the spatiotemporal pattern of synthetic opioid-involved deaths and another key opioid, heroin, and compared patterns of deaths involving these two types of drugs during this time period. Spatial connections between counties were incorporated into a graph convolutional neural network model to represent and analyze the spread of synthetic opioid-involved deaths, and in the context of heroin-involved deaths.

Updated: 2024-09-16 02:37:51

标题: 追踪美国2013年至2020年合成阿片类药物危机的空间动态，利用基于人类流动的图神经网络

摘要: 合成阿片类药物是美国药物涉及过量死亡中最常见的药物。疾病控制和预防中心报告称，2018年，约70%的药物过量死亡涉及阿片类药物，其中67%的阿片类死亡案例由合成阿片类药物造成。本研究调查了2013年至2020年美国合成阿片类药物的传播情况，并分析了合成阿片类药物涉及死亡的时空模式与另一种关键阿片类药物海洛因之间的关系，比较了这段时间内涉及这两种药物的死亡模式。县际之间的空间连接被纳入图卷积神经网络模型中，以表示和分析合成阿片类药物涉及死亡的传播情况，并结合海洛因涉及死亡的背景。

更新时间: 2024-09-16 02:37:51

领域: cs.LG,cs.CY,physics.soc-ph

下载: http://arxiv.org/abs/2409.09945v1

Fault Analysis And Predictive Maintenance Of Induction Motor Using Machine Learning

Induction motors are one of the most crucial electrical equipment and are extensively used in industries in a wide range of applications. This paper presents a machine learning model for the fault detection and classification of induction motor faults by using three phase voltages and currents as inputs. The aim of this work is to protect vital electrical components and to prevent abnormal event progression through early detection and diagnosis. This work presents a fast forward artificial neural network model to detect some of the commonly occurring electrical faults like overvoltage, under voltage, single phasing, unbalanced voltage, overload, ground fault. A separate model free monitoring system wherein the motor itself acts like a sensor is presented and the only monitored signals are the input given to the motor. Limits for current and voltage values are set for the faulty and healthy conditions, which is done by a classifier. Real time data from a 0.33 HP induction motor is used to train and test the neural network. The model so developed analyses the voltage and current values given at a particular instant and classifies the data into no fault or the specific fault. The model is then interfaced with a real motor to accurately detect and classify the faults so that further necessary action can be taken.

Updated: 2024-09-16 02:37:07

标题: 感应电动机的故障分析和预防性维护利用机器学习

摘要: 感应电动机是最关键的电气设备之一，在各种应用中广泛应用于工业中。本文提出了一种机器学习模型，用于利用三相电压和电流作为输入来检测和分类感应电动机故障。本工作的目的是通过早期检测和诊断来保护重要的电气元件，并防止异常事件的进展。本工作提出了一个快速前向人工神经网络模型，用于检测一些常见的电气故障，如过压、欠压、单相失灵、不平衡电压、过载、接地故障。提出了一个独立的无模型监测系统，其中电动机本身充当传感器，仅监测给电动机的输入信号。为故障和健康状态设定了电流和电压值的限制，这是由分类器完成的。使用0.33马力感应电动机的实时数据来训练和测试神经网络。所开发的模型分析了在特定时刻给定的电压和电流值，并将数据分类为无故障或特定故障。然后将该模型与实际电动机接口，以准确检测和分类故障，从而进一步采取必要的措施。

更新时间: 2024-09-16 02:37:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.09944v1

An Efficient Learning-Based Solver for Two-Stage DC Optimal Power Flow with Feasibility Guarantees

In this paper, we consider the scenario-based two-stage stochastic DC optimal power flow (OPF) problem for optimal and reliable dispatch when the load is facing uncertainty. Although this problem is a linear program, it remains computationally challenging to solve due to the large number of scenarios needed to accurately represent the uncertainties. To mitigate the computational issues, many techniques have been proposed to approximate the second-stage decisions so they can be dealt more efficiently. The challenge of finding good policies to approximate the second-stage decisions is that these solutions need to be feasible, which has been difficult to achieve with existing policies. To address these challenges, this paper proposes a learning method to solve the two-stage problem in a more efficient and optimal way. A technique called the gauge map is incorporated into the learning architecture design to guarantee the learned solutions' feasibility to the network constraints. Namely, we can design policies that are feed forward functions and only output feasible solutions. Simulation results on standard IEEE systems show that, compared to iterative solvers and the widely used affine policy, our proposed method not only learns solutions of good quality but also accelerates the computation by orders of magnitude.

Updated: 2024-09-16 02:35:01

标题: 一个高效的基于学习的求解器，用于带有可行性保证的两阶段直流最优潮流问题

摘要: 在本文中，我们考虑了基于场景的两阶段随机直流最优潮流（OPF）问题，用于在负载面临不确定性时实现最佳和可靠的调度。尽管这个问题是一个线性规划问题，但由于需要大量场景来准确表示不确定性，解决它仍然具有计算挑战性。为了缓解计算问题，许多技术已被提出来近似第二阶段的决策，以便更有效地处理。近似第二阶段决策的挑战在于找到良好的策略，使这些解决方案需要可行性，而现有策略很难实现这一目标。为了解决这些挑战，本文提出了一种学习方法来更高效和更优化地解决两阶段问题。一种名为“标尺映射”的技术被纳入到学习架构设计中，以确保所学解决方案符合网络约束的可行性。也就是说，我们可以设计出仅输出可行解决方案的前馈函数策略。在标准IEEE系统上的模拟结果显示，与迭代求解器和广泛使用的仿射策略相比，我们提出的方法不仅学习到了质量良好的解决方案，而且加快了计算速度数个数量级。

更新时间: 2024-09-16 02:35:01

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2304.01409v2

Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code

This paper introduces SGCode, a flexible prompt-optimizing system to generate secure code with large language models (LLMs). SGCode integrates recent prompt-optimization approaches with LLMs in a unified system accessible through front-end and back-end APIs, enabling users to 1) generate secure code, which is free of vulnerabilities, 2) review and share security analysis, and 3) easily switch from one prompt optimization approach to another, while providing insights on model and system performance. We populated SGCode on an AWS server with PromSec, an approach that optimizes prompts by combining an LLM and security tools with a lightweight generative adversarial graph neural network to detect and fix security vulnerabilities in the generated code. Extensive experiments show that SGCode is practical as a public tool to gain insights into the trade-offs between model utility, secure code generation, and system cost. SGCode has only a marginal cost compared with prompting LLMs. SGCode is available at: http://3.131.141.63:8501/.

Updated: 2024-09-16 02:28:00

标题: 演示：SGCode：一种灵活的提示优化系统，用于安全生成代码

摘要: 本文介绍了SGCode，这是一个灵活的优化系统，用于生成具有大型语言模型（LLMs）的安全代码。SGCode将最近的提示优化方法与LLMs集成到一个统一的系统中，通过前端和后端API可访问，使用户能够 1）生成无漏洞的安全代码，2）审查和分享安全分析，3）轻松切换不同的提示优化方法，同时提供有关模型和系统性能的见解。我们在AWS服务器上使用PromSec填充了SGCode，这是一种通过将LLM和安全工具与轻量级生成对抗图神经网络相结合，以检测和修复生成代码中的安全漏洞的方法。大量实验证明，SGCode作为一个公共工具来获取有关模型效用、安全代码生成和系统成本之间权衡的见解是实用的。与提示LLMs相比，SGCode的成本仅有微不足道的差异。SGCode可在以下网址找到：http://3.131.141.63:8501/。

更新时间: 2024-09-16 02:28:00

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2409.07368v2

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of ``Concept Depth'' to suggest that more complex concepts are typically acquired in deeper layers. Specifically, we categorize concepts based on their level of abstraction, defining them in the order of increasing complexity within factual, emotional, and inferential tasks. We conduct extensive probing experiments using layer-wise representations across various LLM families (Gemma, LLaMA, Qwen) on various datasets spanning the three domains of tasks. Our findings reveal that models could efficiently conduct probing for simpler tasks in shallow layers, and more complex tasks typically necessitate deeper layers for accurate understanding. Additionally, we examine how external factors, such as adding noise to the input and quantizing the model weights, might affect layer-wise representations. Our findings suggest that these factors can impede the development of a conceptual understanding of LLMs until deeper layers are explored. We hope that our proposed concept and experimental insights will enhance the understanding of the mechanisms underlying LLMs. Our codes are available at \url{https://github.com/Luckfort/CD}.

Updated: 2024-09-16 02:15:44

标题: 探索概念深度：大型语言模型如何在不同层次上获取知识？

摘要: 大型语言模型(LLMs)在各种任务中表现出色。然而，这些模型编码不同复杂性任务的机制仍然不太清楚。在本文中，我们探索了一个假设，即LLMs在不同层中处理不同复杂度的概念，引入了“概念深度”的概念，以表明更复杂的概念通常在更深的层中获得。具体来说，我们根据抽象级别对概念进行分类，在事实、情感和推理任务中按照递增复杂度的顺序定义它们。我们使用各种LLM系列(Gemma、LLaMA、Qwen)跨越三个任务领域的各种数据集进行了广泛的探测实验。我们的研究结果显示，模型可以在浅层有效地进行简单任务的探测，而更复杂的任务通常需要更深的层以获得准确理解。此外，我们还考察了外部因素，例如向输入添加噪声和量化模型权重，可能会如何影响层次表示。我们的研究结果表明，这些因素可能会妨碍对LLMs的概念理解的发展，直到探索更深的层。我们希望我们提出的概念和实验结果能够增进对LLMs基础机制的理解。我们的代码可在\url{https://github.com/Luckfort/CD}获取。

更新时间: 2024-09-16 02:15:44

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.07066v3

Generalizability of Graph Neural Network Force Fields for Predicting Solid-State Properties

Machine-learned force fields (MLFFs) promise to offer a computationally efficient alternative to ab initio simulations for complex molecular systems. However, ensuring their generalizability beyond training data is crucial for their wide application in studying solid materials. This work investigates the ability of a graph neural network (GNN)-based MLFF, trained on Lennard-Jones Argon, to describe solid-state phenomena not explicitly included during training. We assess the MLFF's performance in predicting phonon density of states (PDOS) for a perfect face-centered cubic (FCC) crystal structure at both zero and finite temperatures. Additionally, we evaluate vacancy migration rates and energy barriers in an imperfect crystal using direct molecular dynamics (MD) simulations and the string method. Notably, vacancy configurations were absent from the training data. Our results demonstrate the MLFF's capability to capture essential solid-state properties with good agreement to reference data, even for unseen configurations. We further discuss data engineering strategies to enhance the generalizability of MLFFs. The proposed set of benchmark tests and workflow for evaluating MLFF performance in describing perfect and imperfect crystals pave the way for reliable application of MLFFs in studying complex solid-state materials.

Updated: 2024-09-16 02:14:26

标题: 图神经网络力场预测固态性质的泛化能力

摘要: 机器学习力场（MLFFs）承诺为复杂分子系统提供一种计算效率高的替代方案，而不是从头开始的模拟。然而，确保它们在训练数据之外的泛化能力对于它们在研究固体材料中的广泛应用至关重要。本研究调查了基于图神经网络（GNN）的MLFF在Lennard-Jones氩上进行训练，描述了在训练期间未明确包含的固态现象的能力。我们评估了MLFF在预测完美的面心立方（FCC）晶体结构的零温和有限温度下的声子态密度（PDOS）的性能。此外，我们使用直接分子动力学（MD）模拟和弦方法评估了在不完美晶体中的空位迁移速率和能垒。值得注意的是，空位配置在训练数据中不存在。我们的结果表明MLFF能够捕捉到关键的固态特性，并且与参考数据有很好的一致性，即使对于未见过的配置。我们进一步讨论了数据工程策略，以增强MLFF的泛化能力。提出的一套评估MLFF在描述完美和不完美晶体性能中的表现的基准测试和工作流程为MLFF在研究复杂固态材料中的可靠应用铺平了道路。

更新时间: 2024-09-16 02:14:26

领域: cs.LG,cond-mat.mtrl-sci,cs.NA,math.NA

下载: http://arxiv.org/abs/2409.09931v1

Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time Series

Multivariate time series data suffer from the problem of missing values, which hinders the application of many analytical methods. To achieve the accurate imputation of these missing values, exploiting inter-correlation by employing the relationships between sequences (i.e., a network) is as important as the use of temporal dependency, since a sequence normally correlates with other sequences. Moreover, exploiting an adequate network depending on time is also necessary since the network varies over time. However, in real-world scenarios, we normally know neither the network structure nor when the network changes beforehand. Here, we propose a missing value imputation method for multivariate time series, namely MissNet, that is designed to exploit temporal dependency with a state-space model and inter-correlation by switching sparse networks. The network encodes conditional independence between features, which helps us understand the important relationships for imputation visually. Our algorithm, which scales linearly with reference to the length of the data, alternatively infers networks and fills in missing values using the networks while discovering the switching of the networks. Extensive experiments demonstrate that MissNet outperforms the state-of-the-art algorithms for multivariate time series imputation and provides interpretable results.

Updated: 2024-09-16 02:08:33

标题: 在多变量时间序列中利用切换稀疏网络进行缺失值插补

摘要: 多元时间序列数据存在缺失值的问题，这阻碍了许多分析方法的应用。为了实现准确的缺失值插补，利用序列之间的相互关联，即利用序列之间的关系（即网络）与利用时间依赖性一样重要，因为一个序列通常与其他序列相关。此外，还需要利用依赖于时间的适当网络，因为网络随时间变化。然而，在实际场景中，我们通常既不知道网络结构，也不知道网络何时会发生变化。在这里，我们提出了一种适用于多元时间序列的缺失值插补方法，即MissNet，它旨在利用状态空间模型的时间依赖性和通过切换稀疏网络来利用相互关联。网络编码了特征之间的条件独立性，这有助于我们通过视觉理解插补的重要关系。我们的算法与数据长度成线性比例，交替推断网络并使用网络填充缺失值，同时发现网络的切换。大量实验证明，MissNet优于当前的多元时间序列插补算法，并提供可解释的结果。

更新时间: 2024-09-16 02:08:33

领域: cs.LG

下载: http://arxiv.org/abs/2409.09930v1

High-Security Hardware Module with PUF and Hybrid Cryptography for Data Security

This research highlights the rapid development of technology in the industry, particularly Industry 4.0, supported by fundamental technologies such as the Internet of Things (IoT), cloud computing, big data, and data analysis. Despite providing efficiency, these developments also bring negative impacts, such as increased cyber-attacks, especially in manufacturing. One standard attack in the industry is the man-in-the-middle (MITM) attack, which can have severe consequences for the physical data transfer, particularly on the integrity of sensor and actuator data in industrial machines. This research proposes a solution by developing a hardware security module (HSM) using a field-programmable gate array (FPGA) with physical unclonable function (PUF) authentication and a hybrid encryption data security system. Experimental results show that this research improves some criteria in industrial cybersecurity, ensuring critical data security from cyber-attacks in industrial machines.

Updated: 2024-09-16 02:06:49

标题: 数据安全的PUF和混合密码学高安全硬件模块

摘要: 这项研究突出了工业领域技术的快速发展，特别是工业4.0，其基本技术包括物联网（IoT）、云计算、大数据和数据分析。尽管这些发展提高了效率，但也带来了负面影响，如增加的网络攻击，尤其是在制造业中。工业中的一个标准攻击是中间人攻击（MITM攻击），可以对工业机器中传感器和执行器数据的完整性产生严重后果。这项研究提出了一种解决方案，通过开发使用物理不可克隆功能（PUF）认证和混合加密数据安全系统的可编程门阵列（FPGA）的硬件安全模块（HSM）。实验结果表明，这项研究改善了工业网络安全中的一些标准，确保了工业机器中关键数据免受网络攻击的影响。

更新时间: 2024-09-16 02:06:49

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2409.09928v1

Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges

As large language models achieve increasingly impressive results, questions arise about whether such performance is from generalizability or mere data memorization. Thus, numerous data contamination detection methods have been proposed. However, these approaches are often validated with traditional benchmarks and early-stage LLMs, leaving uncertainty about their effectiveness when evaluating state-of-the-art LLMs on the contamination of more challenging benchmarks. To address this gap and provide a dual investigation of SOTA LLM contamination status and detection method robustness, we evaluate five contamination detection approaches with four state-of-the-art LLMs across eight challenging datasets often used in modern LLM evaluation. Our analysis reveals that (1) Current methods have non-trivial limitations in their assumptions and practical applications; (2) Notable difficulties exist in detecting contamination introduced during instruction fine-tuning with answer augmentation; and (3) Limited consistencies between SOTA contamination detection techniques. These findings highlight the complexity of contamination detection in advanced LLMs and the urgent need for further research on robust and generalizable contamination evaluation. Our code is available at https://github.com/vsamuel2003/data-contamination.

Updated: 2024-09-16 02:04:33

标题: 现代大型语言模型数据污染检测：局限性、不一致性和Oracle挑战

摘要: 随着大型语言模型取得越来越令人印象深刻的结果，人们开始质疑这种表现是来自于泛化能力还是仅仅是数据记忆。因此，许多数据污染检测方法被提出。然而，这些方法通常是通过传统基准和早期阶段的LLM进行验证的，这就使得在评估更具挑战性基准上最新的LLM时，它们的有效性存在不确定性。为了填补这一空白并对SOTA LLM的污染状况和检测方法的鲁棒性进行双重调查，我们评估了五种污染检测方法在四种最新的LLM上，涵盖了现代LLM评估中经常使用的八个具有挑战性的数据集。我们的分析表明：（1）当前方法在其假设和实际应用中存在非微不足道的局限性；（2）在指令微调和答案增强过程中引入的污染检测存在显著困难；（3）SOTA污染检测技术之间存在有限的一致性。这些发现突显了在高级LLM中污染检测的复杂性，迫切需要进一步研究鲁棒和泛化的污染评估。我们的代码可以在https://github.com/vsamuel2003/data-contamination 上找到。

更新时间: 2024-09-16 02:04:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.09927v1

Exact Recovery Guarantees for Parameterized Non-linear System Identification Problem under Adversarial Attacks

In this work, we study the system identification problem for parameterized non-linear systems using basis functions under adversarial attacks. Motivated by the LASSO-type estimators, we analyze the exact recovery property of a non-smooth estimator, which is generated by solving an embedded $\ell_1$-loss minimization problem. First, we derive necessary and sufficient conditions for the well-specifiedness of the estimator and the uniqueness of global solutions to the underlying optimization problem. Next, we provide exact recovery guarantees for the estimator under two different scenarios of boundedness and Lipschitz continuity of the basis functions. The non-asymptotic exact recovery is guaranteed with high probability, even when there are more severely corrupted data than clean data. Finally, we numerically illustrate the validity of our theory. This is the first study on the sample complexity analysis of a non-smooth estimator for the non-linear system identification problem.

Updated: 2024-09-16 01:41:17

标题: 对抗攻击下参数化非线性系统识别问题的确切恢复保证

摘要: 在这项工作中，我们研究了在对抗攻击下使用基函数解决参数化非线性系统的系统识别问题。受LASSO型估计器的启发，我们分析了通过解决嵌入的$\ell_1$损失最小化问题生成的非光滑估计器的精确恢复特性。首先，我们推导了估计器的良好指定性和基础优化问题全局解唯一性的必要和充分条件。接下来，我们针对基函数的有界性和Lipschitz连续性的两种不同情景，提供了估计器的精确恢复保证。即使受到比干净数据更严重的数据污染，非渐近精确恢复也能高概率地得到保证。最后，我们通过数值实例证明了我们理论的有效性。这是对非线性系统识别问题中非光滑估计器的样本复杂度分析的首次研究。

更新时间: 2024-09-16 01:41:17

领域: math.OC,cs.CR,cs.LG,cs.SY,eess.SY,62, 90, 93

下载: http://arxiv.org/abs/2409.00276v2

Multi-Step Embed to Control: A Novel Deep Learning-based Approach for Surrogate Modelling in Reservoir Simulation

Reduced-order models, also known as proxy model or surrogate model, are approximate models that are less computational expensive as opposed to fully descriptive models. With the integration of machine learning, these models have garnered increasing research interests recently. However, many existing reduced-order modeling methods, such as embed to control (E2C) and embed to control and observe (E2CO), fall short in long-term predictions due to the accumulation of prediction errors over time. This issue arises partly from the one-step prediction framework inherent in E2C and E2CO architectures. This paper introduces a deep learning-based surrogate model, referred as multi-step embed-to-control model, for the construction of proxy models with improved long-term prediction performance. Unlike E2C and E2CO, the proposed network considers multiple forward transitions in the latent space at a time using Koopman operator, allowing the model to incorporate a sequence of state snapshots during training phrases. Additionally, the loss function of this novel approach has been redesigned to accommodate these multiple transitions and to respect the underlying physical principles. To validate the efficacy of the proposed method, the developed framework was implemented within two-phase (oil and water) reservoir model under a waterflooding scheme. Comparative analysis demonstrate that the proposed model significantly outperforms the conventional E2C model in long-term simulation scenarios. Notably, there was a substantial reduction in temporal errors in the prediction of saturation profiles and a decent improvement in pressure forecasting accuracy.

Updated: 2024-09-16 01:35:34

标题: 多步嵌入控制：一种新颖的基于深度学习的方法用于油藏模拟中的代理建模

摘要: 减少阶模型，也称为代理模型或替代模型，是相对于完全描述性模型而言计算成本更低的近似模型。随着机器学习的整合，这些模型最近引起了越来越多的研究兴趣。然而，许多现有的减少阶建模方法，如嵌入控制（E2C）和嵌入控制和观察（E2CO），由于随时间累积的预测错误而在长期预测方面存在不足。这个问题部分源于E2C和E2CO架构中固有的一步预测框架。本文介绍了一种基于深度学习的替代模型，称为多步嵌入控制模型，用于构建具有改进长期预测性能的代理模型。与E2C和E2CO不同，所提出的网络一次考虑潜在空间中的多个前向转换，使用库普曼算子，允许模型在训练阶段中包含一系列状态快照。此外，这种新方法的损失函数已经重新设计，以适应这些多个转换，并尊重潜在的物理原则。为了验证所提出方法的有效性，开发的框架被实施在水驱油藏模型下的两相（油和水）水驱方案中。比较分析表明，所提出的模型在长期模拟场景中明显优于传统的E2C模型。值得注意的是，在饱和度轮廓的预测中，时间误差显著减少，压力预测的准确性也有相当提高。

更新时间: 2024-09-16 01:35:34

领域: cs.LG

下载: http://arxiv.org/abs/2409.09920v1

Harnessing the Power of Federated Learning in Federated Contextual Bandits

Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning. Among many directions, federated contextual bandits (FCB), a pivotal integration of FL and sequential decision-making, has garnered significant attention in recent years. Despite substantial progress, existing FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework. Consequently, even renowned algorithms like FedAvg remain under-utilized in FCB, let alone other FL advancements. Motivated by this disconnection, this work takes one step towards building a tighter relationship between the canonical FL study and the investigations on FCB. In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm, i.e., inverse gap weighting. Compared with existing FCB approaches, the proposed FedIGW design can better harness the entire spectrum of FL innovations, which is concretely reflected as (1) flexible incorporation of (both existing and forthcoming) FL protocols; (2) modularized plug-in of FL analyses in performance guarantees; (3) seamless integration of FL appendages (such as personalization, robustness, and privacy). We substantiate these claims through rigorous theoretical analyses and empirical evaluations.

Updated: 2024-09-16 01:33:08

标题: 利用联邦上下文臂带中的联邦学习能力

摘要: 联邦学习（FL）已经展示出在颠覆分布式机器学习方面的巨大潜力，并且已经做出了巨大努力，将其扩展到超越最初关注的监督学习。在许多方向中，联邦情境赌博（FCB）作为FL和顺序决策制定的关键集成，在近年来引起了重大关注。尽管取得了实质性进展，现有的FCB方法很大程度上采用了它们量身定制的FL组件，往往偏离了经典的FL框架。因此，即使像FedAvg这样著名的算法在FCB中仍然被低估，更不用说其他FL的进步了。受到这种脱节的激励，本文朝着建立经典FL研究与FCB研究之间更紧密关系的方向迈出了一步。具体而言，提出了一种新颖的FCB设计，称为FedIGW，旨在利用基于回归的CB算法，即逆差值加权。与现有的FCB方法相比，所提出的FedIGW设计可以更好地利用整个FL创新的全谱，具体体现为：（1）灵活地整合（现有和即将到来的）FL协议；（2）在性能保证中模块化地插入FL分析；（3）无缝地整合FL附加功能（如个性化、鲁棒性和隐私）。我们通过严格的理论分析和实证评估来证实这些声明。

更新时间: 2024-09-16 01:33:08

领域: stat.ML,cs.IT,cs.LG,cs.MA,math.IT

下载: http://arxiv.org/abs/2312.16341v2

LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses

Writing effective prompts for large language models (LLM) can be unintuitive and burdensome. In response, services that optimize or suggest prompts have emerged. While such services can reduce user effort, they also introduce a risk: the prompt provider can subtly manipulate prompts to produce heavily biased LLM responses. In this work, we show that subtle synonym replacements in prompts can increase the likelihood (by a difference up to 78%) that LLMs mention a target concept (e.g., a brand, political party, nation). We substantiate our observations through a user study, showing our adversarially perturbed prompts 1) are indistinguishable from unaltered prompts by humans, 2) push LLMs to recommend target concepts more often, and 3) make users more likely to notice target concepts, all without arousing suspicion. The practicality of this attack has the potential to undermine user autonomy. Among other measures, we recommend implementing warnings against using prompts from untrusted parties.

Updated: 2024-09-16 01:23:27

标题: LLM Whisperer：对LLM响应进行隐蔽攻击

摘要: 为大型语言模型（LLM）编写有效提示可能会不直观且繁重。为此，出现了优化或建议提示的服务。虽然这类服务可以减少用户的努力，但也引入了风险：提示提供者可以微妙地操纵提示，以产生严重偏见的LLM响应。在这项工作中，我们展示了提示中微妙的同义词替换可以增加LLM提及目标概念（例如品牌、政党、国家）的可能性（差异高达78%）。我们通过用户研究证实了我们的观察，表明我们的对抗性扰动提示1）在人类看来与未经改变的提示无法区分，2）推动LLM更频繁地推荐目标概念，3）使用户更容易注意到目标概念，而又不引起怀疑。这种攻击的实用性有可能损害用户的自主权。在其他措施中，我们建议实施警告，防止使用来自不受信任方的提示。

更新时间: 2024-09-16 01:23:27

领域: cs.CR,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2406.04755v2

SFR-RAG: Towards Contextually Faithful LLMs

Retrieval Augmented Generation (RAG), a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance, has emerged as a pivotal area in generative AI. The LLMs used in RAG applications are required to faithfully and completely comprehend the provided context and users' questions, avoid hallucination, handle unanswerable, counterfactual or otherwise low-quality and irrelevant contexts, perform complex multi-hop reasoning and produce reliable citations. In this paper, we introduce SFR-RAG, a small LLM that is instruction-tuned with an emphasis on context-grounded generation and hallucination minimization. We also present ContextualBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks, such as HotpotQA and TriviaQA, with consistent RAG settings to ensure reproducibility and consistency in model assessments. Experimental results demonstrate that our SFR-RAG-9B model outperforms leading baselines such as Command-R+ (104B) and GPT-4o, achieving state-of-the-art results in 3 out of 7 benchmarks in ContextualBench with significantly fewer parameters. The model is also shown to be resilient to alteration in the contextual information and behave appropriately when relevant context is removed. Additionally, the SFR-RAG model maintains competitive performance in general instruction-following tasks and function-calling capabilities.

Updated: 2024-09-16 01:08:18

标题: SFR-RAG：朝向具有上下文忠实性的LLMs

摘要: 检索增强生成（RAG）是一种将外部上下文信息与大型语言模型（LLMs）相结合，以增强事实准确性和相关性的范例，在生成式人工智能中已经成为一个关键领域。在RAG应用中使用的LLMs需要忠实完全理解所提供的上下文和用户的问题，避免产生幻觉，处理无法回答、反事实或其他低质量和不相关的上下文，执行复杂的多跳推理并生成可靠的引用。在本文中，我们介绍SFR-RAG，这是一个小型LLM，通过强调基于上下文的生成和最小化幻觉来进行指令调整。我们还提出了ContextualBench，这是一个新的评估框架，汇编了多个流行和多样化的RAG基准，如HotpotQA和TriviaQA，采用一致的RAG设置，以确保在模型评估中的可重现性和一致性。实验结果表明，我们的SFR-RAG-9B模型在ContextualBench的7个基准中的3个中表现优于领先的基线，如Command-R+（104B）和GPT-4o，并且参数数量明显较少。该模型还表现出对上下文信息的改变具有韧性，并在相关上下文被移除时表现得适当。此外，SFR-RAG模型在一般的指令遵循任务和函数调用能力方面保持着竞争性的性能。

更新时间: 2024-09-16 01:08:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.09916v1

Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs

For modern artificial intelligence (AI) applications such as large language models (LLMs), the training paradigm has recently shifted to pre-training followed by fine-tuning. Furthermore, owing to dwindling open repositories of data and thanks to efforts to democratize access to AI models, pre-training is expected to increasingly migrate from the current centralized deployments to federated learning (FL) implementations. Meta-learning provides a general framework in which pre-training and fine-tuning can be formalized. Meta-learning-based personalized FL (meta-pFL) moves beyond basic personalization by targeting generalization to new agents and tasks. This paper studies the generalization performance of meta-pFL for a wireless setting in which the agents participating in the pre-training phase, i.e., meta-learning, are connected via a shared wireless channel to the server. Adopting over-the-air computing, we study the trade-off between generalization to new agents and tasks, on the one hand, and convergence, on the other hand. The trade-off arises from the fact that channel impairments may enhance generalization, while degrading convergence. Extensive numerical results validate the theory.

Updated: 2024-09-16 00:35:15

标题: 通过空中联邦元学习进行预训练和个性化微调：收敛 - 泛化权衡

摘要: 对于现代人工智能（AI）应用，如大型语言模型（LLMs），训练范式最近已经转变为先进行预训练，然后再进行微调。此外，由于数据开放存储库日益减少，以及努力使AI模型的访问民主化，预训练预计将从当前的集中部署逐渐迁移到联邦学习（FL）实施。元学习提供了一个通用框架，可以在其中形式化预训练和微调。基于元学习的个性化FL（meta-pFL）通过针对新代理和任务的泛化，超越了基本个性化。本文研究了在一种无线环境中，即参与预训练阶段（即元学习）的代理通过共享的无线信道连接到服务器的meta-pFL的泛化性能。采用空中计算，我们研究了泛化到新代理和任务之间的权衡，一方面，收敛，另一方面。这种权衡是由于信道受损可能增强泛化，同时降低收敛。广泛的数值结果验证了理论。

更新时间: 2024-09-16 00:35:15

领域: cs.LG,cs.IT,eess.SP,math.IT

下载: http://arxiv.org/abs/2406.11569v3

Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees

In this paper, we study a class of deterministically constrained stochastic optimization problems. Existing methods typically aim to find an $\epsilon$-stochastic stationary point, where the expected violations of both the constraints and first-order stationarity are within a prescribed accuracy of $\epsilon$. However, in many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an $\epsilon$-stochastic stationary point potentially undesirable due to the risk of significant constraint violations. To address this issue, we propose single-loop variance-reduced stochastic first-order methods, where the stochastic gradient of the stochastic component is computed using either a truncated recursive momentum scheme or a truncated Polyak momentum scheme for variance reduction, while the gradient of the deterministic component is computed exactly. Under the error bound condition with a parameter $\theta \geq 1$ and other suitable assumptions, we establish that the proposed methods achieve a sample complexity and first-order operation complexity of $\widetilde O(\epsilon^{-\max\{4, 2\theta\}})$ for finding a stronger $\epsilon$-stochastic stationary point, where the constraint violation is within $\epsilon$ with certainty, and the expected violation of first-order stationarity is within $\epsilon$. To the best of our knowledge, this is the first work to develop methods with provable complexity guarantees for finding an approximate stochastic stationary point of such problems that nearly satisfies all constraints with certainty.

Updated: 2024-09-16 00:26:42

标题: 减小方差的确定性约束随机非凸优化的一阶方法，并具有强收敛保证

摘要: 在这篇论文中，我们研究了一类确定性约束随机优化问题。现有的方法通常旨在寻找一个$\epsilon$-随机稳定点，其中约束和一阶稳定性的期望违反都在预定的精度$\epsilon$内。然而，在许多实际应用中，约束几乎被确定地满足是至关重要的，这使得这样一个$\epsilon$-随机稳定点可能是不理想的，因为存在重大约束违反的风险。为了解决这个问题，我们提出了单循环方差减少的随机一阶方法，其中随机分量的随机梯度是使用截断的递归动量方案或截断的Polyak动量方案进行方差减少计算的，而确定性分量的梯度则是精确计算的。在误差界条件下，参数$\theta \geq 1$和其他合适的假设下，我们证明了所提出的方法实现了一个样本复杂度和一阶操作复杂度为$\widetilde O(\epsilon^{-\max\{4, 2\theta\}})$，以找到一个更强的$\epsilon$-随机稳定点，其中约束违反在$\epsilon$内是确定的，而一阶稳定性的期望违反在$\epsilon$内。据我们所知，这是第一个为找到几乎确定地满足所有约束的问题的近似随机稳定点提供具有可证明复杂性保证的方法的工作。

更新时间: 2024-09-16 00:26:42

领域: math.OC,cs.LG,cs.NA,math.NA,stat.ML,90C15, 90C26, 90C30, 65K05

下载: http://arxiv.org/abs/2409.09906v1

Policy Learning for Balancing Short-Term and Long-Term Rewards

Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may inadvertently overshadow short-term gains. Motivated by this, this paper formalizes a new framework for learning the optimal policy that effectively balances both long-term and short-term rewards, where some long-term outcomes are allowed to be missing. In particular, we first present the identifiability of both rewards under mild assumptions. Next, we deduce the semiparametric efficiency bounds, along with the consistency and asymptotic normality of their estimators. We also reveal that short-term outcomes, if associated, contribute to improving the estimator of the long-term reward. Based on the proposed estimators, we develop a principled policy learning approach and further derive the convergence rates of regret and estimation errors associated with the learned policy. Extensive experiments are conducted to validate the effectiveness of the proposed method, demonstrating its practical applicability.

Updated: 2024-09-16 00:19:16

标题: 政策学习：平衡短期和长期回报

摘要: 实证研究人员和决策者在各个领域经常寻求对干预措施的长期影响有深刻的见解。虽然长期结果的重要性不可否认，但过分强调它们可能无意中掩盖了短期收益。基于这一动机，本文形式化了一个新的框架，用于学习有效地平衡长期和短期回报的最佳策略，其中允许一些长期结果缺失。具体而言，我们首先在温和的假设下提出了两种回报的可辨识性。接下来，我们推导了半参数效率界限，以及其估计量的一致性和渐近正态性。我们还揭示了短期结果，如果相关的话，有助于改善长期回报的估计量。基于所提出的估计量，我们开发了一个有原则的政策学习方法，并进一步推导了与学习策略相关的遗憾和估计误差的收敛速度。进行了大量实验来验证所提出方法的有效性，展示了其实际适用性。

更新时间: 2024-09-16 00:19:16

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.03329v2

Learning large softmax mixtures with warm start EM

Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute from $p$ possible candidates, in heterogeneous populations. The model has recently attracted attention in the AI literature, under the name softmax mixtures, where it is routinely used in the final layer of a neural network to map a large number $p$ of vectors in $\mathbb{R}^L$ to a probability vector. Despite its wide applicability and empirical success, statistically optimal estimators of the mixture parameters, obtained via algorithms whose running time scales polynomially in $L$, are not known. This paper provides a solution to this problem for contemporary applications, such as large language models, in which the mixture has a large number $p$ of support points, and the size $N$ of the sample observed from the mixture is also large. Our proposed estimator combines two classical estimators, obtained respectively via a method of moments (MoM) and the expectation-minimization (EM) algorithm. Although both estimator types have been studied, from a theoretical perspective, for Gaussian mixtures, no similar results exist for softmax mixtures for either procedure. We develop a new MoM parameter estimator based on latent moment estimation that is tailored to our model, and provide the first theoretical analysis for a MoM-based procedure in softmax mixtures. Although consistent, MoM for softmax mixtures can exhibit poor numerical performance, as observed other mixture models. Nevertheless, as MoM is provably in a neighborhood of the target, it can be used as warm start for any iterative algorithm. We study in detail the EM algorithm, and provide its first theoretical analysis for softmax mixtures. Our final proposal for parameter estimation is the EM algorithm with a MoM warm start.

Updated: 2024-09-16 00:14:48

标题: 学习具有温暖启动EM的大softmax混合

摘要: 混合多项logits是几十年前引入的离散混合模型，用于在异质人群中建模从$p$个可能候选项中选择属性的概率。最近，该模型在人工智能文献中引起了关注，被称为softmax混合，在神经网络的最后一层中常用于将$\mathbb{R}^L$中大量的$p$个向量映射到一个概率向量。尽管该模型具有广泛的适用性和经验成功，但通过在运行时间按$L$多项式扩展的算法获得混合参数的统计最优估计尚未知晓。本文为当代应用提供了这个问题的解决方案，例如大型语言模型，在这些模型中混合有大量的$p$个支持点，并且从混合中观察到的样本的大小$N$也很大。我们提出的估计器结合了两种经典的估计器，分别通过矩估计法（MoM）和期望最大化（EM）算法得到。虽然两种估计器类型在高斯混合模型中已经被研究过，但对于softmax混合模型，无论是哪种程序，都没有类似的结果。我们基于潜在矩估计开发了一种新的MoM参数估计器，以适应我们的模型，并为softmax混合模型中基于MoM的程序提供了第一次理论分析。尽管MoM对softmax混合可能具有一致性，但在数值性能方面可能表现不佳，就像观察到的其他混合模型一样。然而，由于MoM在目标附近是可证的，因此可以用作任何迭代算法的热启动。我们详细研究了EM算法，并为softmax混合模型提供了第一次理论分析。我们最终的参数估计提议是使用MoM热启动的EM算法。

更新时间: 2024-09-16 00:14:48

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2409.09903v1