Arxiv Day: Article

Comprehensive Monitoring of Air Pollution Hotspots Using Sparse Sensor Networks

Urban air pollution hotspots pose significant health risks, yet their detection and analysis remain limited by the sparsity of public sensor networks. This paper addresses this challenge by combining predictive modeling and mechanistic approaches to comprehensively monitor pollution hotspots. We enhanced New Delhi's existing sensor network with 28 low-cost sensors, collecting PM2.5 data over 30 months from May 1, 2018, to Nov 1, 2020. Applying established definitions of hotspots to this data, we found the existence of additional 189 hidden hotspots apart from confirming 660 hotspots detected by the public network. Using predictive techniques like Space-Time Kriging, we identified hidden hotspots with 95% precision and 88% recall with 50% sensor failure rate, and with 98% precision and 95% recall with 50% missing sensors. The projected results of our predictive models were further compiled into policy recommendations for public authorities. Additionally, we developed a Gaussian Plume Dispersion Model to understand the mechanistic underpinnings of hotspot formation, incorporating an emissions inventory derived from local sources. Our mechanistic model is able to explain 65% of observed transient hotspots. Our findings underscore the importance of integrating data-driven predictive models with physics-based mechanistic models for scalable and robust air pollution management in resource-constrained settings.

Updated: 2024-11-20 23:55:02

标题: 使用稀疏传感器网络全面监测空气污染热点

摘要: 城市空气污染热点对健康构成重大风险，但由于公共传感器网络的稀疏性，它们的检测和分析仍然受限。本文通过结合预测建模和机械方法来全面监测污染热点，解决了这一挑战。我们利用28个低成本传感器增强了新德里现有的传感器网络，从2018年5月1日至2020年11月1日收集了30个月的PM2.5数据。将已建立的热点定义应用于这些数据，我们发现除了确认公共网络检测到的660个热点外，还存在额外的189个隐藏热点。利用空间-时间克里金等预测技术，我们在50%传感器故障率下，以95%的精度和88%的召回率识别了隐藏热点，并在50%缺失传感器的情况下以98%的精度和95%的召回率识别了隐藏热点。我们的预测模型的预测结果进一步整理成政策建议，供公共当局参考。此外，我们还开发了一个高斯烟羽扩散模型，以了解热点形成的机理基础，包括源自当地的排放清单。我们的机械模型能够解释65%的观察到的临时热点。我们的研究结果强调了在资源受限的环境中，将基于数据驱动的预测模型与基于物理学的机械模型相结合，对于可扩展和强大的空气污染管理的重要性。

更新时间: 2024-11-20 23:55:02

领域: cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.04309v2

Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios

Complex visual reasoning and question answering (VQA) is a challenging task that requires compositional multi-step processing and higher-level reasoning capabilities beyond the immediate recognition and localization of objects and events. Here, we introduce a fully neural Iterative and Parallel Reasoning Mechanism (IPRM) that combines two distinct forms of computation -- iterative and parallel -- to better address complex VQA scenarios. Specifically, IPRM's "iterative" computation facilitates compositional step-by-step reasoning for scenarios wherein individual operations need to be computed, stored, and recalled dynamically (e.g. when computing the query "determine the color of pen to the left of the child in red t-shirt sitting at the white table"). Meanwhile, its "parallel" computation allows for the simultaneous exploration of different reasoning paths and benefits more robust and efficient execution of operations that are mutually independent (e.g. when counting individual colors for the query: "determine the maximum occurring color amongst all t-shirts"). We design IPRM as a lightweight and fully-differentiable neural module that can be conveniently applied to both transformer and non-transformer vision-language backbones. It notably outperforms prior task-specific methods and transformer-based attention modules across various image and video VQA benchmarks testing distinct complex reasoning capabilities such as compositional spatiotemporal reasoning (AGQA), situational reasoning (STAR), multi-hop reasoning generalization (CLEVR-Humans) and causal event linking (CLEVRER-Humans). Further, IPRM's internal computations can be visualized across reasoning steps, aiding interpretability and diagnosis of its errors.

Updated: 2024-11-20 23:39:54

标题: 学习在复杂的视觉推理场景中迭代和并行推理

摘要: 复杂的视觉推理和问题回答（VQA）是一个具有挑战性的任务，需要组合多步处理和高级推理能力，超越了对对象和事件的即时识别和定位。在这里，我们介绍了一个完全神经的迭代和并行推理机制（IPRM），它结合了两种不同形式的计算——迭代和并行——以更好地应对复杂的VQA场景。具体而言，IPRM的“迭代”计算促进了按步骤组合推理，用于需要动态计算、存储和调用个体操作的场景（例如，在计算查询“确定红色T恤穿戴的孩子左侧的笔的颜色时”）。同时，它的“并行”计算允许同时探索不同的推理路径，并且对相互独立的操作进行更强大和高效的执行（例如，在查询“确定所有T恤中最多发生的颜色”时进行单独颜色计数）。我们设计IPRM作为一个轻量级和完全可微的神经模块，可以方便地应用于变压器和非变压器视觉语言骨干。它显著优于先前的特定任务方法和基于变压器的注意模块，在各种图像和视频VQA基准测试中表现出不同的复杂推理能力，例如组合时空推理（AGQA），情境推理（STAR），多跳推理概括（CLEVR-Humans）和因果事件链接（CLEVRER-Humans）。此外，IPRM的内部计算可以在推理步骤中可视化，有助于解释和诊断其错误。

更新时间: 2024-11-20 23:39:54

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2411.13754v1

Deep Learning Innovations for Underwater Waste Detection: An In-Depth Analysis

Addressing the issue of submerged underwater trash is crucial for safeguarding aquatic ecosystems and preserving marine life. While identifying debris present on the surface of water bodies is straightforward, assessing the underwater submerged waste is a challenge due to the image distortions caused by factors such as light refraction, absorption, suspended particles, color shifts, and occlusion. This paper conducts a comprehensive review of state-of-the-art architectures and on the existing datasets to establish a baseline for submerged waste and trash detection. The primary goal remains to establish the benchmark of the object localization techniques to be leveraged by advanced underwater sensors and autonomous underwater vehicles. The ultimate objective is to explore the underwater environment, to identify, and remove underwater debris. The absence of benchmarks (dataset or algorithm) in many researches emphasizes the need for a more robust algorithmic solution. Through this research, we aim to give performance comparative analysis of various underwater trash detection algorithms.

Updated: 2024-11-20 23:23:40

标题: 深度学习技术在水下垃圾检测中的创新：深入分析

摘要: 解决水下垃圾潜伏问题对于保护水生态系统和保护海洋生物至关重要。虽然识别水体表面上的垃圾是直接的，但评估水下潜伏垃圾却是一个挑战，因为光折射、吸收、悬浮颗粒、颜色变化和遮挡等因素导致图像失真。本文对现有的数据集进行了全面审查，以建立水下废物和垃圾检测的基准线。主要目标仍然是建立目标定位技术的基准，以便先进的水下传感器和自主水下车辆利用。最终目标是探索水下环境，识别和清除水下垃圾。许多研究中缺乏基准（数据集或算法）强调了需要更稳健的算法解决方案。通过这项研究，我们旨在对各种水下垃圾检测算法进行性能比较分析。

更新时间: 2024-11-20 23:23:40

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.18299v4

AI-Driven Agents with Prompts Designed for High Agreeableness Increase the Likelihood of Being Mistaken for a Human in the Turing Test

Large Language Models based on transformer algorithms have revolutionized Artificial Intelligence by enabling verbal interaction with machines akin to human conversation. These AI agents have surpassed the Turing Test, achieving confusion rates up to 50%. However, challenges persist, especially with the advent of robots and the need to humanize machines for improved Human-AI collaboration. In this experiment, three GPT agents with varying levels of agreeableness (disagreeable, neutral, agreeable) based on the Big Five Inventory were tested in a Turing Test. All exceeded a 50% confusion rate, with the highly agreeable AI agent surpassing 60%. This agent was also recognized as exhibiting the most human-like traits. Various explanations in the literature address why these GPT agents were perceived as human, including psychological frameworks for understanding anthropomorphism. These findings highlight the importance of personality engineering as an emerging discipline in artificial intelligence, calling for collaboration with psychology to develop ergonomic psychological models that enhance system adaptability in collaborative activities.

Updated: 2024-11-20 23:12:49

标题: 使用为高宜人性设计的提示的AI驱动代理增加了在图灵测试中被误认为是人类的可能性

摘要: 基于变压器算法的大型语言模型已经通过使机器实现类似人类对话的语言交互，彻底改变了人工智能。这些人工智能代理已经超越了图灵测试，实现了高达50%的混淆率。然而，挑战仍然存在，尤其是随着机器人的出现以及需要使机器人类化以改善人机协作。在这个实验中，基于五大人格特质清单的三个具有不同亲和性水平（不友好、中性、友好）的GPT代理在图灵测试中进行了测试。所有代理都超过了50%的混淆率，其中高度友好的人工智能代理超过了60%。这个代理还被认为表现出最接近人类的特质。文献中提出了各种解释，解释为什么这些GPT代理被认为是人类，包括用于理解拟人化的心理框架。这些发现突显了人格工程作为人工智能新兴学科的重要性，呼吁与心理学合作，开发增强系统在协作活动中适应性的人体工程心理模型。

更新时间: 2024-11-20 23:12:49

领域: cs.AI

下载: http://arxiv.org/abs/2411.13749v1

Word Alignment as Preference for Machine Translation

The problem of hallucination and omission, a long-standing problem in machine translation (MT), is more pronounced when a large language model (LLM) is used in MT because an LLM itself is susceptible to these phenomena. In this work, we mitigate the problem in an LLM-based MT model by guiding it to better word alignment. We first study the correlation between word alignment and the phenomena of hallucination and omission in MT. Then we propose to utilize word alignment as preference to optimize the LLM-based MT model. The preference data are constructed by selecting chosen and rejected translations from multiple MT tools. Subsequently, direct preference optimization is used to optimize the LLM-based model towards the preference signal. Given the absence of evaluators specifically designed for hallucination and omission in MT, we further propose selecting hard instances and utilizing GPT-4 to directly evaluate the performance of the models in mitigating these issues. We verify the rationality of these designed evaluation methods by experiments, followed by extensive results demonstrating the effectiveness of word alignment-based preference optimization to mitigate hallucination and omission. On the other hand, although it shows promise in mitigating hallucination and omission, the overall performance of MT in different language directions remains mixed, with slight increases in BLEU and decreases in COMET.

Updated: 2024-11-20 23:06:56

标题: 单词对齐作为机器翻译的偏好

摘要: 机器翻译（MT）中长期存在的幻觉和省略问题在使用大型语言模型（LLM）时更加突出，因为LLM本身容易受到这些现象的影响。在这项工作中，我们通过引导LLM基础的MT模型更好地进行词语对齐来缓解这一问题。我们首先研究了词语对齐与MT中幻觉和省略现象之间的相关性。然后，我们提出利用词语对齐作为偏好来优化LLM基础的MT模型。偏好数据是通过从多个MT工具中选择被选择和被拒绝的翻译来构建的。随后，我们使用直接偏好优化来优化LLM基础的模型以使其朝向偏好信号。鉴于目前没有专门设计用于评估MT中幻觉和省略的评估器，我们进一步提出选择困难实例并利用GPT-4直接评估模型在减轻这些问题方面的性能。我们通过实验证实了这些设计的评估方法的合理性，随后展示了基于词语对齐的偏好优化对减轻幻觉和省略的有效性。另一方面，尽管在减轻幻觉和省略方面显示出潜力，但不同语言方向中MT的整体性能仍然表现出不稳定，其中BLEU略有增加，COMET略有下降。

更新时间: 2024-11-20 23:06:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.09223v2

Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver

We numerically benchmark 30 optimisers on 372 instances of the variational quantum eigensolver for solving the Fermi-Hubbard system with the Hamiltonian variational ansatz. We rank the optimisers with respect to metrics such as final energy achieved and function calls needed to get within a certain tolerance level, and find that the best performing optimisers are variants of gradient descent such as Momentum and ADAM (using finite difference), SPSA, CMAES, and BayesMGD. We also perform gradient analysis and observe that the step size for finite difference has a very significant impact. We also consider using simultaneous perturbation (inspired by SPSA) as a gradient subroutine: here finite difference can lead to a more precise estimate of the ground state but uses more calls, whereas simultaneous perturbation can converge quicker but may be less precise in the later stages. Finally, we also study the quantum natural gradient algorithm: we implement this method for 1-dimensional Fermi-Hubbard systems, and find that whilst it can reach a lower energy with fewer iterations, this improvement is typically lost when taking total function calls into account. Our method involves performing careful hyperparameter sweeping on 4 instances. We present a variety of analysis and figures, detailed optimiser notes, and discuss future directions.

Updated: 2024-11-20 22:54:23

标题: 使用变分量子特征求解器对费米-哈伯德模型进行广泛优化器的基准测试

摘要: 我们在372个费米-哈伯德系统的变分量子本征求解器实例上对30个优化器进行了数值基准测试，使用哈密顿变分猜想。我们根据最终达到的能量和达到一定容差水平所需的函数调用等指标对优化器进行排名，发现表现最佳的优化器是动量和ADAM（使用有限差分）、SPSA、CMAES和BayesMGD等梯度下降的变体。我们还进行了梯度分析，观察到有限差分的步长具有非常显著的影响。我们还考虑使用同时扰动（受SPSA启发）作为梯度子程序：这里有限差分可以更准确地估计基态，但需要更多调用，而同时扰动可以更快地收敛，但在后期阶段可能不够精确。最后，我们还研究了量子自然梯度算法：我们将该方法应用于一维费米-哈伯德系统，并发现虽然它可以在较少迭代次数内达到更低的能量，但考虑总函数调用时通常会失去这种改进。我们的方法涉及在4个实例上进行仔细的超参数扫描。我们提供了各种分析和图表，详细的优化器注释，并讨论未来的方向。

更新时间: 2024-11-20 22:54:23

领域: quant-ph,cs.LG,cs.NE

下载: http://arxiv.org/abs/2411.13742v1

Federated Continual Learning for Edge-AI: A Comprehensive Survey

Edge-AI, the convergence of edge computing and artificial intelligence (AI), has become a promising paradigm that enables the deployment of advanced AI models at the network edge, close to users. In Edge-AI, federated continual learning (FCL) has emerged as an imperative framework, which fuses knowledge from different clients while preserving data privacy and retaining knowledge from previous tasks as it learns new ones. By so doing, FCL aims to ensure stable and reliable performance of learning models in dynamic and distributed environments. In this survey, we thoroughly review the state-of-the-art research and present the first comprehensive survey of FCL for Edge-AI. We categorize FCL methods based on three task characteristics: federated class continual learning, federated domain continual learning, and federated task continual learning. For each category, an in-depth investigation and review of the representative methods are provided, covering background, challenges, problem formalisation, solutions, and limitations. Besides, existing real-world applications empowered by FCL are reviewed, indicating the current progress and potential of FCL in diverse application domains. Furthermore, we discuss and highlight several prospective research directions of FCL such as algorithm-hardware co-design for FCL and FCL with foundation models, which could provide insights into the future development and practical deployment of FCL in the era of Edge-AI.

Updated: 2024-11-20 22:49:28

标题: 边缘人工智能的联邦式持续学习：综述

摘要: Edge-AI是边缘计算和人工智能（AI）的融合，已经成为一种有前景的范式，使得先进的AI模型可以部署在网络边缘，靠近用户。在Edge-AI中，联邦持续学习（FCL）已经成为一种必不可少的框架，它在学习新任务的同时融合来自不同客户端的知识，同时保护数据隐私并保留先前任务的知识。通过这样做，FCL旨在确保学习模型在动态和分布式环境中的稳定可靠性能。在这项调查中，我们全面审查了最新的研究成果，并首次全面调查了Edge-AI的FCL。我们根据三种任务特征将FCL方法进行分类：联邦类持续学习、联邦域持续学习和联邦任务持续学习。对于每个类别，我们提供了代表性方法的深入调查和评论，涵盖了背景、挑战、问题形式化、解决方案和局限性。此外，我们还审查了受FCL赋能的现实世界应用，展示了FCL在不同应用领域的当前进展和潜力。此外，我们讨论并强调了FCL的几个前景研究方向，如FCL的算法硬件协同设计和具有基础模型的FCL，这些研究方向可以为Edge-AI时代FCL的未来发展和实际部署提供见解。

更新时间: 2024-11-20 22:49:28

领域: cs.LG,cs.AI,cs.DC,cs.NI

下载: http://arxiv.org/abs/2411.13740v1

Assessing Gender Bias in LLMs: Comparing LLM Outputs with Human Perceptions and Official Statistics

This study investigates gender bias in large language models (LLMs) by comparing their gender perception to that of human respondents, U.S. Bureau of Labor Statistics data, and a 50% no-bias benchmark. We created a new evaluation set using occupational data and role-specific sentences. Unlike common benchmarks included in LLM training data, our set is newly developed, preventing data leakage and test set contamination. Five LLMs were tested to predict the gender for each role using single-word answers. We used Kullback-Leibler (KL) divergence to compare model outputs with human perceptions, statistical data, and the 50% neutrality benchmark. All LLMs showed significant deviation from gender neutrality and aligned more with statistical data, still reflecting inherent biases.

Updated: 2024-11-20 22:43:18

标题: 评估LLMs中的性别偏见：将LLM产出与人类感知和官方统计数据进行比较

摘要: 这项研究通过比较大型语言模型（LLMs）对性别的认知与人类受访者、美国劳工统计局数据和50%无偏差基准的性别认知来调查性别偏见。我们利用职业数据和特定角色的句子创建了一个新的评估集。与LLM训练数据中包含的常见基准不同，我们的数据集是新开发的，防止了数据泄漏和测试集污染。使用单词回答测试了五个LLMs以预测每个角色的性别。我们使用Kullback-Leibler（KL）散度来比较模型输出与人类认知、统计数据和50%中立基准。所有LLMs都显示出与性别中立性的显著偏差，并更加与统计数据一致，仍然反映出固有的偏见。

更新时间: 2024-11-20 22:43:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.13738v1

Truthfulness of Calibration Measures

We initiate the study of the truthfulness of calibration measures in sequential prediction. A calibration measure is said to be truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. Truthfulness is an important property of calibration measures, ensuring that the forecaster is not incentivized to exploit the system with deliberate poor forecasts. This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness. We conduct a taxonomy of existing calibration measures and their truthfulness. Perhaps surprisingly, we find that all of them are far from being truthful. That is, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty. Our main contribution is the introduction of a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor.

Updated: 2024-11-20 22:41:47

标题: 校准措施的真实性

摘要: 我们开始研究顺序预测中校准度量的真实性。如果校准度量是真实的，那么预测者会（近似地）通过预测下一个结果的条件期望来最小化预期惩罚，给定先前结果的分布。真实性是校准度量的重要属性，确保预测者没有动机利用系统进行故意糟糕的预测。这使得真实性成为校准度量的一个基本要求，与典型要求（如合理性和完整性）一样重要。我们对现有校准度量及其真实性进行了分类。也许令人惊讶的是，我们发现它们都远非真实。也就是说，在现有校准度量下，存在简单的分布可以实现多对数（甚至零）的惩罚，而真实的预测会导致多项式的惩罚。我们的主要贡献是引入了一种新的校准度量，称为子采样平滑校准误差（SSCE），在该度量下，真实的预测是最优的，至多相乘一个常数因子。

更新时间: 2024-11-20 22:41:47

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2407.13979v2

On Generalization Bounds for Neural Networks with Low Rank Layers

While previous optimization results have suggested that deep neural networks tend to favour low-rank weight matrices, the implications of this inductive bias on generalization bounds remain underexplored. In this paper, we apply Maurer's chain rule for Gaussian complexity to analyze how low-rank layers in deep networks can prevent the accumulation of rank and dimensionality factors that typically multiply across layers. This approach yields generalization bounds for rank and spectral norm constrained networks. We compare our results to prior generalization bounds for deep networks, highlighting how deep networks with low-rank layers can achieve better generalization than those with full-rank layers. Additionally, we discuss how this framework provides new perspectives on the generalization capabilities of deep networks exhibiting neural collapse.

Updated: 2024-11-20 22:20:47

标题: 关于具有低秩层神经网络的泛化界限

摘要: 先前的优化结果表明，深度神经网络倾向于偏爱低秩权重矩阵，但这种归纳偏好对泛化界限的影响尚未得到充分探讨。在本文中，我们应用Maurer的高斯复杂度链规则，分析深度网络中低秩层如何阻止通常跨层相乘的秩和维度因子的累积。这一方法为秩和谱范数受限网络提供了泛化界限。我们将我们的结果与先前深度网络的泛化界限进行比较，突出了具有低秩层的深度网络可以比具有全秩层的网络实现更好的泛化。此外，我们讨论了这一框架如何提供新的视角，针对表现出神经崩溃的深度网络的泛化能力。

更新时间: 2024-11-20 22:20:47

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.13733v1

Delta-Influence: Unlearning Poisons via Influence Functions

Addressing data integrity challenges, such as unlearning the effects of data poisoning after model training, is necessary for the reliable deployment of machine learning models. State-of-the-art influence functions, such as EK-FAC, often fail to accurately attribute abnormal model behavior to the specific poisoned training data responsible for the data poisoning attack. In addition, traditional unlearning algorithms often struggle to effectively remove the influence of poisoned samples, particularly when only a few affected examples can be identified. To address these challenge, we introduce $\Delta$-Influence, a novel approach that leverages influence functions to trace abnormal model behavior back to the responsible poisoned training data using as little as just one poisoned test example. $\Delta$-Influence applies data transformations that sever the link between poisoned training data and compromised test points without significantly affecting clean data. This allows $\Delta$-Influence to detect large negative shifts in influence scores following data transformations, a phenomenon we term as influence collapse, thereby accurately identifying poisoned training data. Unlearning this subset, e.g. through retraining, effectively eliminates the data poisoning. We validate our method across three vision-based poisoning attacks and three datasets, benchmarking against four detection algorithms and five unlearning strategies. We show that $\Delta$-Influence consistently achieves the best unlearning across all settings, showing the promise of influence functions for corrective unlearning. Our code is publicly available at: \url{https://github.com/andyisokay/delta-influence}

Updated: 2024-11-20 22:15:10

标题: Delta-Influence: 通过影响函数消除毒素

摘要: 解决数据完整性挑战，如在模型训练后消除数据中毒效果，对于可靠部署机器学习模型是必要的。最先进的影响函数，如EK-FAC，通常无法准确将异常模型行为归因于特定的毒害训练数据，这些数据负责数据中毒攻击。此外，传统的消除算法通常难以有效地消除受影响样本的影响，特别是当只能识别少数受影响的示例时。为了解决这些挑战，我们引入了 $\Delta$-Influence，这是一种利用影响函数追溯异常模型行为至负责数据中毒训练数据的新方法，仅需一个中毒测试示例。$\Delta$-Influence 应用数据转换，切断了中毒训练数据与受损测试点之间的联系，而不会对干净数据产生显著影响。这使得 $\Delta$-Influence 能够检测到数据转换后影响分数的大幅负向变化，我们称之为影响崩溃，从而准确识别受到污染的训练数据。消除这个子集，例如通过重新训练，有效地消除了数据中毒。我们验证了我们的方法在三种基于视觉的中毒攻击和三个数据集上，与四种检测算法和五种消除策略进行了基准测试。我们展示了 $\Delta$-Influence在所有设置中始终实现最佳的消除效果，显示了影响函数在修正消除方面的潜力。我们的代码可在以下网址公开获取：\url{https://github.com/andyisokay/delta-influence}

更新时间: 2024-11-20 22:15:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.13731v1

KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

Drawing inspiration from prompt tuning techniques applied to Large Language Models, recent methods based on pre-trained ViT networks have achieved remarkable results in the field of Continual Learning. Specifically, these approaches propose to maintain a set of prompts and allocate a subset of them to learn each task using a key-query matching strategy. However, they may encounter limitations when lacking control over the correlations between old task queries and keys of future tasks, the shift of features in the latent space, and the relative separation of latent vectors learned in independent tasks. In this work, we introduce a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features. Furthermore, we introduce a One-Versus-All (OVA) prototype-based component that enhances the classification head distinction. Experimental results on benchmark datasets demonstrate that our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.

Updated: 2024-11-20 22:14:07

标题: KOPPA：通过键-查询正交投影和基于原型的一对多改进基于提示的持续学习

摘要: 从应用于大型语言模型的提示调整技术中汲取灵感，最近基于预训练的ViT网络的方法在持续学习领域取得了显著的成果。具体来说，这些方法提出通过维护一组提示并使用键查询匹配策略为每个任务分配其中的一部分来学习。然而，当缺乏对旧任务查询和未来任务键之间相关性的控制时，特征在潜在空间中的转移以及在独立任务中学习的潜在向量的相对分离时，它们可能会遇到限制。在这项工作中，我们引入了一种基于正交投影的新型键查询学习策略，受到模型无关元学习的启发，以增强提示匹配效率并解决特征转移的挑战。此外，我们引入了一个基于一对多（OVA）原型的组件，增强了分类头的区分性。在基准数据集上的实验结果表明，我们的方法使模型能够以高达20%的大幅度超越当前最先进方法的结果。

更新时间: 2024-11-20 22:14:07

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2311.15414v3

Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity

Multitask learning is a widely used paradigm for training models on diverse tasks, with applications ranging from graph neural networks to language model fine-tuning. Since tasks may interfere with each other, a key notion for modeling their relationships is task affinity. This includes pairwise task affinity, computed among pairs of tasks, and higher-order affinity, computed among subsets of tasks. Naively computing either of them requires repeatedly training on data from various task combinations, which is computationally intensive. We present a new algorithm Grad-TAG that can estimate task affinities without this repeated training. The key idea of Grad-TAG is to train a "base" model for all tasks and then use a linearization technique to estimate the loss of the model for a specific task combination. The linearization works by computing a gradient-based approximation of the loss, using low-dimensional projections of gradients as features in a logistic regression to predict labels for the task combination. We show that the linearized model can provably approximate the loss when the gradient-based approximation is accurate, and also empirically verify that on several large models. Then, given the estimated task affinity, we design a semi-definite program for clustering similar tasks by maximizing the average density of clusters. We evaluate Grad-TAG's performance across seven datasets, including multi-label classification on graphs, and instruction fine-tuning of language models. Our task affinity estimates are within 2.7% distance to the true affinities while needing only 3% of FLOPs in full training. On our largest graph with 21M edges and 500 labeling tasks, our algorithm delivers estimates within 5% distance to the true affinities, using only 112 GPU hours. Our results show that Grad-TAG achieves excellent performance and runtime tradeoffs compared to existing approaches.

Updated: 2024-11-20 22:13:47

标题: 可扩展的多任务学习：基于梯度估计任务亲和力的方法

摘要: 多任务学习是一种广泛应用于在不同任务上训练模型的范式，应用范围从图神经网络到语言模型微调。由于任务之间可能会相互干扰，建模它们之间关系的关键概念是任务亲和性。这包括成对任务亲和性，计算在任务对之间，以及高阶亲和性，计算在任务子集之间。简单地计算其中任何一个都需要反复在各种任务组合上训练，这在计算上是非常耗费资源的。我们提出了一种新算法Grad-TAG，可以估计任务亲和性而无需这种重复训练。 Grad-TAG的关键思想是为所有任务训练一个“基本”模型，然后使用线性化技术来估计模型针对特定任务组合的损失。线性化通过计算基于梯度的损失的近似值来实现，使用梯度的低维投影作为 logistic 回归中的特征，以预测任务组合的标签。我们展示了当基于梯度的近似准确时，线性化模型可以可靠地近似损失，并在几个大型模型上进行了实证验证。然后，根据估计的任务亲和性，我们设计了一个半定规划，通过最大化聚类的平均密度来对类似任务进行聚类。我们评估了Grad-TAG在包括在图上的多标签分类和语言模型的指令微调在内的七个数据集上的性能。我们的任务亲和性估计与真实亲和性的距离在2.7%以内，而只需要进行完整训练的3%的 FLOPs。在我们最大的图上，拥有 21M 条边和 500 个标记任务，我们的算法仅使用 112 个 GPU 小时，就能将估计值与真实亲和性的距离控制在 5% 以内。我们的结果表明，与现有方法相比，Grad-TAG在性能和运行时间的权衡方面表现出色。

更新时间: 2024-11-20 22:13:47

领域: cs.LG,cs.AI,cs.SI,stat.ML

下载: http://arxiv.org/abs/2409.06091v2

Replicable Online Learning

We investigate the concept of algorithmic replicability introduced by Impagliazzo et al. 2022, Ghazi et al. 2021, Ahn et al. 2024 in an online setting. In our model, the input sequence received by the online learner is generated from time-varying distributions chosen by an adversary (obliviously). Our objective is to design low-regret online algorithms that, with high probability, produce the exact same sequence of actions when run on two independently sampled input sequences generated as described above. We refer to such algorithms as adversarially replicable. Previous works (such as Esfandiari et al. 2022) explored replicability in the online setting under inputs generated independently from a fixed distribution; we term this notion as iid-replicability. Our model generalizes to capture both adversarial and iid input sequences, as well as their mixtures, which can be modeled by setting certain distributions as point-masses. We demonstrate adversarially replicable online learning algorithms for online linear optimization and the experts problem that achieve sub-linear regret. Additionally, we propose a general framework for converting an online learner into an adversarially replicable one within our setting, bounding the new regret in terms of the original algorithm's regret. We also present a nearly optimal (in terms of regret) iid-replicable online algorithm for the experts problem, highlighting the distinction between the iid and adversarial notions of replicability. Finally, we establish lower bounds on the regret (in terms of the replicability parameter and time) that any replicable online algorithm must incur.

Updated: 2024-11-20 22:10:37

标题: 可复制的在线学习

摘要: 我们研究了由Impagliazzo等人于2022年，Ghazi等人于2021年，Ahn等人于2024年引入的算法可复制性概念在在线环境中。在我们的模型中，在线学习者接收到的输入序列是由对手（毫不知情地）选择的时间变化分布生成的。我们的目标是设计低后悔的在线算法，当在上述描述的独立抽样的两个输入序列上运行时，高概率地产生完全相同的动作序列。我们将这样的算法称为对手可复制的。先前的研究（如Esfandiari等人2022年）在从固定分布独立生成的输入下探索了在线环境中的可复制性；我们将这种概念称为iid-可复制性。我们的模型泛化到捕捉对手和iid输入序列，以及它们的混合，可以通过将某些分布设置为点质量来建模。我们展示了针对在线线性优化和专家问题的对手可复制在线学习算法，实现次线性后悔。此外，我们提出了一个将在线学习者转换为对手可复制者的通用框架，在我们的设置中，用原始算法的后悔来界定新的后悔。我们还提出了一个几乎最优的（就后悔而言）iid-可复制在线算法专家问题，突显了iid和对手概念之间的可复制性区别。最后，我们建立了任何可复制的在线算法必须承担的后悔下界（根据可复制性参数和时间）。

更新时间: 2024-11-20 22:10:37

领域: cs.LG

下载: http://arxiv.org/abs/2411.13730v1

Exploring Large Language Models for Climate Forecasting

With the increasing impacts of climate change, there is a growing demand for accessible tools that can provide reliable future climate information to support planning, finance, and other decision-making applications. Large language models (LLMs), such as GPT-4, present a promising approach to bridging the gap between complex climate data and the general public, offering a way for non-specialist users to obtain essential climate insights through natural language interaction. However, an essential challenge remains under-explored: evaluating the ability of LLMs to provide accurate and reliable future climate predictions, which is crucial for applications that rely on anticipating climate trends. In this study, we investigate the capability of GPT-4 in predicting rainfall at short-term (15-day) and long-term (12-month) scales. We designed a series of experiments to assess GPT's performance under different conditions, including scenarios with and without expert data inputs. Our results indicate that GPT, when operating independently, tends to generate conservative forecasts, often reverting to historical averages in the absence of clear trend signals. This study highlights both the potential and challenges of applying LLMs for future climate predictions, providing insights into their integration with climate-related applications and suggesting directions for enhancing their predictive capabilities in the field.

Updated: 2024-11-20 21:58:19

标题: 探索用于气候预测的大型语言模型

摘要: 随着气候变化影响的不断增加，对能够提供可靠未来气候信息以支持规划、融资和其他决策应用的可访问工具的需求不断增长。大型语言模型（LLMs），如GPT-4，为填补复杂气候数据与普通公众之间的差距提供了一种有前途的方法，为非专业用户通过自然语言交互获得必要的气候洞察提供了途径。然而，一个关键挑战仍未被深入探讨：评估LLMs提供准确可靠的未来气候预测能力，这对依赖于预测气候趋势的应用至关重要。在本研究中，我们调查了GPT-4在短期（15天）和长期（12个月）尺度上预测降雨的能力。我们设计了一系列实验来评估在不同条件下GPT的表现，包括有和没有专家数据输入的情景。我们的结果表明，当独立运行时，GPT往往会产生保守的预测，在没有明显趋势信号的情况下往往会回归历史平均值。本研究突出了将LLMs应用于未来气候预测的潜力和挑战，为它们与气候相关应用的整合提供了见解，并提出了增强它们在该领域预测能力的方向。

更新时间: 2024-11-20 21:58:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.13724v1

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.

Updated: 2024-11-20 21:52:50

标题: 在随机情境下的快速高效多任务表示学习

摘要: 我们研究了表示学习如何改善上下文强盗问题的学习效率。我们研究了同时玩T个具有维度d的上下文线性强盗的设置，这T个强盗任务共享一个维度远小于d的公共线性表示。我们提出了一种基于交替投影梯度下降（GD）和最小化估计器的新算法，用于恢复一个低秩特征矩阵。使用所提出的估计器，我们提出了一种用于线性上下文强盗的多任务学习算法，并证明了我们算法的遗憾界。我们进行了实验并将我们的算法的性能与基准算法进行了比较。

更新时间: 2024-11-20 21:52:50

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.02068v2

Large Language Models on Graphs: A Comprehensive Survey

Large language models (LLMs), such as GPT4 and LLaMA, are creating significant advancements in natural language processing, due to their strong text encoding/decoding ability and newly found emergent capability (e.g., reasoning). While LLMs are mainly designed to process pure texts, there are many real-world scenarios where text data is associated with rich structure information in the form of graphs (e.g., academic networks, and e-commerce networks) or scenarios where graph data is paired with rich textual information (e.g., molecules with descriptions). Besides, although LLMs have shown their pure text-based reasoning ability, it is underexplored whether such ability can be generalized to graphs (i.e., graph-based reasoning). In this paper, we provide a systematic review of scenarios and techniques related to large language models on graphs. We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs. We then discuss detailed techniques for utilizing LLMs on graphs, including LLM as Predictor, LLM as Encoder, and LLM as Aligner, and compare the advantages and disadvantages of different schools of models. Furthermore, we discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future research directions in this fast-growing field. The related source can be found at https://github.com/PeterGriffinJin/Awesome-Language-Model-on-Graphs.

Updated: 2024-11-20 21:24:10

标题: 大型语言模型在图上的应用：一项全面调查

摘要: 大型语言模型（LLMs），例如GPT4和LLaMA，由于其强大的文本编码/解码能力和新发现的紧急能力（例如，推理），正在在自然语言处理领域取得重大进展。虽然LLMs主要设计用于处理纯文本，但在许多现实世界场景中，文本数据与图形结构信息相关联（例如，学术网络和电子商务网络），或者图形数据与丰富的文本信息配对（例如，具有描述的分子）。此外，虽然LLMs已经展示了其基于纯文本的推理能力，但尚未探索这种能力是否可以推广到图形（即基于图形的推理）。在本文中，我们对大型语言模型在图形上的场景和技术进行了系统回顾。我们首先将采用LLMs在图形上的潜在场景总结为三类，即纯图形、文本属性图形和文本配对图形。然后，我们讨论了利用LLMs在图形上的详细技术，包括LLM作为预测器、LLM作为编码器和LLM作为对齐器，并比较了不同模型流派的优缺点。此外，我们讨论了这些方法的实际应用，并总结了开源代码和基准数据集。最后，我们总结了这个快速发展领域的潜在未来研究方向。相关资源可以在https://github.com/PeterGriffinJin/Awesome-Language-Model-on-Graphs找到。

更新时间: 2024-11-20 21:24:10

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.02783v4

SimPhony: A Device-Circuit-Architecture Cross-Layer Modeling and Simulation Framework for Heterogeneous Electronic-Photonic AI System

Electronic-photonic integrated circuits (EPICs) offer transformative potential for next-generation high-performance AI but require interdisciplinary advances across devices, circuits, architecture, and design automation. The complexity of hybrid systems makes it challenging even for domain experts to understand distinct behaviors and interactions across design stack. The lack of a flexible, accurate, fast, and easy-to-use EPIC AI system simulation framework significantly limits the exploration of hardware innovations and system evaluations on common benchmarks. To address this gap, we propose SimPhony, a cross-layer modeling and simulation framework for heterogeneous electronic-photonic AI systems. SimPhony offers a platform that enables (1) generic, extensible hardware topology representation that supports heterogeneous multi-core architectures with diverse photonic tensor core designs; (2) optics-specific dataflow modeling with unique multi-dimensional parallelism and reuse beyond spatial/temporal dimensions; (3) data-aware energy modeling with realistic device responses, layout-aware area estimation, link budget analysis, and bandwidth-adaptive memory modeling; and (4) seamless integration with model training framework for hardware/software co-simulation. By providing a unified, versatile, and high-fidelity simulation platform, SimPhony enables researchers to innovate and evaluate EPIC AI hardware across multiple domains, facilitating the next leap in emerging AI hardware. We open-source our codes at https://github.com/ScopeX-ASU/SimPhony

Updated: 2024-11-20 21:21:54

标题: SimPhony：一种用于异质电子光子人工智能系统的设备-电路-架构跨层建模和仿真框架

摘要: 电子光子集成电路（EPICs）为下一代高性能人工智能提供了变革性的潜力，但需要跨越设备、电路、架构和设计自动化的跨学科进步。混合系统的复杂性使得即使是领域专家也难以理解设计堆栈中不同行为和交互。缺乏灵活、准确、快速和易于使用的EPIC人工智能系统仿真框架显著限制了在常见基准测试中硬件创新和系统评估的探索。为了解决这一差距，我们提出了SimPhony，一个用于异构电子光子人工智能系统的跨层建模和仿真框架。SimPhony提供了一个平台，可以（1）支持异构多核架构和不同光子张量核设计的通用、可扩展的硬件拓扑表示；（2）具有独特的多维并行性和利用率的光学特定数据流建模，超越空间/时间维度；（3）数据感知能量建模，具有真实的设备响应、布局感知区域估计、链路预算分析和带宽自适应内存建模；以及（4）与模型训练框架无缝集成，实现硬件/软件协同仿真。通过提供统一、多功能和高保真度的仿真平台，SimPhony使研究人员能够在多个领域创新和评估EPIC人工智能硬件，促进新兴人工智能硬件的下一个飞跃。我们在https://github.com/ScopeX-ASU/SimPhony上开源我们的代码。

更新时间: 2024-11-20 21:21:54

领域: physics.optics,cs.AI,cs.AR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2411.13715v1

Large Scale Transfer Learning for Tabular Data via Language Modeling

Tabular data -- structured, heterogeneous, spreadsheet-style data with rows and columns -- is widely used in practice across many domains. However, while recent foundation models have reduced the need for developing task-specific datasets and predictors in domains such as language modeling and computer vision, this transfer learning paradigm has not had similar impact in the tabular domain. In this work, we seek to narrow this gap and present TabuLa-8B, a language model for tabular prediction. We define a process for extracting a large, high-quality training dataset from the TabLib corpus, proposing methods for tabular data filtering and quality control. Using the resulting dataset, which comprises over 2.1B rows from over 4M unique tables, we fine-tune a Llama 3-8B large language model (LLM) for tabular data prediction (classification and binned regression) using a novel packing and attention scheme for tabular prediction. Through evaluation across a test suite of 329 datasets, we find that TabuLa-8B has zero-shot accuracy on unseen tables that is over 15 percentage points (pp) higher than random guessing, a feat that is not possible with existing state-of-the-art tabular prediction models (e.g. XGBoost, TabPFN). In the few-shot setting (1-32 shots), without any fine-tuning on the target datasets, TabuLa-8B is 5-15 pp more accurate than XGBoost and TabPFN models that are explicitly trained on equal, or even up to 16x more data. We release our model, code, and data along with the publication of this paper.

Updated: 2024-11-20 21:20:08

标题: 基于语言建模的表格数据大规模迁移学习

摘要: 表格数据——结构化、异构、类似电子表格的行列数据——在许多领域中被广泛使用。然而，尽管最近的基础模型已经减少了在诸如语言建模和计算机视觉等领域开发特定任务数据集和预测模型的需求，但这种迁移学习范式在表格领域并未产生类似的影响。在这项工作中，我们试图缩小这一差距，并提出了TabuLa-8B，一个用于表格预测的语言模型。我们定义了一个从TabLib语料库中提取大规模、高质量训练数据集的过程，并提出了表格数据过滤和质量控制的方法。利用由超过4百万个唯一表格中的超过21亿行组成的结果数据集，我们对Llama 3-8B大语言模型（LLM）进行微调，用于表格数据预测（分类和分箱回归），并使用一种新颖的打包和注意机制方案。通过在329个数据集的测试套件中进行评估，我们发现TabuLa-8B在未见过的表格上的零样本准确率比随机猜测高出15个百分点以上，这是现有最先进的表格预测模型（例如XGBoost、TabPFN）无法实现的成就。在少样本设置（1-32个样本）中，没有在目标数据集上进行任何微调，TabuLa-8B比在相同甚至多达16倍数据上明确训练的XGBoost和TabPFN模型准确率高出5-15个百分点。我们将在本文发表时发布我们的模型、代码和数据。

更新时间: 2024-11-20 21:20:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12031v2

Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise

This paper establishes the first almost sure convergence rate and the first maximal concentration bound with exponential tails for general contractive stochastic approximation algorithms with Markovian noise. As a corollary, we also obtain convergence rates in $L^p$. Key to our successes is a novel discretization of the mean ODE of stochastic approximation algorithms using intervals with diminishing (instead of constant) length. As applications, we provide the first almost sure convergence rate for $Q$-learning with Markovian samples without count-based learning rates. We also provide the first concentration bound for off-policy temporal difference learning with Markovian samples.

Updated: 2024-11-20 21:09:09

标题: 几乎必然收敛速率和马尔可夫噪声下的随机逼近和强化学习集中效应

摘要: 这篇论文建立了具有马尔可夫噪声的一般收缩随机逼近算法的第一个几乎肯定收敛速率和第一个具有指数尾巴的最大集中度界限。作为推论，我们还获得了在$L^p$中的收敛速率。我们成功的关键在于使用长度逐渐减小（而不是恒定的）间隔对随机逼近算法的均值ODE进行新颖的离散化。作为应用，我们提供了具有马尔可夫样本的$Q$-学习的第一个几乎肯定收敛速率，而无需基于计数的学习速率。我们还为具有马尔可夫样本的离线策略时间差分学习提供了第一个集中度界限。

更新时间: 2024-11-20 21:09:09

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2411.13711v1

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

As LLMs become more widely deployed, there is increasing interest in directly optimizing for feedback from end users (e.g. thumbs up) in addition to feedback from paid annotators. However, training to maximize human feedback creates a perverse incentive structure for the AI to resort to manipulative or deceptive tactics to obtain positive feedback from users who are vulnerable to such strategies. We study this phenomenon by training LLMs with Reinforcement Learning with simulated user feedback in environments of practical LLM usage. In our settings, we find that: 1) Extreme forms of "feedback gaming" such as manipulation and deception are learned reliably; 2) Even if only 2% of users are vulnerable to manipulative strategies, LLMs learn to identify and target them while behaving appropriately with other users, making such behaviors harder to detect; 3) To mitigate this issue, it may seem promising to leverage continued safety training or LLM-as-judges during training to filter problematic outputs. Instead, we found that while such approaches help in some of our settings, they backfire in others, sometimes even leading to subtler manipulative behaviors. We hope our results can serve as a case study which highlights the risks of using gameable feedback sources -- such as user feedback -- as a target for RL.

Updated: 2024-11-20 20:50:01

标题: 关于优化LLM以获取用户反馈时的有针对性操纵和欺骗

摘要: 随着LLMs的广泛部署，人们越来越关注直接优化来自最终用户的反馈（例如点赞），而不仅仅是来自付费注释者的反馈。然而，为了最大化人类反馈而进行的训练为人工智能创造了一种变相的激励结构，使其倾向于采用欺骗性或欺骗性策略，以获取易受此类策略影响的用户的积极反馈。我们通过在实际LLM使用环境中使用模拟用户反馈来训练LLMs，并研究了这一现象。在我们的设置中，我们发现：1）像操纵和欺骗这样的"反馈游戏"极端形式被可靠地学习；2）即使只有2%的用户容易受到操纵策略的影响，LLMs也学会了识别并针对这些用户，同时与其他用户行为得当，使这种行为更难以检测；3）为了缓解这个问题，利用持续的安全训练或LLM作为法官在训练中过滤问题输出似乎是有希望的。然而，我们发现，虽然这些方法在我们的一些设置中有所帮助，但在其他情况下会产生反效果，有时甚至会导致更微妙的操纵行为。我们希望我们的研究结果可以作为一个案例研究，突出使用可被操纵的反馈来源（例如用户反馈）作为RL目标的风险。

更新时间: 2024-11-20 20:50:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.02306v2

SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

In this paper, we introduce SalsaNext for the uncertainty-aware semantic segmentation of a full 3D LiDAR point cloud in real-time. SalsaNext is the next version of SalsaNet [1] which has an encoder-decoder architecture where the encoder unit has a set of ResNet blocks and the decoder part combines upsampled features from the residual blocks. In contrast to SalsaNet, we introduce a new context module, replace the ResNet encoder blocks with a new residual dilated convolution stack with gradually increasing receptive fields and add the pixel-shuffle layer in the decoder. Additionally, we switch from stride convolution to average pooling and also apply central dropout treatment. To directly optimize the Jaccard index, we further combine the weighted cross-entropy loss with Lovasz-Softmax loss [2]. We finally inject a Bayesian treatment to compute the epistemic and aleatoric uncertainties for each point in the cloud. We provide a thorough quantitative evaluation on the Semantic-KITTI dataset [3], which demonstrates that the proposed SalsaNext outperforms other state-of-the-art semantic segmentation networks and ranks first on the Semantic-KITTI leaderboard. We also release our source code https://github.com/TiagoCortinhal/SalsaNext.

Updated: 2024-11-20 20:47:25

标题: SalsaNext：用于自动驾驶的快速、不确定性感知的LiDAR点云语义分割

摘要: 在本文中，我们介绍了SalsaNext，用于实时对完整的3D LiDAR点云进行不确定性感知语义分割。SalsaNext是SalsaNet的下一个版本，其具有编码器-解码器架构，其中编码器单元具有一组ResNet块，解码器部分将来自残差块的上采样特征组合在一起。与SalsaNet不同的是，我们引入了一个新的上下文模块，用新的残差扩张卷积堆栈替换了ResNet编码器块，并在解码器中添加了像素混洗层。此外，我们从步幅卷积切换到平均池化，并应用中心丢弃处理。为了直接优化Jaccard指数，我们进一步将加权交叉熵损失与Lovasz-Softmax损失相结合。最后，我们引入贝叶斯处理来计算云中每个点的认知和偶然不确定性。我们在Semantic-KITTI数据集上进行了彻底的定量评估，结果表明所提出的SalsaNext优于其他最先进的语义分割网络，并在Semantic-KITTI排行榜上排名第一。我们还发布了我们的源代码https://github.com/TiagoCortinhal/SalsaNext。

更新时间: 2024-11-20 20:47:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2003.03653v4

A Collaborative Ensemble Framework for CTR Prediction

Recent advances in foundation models have established scaling laws that enable the development of larger models to achieve enhanced performance, motivating extensive research into large-scale recommendation models. However, simply increasing the model size in recommendation systems, even with large amounts of data, does not always result in the expected performance improvements. In this paper, we propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models, each with its own embedding table, to capture unique feature interaction patterns. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning, where models iteratively refine their predictions. To dynamically balance contributions from each model, we introduce a confidence-based fusion mechanism using general softmax, where model confidence is computed via negation entropy. This design ensures that more confident models have a greater influence on the final prediction while benefiting from the complementary strengths of other models. We validate our framework on three public datasets (AmazonElectronics, TaobaoAds, and KuaiVideo) as well as a large-scale industrial dataset from Meta, demonstrating its superior performance over individual models and state-of-the-art baselines. Additionally, we conduct further experiments on the Criteo and Avazu datasets to compare our method with the multi-embedding paradigm. Our results show that our framework achieves comparable or better performance with smaller embedding sizes, offering a scalable and efficient solution for CTR prediction tasks.

Updated: 2024-11-20 20:38:56

标题: 一个用于点击率预测的协作式集成框架

摘要: 最近基础模型的进展已经建立了能够实现更大模型以达到增强性能的缩放定律，促使对大规模推荐模型进行广泛研究。然而，仅仅增加推荐系统中模型的大小，即使有大量数据，也不总是能够实现预期的性能提升。在本文中，我们提出了一个新颖的框架，协作集成训练网络（CETNet），利用多个不同的模型，每个模型都有自己的嵌入表，来捕捉独特的特征交互模式。与朴素的模型缩放不同，我们的方法强调多样性和协作，通过协作学习，模型迭代地完善其预测。为了动态平衡每个模型的贡献，我们引入了一种基于置信度的融合机制，使用通用softmax，其中模型置信度通过否定熵计算。这种设计确保更有信心的模型对最终预测有更大的影响，同时受益于其他模型的补充优势。我们在三个公共数据集（AmazonElectronics、TaobaoAds和KuaiVideo）以及来自Meta的大规模工业数据集上验证了我们的框架，证明了它相对于单个模型和最先进基线的优越性能。此外，我们在Criteo和Avazu数据集上进行了进一步实验，与多嵌入范式进行比较。我们的结果表明，我们的框架在较小的嵌入大小下实现了可比或更好的性能，为CTR预测任务提供了可扩展和高效的解决方案。

更新时间: 2024-11-20 20:38:56

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2411.13700v1

Test Security in Remote Testing Age: Perspectives from Process Data Analytics and AI

The COVID-19 pandemic has accelerated the implementation and acceptance of remotely proctored high-stake assessments. While the flexible administration of the tests brings forth many values, it raises test security-related concerns. Meanwhile, artificial intelligence (AI) has witnessed tremendous advances in the last five years. Many AI tools (such as the very recent ChatGPT) can generate high-quality responses to test items. These new developments require test security research beyond the statistical analysis of scores and response time. Data analytics and AI methods based on clickstream process data can get us deeper insight into the test-taking process and hold great promise for securing remotely administered high-stakes tests. This chapter uses real-world examples to show that this is indeed the case.

Updated: 2024-11-20 20:38:34

标题: 远程测试时代的考试安全：来自过程数据分析和人工智能的观点

摘要: COVID-19大流行加速了远程监考的实施和接受。虽然测试的灵活管理带来了许多价值，但也引发了与考试安全相关的担忧。同时，人工智能（AI）在过去五年中取得了巨大进步。许多AI工具（如最近推出的ChatGPT）可以生成高质量的测试题目响应。这些新发展需要超越对分数和答题时间的统计分析的考试安全研究。基于点击流过程数据的数据分析和AI方法可以帮助我们更深入地了解考试过程，并为确保远程监考的高风险考试提供巨大希望。本章节使用实际案例展示了这一点。

更新时间: 2024-11-20 20:38:34

领域: cs.CR,cs.CL,cs.HC

下载: http://arxiv.org/abs/2411.13699v1

Closing the Gap Between SGP4 and High-Precision Propagation via Differentiable Programming

The Simplified General Perturbations 4 (SGP4) orbital propagation method is widely used for predicting the positions and velocities of Earth-orbiting objects rapidly and reliably. Despite continuous refinement, SGP models still lack the precision of numerical propagators, which offer significantly smaller errors. This study presents dSGP4, a novel differentiable version of SGP4 implemented using PyTorch. By making SGP4 differentiable, dSGP4 facilitates various space-related applications, including spacecraft orbit determination, state conversion, covariance transformation, state transition matrix computation, and covariance propagation. Additionally, dSGP4's PyTorch implementation allows for embarrassingly parallel orbital propagation across batches of Two-Line Element Sets (TLEs), leveraging the computational power of CPUs, GPUs, and advanced hardware for distributed prediction of satellite positions at future times. Furthermore, dSGP4's differentiability enables integration with modern machine learning techniques. Thus, we propose a novel orbital propagation paradigm, ML-dSGP4, where neural networks are integrated into the orbital propagator. Through stochastic gradient descent, this combined model's inputs, outputs, and parameters can be iteratively refined, surpassing SGP4's precision. Neural networks act as identity operators by default, adhering to SGP4's behavior. However, dSGP4's differentiability allows fine-tuning with ephemeris data, enhancing precision while maintaining computational speed. This empowers satellite operators and researchers to train the model using specific ephemeris or high-precision numerical propagation data, significantly advancing orbital prediction capabilities.

Updated: 2024-11-20 20:34:06

标题: 通过可微编程缩小SGP4和高精度传播之间的差距

摘要: 简化的一般摄动4（SGP4）轨道传播方法被广泛用于快速可靠地预测地球轨道物体的位置和速度。尽管不断进行改进，SGP模型仍然缺乏数值传播器的精度，后者提供明显较小的误差。本研究提出了dSGP4，这是一种使用PyTorch实现的新颖的可微分版本的SGP4。通过使SGP4可微分，dSGP4促进了各种与空间相关的应用，包括航天器轨道确定、状态转换、协方差变换、状态转移矩阵计算和协方差传播。此外，dSGP4的PyTorch实现允许在批处理的两行元素集（TLE）中进行尴尬并行轨道传播，利用CPU、GPU和分布式硬件的计算能力来预测卫星在未来时间的位置。此外，dSGP4的可微性使其能够与现代机器学习技术集成。因此，我们提出了一种新颖的轨道传播范式，ML-dSGP4，其中神经网络被集成到轨道传播器中。通过随机梯度下降，这种结合模型的输入、输出和参数可以被迭代地优化，超越SGP4的精度。神经网络默认情况下充当身份运算符，遵循SGP4的行为。然而，dSGP4的可微性允许通过星历数据进行微调，增强精度同时保持计算速度。这使卫星运营商和研究人员能够使用特定的星历或高精度数值传播数据来训练模型，显著提升了轨道预测能力。

更新时间: 2024-11-20 20:34:06

领域: cs.LG,astro-ph.EP

下载: http://arxiv.org/abs/2402.04830v5

Sounds Good? Fast and Secure Contact Exchange in Groups

Trustworthy digital communication requires the secure exchange of contact information, but current approaches lack usability and scalability for larger groups of users. We evaluate the usability of two secure contact exchange systems: the current state of the art, SafeSlinger, and our newly designed protocol, PairSonic, which extends trust from physical encounters to spontaneous online communication. Our lab study (N=45) demonstrates PairSonic's superior usability, automating the tedious verification tasks from previous approaches via an acoustic out-of-band channel. Although participants significantly preferred our system, minimizing user effort surprisingly decreased the perceived security for some users, who associated security with complexity. We discuss user perceptions of the different protocol components and identify remaining usability barriers for CSCW application scenarios.

Updated: 2024-11-20 20:23:07

标题: 听起来不错吧？在群组中快速安全地交换联系方式

摘要: 值得信赖的数字通信需要安全地交换联系信息，但目前的方法在更大用户群体的可用性和可扩展性方面存在不足。我们评估了两种安全联系交换系统的可用性：当前最先进的SafeSlinger和我们新设计的协议PairSonic，该协议将信任从物理接触扩展到即时在线通信。我们的实验室研究（N=45）证明了PairSonic的优越可用性，通过声学带外通道自动化了以前方法中繁琐的验证任务。尽管参与者明显更喜欢我们的系统，但一些用户将安全性与复杂性联系在一起，使用户努力最小化令他们对安全性的感知出现了意外降低。我们讨论了用户对不同协议组件的感知，并确定了CSCW应用场景中剩余的可用性障碍。

更新时间: 2024-11-20 20:23:07

领域: cs.HC,cs.CR,cs.NI

下载: http://arxiv.org/abs/2411.13694v1

PairSonic: Helping Groups Securely Exchange Contact Information

Securely exchanging contact information is essential for establishing trustworthy communication channels that facilitate effective online collaboration. However, current methods are neither user-friendly nor scalable for large groups of users. In response, we introduce PairSonic, a novel group pairing protocol that extends trust from physical encounters to online communication. PairSonic simplifies the pairing process by automating the tedious verification tasks of previous methods through an acoustic out-of-band channel using smartphones' built-in hardware. Our protocol not only facilitates connecting users for computer-supported collaboration, but also provides a more user-friendly and scalable solution to the authentication ceremonies currently used in end-to-end encrypted messengers like Signal or WhatsApp. PairSonic is available as open-source software: https://github.com/seemoo-lab/pairsonic

Updated: 2024-11-20 20:19:41

标题: PairSonic：帮助群体安全地交换联系信息

摘要: 安全地交换联系信息对于建立可信的沟通渠道以促进有效的在线协作至关重要。然而，当前的方法既不用户友好，也不适用于大量用户组。因此，我们介绍了PairSonic，一种新颖的群组配对协议，将信任从现实相遇扩展到在线沟通。PairSonic通过利用智能手机的内置硬件，通过声音的带外信道自动化了以前方法中繁琐的验证任务，简化了配对过程。我们的协议不仅促进了用户之间的计算机支持协作，还为类似Signal或WhatsApp这样的端到端加密信使目前使用的身份验证仪式提供了更加用户友好和可扩展的解决方案。PairSonic作为开源软件可供使用：https://github.com/seemoo-lab/pairsonic

更新时间: 2024-11-20 20:19:41

领域: cs.CR,cs.HC,cs.NI

下载: http://arxiv.org/abs/2411.13693v1

Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU

We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions about Pittsburgh and Carnegie Mellon University (CMU). We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs, achieving an inter-annotator agreement (IAA) score of 0.7625. Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy. Experimental results show that the RAG system significantly outperforms a non-RAG baseline, particularly in time-sensitive and complex queries, with an F1 score improvement from 5.45% to 42.21% and recall of 56.18%. This study demonstrates the potential of RAG systems in enhancing answer precision and relevance, while identifying areas for further optimization in document retrieval and model training.

Updated: 2024-11-20 20:10:43

标题: 检索增强生成用于领域特定问答：匹兹堡和卡内基梅隆大学的案例研究

摘要: 我们设计了一个检索增强生成（RAG）系统，为大型语言模型提供有关匹兹堡和卡内基梅隆大学（CMU）的领域特定问题的相关文档。我们使用贪婪抓取策略提取了超过1800个子页面，并采用了混合注释过程，结合了手动和Mistral生成的问题-答案对，实现了0.7625的注释者间一致性（IAA）分数。我们的RAG框架集成了BM25和FAISS检索器，并加入了一个重新排名器，以提高文档检索的准确性。实验结果显示，RAG系统在时间敏感和复杂查询方面显著优于非RAG基线，F1分数从5.45%提高到42.21%，召回率为56.18%。这项研究展示了RAG系统在提高答案精度和相关性方面的潜力，同时识别了文档检索和模型训练进一步优化的领域。

更新时间: 2024-11-20 20:10:43

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2411.13691v1

Multi-Agent Best Arm Identification in Stochastic Linear Bandits

We study the problem of collaborative best-arm identification in stochastic linear bandits under a fixed-budget scenario. In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel. The objective of the agents is to collaboratively learn the best arm of the given bandit instance with the help of a central server while minimizing the probability of error in best arm estimation. For this purpose, we devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively. Both algorithms employ an Upper-Confidence-Bound approach where agents share their knowledge through the central server during each communication round. We demonstrate, both theoretically and empirically, that our algorithms enjoy exponentially decaying probability of error in the allocated time budget. Furthermore, experimental results based on synthetic and real-world data validate the effectiveness of our algorithms over the existing multi-agent algorithms.

Updated: 2024-11-20 20:09:44

标题: 多智能体在随机线性赌臂中的最佳臂识别

摘要: 我们研究了在固定预算情况下，在随机线性赌博机中的协作最佳臂识别问题。在我们的学习模型中，考虑通过星形网络或通用网络连接的多个代理，与线性赌博机实例并行交互。代理的目标是通过中央服务器的帮助协同学习给定赌博机实例的最佳臂，并最小化在最佳臂估计中的错误概率。为此，我们设计了MaLinBAI-Star和MaLinBAI-Gen算法，分别用于星形网络和通用网络。这两种算法都采用上置信度边界方法，代理在每一轮通信中通过中央服务器分享他们的知识。我们在理论和实证上证明，我们的算法在分配的时间预算中享有指数衰减的错误概率。此外，基于合成和真实数据的实验结果验证了我们的算法在现有多代理算法上的有效性。

更新时间: 2024-11-20 20:09:44

领域: cs.LG,93E35,I.2.6

下载: http://arxiv.org/abs/2411.13690v1

Investigating Graph Neural Networks and Classical Feature-Extraction Techniques in Activity-Cliff and Molecular Property Prediction

Molecular featurisation refers to the transformation of molecular data into numerical feature vectors. It is one of the key research areas in molecular machine learning and computational drug discovery. Recently, message-passing graph neural networks (GNNs) have emerged as a novel method to learn differentiable features directly from molecular graphs. While such techniques hold great promise, further investigations are needed to clarify if and when they indeed manage to definitively outcompete classical molecular featurisations such as extended-connectivity fingerprints (ECFPs) and physicochemical-descriptor vectors (PDVs). We systematically explore and further develop classical and graph-based molecular featurisation methods for two important tasks: molecular property prediction, in particular, quantitative structure-activity relationship (QSAR) prediction, and the largely unexplored challenge of activity-cliff (AC) prediction. We first give a technical description and critical analysis of PDVs, ECFPs and message-passing GNNs, with a focus on graph isomorphism networks (GINs). We then conduct a rigorous computational study to compare the performance of PDVs, ECFPs and GINs for QSAR and AC-prediction. Following this, we mathematically describe and computationally evaluate a novel twin neural network model for AC-prediction. We further introduce an operation called substructure pooling for the vectorisation of structural fingerprints as a natural counterpart to graph pooling in GNN architectures. We go on to propose Sort & Slice, a simple substructure-pooling technique for ECFPs that robustly outperforms hash-based folding at molecular property prediction. Finally, we outline two ideas for future research: (i) a graph-based self-supervised learning strategy to make classical molecular featurisations trainable, and (ii) trainable substructure-pooling via differentiable self-attention.

Updated: 2024-11-20 20:07:48

标题: 研究图神经网络和经典特征提取技术在活性悬崖和分子性质预测中的应用

摘要: 分子特征化是将分子数据转化为数值特征向量的过程。这是分子机器学习和计算药物发现中的关键研究领域之一。最近，消息传递图神经网络（GNNs）作为一种新颖方法出现，可以直接从分子图中学习可微特征。虽然这些技术具有巨大的潜力，但需要进一步研究以澄清它们是否确实能够明确地胜过诸如扩展连接指纹（ECFPs）和物理化学描述符向量（PDVs）等经典分子特征化方法。我们系统地探索和进一步发展了经典和基于图的分子特征化方法，用于两个重要任务：分子性质预测，特别是定量结构-活性关系（QSAR）预测，以及较少探索的活性悬崖（AC）预测挑战。我们首先对PDVs、ECFPs和消息传递GNNs进行技术描述和批判性分析，重点关注图同构网络（GINs）。然后，我们进行了严格的计算研究，比较了PDVs、ECFPs和GINs在QSAR和AC预测中的表现。随后，我们对AC预测提出了一种新颖的双神经网络模型进行数学描述和计算评估。我们进一步引入了一种名为亚结构池化的操作，用于将结构指纹向量化，作为GNN架构中图池化的自然对应物。我们提出了Sort＆Slice，一种简单的用于ECFPs的亚结构池化技术，可以稳健地优于基于哈希的折叠在分子性质预测中。最后，我们概述了未来研究的两个想法：（i）一种基于图的自监督学习策略，使经典分子特征化可训练，以及（ii）通过可微自注意力实现可训练的亚结构池化。

更新时间: 2024-11-20 20:07:48

领域: cs.LG,q-bio.BM,stat.ML

下载: http://arxiv.org/abs/2411.13688v1

Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Theory Perspective

The Rectified Power Unit (RePU) activation functions, unlike the Rectified Linear Unit (ReLU), have the advantage of being a differentiable function when constructing neural networks. However, it can be experimentally observed when deep layers are stacked, neural networks constructed with RePU encounter critical issues. These issues include the values exploding or vanishing and failure of training. And these happen regardless of the hyperparameter initialization. From the perspective of effective theory, we aim to identify the causes of this phenomenon and propose a new activation function that retains the advantages of RePU while overcoming its drawbacks.

Updated: 2024-11-20 20:00:31

标题: 为什么整流电源单元网络失败以及如何改进：有效理论视角

摘要: Rectified Power Unit（RePU）激活函数与Rectified Linear Unit（ReLU）不同，具有在构建神经网络时是可微分函数的优势。然而，实验观察发现，当深层堆叠时，使用RePU构建的神经网络会遇到关键问题。这些问题包括数值爆炸或消失以及训练失败。而且这些问题发生时与超参数初始化无关。从有效理论的角度，我们旨在确定这种现象的原因，并提出一个新的激活函数，保留RePU的优势同时克服其缺点。

更新时间: 2024-11-20 20:00:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.02697v3

Differentially Private Learning Beyond the Classical Dimensionality Regime

We initiate the study of differentially private learning in the proportional dimensionality regime, in which the number of data samples $n$ and problem dimension $d$ approach infinity at rates proportional to one another, meaning that $d / n \to \delta$ as $n \to \infty$ for an arbitrary, given constant $\delta \in (0, \infty)$. This setting is significantly more challenging than that of all prior theoretical work in high-dimensional differentially private learning, which, despite the name, has assumed that $\delta = 0$ or is sufficiently small for problems of sample complexity $O(d)$, a regime typically considered "low-dimensional" or "classical" by modern standards in high-dimensional statistics. We provide sharp theoretical estimates of the error of several well-studied differentially private algorithms for robust linear regression and logistic regression, including output perturbation, objective perturbation, and noisy stochastic gradient descent, in the proportional dimensionality regime. The $1 + o(1)$ factor precision of our error estimates enables a far more nuanced understanding of the price of privacy of these algorithms than that afforded by existing, coarser analyses, which are essentially vacuous in the regime we consider. We incorporate several probabilistic tools that have not previously been used to analyze differentially private learning algorithms, such as a modern Gaussian comparison inequality and recent universality laws with origins in statistical physics.

Updated: 2024-11-20 19:56:12

标题: 超越经典维度范围的差分隐私学习

摘要: 我们开始研究比例维度范围内的差分私有学习，其中数据样本数量$n$和问题维度$d$相互接近无穷大，以相互比例的速率增加，意味着当$n \to \infty$时$d / n \to \delta$，其中$\delta \in (0, \infty)$是任意给定的常数。这种设置比高维度差分私有学习中所有先前理论工作都更具挑战性，尽管名称为高维度差分私有学习，但这些工作假定$\delta = 0$或足够小，适用于样本复杂度为$O(d)$的问题，这个范围通常被现代高维统计学标准视为“低维度”或“经典”。我们为鲁棒线性回归和逻辑回归的几种经过深入研究的差分私有算法提供了尖锐的理论误差估计，包括输出扰动、目标扰动和嘈杂的随机梯度下降，在比例维度范围内。我们的误差估计精度为$1 + o(1)$，比现有的粗略分析更深入地理解这些算法的隐私代价，这些分析在我们考虑的范围内基本上是空洞的。我们结合了几种以前未曾用于分析差分私有学习算法的概率工具，例如现代高斯比较不等式和源自统计物理的最近的普适定律。

更新时间: 2024-11-20 19:56:12

领域: cs.LG,cs.CR,cs.DS

下载: http://arxiv.org/abs/2411.13682v1

Bimanual Dexterity for Complex Tasks

To train generalist robot policies, machine learning methods often require a substantial amount of expert human teleoperation data. An ideal robot for humans collecting data is one that closely mimics them: bimanual arms and dexterous hands. However, creating such a bimanual teleoperation system with over 50 DoF is a significant challenge. To address this, we introduce Bidex, an extremely dexterous, low-cost, low-latency and portable bimanual dexterous teleoperation system which relies on motion capture gloves and teacher arms. We compare Bidex to a Vision Pro teleoperation system and a SteamVR system and find Bidex to produce better quality data for more complex tasks at a faster rate. Additionally, we show Bidex operating a mobile bimanual robot for in the wild tasks. The robot hands (5k USD) and teleoperation system (7k USD) is readily reproducible and can be used on many robot arms including two xArms (16k USD). Website at https://bidex-teleop.github.io/

Updated: 2024-11-20 19:53:35

标题: 复杂任务的双手灵巧操作

摘要: 为了训练通用机器人策略，机器学习方法通常需要大量的专家人类遥操作数据。对于收集数据的人类来说，最理想的机器人是那些能够紧密模仿他们的：双手臂和灵巧的手。然而，创建一个具有50个以上自由度的双手臂遥操作系统是一个重大挑战。为了解决这个问题，我们引入了Bidex，一个极具灵巧性、低成本、低延迟和便携的双手臂灵巧遥操作系统，它依靠动作捕捉手套和教师手臂。我们将Bidex与Vision Pro遥操作系统和SteamVR系统进行比较，发现Bidex在更复杂任务中以更快的速度产生更优质的数据。此外，我们展示了Bidex在野外任务中操作移动双手臂机器人。机器人手（5k美元）和遥操作系统（7k美元）可以轻松复制，并可用于许多机器人手臂，包括两个xArms（16k美元）。网站位于https://bidex-teleop.github.io/

更新时间: 2024-11-20 19:53:35

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.13677v1

Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training

Recent breakthroughs and successful deployment of large language and vision models in a constrained environment predominantly follow a two phase approach. First, large models are trained to achieve peak performance, followed by a model shrinking method to meet hardware constraints; Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones. Formally, this can be seen as the problem of identifying an optimal model of size $n$ from a larger model of size $k \cdot n$, where $k > 1$ is the overparameterization factor. This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment. Our contribution is an effective architectural change, namely, {\it Majority Kernels} that is compatible with the main standard architectures such as multi-layer perceptrons (MLPs), Residual networks (ResNets), and Transformers. We demonstrate that applying our technique can modify the training dynamics resulting in performance gains across architectures and tasks while maintaining the inference performance consistent. Furthermore, our approach adds minimal overhead to the cost incurred (wall clock time) at training time. The proposed approach shows strong performance on a wide variety of datasets and models, even outperforming strong baselines such as distilled ensembles as well as combinatorial optimization methods based on submodular optimization.

Updated: 2024-11-20 19:51:32

标题: 大多数核：一种利用大模型动态进行高效小模型训练的方法

摘要: 最近，在受限环境中，大型语言和视觉模型的突破和成功部署主要遵循两阶段方法。首先，训练大型模型以实现最佳性能，然后使用模型缩减方法来满足硬件约束；蒸馏、压缩或量化等方法有助于利用高性能的大型模型来诱导出性能较小的模型。形式上，这可以看作是从大小为$k\cdot n$的较大模型中识别出大小为$n$的最佳模型的问题，其中$k>1$是超参数化因子。本文探讨了一个假设，即单次训练可以同时训练一个更大的模型以获得性能，并为部署推导出一个较小的模型。我们的贡献是一种有效的架构改变，即“多数内核”，与主要标准架构（如多层感知器（MLPs）、残差网络（ResNets）和Transformer）兼容。我们展示了应用我们的技术可以修改训练动态，从而在各种体系结构和任务中实现性能增益，同时保持推理性能一致。此外，我们的方法在训练时增加的成本开销（挂钟时间）很小。所提出的方法在各种数据集和模型上表现出强大的性能，甚至胜过强大的基线方法，如蒸馏集合和基于子模块优化的组合优化方法。

更新时间: 2024-11-20 19:51:32

领域: cs.LG

下载: http://arxiv.org/abs/2402.05033v2

Hymba: A Hybrid-head Architecture for Small Language Models

We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficiency. Attention heads provide high-resolution recall, while SSM heads enable efficient context summarization. Additionally, we introduce learnable meta tokens that are prepended to prompts, storing critical information and alleviating the "forced-to-attend" burden associated with attention mechanisms. This model is further optimized by incorporating cross-layer key-value (KV) sharing and partial sliding window attention, resulting in a compact cache size. During development, we conducted a controlled study comparing various architectures under identical settings and observed significant advantages of our proposed architecture. Notably, Hymba achieves state-of-the-art results for small LMs: Our Hymba-1.5B-Base model surpasses all sub-2B public models in performance and even outperforms Llama-3.2-3B with 1.32% higher average accuracy, an 11.67x cache size reduction, and 3.49x throughput.

Updated: 2024-11-20 19:51:25

标题: Hymba：用于小型语言模型的混合头结构

摘要: 我们提出了Hymba，一系列小型语言模型，具有混合头并行架构，将transformer注意力机制与状态空间模型（SSMs）相结合，以增强效率。注意力头提供高分辨率的回忆，而SSM头使得上下文摘要更加高效。此外，我们引入了可学习的元令牌，它们被预置到提示之前，存储关键信息并减轻与注意力机制相关的“强制关注”负担。通过将跨层键值（KV）共享和部分滑动窗口注意力集成到模型中，进一步优化了该模型，从而实现了更紧凑的缓存大小。在开发过程中，我们进行了一项受控研究，比较了在相同设置下的各种架构，并观察到我们提出的架构的显著优势。值得注意的是，Hymba在小型LMs方面取得了最先进的结果：我们的Hymba-1.5B-Base模型在性能上超越了所有低于2B的公开模型，甚至比Llama-3.2-3B表现更好，平均准确率高出1.32％，缓存大小减少11.67倍，吞吐量增加3.49倍。

更新时间: 2024-11-20 19:51:25

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13676v1

Graph convolutional network as a fast statistical emulator for numerical ice sheet modeling

The Ice-sheet and Sea-level System Model (ISSM) provides numerical solutions for ice sheet dynamics using finite element and fine mesh adaption. However, considering ISSM is compatible only with central processing units (CPUs), it has limitations in economizing computational time to explore the linkage between climate forcings and ice dynamics. Although several deep learning emulators using graphic processing units (GPUs) have been proposed to accelerate ice sheet modeling, most of them rely on convolutional neural networks (CNNs) designed for regular grids. Since they are not appropriate for the irregular meshes of ISSM, we use a graph convolutional network (GCN) to replicate the adapted mesh structures of the ISSM. When applied to transient simulations of the Pine Island Glacier (PIG), Antarctica, the GCN successfully reproduces ice thickness and velocity with a correlation coefficient of approximately 0.997, outperforming non-graph models, including fully convolutional network (FCN) and multi-layer perceptron (MLP). Compared to the fixed-resolution approach of the FCN, the flexible-resolution structure of the GCN accurately captures detailed ice dynamics in fast-ice regions. By leveraging 60-100 times faster computational time of the GPU-based GCN emulator, we efficiently examine the impacts of basal melting rates on the ice sheet dynamics in the PIG.

Updated: 2024-11-20 19:46:37

标题: 图卷积网络作为数值冰盖模拟的快速统计模拟器

摘要: 冰盖和海平面系统模型（ISSM）利用有限元和细网格适应提供冰盖动力学的数值解。然而，考虑到ISSM只兼容中央处理单元（CPU），在探索气候强迫和冰动力学之间的联系时，它在节约计算时间方面存在局限性。虽然已经提出了几个利用图形处理单元（GPU）加速冰盖建模的深度学习模拟器，但大多数依赖于为规则网格设计的卷积神经网络（CNNs）。由于它们不适用于ISSM的不规则网格，我们使用图卷积网络（GCN）来复制ISSM的适应网格结构。当应用于南极洲的Pine Island冰川（PIG）的瞬态模拟时，GCN成功地重现了冰厚度和速度，相关系数约为0.997，优于非图模型，包括全卷积网络（FCN）和多层感知器（MLP）。与FCN的固定分辨率方法相比，GCN的灵活分辨率结构准确捕捉快冰区域的详细冰动力学。通过利用基于GPU的GCN模拟器的60-100倍更快的计算时间，我们有效地研究了Pine Island冰川的基底融化速率对冰盖动力学的影响。

更新时间: 2024-11-20 19:46:37

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2402.05291v2

FabuLight-ASD: Unveiling Speech Activity via Body Language

Active speaker detection (ASD) in multimodal environments is crucial for various applications, from video conferencing to human-robot interaction. This paper introduces FabuLight-ASD, an advanced ASD model that integrates facial, audio, and body pose information to enhance detection accuracy and robustness. Our model builds upon the existing Light-ASD framework by incorporating human pose data, represented through skeleton graphs, which minimises computational overhead. Using the Wilder Active Speaker Detection (WASD) dataset, renowned for reliable face and body bounding box annotations, we demonstrate FabuLight-ASD's effectiveness in real-world scenarios. Achieving an overall mean average precision (mAP) of 94.3%, FabuLight-ASD outperforms Light-ASD, which has an overall mAP of 93.7% across various challenging scenarios. The incorporation of body pose information shows a particularly advantageous impact, with notable improvements in mAP observed in scenarios with speech impairment, face occlusion, and human voice background noise. Furthermore, efficiency analysis indicates only a modest increase in parameter count (27.3%) and multiply-accumulate operations (up to 2.4%), underscoring the model's efficiency and feasibility. These findings validate the efficacy of FabuLight-ASD in enhancing ASD performance through the integration of body pose data. FabuLight-ASD's code and model weights are available at https://github.com/knowledgetechnologyuhh/FabuLight-ASD.

Updated: 2024-11-20 19:45:54

标题: FabuLight-ASD：通过身体语言揭示言语活动

摘要: 在多模态环境中进行主动说话者检测（ASD）对于各种应用至关重要，从视频会议到人机交互。本文介绍了FabuLight-ASD，这是一个先进的ASD模型，整合了面部、音频和身体姿势信息，以增强检测准确性和稳健性。我们的模型基于现有的Light-ASD框架，通过整合人体姿势数据（通过骨架图表示），从而减少计算开销。使用以可靠的人脸和身体边界框注释而闻名的Wilder Active Speaker Detection（WASD）数据集，我们展示了FabuLight-ASD在真实场景中的有效性。在各种具有挑战性的场景中，FabuLight-ASD的整体平均精度（mAP）达到94.3%，优于Light-ASD，在各种具有挑战性的场景中的整体mAP为93.7%。身体姿势信息的整合显示出特别有利的影响，在具有言语障碍、面部遮挡和人声背景噪音的场景中观察到mAP的显着改善。此外，效率分析表明参数数量仅略微增加（27.3%）和乘积累加操作（高达2.4%），突显了模型的效率和可行性。这些发现验证了FabuLight-ASD通过整合身体姿势数据来增强ASD性能的有效性。FabuLight-ASD的代码和模型权重可在https://github.com/knowledgetechnologyuhh/FabuLight-ASD获取。

更新时间: 2024-11-20 19:45:54

领域: cs.CV,cs.AI,cs.LG,cs.NE,cs.SD,68T20,I.2.0

下载: http://arxiv.org/abs/2411.13674v1

Can CDT rationalise the ex ante optimal policy via modified anthropics?

In Newcomb's problem, causal decision theory (CDT) recommends two-boxing and thus comes apart from evidential decision theory (EDT) and ex ante policy optimisation (which prescribe one-boxing). However, in Newcomb's problem, you should perhaps believe that with some probability you are in a simulation run by the predictor to determine whether to put a million dollars into the opaque box. If so, then causal decision theory might recommend one-boxing in order to cause the predictor to fill the opaque box. In this paper, we study generalisations of this approach. That is, we consider general Newcomblike problems and try to form reasonable self-locating beliefs under which CDT's recommendations align with an EDT-like notion of ex ante policy optimisation. We consider approaches in which we model the world as running simulations of the agent, and an approach not based on such models (which we call 'Generalised Generalised Thirding', or GGT). For each approach, we characterise the resulting CDT policies, and prove that under certain conditions, these include the ex ante optimal policies.

Updated: 2024-11-20 19:39:49

标题: CDT是否可以通过修改后的人类学来合理化事先最优政策？

摘要: 在纽康姆问题中，因果决策理论（CDT）推荐双选盒，因此与证据决策理论（EDT）和前置政策优化（建议单选盒）相分离。然而，在纽康姆问题中，您可能应该相信自己有一定概率处于由预测者运行的模拟中，以确定是否将一百万美元放入不透明盒子。如果是这样，那么因果决策理论可能会建议单选盒，以导致预测者填满不透明盒子。在本文中，我们研究了这种方法的泛化。也就是说，我们考虑类似纽康姆的问题，并尝试形成合理的自我定位信念，使得CDT的建议与类似EDT的前置政策优化概念保持一致。我们考虑了将世界建模为运行代理的模拟的方法，以及不基于这种模型的方法（我们称之为“广义广义三分法”，或GGT）。对于每种方法，我们表征了由此产生的CDT策略，并证明在某些条件下，这些策略包括前置最优策略。

更新时间: 2024-11-20 19:39:49

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2411.04462v2

Hierarchical Information-sharing Convolutional Neural Network for the Prediction of Arctic Sea Ice Concentration and Velocity

Forecasting sea ice concentration (SIC) and sea ice velocity (SIV) in the Arctic Ocean is of great significance as the Arctic environment has been changed by the recent warming climate. Given that physical sea ice models require high computational costs with complex parameterization, deep learning techniques can effectively replace the physical model and improve the performance of sea ice prediction. This study proposes a novel multi-task fully conventional network architecture named hierarchical information-sharing U-net (HIS-Unet) to predict daily SIC and SIV. Instead of learning SIC and SIV separately at each branch, we allow the SIC and SIV layers to share their information and assist each other's prediction through the weighting attention modules (WAMs). Consequently, our HIS-Unet outperforms other statistical approaches, sea ice physical models, and neural networks without such information-sharing units. The improvement of HIS-Unet is more significant to when and where SIC changes seasonally, which implies that the information sharing between SIC and SIV through WAMs helps learn the dynamic changes of SIC and SIV. The weight values of the WAMs imply that SIC information plays a more critical role in SIV prediction, compared to that of SIV information in SIC prediction, and information sharing is more active in marginal ice zones (e.g., East Greenland and Hudson/Baffin Bays) than in the central Arctic.

Updated: 2024-11-20 19:28:28

标题: 分层信息共享卷积神经网络用于预测北极海冰浓度和速度

摘要: 在近年来的气候变暖下，预测北冰洋海冰浓度（SIC）和海冰速度（SIV）具有重要意义。考虑到物理海冰模型需要高计算成本和复杂参数化，深度学习技术可以有效取代物理模型，提高海冰预测的性能。本研究提出了一种新颖的多任务全卷积网络架构，命名为分层信息共享U-net（HIS-Unet），用于预测每日SIC和SIV。我们允许SIC和SIV层通过加权注意力模块（WAMs）共享信息，并通过辅助彼此的预测。因此，我们的HIS-Unet表现优于其他统计方法、海冰物理模型和没有此类信息共享单元的神经网络。HIS-Unet的改进对SIC季节性变化更为显著，这意味着通过WAMs在SIC和SIV之间共享信息有助于学习SIC和SIV的动态变化。WAMs的权重值表明，在SIV预测中，SIC信息起着更为关键的作用，而在SIC预测中，SIV信息的作用相对较小，信息共享在边缘冰区（如东格陵兰和哈德逊/巴芬湾）比在中央北冰洋更为活跃。

更新时间: 2024-11-20 19:28:28

领域: cs.LG,cs.CV,physics.ao-ph

下载: http://arxiv.org/abs/2311.00167v2

Prediction-Guided Active Experiments

In this work, we introduce a new framework for active experimentation, the Prediction-Guided Active Experiment (PGAE), which leverages predictions from an existing machine learning model to guide sampling and experimentation. Specifically, at each time step, an experimental unit is sampled according to a designated sampling distribution, and the actual outcome is observed based on an experimental probability. Otherwise, only a prediction for the outcome is available. We begin by analyzing the non-adaptive case, where full information on the joint distribution of the predictor and the actual outcome is assumed. For this scenario, we derive an optimal experimentation strategy by minimizing the semi-parametric efficiency bound for the class of regular estimators. We then introduce an estimator that meets this efficiency bound, achieving asymptotic optimality. Next, we move to the adaptive case, where the predictor is continuously updated with newly sampled data. We show that the adaptive version of the estimator remains efficient and attains the same semi-parametric bound under certain regularity assumptions. Finally, we validate PGAE's performance through simulations and a semi-synthetic experiment using data from the US Census Bureau. The results underscore the PGAE framework's effectiveness and superiority compared to other existing methods.

Updated: 2024-11-20 19:25:33

标题: 预测导向的主动实验

摘要: 在这项工作中，我们引入了一个新的框架，用于主动实验，即预测引导的主动实验(PGAE)，它利用现有机器学习模型的预测来引导采样和实验。具体来说，在每个时间步骤，根据指定的采样分布对实验单元进行采样，并根据实验概率观察实际结果。否则，只有结果的预测可用。我们首先分析非自适应情况，假设对预测变量和实际结果的联合分布具有完全信息。对于这种情况，通过最小化常规估计器类的半参数效率界，我们推导出一个最优的实验策略。然后，我们引入一个符合这一效率界的估计器，实现渐近最优性。接下来，我们转向自适应情况，其中预测器会随着新采样数据的不断更新而更新。我们展示了估计器的自适应版本在某些规则性假设下仍然高效，并达到相同的半参数界。最后，我们通过模拟和使用来自美国人口普查局数据的半合成实验验证了PGAE的性能。结果强调了PGAE框架相对于其他现有方法的有效性和优越性。

更新时间: 2024-11-20 19:25:33

领域: stat.ML,cs.LG,econ.EM

下载: http://arxiv.org/abs/2411.12036v2

Graph neural network framework for energy mapping of hybrid monte-carlo molecular dynamics simulations of Medium Entropy Alloys

Machine learning (ML) methods have drawn significant interest in material design and discovery. Graph neural networks (GNNs), in particular, have demonstrated strong potential for predicting material properties. The present study proposes a graph-based representation for modeling medium-entropy alloys (MEAs). Hybrid Monte-Carlo molecular dynamics (MC/MD) simulations are employed to achieve thermally stable structures across various annealing temperatures in an MEA. These simulations generate dump files and potential energy labels, which are used to construct graph representations of the atomic configurations. Edges are created between each atom and its 12 nearest neighbors without incorporating explicit edge features. These graphs then serve as input for a Graph Convolutional Neural Network (GCNN) based ML model to predict the system's potential energy. The GCNN architecture effectively captures the local environment and chemical ordering within the MEA structure. The GCNN-based ML model demonstrates strong performance in predicting potential energy at different steps, showing satisfactory results on both the training data and unseen configurations. Our approach presents a graph-based modeling framework for MEAs and high-entropy alloys (HEAs), which effectively captures the local chemical order (LCO) within the alloy structure. This allows us to predict key material properties influenced by LCO in both MEAs and HEAs, providing deeper insights into how atomic-scale arrangements affect the properties of these alloys.

Updated: 2024-11-20 19:22:40

标题: 图神经网络框架用于中熵合金混合蒙特卡洛分子动力学模拟的能量映射

摘要: 机器学习（ML）方法在材料设计和发现中引起了极大的兴趣。图神经网络（GNNs）特别表现出对预测材料性质具有很强潜力。本研究提出了一种基于图的表示方法，用于建模中熵合金（MEAs）。采用混合蒙特卡罗分子动力学（MC/MD）模拟来实现在MEAs中不同退火温度下的热稳定结构。这些模拟生成转储文件和势能标签，用于构建原子配置的图表示。在不包含显式边特征的情况下，为每个原子及其12个最近邻之间创建边。然后，这些图作为输入用于基于图卷积神经网络（GCNN）的ML模型，用于预测系统的势能。GCNN架构有效捕获了MEA结构中的局部环境和化学排序。基于GCNN的ML模型在不同步骤预测势能方面表现出很强的性能，对训练数据和未见配置都显示出令人满意的结果。我们的方法提出了一个基于图的建模框架，可以有效捕捉合金结构中的局部化学顺序（LCO），从而预测MEAs和HEAs中受LCO影响的关键材料性质，为我们提供更深入的洞察力，了解原子尺度排列如何影响这些合金的性质。

更新时间: 2024-11-20 19:22:40

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2411.13670v1

Watermark-based Attribution of AI-Generated Content

Several companies have deployed watermark-based detection to identify AI-generated content. However, attribution--the ability to trace back to the user of a generative AI (GenAI) service who created a given piece of AI-generated content--remains largely unexplored despite its growing importance. In this work, we aim to bridge this gap by conducting the first systematic study on watermark-based, user-level attribution of AI-generated content. Our key idea is to assign a unique watermark to each user of the GenAI service and embed this watermark into the AI-generated content created by that user. Attribution is then performed by identifying the user whose watermark best matches the one extracted from the given content. This approach, however, faces a key challenge: How should watermarks be selected for users to maximize attribution performance? To address the challenge, we first theoretically derive lower bounds on detection and attribution performance through rigorous probabilistic analysis for any given set of user watermarks. Then, we select watermarks for users to maximize these lower bounds, thereby optimizing detection and attribution performance. Our theoretical and empirical results show that watermark-based attribution inherits both the accuracy and (non-)robustness properties of the underlying watermark. Specifically, attribution remains highly accurate when the watermarked AI-generated content is either not post-processed or subjected to common post-processing such as JPEG compression, as well as black-box adversarial post-processing with limited query budgets.

Updated: 2024-11-20 19:17:53

标题: 基于水印的人工智能生成内容溯源

摘要: 几家公司已经部署了基于水印的检测技术来识别人工智能生成的内容。然而，归因——即追溯生成式人工智能（GenAI）服务的用户创建的特定人工智能生成内容的能力——尽管其重要性不断增长，仍然在很大程度上未被探索。在这项工作中，我们旨在通过进行第一次系统研究水印技术在用户级别对人工智能生成内容进行归因来弥补这一空白。我们的关键思想是为GenAI服务的每个用户分配一个唯一的水印，并将该水印嵌入到该用户创建的人工智能生成内容中。然后通过识别从给定内容中提取出的水印最匹配的用户来进行归因。然而，这种方法面临一个关键挑战：如何选择水印以最大化用户的归因性能？为了解决这一挑战，我们首先通过严格的概率分析在任何给定的用户水印集合上推导出检测和归因性能的下界。然后，我们选择水印以最大化这些下界，从而优化检测和归因性能。我们的理论和实证结果表明，基于水印的归因继承了底层水印的准确性和（非）鲁棒性特性。具体来说，在水印过的人工智能生成内容未经后处理或经过常见的后处理（如JPEG压缩）以及黑盒对抗性后处理时，归因仍然保持高度准确。

更新时间: 2024-11-20 19:17:53

领域: cs.CR,cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.04254v3

Knowledge Transfer for Cross-Domain Reinforcement Learning: A Systematic Review

Reinforcement Learning (RL) provides a framework in which agents can be trained, via trial and error, to solve complex decision-making problems. Learning with little supervision causes RL methods to require large amounts of data, rendering them too expensive for many applications (e.g., robotics). By reusing knowledge from a different task, knowledge transfer methods present an alternative to reduce the training time in RL. Given the severe data scarcity, due to their flexibility, there has been a growing interest in methods capable of transferring knowledge across different domains (i.e., problems with different representations). However, identifying similarities and adapting knowledge across tasks from different domains requires matching their representations or finding domain-invariant features. These processes can be data-demanding, which poses the main challenge in cross-domain knowledge transfer: to select and transform knowledge in a data-efficient way, such that it accelerates learning in the target task, despite the presence of significant differences across problems (e.g., robots with distinct morphologies). Thus, this review presents a unifying analysis of methods focused on transferring knowledge across different domains. Through a taxonomy based on a transfer-approach categorization and a characterization of works based on their data-assumption requirements, the contributions of this article are 1) a comprehensive and systematic revision of knowledge transfer methods for the cross-domain RL setting, 2) a categorization and characterization of such methods to provide an analysis based on relevant features such as their transfer approach and data requirements, and 3) a discussion on the main challenges regarding cross-domain knowledge transfer, as well as on ideas of future directions worth exploring to address these problems.

Updated: 2024-11-20 19:02:48

标题: 跨领域强化学习的知识传递：一项系统性综述

摘要: 强化学习（RL）提供了一个框架，通过试错的方式，代理可以被训练来解决复杂的决策问题。由于学习过程中监督较少，RL方法需要大量的数据，使得它们对许多应用（例如机器人领域）来说过于昂贵。通过从不同任务中重新利用知识，知识转移方法提供了一种减少RL训练时间的替代方案。由于数据稀缺性严重，因此对于能够在不同领域之间转移知识的方法越来越受到关注（即不同表示的问题）。然而，识别相似性并对来自不同领域的任务进行知识适应需要匹配它们的表示或找到域不变特征。这些过程可能需要大量数据，这构成了跨领域知识转移的主要挑战：以数据有效的方式选择和转换知识，以便加速目标任务的学习，尽管存在显著的问题差异（例如，具有不同形态的机器人）。因此，本综述提供了一个关于不同领域之间知识转移方法的统一分析。通过基于转移方法分类和基于数据假设要求对工作进行表征的分类法，本文的贡献包括1）对于跨领域RL设置的知识转移方法的全面系统的修订，2）对这些方法进行分类和表征以便根据相关特征（如转移方法和数据需求）进行分析，以及3）对于跨领域知识转移的主要挑战进行讨论，以及对于未来探索值得解决这些问题的方向进行思考。

更新时间: 2024-11-20 19:02:48

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.17687v2

In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies

We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously known, namely in R\'enyi divergence (which implies TV, $\mathcal{W}_2$, KL, $\chi^2$). The proof departs from known approaches for polytime algorithms for the problem -- we utilize a stochastic diffusion perspective to show contraction to the target distribution with the rate of convergence determined by functional isoperimetric constants of the stationary density.

Updated: 2024-11-20 19:01:42

标题: 进出：用于凸体抽样的算法扩散

摘要: 我们提出了一种新的随机游走算法，用于均匀采样高维凸体。它在输出方面的保证比以往所知的更强，具有最先进的运行时复杂度，即在Rényi分歧（这意味着TV、W2、KL、χ2）方面。证明与以往用于问题的多项式时间算法的方法不同 - 我们利用随机扩散的视角来显示收敛到目标分布，收敛速度由稳态密度的函数等周常数确定。

更新时间: 2024-11-20 19:01:42

领域: cs.DS,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.01425v2

No Free Delivery Service: Epistemic limits of passive data collection in complex social systems

Rapid model validation via the train-test paradigm has been a key driver for the breathtaking progress in machine learning and AI. However, modern AI systems often depend on a combination of tasks and data collection practices that violate all assumptions ensuring test validity. Yet, without rigorous model validation we cannot ensure the intended outcomes of deployed AI systems, including positive social impact, nor continue to advance AI research in a scientifically sound way. In this paper, I will show that for widely considered inference settings in complex social systems the train-test paradigm does not only lack a justification but is indeed invalid for any risk estimator, including counterfactual and causal estimators, with high probability. These formal impossibility results highlight a fundamental epistemic issue, i.e., that for key tasks in modern AI we cannot know whether models are valid under current data collection practices. Importantly, this includes variants of both recommender systems and reasoning via large language models, and neither na\"ive scaling nor limited benchmarks are suited to address this issue. I am illustrating these results via the widely used MovieLens benchmark and conclude by discussing the implications of these results for AI in social systems, including possible remedies such as participatory data curation and open science.

Updated: 2024-11-20 19:01:03

标题: 没有免费的投递服务：被动数据收集在复杂社会系统中的认识限制

摘要: 通过训练-测试范式快速验证模型已成为推动机器学习和人工智能取得令人瞩目进展的关键因素。然而，现代人工智能系统往往依赖于一系列任务和数据收集实践的组合，违背了确保测试有效性的所有假设。然而，如果没有严格的模型验证，我们无法确保部署的人工智能系统的预期结果，包括积极的社会影响，也无法继续以科学合理的方式推进人工智能研究。在本文中，我将展示，在复杂社会系统中被广泛考虑的推断设置中，训练-测试范式不仅缺乏理据，而且对于任何风险估计器，包括反事实和因果估计器，都是无效的，概率极高。这些形式上的不可能结果凸显了一个基本的认识问题，即对于现代人工智能中的关键任务，我们无法确定模型在当前数据收集实践下是否有效。重要的是，这包括推荐系统和通过大型语言模型推理的变体，既天真的扩展也受限的基准测试都不适合解决这个问题。我通过广泛使用的MovieLens基准测试来说明这些结果，并最后讨论了这些结果对社会系统中人工智能的影响，包括可能的治疗方法，如参与式数据管理和开放科学。

更新时间: 2024-11-20 19:01:03

领域: cs.AI,stat.ML,62A01,G.3; I.2.0

下载: http://arxiv.org/abs/2411.13653v1

AI-generated Image Detection: Passive or Watermark?

While text-to-image models offer numerous benefits, they also pose significant societal risks. Detecting AI-generated images is crucial for mitigating these risks. Detection methods can be broadly categorized into passive and watermark-based approaches: passive detectors rely on artifacts present in AI-generated images, whereas watermark-based detectors proactively embed watermarks into such images. A key question is which type of detector performs better in terms of effectiveness, robustness, and efficiency. However, the current literature lacks a comprehensive understanding of this issue. In this work, we aim to bridge that gap by developing ImageDetectBench, the first comprehensive benchmark to compare the effectiveness, robustness, and efficiency of passive and watermark-based detectors. Our benchmark includes four datasets, each containing a mix of AI-generated and non-AI-generated images. We evaluate five passive detectors and four watermark-based detectors against eight types of common perturbations and three types of adversarial perturbations. Our benchmark results reveal several interesting findings. For instance, watermark-based detectors consistently outperform passive detectors, both in the presence and absence of perturbations. Based on these insights, we provide recommendations for detecting AI-generated images, e.g., when both types of detectors are applicable, watermark-based detectors should be the preferred choice.

Updated: 2024-11-20 18:59:58

标题: 人工智能生成的图像检测：被动还是水印？

摘要: 尽管文本到图像模型提供了许多好处，但它们也带来了重大的社会风险。检测AI生成的图像对于减轻这些风险至关重要。检测方法可以广泛分类为被动和基于水印的方法：被动检测器依赖于AI生成的图像中存在的痕迹，而基于水印的检测器主动将水印嵌入到这些图像中。一个关键问题是哪种类型的检测器在有效性、鲁棒性和效率方面表现更好。然而，目前的文献缺乏对这个问题的全面理解。在这项工作中，我们旨在填补这一空白，通过开发ImageDetectBench，这是第一个用于比较被动和基于水印的检测器在有效性、鲁棒性和效率方面的全面基准。我们的基准包括四个数据集，每个数据集都包含一组AI生成和非AI生成的图像。我们对五种被动检测器和四种基于水印的检测器进行评估，针对八种常见扰动和三种对抗性扰动。我们的基准结果揭示了一些有趣的发现。例如，基于水印的检测器在存在和不存在扰动的情况下都始终优于被动检测器。基于这些见解，我们提供了检测AI生成图像的建议，例如，在两种类型的检测器都适用时，基于水印的检测器应该是首选。

更新时间: 2024-11-20 18:59:58

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.13553v1

Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning

Drawing inspiration from human learning behaviors, this work proposes a novel approach to mitigate catastrophic forgetting in Prompt-based Continual Learning models by exploiting the relationships between continuously emerging class data. We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models. Specifically, by building a hierarchical tree structure based on the expanding set of labels, we gain fresh insights into the data, identifying groups of similar classes could easily cause confusion. Additionally, we delve deeper into the hidden connections between classes by exploring the original pretrained model's behavior through an optimal transport-based approach. From these insights, we propose a novel regularization loss function that encourages models to focus more on challenging knowledge areas, thereby enhancing overall performance. Experimentally, our method demonstrated significant superiority over the most robust state-of-the-art models on various benchmarks.

Updated: 2024-11-20 18:59:23

标题: 利用分层分类法在基于提示的持续学习中发挥作用

摘要: 这项工作从人类学习行为中汲取灵感，提出了一种新颖的方法，通过利用不断出现的类数据之间的关系来减轻Prompt-based Continual Learning模型中的灾难性遗忘。我们发现，应用人类组织和连接信息的习惯可以作为训练深度学习模型时的有效策略。具体来说，通过基于不断扩大的标签集构建层次树结构，我们对数据获得了新的洞察，识别出类似类别可能会容易引起混淆。此外，我们通过探索原始预训练模型的行为，通过基于最优传输的方法深入挖掘类别之间的隐藏连接。基于这些洞察，我们提出了一种新颖的正则化损失函数，鼓励模型更多地关注具有挑战性的知识领域，从而提高整体性能。在实验中，我们的方法在各种基准测试中表现出显著的优越性，超过了最强大的最新模型。

更新时间: 2024-11-20 18:59:23

领域: cs.LG

下载: http://arxiv.org/abs/2410.04327v2

HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution

Although recent diffusion-based single-step super-resolution methods achieve better performance as compared to SinSR, they are computationally complex. To improve the performance of SinSR, we investigate preserving the high-frequency detail features during super-resolution (SR) because the downgraded images lack detailed information. For this purpose, we introduce a high-frequency perceptual loss by utilizing an invertible neural network (INN) pretrained on the ImageNet dataset. Different feature maps of pretrained INN produce different high-frequency aspects of an image. During the training phase, we impose to preserve the high-frequency features of super-resolved and ground truth (GT) images that improve the SR image quality during inference. Furthermore, we also utilize the Jenson-Shannon divergence between GT and SR images in the pretrained DINO-v2 embedding space to match their distribution. By introducing the $\textbf{h}igh$- $\textbf{f}requency$ preserving loss and distribution matching constraint in the single-step $\textbf{diff}usion-based$ SR ($\textbf{HF-Diff}$), we achieve a state-of-the-art CLIPIQA score in the benchmark RealSR, RealSet65, DIV2K-Val, and ImageNet datasets. Furthermore, the experimental results in several datasets demonstrate that our high-frequency perceptual loss yields better SR image quality than LPIPS and VGG-based perceptual losses. Our code will be released at https://github.com/shoaib-sami/HF-Diff.

Updated: 2024-11-20 18:56:24

标题: HF-Diff：一步扩散式图像超分辨率的高频感知损失和分布匹配

摘要: 尽管最近基于扩散的单步超分辨率方法比SinSR实现了更好的性能，但它们在计算上非常复杂。为了改善SinSR的性能，我们研究在超分辨率（SR）过程中保留高频细节特征，因为降级的图像缺乏详细信息。为此，我们引入了一个利用在ImageNet数据集上预训练的可逆神经网络（INN）的高频感知损失。预训练INN的不同特征图产生图像的不同高频方面。在训练阶段，我们强调保留超分辨率和地面真实（GT）图像的高频特征，以在推断期间提高SR图像质量。此外，我们还利用预训练的DINO-v2嵌入空间中GT和SR图像之间的Jenson-Shannon散度来匹配它们的分布。通过在单步基于扩散的SR（HF-Diff）中引入高频保持损失和分布匹配约束，我们在基准RealSR、RealSet65、DIV2K-Val和ImageNet数据集中实现了最新的CLIPIQA得分。此外，多个数据集中的实验结果表明，我们的高频感知损失比LPIPS和基于VGG的感知损失产生更好的SR图像质量。我们的代码将在https://github.com/shoaib-sami/HF-Diff发布。

更新时间: 2024-11-20 18:56:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.13548v1

SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs

Evaluating the output of Large Language Models (LLMs) is one of the most critical aspects of building a performant compound AI system. Since the output from LLMs propagate to downstream steps, identifying LLM errors is crucial to system performance. A common task for LLMs in AI systems is tool use. While there are several benchmark environments for evaluating LLMs on this task, they typically only give a success rate without any explanation of the failure cases. To solve this problem, we introduce SpecTool, a new benchmark to identify error patterns in LLM output on tool-use tasks. Our benchmark data set comprises of queries from diverse environments that can be used to test for the presence of seven newly characterized error patterns. Using SPECTOOL , we show that even the most prominent LLMs exhibit these error patterns in their outputs. Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.

Updated: 2024-11-20 18:56:22

标题: SpecTool：用于表征工具使用LLMs中错误的基准测试

摘要: 评估大型语言模型（LLMs）的输出是构建高性能复合人工智能系统的最关键方面之一。由于LLMs的输出会传播到下游步骤，因此识别LLMs错误对系统性能至关重要。在人工智能系统中，LLMs的常见任务是工具使用。虽然有几个基准环境可用于评估LLMs在此任务上的表现，但它们通常只给出成功率，没有任何对失败案例的解释。为解决这一问题，我们引入了SpecTool，这是一个新的基准测试，用于识别LLM输出在工具使用任务中的错误模式。我们的基准数据集包含来自各种环境的查询，可用于测试七种新的错误模式的存在。使用SPECTOOL，我们展示即使是最突出的LLMs在其输出中也会表现出这些错误模式。研究人员可以利用SPECTOOL的分析和见解来指导他们的错误缓解策略。

更新时间: 2024-11-20 18:56:22

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.13547v1

Promoting User Data Autonomy During the Dissolution of a Monopolistic Firm

The deployment of AI in consumer products is currently focused on the use of so-called foundation models, large neural networks pre-trained on massive corpora of digital records. This emphasis on scaling up datasets and pre-training computation raises the risk of further consolidating the industry, and enabling monopolistic (or oligopolistic) behavior. Judges and regulators seeking to improve market competition may employ various remedies. This paper explores dissolution -- the breaking up of a monopolistic entity into smaller firms -- as one such remedy, focusing in particular on the technical challenges and opportunities involved in the breaking up of large models and datasets. We show how the framework of Conscious Data Contribution can enable user autonomy during under dissolution. Through a simulation study, we explore how fine-tuning and the phenomenon of "catastrophic forgetting" could actually prove beneficial as a type of machine unlearning that allows users to specify which data they want used for what purposes.

Updated: 2024-11-20 18:55:51

标题: 在垄断性公司解散期间促进用户数据自治

摘要: 目前，人工智能在消费产品中的应用主要集中在使用所谓的基础模型，即在大规模数字记录语料库上进行预训练的大型神经网络。这种对数据集扩大和预训练计算的强调增加了进一步巩固行业并实现垄断（或寡头垄断）行为的风险。寻求改善市场竞争的法官和监管机构可以采用各种补救措施。本文探讨了解散——将垄断实体拆分为较小公司——作为一种此类补救措施，特别关注在拆分大型模型和数据集中涉及的技术挑战和机会。我们展示了自觉数据贡献框架如何在解散过程中实现用户自主权。通过模拟研究，我们探讨了微调和“灾难性遗忘”现象实际上可能证明有益，作为一种机器遗忘的类型，允许用户指定他们希望用于何种目的的数据。

更新时间: 2024-11-20 18:55:51

领域: cs.LG

下载: http://arxiv.org/abs/2411.13546v1

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Large Language Models (LLMs) and Vision Language Models (VLMs) possess extensive knowledge and exhibit promising reasoning abilities; however, they still struggle to perform well in complex, dynamic environments. Real-world tasks require handling intricate interactions, advanced spatial reasoning, long-term planning, and continuous exploration of new strategies-areas in which we lack effective methodologies for comprehensively evaluating these capabilities. To address this gap, we introduce BALROG, a novel benchmark designed to assess the agentic capabilities of LLMs and VLMs through a diverse set of challenging games. Our benchmark incorporates a range of existing reinforcement learning environments with varying levels of difficulty, including tasks that are solvable by non-expert humans in seconds to extremely challenging ones that may take years to master (e.g., the NetHack Learning Environment). We devise fine-grained metrics to measure performance and conduct an extensive evaluation of several popular open-source and closed-source LLMs and VLMs. Our findings indicate that while current models achieve partial success in the easier games, they struggle significantly with more challenging tasks. Notably, we observe severe deficiencies in vision-based decision-making, as models perform worse when visual representations of the environments are provided. We release BALROG as an open and user-friendly benchmark to facilitate future research and development in the agentic community.

Updated: 2024-11-20 18:54:32

标题: 巴尔罗格：在游戏中对主体型LLM和VLM推理进行基准测试

摘要: 大型语言模型（LLMs）和视觉语言模型（VLMs）拥有广泛的知识并展现出有希望的推理能力；然而，它们仍然在复杂、动态环境中表现出困难。现实世界中的任务需要处理复杂的交互、高级空间推理、长期规划和持续探索新策略，这些领域我们缺乏全面评估这些能力的有效方法。为了弥补这一差距，我们引入了BALROG，一个旨在通过一系列具有挑战性的游戏来评估LLMs和VLMs代理能力的新基准。我们的基准包括一系列现有的强化学习环境，难度各不相同，包括一些可以由非专家在几秒钟内解决的任务，到一些可能需要多年才能掌握的极具挑战性的任务（例如NetHack学习环境）。我们设计了细粒度的指标来衡量性能，并对几种流行的开源和闭源LLMs和VLMs进行了广泛评估。我们的研究结果表明，当前的模型在较容易的游戏中取得了部分成功，但在更具挑战性的任务中表现出明显困难。值得注意的是，我们观察到在基于视觉的决策制定中存在严重的不足，因为当提供环境的视觉表示时，模型的性能更差。我们将BALROG作为一个开放且用户友好的基准发布，以促进代理社区未来的研究和发展。

更新时间: 2024-11-20 18:54:32

领域: cs.AI

下载: http://arxiv.org/abs/2411.13543v1

The Role of Accuracy and Validation Effectiveness in Conversational Business Analytics

This study examines conversational business analytics, an approach that utilizes AI to address the technical competency gaps that hinder end users from effectively using traditional self-service analytics. By facilitating natural language interactions, conversational business analytics aims to empower end users to independently retrieve data and generate insights. The analysis focuses on Text-to-SQL as a representative technology for translating natural language requests into SQL statements. Developing theoretical models grounded in expected utility theory, the study identifies conditions under which conversational business analytics, through partial or full support, can outperform delegation to human experts. The results indicate that partial support, focusing solely on information generation by AI, is viable when the accuracy of AI-generated SQL queries leads to a profit that surpasses the performance of a human expert. In contrast, full support includes not only information generation but also validation through explanations provided by the AI, and requires sufficiently high validation effectiveness to be reliable. However, user-based validation presents challenges, such as misjudgment and rejection of valid SQL queries, which may limit the effectiveness of conversational business analytics. These challenges underscore the need for robust validation mechanisms, including improved user support, automated processes, and methods for assessing quality independently of end users' technical competencies.

Updated: 2024-11-20 18:46:13

标题: 对话式商业分析中准确性和验证有效性的作用

摘要: 这项研究探讨了对话式商业分析，这是一种利用人工智能解决技术能力差距的方法，这些差距妨碍最终用户有效地使用传统的自助式分析。通过促进自然语言交互，对话式商业分析旨在赋予最终用户独立检索数据并生成见解的能力。分析重点放在以Text-to-SQL作为代表技术来将自然语言请求转换为SQL语句。通过制定基于预期效用理论的理论模型，研究确定了对话式商业分析在哪些条件下，通过部分或完全支持，可以超越委托给人类专家的表现。结果表明，仅侧重于AI生成的信息生成的部分支持在AI生成的SQL查询的准确性导致利润超过人类专家表现时是可行的。相反，完全支持不仅包括信息生成，还包括通过AI提供的解释进行验证，并且需要足够高的验证效果才能可靠。然而，基于用户的验证存在挑战，例如对有效的SQL查询的错误判断和拒绝，这可能限制对话式商业分析的效果。这些挑战突显了稳健的验证机制的必要性，包括改进用户支持、自动化流程和独立于最终用户技术能力的质量评估方法。

更新时间: 2024-11-20 18:46:13

领域: cs.AI,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2411.12128v2

Metacognition for Unknown Situations and Environments (MUSE)

Metacognition--the awareness and regulation of one's cognitive processes--is central to human adaptability in unknown situations. In contrast, current autonomous agents often struggle in novel environments due to their limited capacity for adaptation. We hypothesize that metacognition is a critical missing ingredient in adaptive autonomous systems, equipping them with the cognitive flexibility needed to tackle unfamiliar challenges. Given the broad scope of metacognitive abilities, we focus on two key aspects: competence awareness and strategy selection for novel tasks. To this end, we propose the Metacognition for Unknown Situations and Environments (MUSE) framework, which integrates metacognitive processes--specifically self-awareness and self-regulation--into autonomous agents. We present two initial implementations of MUSE: one based on world modeling and another leveraging large language models (LLMs), both instantiating the metacognitive cycle. Our system continuously learns to assess its competence on a given task and uses this self-awareness to guide iterative cycles of strategy selection. MUSE agents show significant improvements in self-awareness and self-regulation, enabling them to solve novel, out-of-distribution tasks more effectively compared to Dreamer-v3-based reinforcement learning and purely prompt-based LLM agent approaches. This work highlights the promise of approaches inspired by cognitive and neural systems in enabling autonomous systems to adapt to new environments, overcoming the limitations of current methods that rely heavily on extensive training data.

Updated: 2024-11-20 18:41:03

标题: 未知情境和环境的元认知(Metacognition for Unknown Situations and Environments，MUSE)

摘要: 元认知——对自己的认知过程的意识和调节——对于人类在未知情况下的适应能力至关重要。相比之下，当前的自主代理在新环境中往往难以适应，这是由于它们适应能力有限。我们假设元认知是自适应自主系统中关键缺失的要素，它赋予了它们处理陌生挑战所需的认知灵活性。鉴于元认知能力的广泛范围，我们专注于两个关键方面：对新任务的能力意识和策略选择。为此，我们提出了适用于未知情况和环境的元认知框架(MUSE)，该框架将元认知过程——特别是自我意识和自我调节——整合到自主代理中。我们提出了MUSE的两个初始实现：一个基于世界建模，另一个利用大型语言模型(LLM)，两者都实例化了元认知循环。我们的系统不断学习评估其在特定任务上的能力，并利用这种自我意识来引导策略选择的迭代循环。MUSE代理在自我意识和自我调节方面显示出显著改进，使它们能够更有效地解决新的、分布之外的任务，与基于Dreamer-v3的强化学习和纯粹基于提示的LLM代理方法相比。这项工作突显了受认知和神经系统启发的方法在使自主系统适应新环境方面的潜力，克服了目前依赖大量训练数据的方法的局限性。

更新时间: 2024-11-20 18:41:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.13537v1

Identity Preserving 3D Head Stylization with Multiview Score Distillation

3D head stylization transforms realistic facial features into artistic representations, enhancing user engagement across gaming and virtual reality applications. While 3D-aware generators have made significant advancements, many 3D stylization methods primarily provide near-frontal views and struggle to preserve the unique identities of original subjects, often resulting in outputs that lack diversity and individuality. This paper addresses these challenges by leveraging the PanoHead model, synthesizing images from a comprehensive 360-degree perspective. We propose a novel framework that employs negative log-likelihood distillation (LD) to enhance identity preservation and improve stylization quality. By integrating multi-view grid score and mirror gradients within the 3D GAN architecture and introducing a score rank weighing technique, our approach achieves substantial qualitative and quantitative improvements. Our findings not only advance the state of 3D head stylization but also provide valuable insights into effective distillation processes between diffusion models and GANs, focusing on the critical issue of identity preservation. Please visit the https://three-bee.github.io/head_stylization for more visuals.

Updated: 2024-11-20 18:37:58

标题: 用多视角得分蒸馏实现保持身份的3D头部风格化

摘要: 3D头部风格化将逼真的面部特征转化为艺术表现，增强了用户在游戏和虚拟现实应用中的参与度。尽管3D感知生成器取得了显著进展，但许多3D风格化方法主要提供近正面视图，并且难以保留原始主体的独特身份特征，常常导致输出缺乏多样性和个性化。本文通过利用PanoHead模型，从全面的360度视角合成图像，解决了这些挑战。我们提出了一个新颖的框架，利用负对数似然蒸馏（LD）来增强身份保留并改善风格化质量。通过在3D GAN架构中整合多视图网格评分和镜像梯度，并引入评分排名加权技术，我们的方法实现了实质性的定性和定量改进。我们的发现不仅推动了3D头部风格化的发展，还为扩散模型和GAN之间的有效蒸馏过程提供了宝贵的见解，重点关注身份保留的关键问题。请访问https://three-bee.github.io/head_stylization获取更多可视化内容。

更新时间: 2024-11-20 18:37:58

领域: cs.CV,cs.AI,cs.GR,cs.LG,cs.MM

下载: http://arxiv.org/abs/2411.13536v1

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Computational models of syntax are predominantly text-based. Here we propose that the most basic first step in the evolution of syntax can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary suboperation of syntax -- concatenation. We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the input. We replicate this finding in several independently trained models with different hyperparameters and training data. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. We also show that the concatenated outputs contain precursors to compositionality. To our knowledge, this is a previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on raw speech and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution in the brain from raw acoustic inputs. We also propose a potential neural mechanism called disinhibition that outlines a possible neural pathway towards concatenation and compositionality and suggests our modeling is useful for generating testable prediction for biological and artificial neural processing of speech.

Updated: 2024-11-20 18:30:49

标题: 语音中的基本语法：无监督深度神经网络中的自发串联

摘要: 语法的计算模型主要是基于文本的。在这里，我们提出语法演化中最基本的第一步可以通过完全无监督的方式直接从原始语音建模。我们专注于语法最普遍和基本的子操作之一--连接。我们引入了自发连接：这是一个现象，卷积神经网络(CNNs)在个别单词的声音录音上进行训练后，开始生成两个甚至三个单词连接的输出，而从未访问包含多个单词的数据。我们在几个独立训练的具有不同超参数和训练数据的模型中复制了这一发现。此外，经过两个单词训练的网络学会将单词嵌入到新颖的未观察到的单词组合中。我们还展示了连接输出包含构成性的前体。据我们所知，这是CNNs在ciwGAN/fiwGAN设置下在原始语音上训练时的一个先前未报告的特性，这对我们理解这些架构如何学习以及对从原始声学输入中的语法及其演化进行建模都具有影响。我们还提出了一个潜在的神经机制，称为去抑制，概述了连接和构成性的可能神经通路，并建议我们的建模对生成对生物和人工神经处理语音的可测试预测是有用的。

更新时间: 2024-11-20 18:30:49

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2305.01626v3

Retrieval with Learned Similarities

Retrieval plays a fundamental role in recommendation systems, search, and natural language processing (NLP) by efficiently finding relevant items from a large corpus given a query. Dot products have been widely used as the similarity function in such tasks, enabled by Maximum Inner Product Search (MIPS) algorithms for efficient retrieval. However, state-of-the-art retrieval algorithms have migrated to learned similarities. These advanced approaches encompass multiple query embeddings, complex neural networks, direct item ID decoding via beam search, and hybrid solutions. Unfortunately, we lack efficient solutions for retrieval in these state-of-the-art setups. Our work addresses this gap by investigating efficient retrieval techniques with expressive learned similarity functions. We establish Mixture-of-Logits (MoL) as a universal approximator of similarity functions, demonstrate that MoL's expressiveness can be realized empirically to achieve superior performance on diverse retrieval scenarios, and propose techniques to retrieve the approximate top-k results using MoL with tight error bounds. Through extensive experimentation, we show that MoL, enhanced by our proposed mutual information-based load balancing loss, sets new state-of-the-art results across heterogeneous scenarios, including sequential retrieval models in recommendation systems and finetuning language models for question answering; and our approximate top-$k$ algorithms outperform baselines by up to 66x in latency while achieving >.99 recall rate compared to exact algorithms.

Updated: 2024-11-20 18:30:19

标题: 使用学习到的相似性进行检索

摘要: 检索在推荐系统、搜索和自然语言处理（NLP）中发挥着基础作用，通过高效地从大语料库中找到相关项目来处理查询。点积在这些任务中被广泛用作相似性函数，借助最大内积搜索（MIPS）算法实现高效的检索。然而，最先进的检索算法已经转向学习相似性。这些先进方法包括多个查询嵌入、复杂的神经网络、通过波束搜索直接解码项目ID和混合解决方案。不幸的是，我们缺乏这些最先进设置中的高效检索解决方案。我们的工作通过研究具有表达性学习相似性函数的高效检索技术来填补这一空白。我们将Logits混合（MoL）确定为相似性函数的通用逼近器，并证明MoL的表达力可以在实验中实现，从而在各种检索场景中实现卓越性能，并提出使用MoL检索近似前k个结果的技术并具有严格的误差界。通过大量实验，我们展示MoL，通过我们提出的基于互信息的负载平衡损失增强，实现了跨异构场景的新的最先进结果，包括推荐系统中的顺序检索模型和针对问答问题的微调语言模型；我们的近似前k个算法在延迟方面表现比基准提高了高达66倍，同时实现了比精确算法更高的>.99召回率。

更新时间: 2024-11-20 18:30:19

领域: cs.IR,cs.DB,cs.DS,cs.LG

下载: http://arxiv.org/abs/2407.15462v3

Delegating Data Collection in Decentralized Machine Learning

Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental information asymmetries that arise in decentralized ML: uncertainty in the assessment of model quality and uncertainty regarding the optimal performance of any model. We show that a principal can cope with such asymmetry via simple linear contracts that achieve 1-1/e fraction of the optimal utility. To address the lack of a priori knowledge regarding the optimal performance, we give a convex program that can adaptively and efficiently compute the optimal contract. We also study linear contracts and derive the optimal utility in the more complex setting of multiple interactions.

Updated: 2024-11-20 18:26:03

标题: 将数据收集委托给分散式机器学习

摘要: 受到去中心化机器学习生态系统的出现的启发，我们研究了数据收集的委托。以契约理论领域为起点，我们设计了处理去中心化机器学习中出现的两种基本信息不对称的最优和接近最优契约：模型质量评估的不确定性和关于任何模型的最佳性能的不确定性。我们展示了委托人可以通过简单的线性合同来应对这种不对称，这些合同可以实现最优效用的1-1/e比例。为了解决关于最佳性能的先验知识的缺乏，我们提供了一个凸规划，可以自适应地和高效地计算出最优合同。我们还研究了线性合同，并在更复杂的多次交互设置中推导出最优效用。

更新时间: 2024-11-20 18:26:03

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2309.01837v3

Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms

We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States ($B$-$DES$). The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short-term reward of the action chosen and how "healthy" the system is (i.e., as measured by its state). For example, in recommendation systems, the reward that the platform obtains from a user's engagement with a particular type of content depends not only on the inherent features of the specific content, but also on how the user's preferences have evolved as a result of interacting with other types of content on the platform. Our general model accounts for the different rate $\lambda \in [0,1]$ at which the state evolves (e.g., how fast a user's preferences shift as a result of previous content consumption) and encompasses standard multi-armed bandits as a special case. The goal of the algorithm is to minimize a notion of regret against the best-fixed sequence of arms pulled, which is significantly harder to attain compared to standard benchmark of the best-fixed action in hindsight. We present online learning algorithms for any possible value of the evolution rate $\lambda$ and we show the robustness of our results to various model misspecifications.

Updated: 2024-11-20 18:25:16

标题: 偏好发生变化，因此您的强盗也应该发生变化：具有不断演化状态的强盗用于在线平台

摘要: 我们提出了一种学习模型，用于考虑确定性演变和不可观测状态的赌博反馈，我们称之为带确定性演变状态的赌博模型（$B$-$DES$）。我们模型的主要应用是推荐系统和在线广告学习。在这两种情况下，算法在每一轮获得的奖励取决于所选动作的短期奖励以及系统的“健康程度”（即，根据其状态测量）。例如，在推荐系统中，平台从用户与特定类型内容的互动中获得的奖励不仅取决于特定内容的固有特征，还取决于用户通过与平台上其他类型内容互动而演变的偏好。我们的一般模型考虑状态演变的不同速率$\lambda \in [0,1]$（例如，用户偏好随先前内容消费的速度变化有多快），并将标准多臂赌博问题作为特殊情况。算法的目标是最小化对最佳固定动作序列的后悔概念，与标准基准中最佳固定动作的情况相比，这显著更难实现。我们提出了针对任何可能的演变速率$\lambda$的在线学习算法，并展示了我们的结果对各种模型错误规定的鲁棒性。

更新时间: 2024-11-20 18:25:16

领域: cs.LG,cs.AI,cs.GT

下载: http://arxiv.org/abs/2307.11655v4

Entropy Bootstrapping for Weakly Supervised Nuclei Detection

Microscopy structure segmentation, such as detecting cells or nuclei, generally requires a human to draw a ground truth contour around each instance. Weakly supervised approaches (e.g. consisting of only single point labels) have the potential to reduce this workload significantly. Our approach uses individual point labels for an entropy estimation to approximate an underlying distribution of cell pixels. We infer full cell masks from this distribution, and use Mask-RCNN to produce an instance segmentation output. We compare this point--annotated approach with training on the full ground truth masks. We show that our method achieves a comparatively good level of performance, despite a 95% reduction in pixel labels.

Updated: 2024-11-20 18:24:11

标题: 熵引导的弱监督核检测

摘要: 显微镜结构分割，如检测细胞或细胞核，通常需要人工在每个实例周围绘制一个地面真实轮廓。弱监督方法（例如仅包含单点标签）有潜力显著减少这种工作量。我们的方法使用单个点标签进行熵估计，以近似细胞像素的基础分布。我们从这个分布推断出完整的细胞掩膜，并使用Mask-RCNN生成实例分割输出。我们将这种点标注方法与在完整地面真实掩膜上训练进行比较。我们展示了尽管像素标签减少了95％，但我们的方法实现了相对良好的性能水平。

更新时间: 2024-11-20 18:24:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.13528v1

Quantum Attention for Vision Transformers in High Energy Physics

We present a novel hybrid quantum-classical vision transformer architecture incorporating quantum orthogonal neural networks (QONNs) to enhance performance and computational efficiency in high-energy physics applications. Building on advancements in quantum vision transformers, our approach addresses limitations of prior models by leveraging the inherent advantages of QONNs, including stability and efficient parameterization in high-dimensional spaces. We evaluate the proposed architecture using multi-detector jet images from CMS Open Data, focusing on the task of distinguishing quark-initiated from gluon-initiated jets. The results indicate that embedding quantum orthogonal transformations within the attention mechanism can provide robust performance while offering promising scalability for machine learning challenges associated with the upcoming High Luminosity Large Hadron Collider. This work highlights the potential of quantum-enhanced models to address the computational demands of next-generation particle physics experiments.

Updated: 2024-11-20 18:11:17

标题: 量子关注在高能物理中视觉变换器中的应用

摘要: 我们提出了一种新颖的混合量子-经典视觉变换器架构，将量子正交神经网络（QONNs）融入其中，以增强在高能物理应用中的性能和计算效率。借助量子视觉变换器的进展，我们的方法通过利用QONNs的固有优势，包括在高维空间中的稳定性和高效参数化，来解决先前模型的局限性。我们使用CMS开放数据中的多探测器喷注图像评估提出的架构，重点关注将夸克引发的喷注与胶子引发的喷注区分开的任务。结果表明，在注意力机制中嵌入量子正交变换可以提供稳健的性能，同时为与即将到来的高亮度大型强子对撞机相关的机器学习挑战提供有前途的可扩展性。这项工作突显了量子增强模型解决下一代粒子物理实验的计算需求的潜力。

更新时间: 2024-11-20 18:11:17

领域: quant-ph,cs.LG,hep-ex,hep-ph

下载: http://arxiv.org/abs/2411.13520v1

Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models

The increasing demand for multilingual capabilities in healthcare underscores the need for AI models adept at processing diverse languages, particularly in clinical documentation and decision-making. Arabic, with its complex morphology, syntax, and diglossia, poses unique challenges for natural language processing (NLP) in medical contexts. This case study evaluates Sporo AraSum, a language model tailored for Arabic clinical documentation, against JAIS, the leading Arabic NLP model. Using synthetic datasets and modified PDQI-9 metrics modified ourselves for the purposes of assessing model performances in a different language. The study assessed the models' performance in summarizing patient-physician interactions, focusing on accuracy, comprehensiveness, clinical utility, and linguistic-cultural competence. Results indicate that Sporo AraSum significantly outperforms JAIS in AI-centric quantitative metrics and all qualitative attributes measured in our modified version of the PDQI-9. AraSum's architecture enables precise and culturally sensitive documentation, addressing the linguistic nuances of Arabic while mitigating risks of AI hallucinations. These findings suggest that Sporo AraSum is better suited to meet the demands of Arabic-speaking healthcare environments, offering a transformative solution for multilingual clinical workflows. Future research should incorporate real-world data to further validate these findings and explore broader integration into healthcare systems.

Updated: 2024-11-20 18:10:19

标题: 用Sporo AraSum推动阿拉伯语中复杂医学交流：超越现有的大型语言模型

摘要: 在医疗保健领域对多语言能力的需求不断增加，强调了需要AI模型能够处理多种语言，特别是在临床文档和决策中。阿拉伯语，由于其复杂的形态、句法和双语现象，在医疗上下文中自然语言处理（NLP）面临独特挑战。本病例研究评估了专为阿拉伯临床文档定制的语言模型Sporo AraSum，与领先的阿拉伯NLP模型JAIS相比。使用合成数据集和修改后的PDQI-9指标，为了评估模型在不同语言中的性能。研究评估了模型在总结患者-医生互动时的表现，重点关注准确性、全面性、临床效用和语言文化能力。结果表明，Sporo AraSum在以AI为中心的定量指标和我们修改后的PDQI-9中测量的所有定性属性方面显着优于JAIS。 AraSum的架构可以实现精确和文化敏感的文档记录，解决阿拉伯语的语言细微差别，同时减轻AI幻觉的风险。这些发现表明，Sporo AraSum更适合满足阿拉伯语医疗环境的需求，为多语言临床工作流提供了变革性解决方案。未来的研究应该结合真实世界数据进一步验证这些发现，并探索更广泛地将其整合到医疗系统中。

更新时间: 2024-11-20 18:10:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.13518v1

Procurement Auctions via Approximately Optimal Submodular Optimization

We study procurement auctions, where an auctioneer seeks to acquire services from strategic sellers with private costs. The quality of services is measured by a submodular function known to the auctioneer. Our goal is to design computationally efficient procurement auctions that (approximately) maximize the difference between the quality of the acquired services and the total cost of the sellers, while ensuring incentive compatibility (IC), individual rationality (IR) for sellers, and non-negative surplus (NAS) for the auctioneer. Our contributions are twofold: (i) we provide an improved analysis of existing algorithms for non-positive submodular function maximization, and (ii) we design efficient frameworks that transform submodular optimization algorithms into mechanisms that are IC, IR, NAS, and approximation-preserving. These frameworks apply to both the offline setting, where all sellers' bids and services are available simultaneously, and the online setting, where sellers arrive in an adversarial order, requiring the auctioneer to make irrevocable decisions. We also explore whether state-of-the-art submodular optimization algorithms can be converted into descending auctions in adversarial settings, where the schedule of descending prices is determined by an adversary. We show that a submodular optimization algorithm satisfying bi-criteria $(1/2, 1)$-approximation in welfare can be effectively adapted to a descending auction. Additionally, we establish a connection between descending auctions and online submodular optimization. Finally, we demonstrate the practical applications of our frameworks by instantiating them with state-of-the-art submodular optimization algorithms and empirically comparing their welfare performance on publicly available datasets with thousands of sellers.

Updated: 2024-11-20 18:06:55

标题: 通过近似最优次模优化进行采购拍卖

摘要: 我们研究采购拍卖，在这种拍卖中，拍卖人试图从具有私人成本的战略卖家那里获得服务。服务的质量由拍卖人已知的一个次模函数来衡量。我们的目标是设计计算效率高的采购拍卖，以（近似地）最大化所获得服务的质量与卖家总成本之间的差异，同时确保激励兼容性（IC）、卖家个人理性（IR）和拍卖人非负剩余（NAS）。我们的贡献有两个方面：（i）我们提供了对现有算法进行改进的分析，用于非正次模函数最大化；（ii）我们设计了一些有效的框架，将次模优化算法转化为既满足IC、IR、NAS要求又保持近似性的机制。这些框架适用于离线设置，即所有卖家的竞标和服务同时可用的情况，以及在线设置，即卖家以对抗性顺序到达，要求拍卖人做出不可撤销的决定。我们还探讨了当拍卖的价格由对手决定时，最先进的次模优化算法是否可以转化为降价拍卖。我们证明了一个在福利上满足双标准（1/2, 1）-近似的次模优化算法可以有效地适应降价拍卖。此外，我们建立了降价拍卖和在线次模优化之间的联系。最后，我们通过将最先进的次模优化算法实例化到我们的框架中，并在具有成千上万卖家的公开数据集上进行福利性能的实证比较，展示了我们框架的实际应用。

更新时间: 2024-11-20 18:06:55

领域: cs.GT,cs.DS,cs.LG

下载: http://arxiv.org/abs/2411.13513v1

Dyson Brownian motion and random matrix dynamics of weight matrices during learning

During training, weight matrices in machine learning architectures are updated using stochastic gradient descent or variations thereof. In this contribution we employ concepts of random matrix theory to analyse the resulting stochastic matrix dynamics. We first demonstrate that the dynamics can generically be described using Dyson Brownian motion, leading to e.g. eigenvalue repulsion. The level of stochasticity is shown to depend on the ratio of the learning rate and the mini-batch size, explaining the empirically observed linear scaling rule. We verify this linear scaling in the restricted Boltzmann machine. Subsequently we study weight matrix dynamics in transformers (a nano-GPT), following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.

Updated: 2024-11-20 18:05:39

标题: 戴森布朗运动和学习过程中权重矩阵的随机矩阵动力学

摘要: 在训练过程中，机器学习架构中的权重矩阵通过随机梯度下降或其变体进行更新。在这项研究中，我们运用随机矩阵理论的概念来分析由此产生的随机矩阵动态。我们首先证明这种动态通常可以用戴森布朗运动来描述，例如导致特征值排斥。我们表明随机性水平取决于学习率和小批量大小的比率，从而解释了实验观察到的线性缩放规则。我们在受限玻尔兹曼机中验证了这种线性缩放。随后，我们研究了变压器（一种纳米GPT）中的权重矩阵动态，从初始化时的马尔琴科-帕斯图尔分布的特征值演变到学习结束时带有额外结构的组合。

更新时间: 2024-11-20 18:05:39

领域: cond-mat.dis-nn,cs.LG,hep-lat

下载: http://arxiv.org/abs/2411.13512v1

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.

Updated: 2024-11-20 17:57:26

标题: 从解码到元生成：大型语言模型的推理时间算法

摘要: 现代大型语言模型（LLMs）研究中最引人注目的发现之一是，在训练过程中增加计算量会导致更好的结果。然而，在推理过程中增加计算量的好处却受到了较少关注。本调查重点关注这些推理时间方法。我们在统一的数学形式主义下探讨了三个领域：基于标记的生成算法、元生成算法和高效生成。基于标记的生成算法，通常称为解码算法，通过逐个采样标记或构建标记级搜索空间，然后选择输出来操作。这些方法通常假设可以访问语言模型的对数、下一个标记分布或概率分数。元生成算法在部分或完整序列上工作，融合领域知识，实现回溯，并整合外部信息。高效生成方法旨在减少标记成本并提高生成速度。我们的调查统一了三个研究社区的观点：传统自然语言处理、现代LLMs和机器学习系统。

更新时间: 2024-11-20 17:57:26

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.16838v2

Advancing Heatwave Forecasting via Distribution Informed-Graph Neural Networks (DI-GNNs): Integrating Extreme Value Theory with GNNs

Heatwaves, prolonged periods of extreme heat, have intensified in frequency and severity due to climate change, posing substantial risks to public health, ecosystems, and infrastructure. Despite advancements in Machine Learning (ML) modeling, accurate heatwave forecasting at weather scales (1--15 days) remains challenging due to the non-linear interactions between atmospheric drivers and the rarity of these extreme events. Traditional models relying on heuristic feature engineering often fail to generalize across diverse climates and capture the complexities of heatwave dynamics. This study introduces the Distribution-Informed Graph Neural Network (DI-GNN), a novel framework that integrates principles from Extreme Value Theory (EVT) into the graph neural network architecture. DI-GNN incorporates Generalized Pareto Distribution (GPD)-derived descriptors into the feature space, adjacency matrix, and loss function to enhance its sensitivity to rare heatwave occurrences. By prioritizing the tails of climatic distributions, DI-GNN addresses the limitations of existing methods, particularly in imbalanced datasets where traditional metrics like accuracy are misleading. Empirical evaluations using weather station data from British Columbia, Canada, demonstrate the superior performance of DI-GNN compared to baseline models. DI-GNN achieved significant improvements in balanced accuracy, recall, and precision, with high AUC and average precision scores, reflecting its robustness in distinguishing heatwave events.

Updated: 2024-11-20 17:45:03

标题: 通过分布信息图神经网络（DI-GNNs）推进热浪预测：将极值理论与GNNs集成

摘要: 热浪，即持续时间较长的极端高温期，由于气候变化而在频率和严重性上加剧，对公共健康、生态系统和基础设施构成重大风险。尽管机器学习（ML）建模取得了进展，但在天气尺度（1-15天）上准确预测热浪仍然具有挑战性，因为大气驱动因子之间存在非线性相互作用，而这些极端事件的罕见性。依赖启发式特征工程的传统模型通常无法在不同气候中泛化，并捕捉热浪动态的复杂性。本研究介绍了Distribution-Informed Graph Neural Network（DI-GNN），这是一个将极值理论（EVT）原则整合到图神经网络架构中的新框架。DI-GNN将广义帕累托分布（GPD）导出的描述符集成到特征空间、邻接矩阵和损失函数中，以增强其对罕见热浪事件的敏感性。通过优先考虑气候分布的尾部，DI-GNN解决了现有方法的局限性，特别是在不平衡数据集中，传统指标如准确度是误导性的。利用加拿大不列颠哥伦比亚省的气象站数据进行的实证评估显示，与基准模型相比，DI-GNN表现出卓越的性能。DI-GNN在平衡准确度、召回率和精度方面取得了显著改进，具有高的AUC和平均精度分数，反映了其在区分热浪事件方面的稳健性。

更新时间: 2024-11-20 17:45:03

领域: cs.LG,physics.ao-ph,physics.soc-ph

下载: http://arxiv.org/abs/2411.13496v1

Utilizing Large Language Models to Synthesize Product Desirability Datasets

This research explores the application of large language models (LLMs) to generate synthetic datasets for Product Desirability Toolkit (PDT) testing, a key component in evaluating user sentiment and product experience. Utilizing gpt-4o-mini, a cost-effective alternative to larger commercial LLMs, three methods, Word+Review, Review+Word, and Supply-Word, were each used to synthesize 1000 product reviews. The generated datasets were assessed for sentiment alignment, textual diversity, and data generation cost. Results demonstrated high sentiment alignment across all methods, with Pearson correlations ranging from 0.93 to 0.97. Supply-Word exhibited the highest diversity and coverage of PDT terms, although with increased generation costs. Despite minor biases toward positive sentiments, in situations with limited test data, LLM-generated synthetic data offers significant advantages, including scalability, cost savings, and flexibility in dataset production.

Updated: 2024-11-20 17:35:21

标题: 利用大型语言模型合成产品吸引力数据集

摘要: 这项研究探讨了大型语言模型（LLMs）在生成合成数据集用于产品可取性工具包（PDT）测试中的应用，这是评估用户情绪和产品体验的关键组成部分。利用gpt-4o-mini，一个成本效益较高的替代较大商业LLMs的方法，采用了Word+Review、Review+Word和Supply-Word三种方法分别合成了1000条产品评论。生成的数据集被评估其情绪一致性、文本多样性和数据生成成本。结果显示，所有方法在情绪一致性方面表现出较高的一致性，皮尔逊相关系数在0.93至0.97之间。Supply-Word展示出最高的多样性和覆盖PDT术语，尽管生成成本增加。尽管存在对正面情绪的轻微偏见，但在测试数据有限的情况下，LLM生成的合成数据提供了显著优势，包括可扩展性、节省成本和数据集生成的灵活性。

更新时间: 2024-11-20 17:35:21

领域: cs.CL,cs.AI,cs.LG,I.2.7; H.3.3; I.2.6; H.5.2

下载: http://arxiv.org/abs/2411.13485v1

Soda: An Object-Oriented Functional Language for Specifying Human-Centered Problems

We present Soda (Symbolic Objective Descriptive Analysis), a language that helps to treat qualities and quantities in a natural way and greatly simplifies the task of checking their correctness. We present key properties for the language motivated by the design of a descriptive language to encode complex requirements on computer systems, and we explain how these key properties must be addressed to model these requirements with simple definitions. We give an overview of a tool that helps to describe problems in an easy way that we consider more transparent and less error-prone.

Updated: 2024-11-20 17:26:52

标题: 苏打：一种面向对象的功能语言，用于规范人类中心问题

摘要: 我们提出了Soda（Symbolic Objective Descriptive Analysis），这是一种语言，可以自然地处理质量和数量，并极大简化了检查它们正确性的任务。我们提出了这种语言的关键特性，这些特性是基于设计一种描述性语言来对计算机系统上的复杂需求进行编码的动机，并解释了这些关键特性必须如何被处理以用简单的定义来建模这些需求。我们概述了一种工具，可以帮助以更透明和少出错的方式描述问题。

更新时间: 2024-11-20 17:26:52

领域: cs.PL,cs.AI,cs.LO

下载: http://arxiv.org/abs/2310.01961v2

Conformal Prediction for Hierarchical Data

Reconciliation has become an essential tool in multivariate point forecasting for hierarchical time series. However, there is still a lack of understanding of the theoretical properties of probabilistic Forecast Reconciliation techniques. Meanwhile, Conformal Prediction is a general framework with growing appeal that provides prediction sets with probabilistic guarantees in finite sample. In this paper, we propose a first step towards combining Conformal Prediction and Forecast Reconciliation by analyzing how including a reconciliation step in the Split Conformal Prediction (SCP) procedure enhances the resulting prediction sets. In particular, we show that the validity granted by SCP remains while improving the efficiency of the prediction sets. We also advocate a variation of the theoretical procedure for practical use. Finally, we illustrate these results with simulations.

Updated: 2024-11-20 17:26:26

标题: Hierarchical Data的一致性预测

摘要: 和解已经成为多变量点预测在分层时间序列中的一种必要工具。然而，对于概率预测和解技术的理论属性仍然缺乏了解。与此同时，符合性预测是一个具有不断增长吸引力的通用框架，它在有限样本中提供具有概率保证的预测集。在本文中，我们提出了将符合性预测和预测和解相结合的第一步，通过分析在分割符合性预测（SCP）过程中包含和解步骤如何增强结果预测集。特别地，我们展示了SCP授予的有效性仍然存在，同时提高了预测集的效率。我们也提倡了一种用于实际应用的理论方法的变体。最后，我们通过模拟结果来说明这些结果。

更新时间: 2024-11-20 17:26:26

领域: stat.ML,cs.LG,stat.AP,62H12

下载: http://arxiv.org/abs/2411.13479v1

PatentEdits: Framing Patent Novelty as Textual Entailment

A patent must be deemed novel and non-obvious in order to be granted by the US Patent Office (USPTO). If it is not, a US patent examiner will cite the prior work, or prior art, that invalidates the novelty and issue a non-final rejection. Predicting what claims of the invention should change given the prior art is an essential and crucial step in securing invention rights, yet has not been studied before as a learnable task. In this work we introduce the PatentEdits dataset, which contains 105K examples of successful revisions that overcome objections to novelty. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models (LLMs). We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.

Updated: 2024-11-20 17:23:40

标题: 专利编辑：将专利新颖性框定为文本蕴涵

摘要: 为了被美国专利局（USPTO）授予专利，专利必须被认为是新颖且非显而易见的。如果不是，美国专利审查员将引用先前的工作，或先前的技术，来使新颖性无效，并发出非最终拒绝通知。预测在先前技术的基础上发明的权利应该如何改变是确保发明权利的一个重要和关键步骤，然而这之前从未被研究作为一个可以学习的任务。在这项工作中，我们介绍了PatentEdits数据集，其中包含105,000个成功修订的示例，这些示例克服了对新颖性的异议。我们设计了算法，逐句标记编辑，然后确定这些编辑可以如何用大型语言模型（LLMs）进行预测。我们证明，在引用的参考文献和草稿句子之间评估文本蕴涵在预测哪些创新性权利保持不变或在先前技术方面是新颖的方面尤其有效。

更新时间: 2024-11-20 17:23:40

领域: cs.CL,cs.AI,cs.CY,cs.IR

下载: http://arxiv.org/abs/2411.13477v1

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

Score identity Distillation (SiD) is a data-free method that has achieved SOTA performance in image generation by leveraging only a pretrained diffusion model, without requiring any training data. However, its ultimate performance is constrained by how accurate the pretrained model captures the true data scores at different stages of the diffusion process. In this paper, we introduce SiDA (SiD with Adversarial Loss), which not only enhances generation quality but also improves distillation efficiency by incorporating real images and adversarial loss. SiDA utilizes the encoder from the generator's score network as a discriminator, boosting its ability to distinguish between real images and those generated by SiD. The adversarial loss is batch-normalized within each GPU and then combined with the original SiD loss. This integration effectively incorporates the average "fakeness" per GPU batch into the pixel-based SiD loss, enabling SiDA to distill a single-step generator either from scratch or by fine-tuning an existing one. SiDA converges significantly faster than its predecessor when trained from scratch, and swiftly improves upon the original model's performance after an initial warmup period during fine-tuning from a pre-distilled SiD generator. This one-step adversarial distillation method establishes new benchmarks in generation performance when distilling EDM diffusion models pretrained on CIFAR-10 (32x32) and ImageNet (64x64), achieving FID score of 1.110 on ImageNet 64x64. It sets record-low FID scores when distilling EDM2 models trained on ImageNet (512x512), surpassing even the largest teacher model, EDM2-XXL. Our SiDA's results record FID scores of 2.156 for EDM2-XS, 1.669 for S, 1.488 for M, 1.413 for L, 1.379 for XL, and 1.366 for XXL, demonstrating significant improvements across all model sizes. Our open-source code will be integrated into the SiD codebase.

Updated: 2024-11-20 17:20:00

标题: 对抗性分数身份蒸馏：一步迅速超越导师

摘要: Score Identity Distillation (SiD) 是一种无数据方法，通过仅利用预训练扩散模型，在图像生成方面取得了 SOTA 性能，而无需任何训练数据。然而，其最终性能受到预训练模型在不同扩散过程阶段捕捉真实数据分数的准确性的限制。在本文中，我们引入了 SiDA（带对抗损失的 SiD），不仅提高了生成质量，还通过结合真实图像和对抗损失提高了蒸馏效率。SiDA利用生成器得分网络的编码器作为鉴别器，增强了其区分真实图像和SiD生成图像的能力。对抗损失在每个 GPU 中进行批归一化，然后与原始 SiD 损失结合。这种整合有效地将每个 GPU 批次的平均“虚假性”纳入基于像素的 SiD 损失中，使 SiDA 能够从头开始或通过微调现有模型来蒸馏单步生成器。与从头开始训练时的前身相比，SiDA 收敛速度显著更快，并且在从经过预蒸馏的 SiD 生成器进行微调的初始热身期后，迅速改善了原始模型的性能。这种一步对抗蒸馏方法在蒸馏预训练在 CIFAR-10（32x32）和 ImageNet（64x64）上的 EDM 扩散模型时建立了新的生成性能基准，实现了在 ImageNet 64x64 上的 FID 分数为 1.110。在蒸馏在 ImageNet（512x512）上训练的 EDM2 模型时，其创纪录的低 FID 分数超过了最大的教师模型 EDM2-XXL。我们的 SiDA 结果记录了 EDM2-XS 的 FID 分数为 2.156，S 的为 1.669，M 的为 1.488，L 的为 1.413，XL 的为 1.379，XXL 的为 1.366，显示了在所有模型尺寸上的显著改进。我们的开源代码将集成到 SiD 代码库中。

更新时间: 2024-11-20 17:20:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.14919v3

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

This paper considers the learning of logical (Boolean) functions with a focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for sparse functions and a class of network models including instances of Transformers, random features models, and linear networks, a min-degree-interpolator is learned on the unseen. More specifically, this means an interpolator of the training data that has minimal Fourier mass on the higher degree basis elements. These findings lead to two implications: (1) we provide an explanation to the length generalization problem for Boolean functions (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports. Finally, we discuss extensions to other models or non-sparse regimes where the min-degree bias may still occur or fade, as well as how it can be potentially corrected when undesirable.

Updated: 2024-11-20 17:16:01

标题: 未知领域的泛化、逻辑推理和学位课程的概括

摘要: 这篇论文考虑了学习逻辑（布尔）函数，并侧重于未见数据（GOTU）设置的泛化，这是一种强大的超出分布泛化的情况。这是由于在某些推理任务（例如，算术/逻辑）中数据的丰富组合性质使得代表性数据采样具有挑战性，并且在GOTU下成功学习给出了第一个“外推”或“推理”学习者的简要概述。我们研究了不同网络架构在GOTU下通过（S）GD训练的表现，并提供了理论和实验证据，表明对于稀疏函数和包括变压器实例、随机特征模型和线性网络在内的一类网络模型，会在未见数据上学习到一个最小度插值器。更具体地说，这意味着训练数据的插值器在高次基元素上具有最小的傅里叶质量。这些发现导致了两个含义：（1）我们为布尔函数的长度泛化问题提供了一个解释（例如，Anil等人，2022年）；（2）我们介绍了一个名为Degree-Curriculum的课程学习算法，通过增加支持来更有效地学习单项式。最后，我们讨论了对其他模型或非稀疏情况的扩展，最小度偏见仍可能发生或消失，以及在不希望时如何潜在地进行校正。

更新时间: 2024-11-20 17:16:01

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2301.13105v3

Robust Fair Clustering with Group Membership Uncertainty Sets

We study the canonical fair clustering problem where each cluster is constrained to have close to population-level representation of each group. Despite significant attention, the salient issue of having incomplete knowledge about the group membership of each point has been superficially addressed. In this paper, we consider a setting where the assigned group memberships are noisy. We introduce a simple noise model that requires a small number of parameters to be given by the decision maker. We then present an algorithm for fair clustering with provable \emph{robustness} guarantees. Our framework enables the decision maker to trade off between the robustness and the clustering quality. Unlike previous work, our algorithms are backed by worst-case theoretical guarantees. Finally, we empirically verify the performance of our algorithm on real world datasets and show its superior performance over existing baselines.

Updated: 2024-11-20 17:12:50

标题: 具有群体成员不确定性集的强鲁棒公平聚类

摘要: 我们研究了规范公平聚类问题，其中每个聚类都受到约束，使得每个群体在人口水平上具有接近的代表性。尽管引起了相当大的关注，但关于每个点的群体成员资格信息不完整的突出问题只是被表面处理了。在本文中，我们考虑了一种分配的群体成员资格信息是有噪声的情况。我们引入了一个简单的噪声模型，该模型需要决策者提供少量参数。然后，我们提出了一个具有可证明鲁棒性保证的公平聚类算法。我们的框架使决策者能够在鲁棒性和聚类质量之间进行权衡。与先前的研究不同，我们的算法是由最坏情况的理论保证支持的。最后，我们在真实世界数据集上对我们的算法的性能进行了实证验证，并展示了其优于现有基线方法的性能。

更新时间: 2024-11-20 17:12:50

领域: cs.LG,cs.AI,cs.CY,cs.DS

下载: http://arxiv.org/abs/2406.00599v3

Safe Exploitative Play with Untrusted Type Beliefs

The combination of the Bayesian game and learning has a rich history, with the idea of controlling a single agent in a system composed of multiple agents with unknown behaviors given a set of types, each specifying a possible behavior for the other agents. The idea is to plan an agent's own actions with respect to those types which it believes are most likely to maximize the payoff. However, the type beliefs are often learned from past actions and likely to be incorrect. With this perspective in mind, we consider an agent in a game with type predictions of other components, and investigate the impact of incorrect beliefs to the agent's payoff. In particular, we formally define a tradeoff between risk and opportunity by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs. Our main results characterize the tradeoff by establishing upper and lower bounds on the Pareto front for both normal-form and stochastic Bayesian games, with numerical results provided.

Updated: 2024-11-20 17:11:21

标题: 安全的利用性玩法与不可信任的类型信念

摘要: 贝叶斯博弈与学习的结合具有丰富的历史，其核心思想是在由多个具有未知行为的代理组成的系统中控制单个代理，给定一组类型，每种类型都指定其他代理可能的行为。该思想是规划代理自身的行动，使其相对于它认为最有可能最大化回报的那些类型来计划。然而，类型信念通常是根据过去的行动学习的，并且很可能是错误的。基于这种观点，我们考虑在具有其他组件类型预测的博弈中的代理，并研究错误信念对代理回报的影响。具体来说，我们通过比较获得的回报与最优回报来形式化地定义了风险和机会之间的权衡，这由于信任或不信任学习的信念导致的差距而表示。我们的主要结果通过建立正常形式和随机贝叶斯博弈的帕累托前沿的上下界来表征这种权衡，同时提供了数值结果。

更新时间: 2024-11-20 17:11:21

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2411.07679v2

Sampling and Integration of Logconcave Functions by Algorithmic Diffusion

We study the complexity of sampling, rounding, and integrating arbitrary logconcave functions. Our new approach provides the first complexity improvements in nearly two decades for general logconcave functions for all three problems, and matches the best-known complexities for the special case of uniform distributions on convex bodies. For the sampling problem, our output guarantees are significantly stronger than previously known, and lead to a streamlined analysis of statistical estimation based on dependent random samples.

Updated: 2024-11-20 17:10:24

标题: 算法扩散对对数凹函数的采样和积分

摘要: 我们研究了对任意对数凹函数进行采样、四舍五入和积分的复杂性。我们的新方法为所有三个问题的一般对数凹函数提供了近20年来的首次复杂性改进，并与凸体上均匀分布的特殊情况的最佳已知复杂性相匹配。对于采样问题，我们的输出保证明显强于先前已知的，并导致基于相关随机样本的统计估计的简化分析。

更新时间: 2024-11-20 17:10:24

领域: cs.DS,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2411.13462v1

SoK: A Systems Perspective on Compound AI Threats and Countermeasures

Large language models (LLMs) used across enterprises often use proprietary models and operate on sensitive inputs and data. The wide range of attack vectors identified in prior research - targeting various software and hardware components used in training and inference - makes it extremely challenging to enforce confidentiality and integrity policies. As we advance towards constructing compound AI inference pipelines that integrate multiple large language models (LLMs), the attack surfaces expand significantly. Attackers now focus on the AI algorithms as well as the software and hardware components associated with these systems. While current research often examines these elements in isolation, we find that combining cross-layer attack observations can enable powerful end-to-end attacks with minimal assumptions about the threat model. Given, the sheer number of existing attacks at each layer, we need a holistic and systemized understanding of different attack vectors at each layer. This SoK discusses different software and hardware attacks applicable to compound AI systems and demonstrates how combining multiple attack mechanisms can reduce the threat model assumptions required for an isolated attack. Next, we systematize the ML attacks in lines with the Mitre Att&ck framework to better position each attack based on the threat model. Finally, we outline the existing countermeasures for both software and hardware layers and discuss the necessity of a comprehensive defense strategy to enable the secure and high-performance deployment of compound AI systems.

Updated: 2024-11-20 17:08:38

标题: SoK：复合AI威胁和对策的系统视角

摘要: 大型语言模型（LLMs）在企业中广泛使用，通常使用专有模型，并在敏感输入和数据上运行。先前研究中确定的各种攻击向量 - 针对用于训练和推理的各种软件和硬件组件 - 使得强制执行保密性和完整性政策变得极具挑战性。随着我们迈向构建集成多个大型语言模型（LLMs）的复合AI推理管道，攻击面显著扩大。攻击者现在专注于与这些系统相关的AI算法以及软件和硬件组件。虽然当前研究通常独立地检查这些元素，但我们发现结合跨层攻击观察可以实现强大的端到端攻击，而对威胁模型的假设很少。鉴于每个层面的现有攻击数量庞大，我们需要全面系统化地了解每个层面的不同攻击向量。本文综述了适用于复合AI系统的不同软件和硬件攻击，并演示了如何结合多种攻击机制可以减少对一个孤立攻击所需的威胁模型假设。接下来，我们将机器学习攻击系统化，符合Mitre Att&ck框架，以便根据威胁模型更好地定位每次攻击。最后，我们概述了现有的软件和硬件层的对策，并讨论了实现复合AI系统安全高效部署所需的全面防御策略的必要性。

更新时间: 2024-11-20 17:08:38

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13459v1

Debias-CLR: A Contrastive Learning Based Debiasing Method for Algorithmic Fairness in Healthcare Applications

Artificial intelligence based predictive models trained on the clinical notes can be demographically biased. This could lead to adverse healthcare disparities in predicting outcomes like length of stay of the patients. Thus, it is necessary to mitigate the demographic biases within these models. We proposed an implicit in-processing debiasing method to combat disparate treatment which occurs when the machine learning model predict different outcomes for individuals based on the sensitive attributes like gender, ethnicity, race, and likewise. For this purpose, we used clinical notes of heart failure patients and used diagnostic codes, procedure reports and physiological vitals of the patients. We used Clinical BERT to obtain feature embeddings within the diagnostic codes and procedure reports, and LSTM autoencoders to obtain feature embeddings within the physiological vitals. Then, we trained two separate deep learning contrastive learning frameworks, one for gender and the other for ethnicity to obtain debiased representations within those demographic traits. We called this debiasing framework Debias-CLR. We leveraged clinical phenotypes of the patients identified in the diagnostic codes and procedure reports in the previous study to measure fairness statistically. We found that Debias-CLR was able to reduce the Single-Category Word Embedding Association Test (SC-WEAT) effect size score when debiasing for gender and ethnicity. We further found that to obtain fair representations in the embedding space using Debias-CLR, the accuracy of the predictive models on downstream tasks like predicting length of stay of the patients did not get reduced as compared to using the un-debiased counterparts for training the predictive models. Hence, we conclude that our proposed approach, Debias-CLR is fair and representative in mitigating demographic biases and can reduce health disparities.

Updated: 2024-11-20 17:06:26

标题: Debias-CLR：一种基于对比学习的算法公平性在医疗应用中的去偏倚方法

摘要: 基于临床笔记训练的人工智能预测模型可能存在人口统计偏见。这可能导致在预测患者的住院时间等结果时出现不利的医疗差异。因此，有必要减轻这些模型内的人口统计偏见。我们提出了一种隐性内部处理去偏见方法，以对抗机器学习模型根据性别、种族、种族等敏感属性为不同个体预测不同结果的情况。为此，我们使用心力衰竭患者的临床笔记，并使用患者的诊断代码、手术报告和生理生命体征。我们使用临床 BERT 来获取诊断代码和手术报告中的特征嵌入，使用 LSTM 自编码器来获取生理生命体征中的特征嵌入。然后，我们训练了两个独立的深度学习对比学习框架，一个用于性别，另一个用于种族，以获得这些人口统计特征内的无偏表示。我们将这种去偏见框架称为 Debias-CLR。我们利用在先前研究中识别出的患者的临床表型在诊断代码和手术报告中进行公平性统计测量。我们发现，Debias-CLR 能够在去偏见性别和种族时降低单一类别词嵌入关联测试（SC-WEAT）效应大小分数。我们进一步发现，使用 Debias-CLR 在嵌入空间中获得公平表示时，与使用未去偏见的对照模型进行训练预测患者住院时间等下游任务的准确性并未降低。因此，我们得出结论，我们提出的方法 Debias-CLR 在减轻人口统计偏见方面是公平且具代表性的，并能减少健康差异。

更新时间: 2024-11-20 17:06:26

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2411.10544v2

LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models

Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.

Updated: 2024-11-20 16:59:41

标题: LIMBA：使用生成模型进行低资源语言保存和增值的开源框架

摘要: 少数民族语言对于保护文化遗产至关重要，然而由于数字资源有限以及人工智能模型主要依赖高资源语言训练，这些语言面临着日益增长的灭绝风险。本白皮书提出了一个框架，以生成用于低资源语言的语言工具，重点放在数据创建上，以支持语言模型的发展，从而帮助保护努力。作为案例研究，撒丁岛语作为一种濒危语言展示了该框架的有效性。通过解决数据稀缺问题，阻碍了这些语言智能应用的发展，我们有助于促进语言多样性，并支持通过现代技术推动语言标准化和复兴的持续努力。

更新时间: 2024-11-20 16:59:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.13453v1

AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations

State-of-the-art multimodal web agents, powered by Multimodal Large Language Models (MLLMs), can autonomously execute many web tasks by processing user instructions and interacting with graphical user interfaces (GUIs). Current strategies for building web agents rely on (i) the generalizability of underlying MLLMs and their steerability via prompting, and (ii) large-scale fine-tuning of MLLMs on web-related tasks. However, web agents still struggle to automate tasks on unseen websites and domains, limiting their applicability to enterprise-specific and proprietary platforms. Beyond generalization from large-scale pre-training and fine-tuning, we propose building agents for few-shot adaptability using human demonstrations. We introduce the AdaptAgent framework that enables both proprietary and open-weights multimodal web agents to adapt to new websites and domains using few human demonstrations (up to 2). Our experiments on two popular benchmarks -- Mind2Web & VisualWebArena -- show that using in-context demonstrations (for proprietary models) or meta-adaptation demonstrations (for meta-learned open-weights models) boosts task success rate by 3.36% to 7.21% over non-adapted state-of-the-art models, corresponding to a relative increase of 21.03% to 65.75%. Furthermore, our additional analyses (a) show the effectiveness of multimodal demonstrations over text-only ones, (b) shed light on the influence of different data selection strategies during meta-learning on the generalization of the agent, and (c) demonstrate the effect of number of few-shot examples on the web agent's success rate. Overall, our results unlock a complementary axis for developing widely applicable multimodal web agents beyond large-scale pre-training and fine-tuning, emphasizing few-shot adaptability.

Updated: 2024-11-20 16:54:15

标题: AdaptAgent：利用来自人类示范的少样本学习调整多模态网络代理

摘要: 最先进的多模态网络代理，由多模态大型语言模型（MLLMs）驱动，可以通过处理用户指令并与图形用户界面（GUI）交互来自主执行许多网络任务。目前构建网络代理的策略依赖于（i）基础MLLMs的泛化能力及通过提示进行引导，以及（ii）对与网络相关的任务进行大规模微调。然而，网络代理仍然难以自动化处理未知网站和领域的任务，限制了它们在企业特定和专有平台上的适用性。除了从大规模预训练和微调中进行泛化之外，我们提出利用人类演示构建适用于少样本可适应性的代理。我们介绍了AdaptAgent框架，该框架使专有和开放权重的多模态网络代理能够通过少量人类演示（最多2个）适应新的网站和领域。我们在两个流行的基准测试中进行的实验——Mind2Web和VisualWebArena——显示，在上下文演示（对于专有模型）或元适应演示（对于元学习的开放权重模型）的情况下，任务成功率比未经调整的最先进模型提高了3.36%至7.21%，对应相对增加了21.03%至65.75%。此外，我们的额外分析（a）显示多模态演示比仅文本演示更有效，（b）揭示了在元学习过程中不同数据选择策略对代理泛化的影响，以及（c）展示了少样本示例数量对网络代理成功率的影响。总体而言，我们的结果为开发广泛适用的多模态网络代理开辟了一个补充的发展方向，超越了大规模预训练和微调，强调了少样本适应性。

更新时间: 2024-11-20 16:54:15

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.13451v1

CODES: Benchmarking Coupled ODE Surrogates

We introduce CODES, a benchmark for comprehensive evaluation of surrogate architectures for coupled ODE systems. Besides standard metrics like mean squared error (MSE) and inference time, CODES provides insights into surrogate behaviour across multiple dimensions like interpolation, extrapolation, sparse data, uncertainty quantification and gradient correlation. The benchmark emphasizes usability through features such as integrated parallel training, a web-based configuration generator, and pre-implemented baseline models and datasets. Extensive documentation ensures sustainability and provides the foundation for collaborative improvement. By offering a fair and multi-faceted comparison, CODES helps researchers select the most suitable surrogate for their specific dataset and application while deepening our understanding of surrogate learning behaviour.

Updated: 2024-11-20 16:47:44

标题: 代号：耦合ODE替代品基准测试

摘要: 我们介绍了CODES，这是一个用于综合评估耦合ODE系统代理架构的基准。除了标准指标如均方误差（MSE）和推断时间之外，CODES还提供了关于代理行为的见解，跨越插值、外推、稀疏数据、不确定性量化和梯度相关性等多个维度。该基准强调可用性，通过集成并行训练、基于网络的配置生成器以及预先实现的基线模型和数据集等功能实现。广泛的文档确保可持续性，并为协作改进提供基础。通过提供公正和多方面的比较，CODES帮助研究人员为其特定数据集和应用选择最合适的代理，同时加深我们对代理学习行为的理解。

更新时间: 2024-11-20 16:47:44

领域: cs.LG,astro-ph.IM,physics.comp-ph

下载: http://arxiv.org/abs/2410.20886v2

Blockchain-Enhanced Framework for Secure Third-Party Vendor Risk Management and Vigilant Security Controls

In an era of heightened digital interconnectedness, businesses increasingly rely on third-party vendors to enhance their operational capabilities. However, this growing dependency introduces significant security risks, making it crucial to develop a robust framework to mitigate potential vulnerabilities. This paper proposes a comprehensive secure framework for managing third-party vendor risk, integrating blockchain technology to ensure transparency, traceability, and immutability in vendor assessments and interactions. By leveraging blockchain, the framework enhances the integrity of vendor security audits, ensuring that vendor assessments remain up-to-date and tamperproof. This proposed framework leverages smart contracts to reduce human error while ensuring real-time monitoring of compliance and security controls. By evaluating critical security controls-such as data encryption, access control mechanisms, multi-factor authentication, and zero-trust architecture-this approach strengthens an organization's defense against emerging cyber threats. Additionally, continuous monitoring enabled by blockchain ensures the immutability and transparency of vendor compliance processes. In this paper, a case study on iHealth's transition to AWS Cloud demonstrates the practical implementation of the framework, showing a significant reduction in vulnerabilities and marked improvement in incident response times. Through the adoption of this blockchain-enabled approach, organizations can mitigate vendor risks, streamline compliance, and enhance their overall security posture.

Updated: 2024-11-20 16:42:14

标题: 区块链增强框架用于安全的第三方供应商风险管理和警惕安全控制

摘要: 在一个数字互联程度日益加深的时代，企业越来越依赖第三方供应商来提升他们的运营能力。然而，这种增长的依赖性带来了重大的安全风险，因此开发一个强大的框架来减轻潜在的漏洞至关重要。本文提出了一个全面的安全框架来管理第三方供应商风险，整合了区块链技术来确保供应商评估和互动的透明性、可追溯性和不可变性。通过利用区块链，该框架增强了供应商安全审计的完整性，确保供应商评估保持最新和防篡改。该提出的框架利用智能合约来减少人为错误，同时确保对合规和安全控制的实时监控。通过评估关键的安全控制-如数据加密、访问控制机制、多因素身份验证和零信任架构-这种方法加强了组织对新兴网络威胁的防御。此外，区块链实现的连续监控确保了供应商合规流程的不可变性和透明性。本文通过对iHealth转向AWS云的案例研究展示了该框架的实际实施，显示了漏洞显著减少和事件响应时间显著改善。通过采用这种区块链启用的方法，组织可以减轻供应商风险，简化合规流程，并增强其整体安全姿态。

更新时间: 2024-11-20 16:42:14

领域: cs.CR

下载: http://arxiv.org/abs/2411.13447v1

Robust Monocular Visual Odometry using Curriculum Learning

Curriculum Learning (CL), drawing inspiration from natural learning patterns observed in humans and animals, employs a systematic approach of gradually introducing increasingly complex training data during model development. Our work applies innovative CL methodologies to address the challenging geometric problem of monocular Visual Odometry (VO) estimation, which is essential for robot navigation in constrained environments. The primary objective of our research is to push the boundaries of current state-of-the-art (SOTA) benchmarks in monocular VO by investigating various curriculum learning strategies. We enhance the end-to-end Deep-Patch-Visual Odometry (DPVO) framework through the integration of novel CL approaches, with the goal of developing more resilient models capable of maintaining high performance across challenging environments and complex motion scenarios. Our research encompasses several distinctive CL strategies. We develop methods to evaluate sample difficulty based on trajectory motion characteristics, implement sophisticated adaptive scheduling through self-paced weighted loss mechanisms, and utilize reinforcement learning agents for dynamic adjustment of training emphasis. Through comprehensive evaluation on the real-world TartanAir dataset, our Curriculum Learning-based Deep-Patch-Visual Odometry (CL-DPVO) demonstrates superior performance compared to existing SOTA methods, including both feature-based and learning-based VO approaches. The results validate the effectiveness of integrating curriculum learning principles into visual odometry systems.

Updated: 2024-11-20 16:26:51

标题: 使用课程学习的鲁棒单目视觉里程计

摘要: 课程学习（CL）受到人类和动物自然学习模式的启发，采用逐渐引入越来越复杂的训练数据的系统方法，用于模型开发过程中。我们的工作将创新的CL方法应用于解决单目视觉里程计（VO）估计中的具有挑战性的几何问题，这对于机器人在受限环境中进行导航至关重要。我们的研究的主要目标是通过研究各种课程学习策略来推动当前最先进（SOTA）单目VO基准的边界。我们通过集成新颖的CL方法来增强端到端的Deep-Patch-Visual Odometry（DPVO）框架，目标是开发更具弹性的模型，能够在挑战性环境和复杂运动场景中保持高性能。我们的研究涵盖了几种独特的CL策略。我们开发了基于轨迹运动特征评估样本难度的方法，通过自主加权损失机制实现复杂的自适应调度，并利用强化学习代理动态调整训练重点。通过对真实世界TartanAir数据集的全面评估，我们基于课程学习的Deep-Patch-Visual Odometry（CL-DPVO）展示了与现有SOTA方法（包括基于特征和学习的VO方法）相比的优越性能。结果验证了将课程学习原则集成到视觉里程计系统中的有效性。

更新时间: 2024-11-20 16:26:51

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.13438v1

SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformers

Generating synthetic Electronic Health Records (EHRs) offers significant potential for data augmentation, privacy-preserving data sharing, and improving machine learning model training. We propose a novel tokenization strategy tailored for structured EHR data, which encompasses diverse data types such as covariates, ICD codes, and irregularly sampled time series. Using a GPT-like decoder-only transformer model, we demonstrate the generation of high-quality synthetic EHRs. Our approach is evaluated using the MIMIC-III dataset, and we benchmark the fidelity, utility, and privacy of the generated data against state-of-the-art models.

Updated: 2024-11-20 16:11:20

标题: SynEHRgy:仅解码器变压器合成混合类型结构化电子健康记录

摘要: 生成合成的电子健康记录（EHRs）具有显著的数据增强、保护隐私的数据共享和改进机器学习模型训练的潜力。我们提出了一种针对结构化EHR数据量身定制的新颖标记化策略，该策略涵盖了多种数据类型，如协变量、ICD编码和不规则采样的时间序列。通过使用类似GPT的仅解码器的变压器模型，我们展示了高质量合成EHR的生成。我们使用MIMIC-III数据集对我们的方法进行评估，并将生成数据的忠实度、实用性和隐私性与最先进的模型进行基准测试。

更新时间: 2024-11-20 16:11:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.13428v1

WaterPark: A Robustness Assessment of Language Model Watermarking

To mitigate the misuse of large language models (LLMs), such as disinformation, automated phishing, and academic cheating, there is a pressing need for the capability of identifying LLM-generated texts. Watermarking emerges as one promising solution: it plants statistical signals into LLMs' generative processes and subsequently verifies whether LLMs produce given texts. Various watermarking methods (``watermarkers'') have been proposed; yet, due to the lack of unified evaluation platforms, many critical questions remain under-explored: i) What are the strengths/limitations of various watermarkers, especially their attack robustness? ii) How do various design choices impact their robustness? iii) How to optimally operate watermarkers in adversarial environments? To fill this gap, we systematize existing LLM watermarkers and watermark removal attacks, mapping out their design spaces. We then develop WaterPark, a unified platform that integrates 10 state-of-the-art watermarkers and 12 representative attacks. More importantly, leveraging WaterPark, we conduct a comprehensive assessment of existing watermarkers, unveiling the impact of various design choices on their attack robustness. For instance, a watermarker's resilience to increasingly intensive attacks hinges on its context dependency. We further explore the best practices to operate watermarkers in adversarial environments. For instance, using a generic detector alongside a watermark-specific detector improves the security of vulnerable watermarkers. We believe our study sheds light on current LLM watermarking techniques while WaterPark serves as a valuable testbed to facilitate future research.

Updated: 2024-11-20 16:09:22

标题: WaterPark：语言模型水印的健壮性评估

摘要: 为了减轻大型语言模型（LLMs）被滥用的问题，例如虚假信息、自动钓鱼和学术作弊，迫切需要具备识别LLM生成文本能力。水印技术被提出作为一种有前途的解决方案：它将统计信号嵌入到LLMs的生成过程中，随后验证LLMs是否生成给定文本。已经提出了各种水印方法（“水印工具”）；然而，由于缺乏统一的评估平台，许多关键问题仍未得到深入探讨：i）各种水印工具的优势/局限性是什么，特别是它们的攻击鲁棒性？ii）各种设计选择如何影响它们的鲁棒性？iii）如何在对抗环境中最佳地操作水印工具？为了填补这一空白，我们系统化现有的LLM水印工具和水印去除攻击，勾勒出它们的设计空间。然后，我们开发了WaterPark，一个集成了10种最先进水印工具和12种代表性攻击的统一平台。更重要的是，利用WaterPark，我们对现有水印工具进行了全面评估，揭示了各种设计选择对其攻击鲁棒性的影响。例如，一个水印工具对越来越强烈的攻击的抵抗力取决于它的上下文依赖性。我们进一步探讨在对抗环境中操作水印工具的最佳实践。例如，使用一个通用探测器和一个特定水印探测器可以提高容易攻击的水印工具的安全性。我们相信我们的研究对当前的LLM水印技术有所启示，而WaterPark则作为一个有价值的试验平台，有助于促进未来的研究。

更新时间: 2024-11-20 16:09:22

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.13425v1

No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

Reinforcement learning (RL) is inherently rife with non-stationarity since the states and rewards the agent observes during training depend on its changing policy. Therefore, networks in deep RL must be capable of adapting to new observations and fitting new targets. However, previous works have observed that networks trained under non-stationarity exhibit an inability to continue learning, termed loss of plasticity, and eventually a collapse in performance. For off-policy deep value-based RL methods, this phenomenon has been correlated with a decrease in representation rank and the ability to fit random targets, termed capacity loss. Although this correlation has generally been attributed to neural network learning under non-stationarity, the connection to representation dynamics has not been carefully studied in on-policy policy optimization methods. In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments, revealing that PPO agents are also affected by feature rank deterioration and capacity loss. We show that this is aggravated by stronger non-stationarity, ultimately driving the actor's performance to collapse, regardless of the performance of the critic. We ask why the trust region, specific to methods like PPO, cannot alleviate or prevent the collapse and find a connection between representation collapse and the degradation of the trust region, one exacerbating the other. Finally, we present Proximal Feature Optimization (PFO), a novel auxiliary loss that, along with other interventions, shows that regularizing the representation dynamics mitigates the performance collapse of PPO agents.

Updated: 2024-11-20 16:07:04

标题: 没有代表，就没有信任：连接PPO中的代表性、崩溃和信任问题

摘要: 强化学习（RL）在本质上充满了非稳态性，因为训练过程中代理观察到的状态和奖励取决于其不断变化的策略。因此，深度RL中的网络必须能够适应新观察结果并拟合新目标。然而，先前的研究观察到，在非稳态条件下训练的网络表现出无法继续学习的能力，称为可塑性丧失，并最终导致性能崩溃。对于基于值的离线深度RL方法，这种现象已经与表示秩的降低和适应随机目标的能力下降相关联，称为容量丧失。尽管这种相关性通常被归因于神经网络在非稳态条件下的学习，但与策略优化方法中的表示动态的联系尚未得到仔细研究。在这项工作中，我们在Atari和MuJoCo环境中对Proximal Policy Optimization（PPO）中的表示动态进行了实证研究，发现PPO代理也受到特征秩恶化和容量丧失的影响。我们表明，这种情况受到更强的非稳态性的加剧，最终导致行动者的性能崩溃，而批评者的性能不受影响。我们探讨了为什么像PPO这样的方法特有的信任区域不能缓解或防止崩溃，并找到了表示崩溃和信任区域退化之间的联系，二者相互加剧。最后，我们提出了Proximal Feature Optimization（PFO），这是一种新颖的辅助损失，与其他干预措施一起，表明正则化表示动态可以减轻PPO代理的性能崩溃。

更新时间: 2024-11-20 16:07:04

领域: cs.LG

下载: http://arxiv.org/abs/2405.00662v3

Heuristically Adaptive Diffusion-Model Evolutionary Strategy

Diffusion Models represent a significant advancement in generative modeling, employing a dual-phase process that first degrades domain-specific information via Gaussian noise and restores it through a trainable model. This framework enables pure noise-to-data generation and modular reconstruction of, images or videos. Concurrently, evolutionary algorithms employ optimization methods inspired by biological principles to refine sets of numerical parameters encoding potential solutions to rugged objective functions. Our research reveals a fundamental connection between diffusion models and evolutionary algorithms through their shared underlying generative mechanisms: both methods generate high-quality samples via iterative refinement on random initial distributions. By employing deep learning-based diffusion models as generative models across diverse evolutionary tasks and iteratively refining diffusion models with heuristically acquired databases, we can iteratively sample potentially better-adapted offspring parameters, integrating them into successive generations of the diffusion model. This approach achieves efficient convergence toward high-fitness parameters while maintaining explorative diversity. Diffusion models introduce enhanced memory capabilities into evolutionary algorithms, retaining historical information across generations and leveraging subtle data correlations to generate refined samples. We elevate evolutionary algorithms from procedures with shallow heuristics to frameworks with deep memory. By deploying classifier-free guidance for conditional sampling at the parameter level, we achieve precise control over evolutionary search dynamics to further specific genotypical, phenotypical, or population-wide traits. Our framework marks a major heuristic and algorithmic transition, offering increased flexibility, precision, and control in evolutionary optimization processes.

Updated: 2024-11-20 16:06:28

标题: 经验适应扩散模型进化策略

摘要: 扩散模型代表了生成建模中的重大进步，采用了首先通过高斯噪声降解领域特定信息，然后通过可训练模型恢复的双相过程。这一框架实现了从纯噪声到数据生成和图像或视频的模块化重建。与此同时，进化算法采用受生物学原理启发的优化方法来优化编码潜在解决方案的一组数值参数，以适应复杂的客观函数。我们的研究揭示了扩散模型和进化算法之间的基本联系，因为它们共享底层的生成机制：两种方法都通过对随机初始分布的迭代细化生成高质量样本。通过将基于深度学习的扩散模型作为生成模型应用于各种进化任务，并通过启发式获得的数据库迭代地完善扩散模型，我们可以迭代地采样潜在适应更好的后代参数，并将它们整合到扩散模型的连续生成中。这种方法实现了高适应性参数的有效收敛，同时保持探索性多样性。扩散模型为进化算法引入了增强的记忆能力，跨代保留历史信息，并利用微妙的数据相关性生成精细的样本。我们将进化算法从具有浅层启发式的程序转变为具有深层记忆的框架。通过在参数级别进行无分类器引导的条件采样，我们实现了对进化搜索动态的精确控制，以进一步发展特定的基因型、表型或整体群体特征。我们的框架标志着一个重大的启发式和算法转变，为进化优化过程提供了增强的灵活性、精确性和控制。

更新时间: 2024-11-20 16:06:28

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13420v1

Harpocrates: A Statically Typed Privacy Conscious Programming Framework

In this paper, we introduce Harpocrates, a compiler plugin and a framework pair for Scala that binds the privacy policies to the data during data creation in form of oblivious membranes. Harpocrates eliminates raw data for a policy protected type from the application, ensuring it can only exist in protected form and centralizes the policy checking to the policy declaration site, making the privacy logic easy to maintain and verify. Instead of approaching privacy from an information flow verification perspective, Harpocrates allow the data to flow freely throughout the application, inside the policy membranes but enforces the policies when the data is tried to be accessed, mutated, declassified or passed through the application boundary. The centralization of the policies allow the maintainers to change the enforced logic simply by updating a single function while keeping the rest of the application oblivious to the change. Especially in a setting where the data definition is shared by multiple applications, the publisher can update the policies without requiring the dependent applications to make any changes beyond updating the dependency version.

Updated: 2024-11-20 16:02:55

标题: 哈普克拉底：一种静态类型的隐私保护编程框架

摘要: 在这篇论文中，我们介绍了Harpocrates，这是一个为Scala提供的编译器插件和框架对，它在数据创建过程中将隐私政策绑定到数据上，形成隐形膜。Harpocrates从应用程序中消除了受保护类型的原始数据，确保它只能以受保护的形式存在，并将策略检查集中到策略声明站点，使隐私逻辑易于维护和验证。与从信息流验证的角度来看待隐私不同，Harpocrates允许数据在应用程序中自由流动，在策略膜内部自由流动，但当试图访问、突变、解密或通过应用程序边界传递数据时强制执行策略。策略的集中化使维护者可以通过简单更新单个函数来更改强制逻辑，同时保持应用程序的其他部分对更改的无知。特别是在数据定义被多个应用程序共享的情况下，发布者可以更新策略，而无需需要依赖应用程序进行任何更改，只需更新依赖版本即可。

更新时间: 2024-11-20 16:02:55

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.06317v2

A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback

Reinforcement learning (RL) is one of the active fields in machine learning, demonstrating remarkable potential in tackling real-world challenges. Despite its promising prospects, this methodology has encountered with issues and challenges, hindering it from achieving the best performance. In particular, these approaches lack decent performance when navigating environments and solving tasks with large observation space, often resulting in sample-inefficiency and prolonged learning times. This issue, commonly referred to as the curse of dimensionality, complicates decision-making for RL agents, necessitating a careful balance between attention and decision-making. RL agents, when augmented with human or large language models' (LLMs) feedback, may exhibit resilience and adaptability, leading to enhanced performance and accelerated learning. Such feedback, conveyed through various modalities or granularities including natural language, serves as a guide for RL agents, aiding them in discerning relevant environmental cues and optimizing decision-making processes. In this survey paper, we mainly focus on problems of two-folds: firstly, we focus on humans or an LLMs assistance, investigating the ways in which these entities may collaborate with the RL agent in order to foster optimal behavior and expedite learning; secondly, we delve into the research papers dedicated to addressing the intricacies of environments characterized by large observation space.

Updated: 2024-11-20 15:52:03

标题: 在复杂环境中增强强化学习的调查：来自人类和LLM反馈的见解

摘要: 强化学习（RL）是机器学习中一个活跃的领域，展现出在解决现实世界挑战方面的显著潜力。尽管展现出令人期待的前景，这种方法遇到了问题和挑战，阻碍了其达到最佳性能。特别是在导航环境和解决具有大观察空间的任务时，这些方法缺乏良好的性能，通常导致样本效率低下和学习时间延长。这个问题通常被称为维度灾难，给RL代理的决策制定带来了复杂性，需要在关注力和决策制定之间进行谨慎平衡。当RL代理与人类或大型语言模型（LLMs）的反馈相结合时，可能表现出韧性和适应性，从而提高性能并加快学习速度。这种反馈通过各种模态或粒度传达，包括自然语言，作为RL代理的指导，帮助它们识别相关的环境线索并优化决策过程。在这篇综述论文中，我们主要关注两个问题：首先，我们关注人类或LLMs的协助，研究这些实体如何与RL代理合作以促进最佳行为并加快学习；其次，我们深入研究致力于解决具有大观察空间环境复杂性的研究论文。

更新时间: 2024-11-20 15:52:03

领域: cs.LG

下载: http://arxiv.org/abs/2411.13410v1

Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology

The language called Balti belongs to the Sino-Tibetan, specifically the Tibeto-Burman language family. It is understood with variations, across populations in India, China, Pakistan, Nepal, Tibet, Burma, and Bhutan, influenced by local cultures and producing various dialects. Considering the diverse cultural, socio-political, religious, and geographical impacts, it is important to step forward unifying the dialects, the basis of common root, lexica, and phonological perspectives, is vital. In the era of globalization and the increasingly frequent developments in AI technology, understanding the diversity and the efforts of dialect unification is important to understanding commonalities and shortening the gaps impacted by unavoidable circumstances. This article analyzes and examines how artificial intelligence AI in the essence of Large Language Models LLMs, can assist in analyzing, documenting, and standardizing the endangered Balti Language, based on the efforts made in different dialects so far.

Updated: 2024-11-20 15:48:21

标题: 巴尔蒂语和跨境姐妹方言在LLMs和人工智能技术本质中的统一

摘要: 名为巴尔蒂语的语言属于汉藏语系，具体来说是藏缅语系。在印度、中国、巴基斯坦、尼泊尔、西藏、缅甸和不丹等各地的人口群体中，这种语言以不同程度被理解，受当地文化影响并产生各种方言。考虑到多样化的文化、社会政治、宗教和地理影响，统一方言、共同根源、词汇和语音角度是至关重要的。在全球化时代和人工智能技术日益频繁的发展中，理解多样性和方言统一的努力对于理解共同点并缩小受不可避免情况影响的差距至关重要。本文分析和研究了人工智能技术中大型语言模型（LLMs）的本质，如何协助分析、记录和标准化濒危的巴尔蒂语言，基于迄今为止在不同方言中所作的努力。

更新时间: 2024-11-20 15:48:21

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.13409v1

On the Way to LLM Personalization: Learning to Remember User Conversations

Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks. However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization. Prior work in LLM personalization has largely focused on style transfer or incorporating small factoids about the user, as knowledge injection remains an open challenge. In this paper, we explore injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations. We identify two real-world constraints: (1) conversations are sequential in time and must be treated as such during training, and (2) per-user personalization is only viable in parameter-efficient settings. To this aim, we propose PLUM, a pipeline performing data augmentation for up-sampling conversations as question-answer pairs, that are then used to finetune a low-rank adaptation adapter with a weighted cross entropy loss. Even in this first exploration of the problem, we perform competitively with baselines such as RAG, attaining an accuracy of 81.5% across 100 conversations.

Updated: 2024-11-20 15:45:08

标题: 走向LLM个性化：学习记忆用户对话

摘要: 大型语言模型（LLMs）已经迅速成为各种任务的宝贵助手。然而，它们的有效性受到个性化定制响应以适应人类偏好和行为的能力的限制。以往在LLM个性化方面的研究主要集中在风格转移或整合用户的小事实，因为知识注入仍然是一个开放挑战。在本文中，我们探讨将先前对话的知识注入LLMs，以便未来对不那么冗余、个性化的对话进行工作。我们确定了两个现实世界的约束：（1）对话是按时间顺序进行的，必须在训练期间进行处理；（2）每个用户的个性化只在参数高效的设置中才是可行的。为此，我们提出了PLUM，一个管道，执行数据增强以将对话作为问题-答案对进行上采样，然后使用加权交叉熵损失来微调低秩适应适配器。即使在这个问题的首次探索中，我们也与RAG等基线表现有竞争力，在100个对话中取得了81.5％的准确度。

更新时间: 2024-11-20 15:45:08

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.13405v1

When Context Leads but Parametric Memory Follows in Large Language Models

Large language models (LLMs) have demonstrated remarkable progress in leveraging diverse knowledge sources. This study investigates how nine widely used LLMs allocate knowledge between local context and global parameters when answering open-ended questions in knowledge-consistent scenarios. We introduce a novel dataset, WikiAtomic, and systematically vary context sizes to analyze how LLMs prioritize and utilize the provided information and their parametric knowledge in knowledge-consistent scenarios. Additionally, we also study their tendency to hallucinate under varying context sizes. Our findings reveal consistent patterns across models, including a consistent reliance on both contextual (around 70%) and parametric (around 30%) knowledge, and a decrease in hallucinations with increasing context. These insights highlight the importance of more effective context organization and developing models that use input more deterministically for robust performance.

Updated: 2024-11-20 15:41:38

标题: 当上下文引导时，但参数化记忆在大型语言模型中跟随。

摘要: 大型语言模型（LLMs）在整合多样化知识来源方面取得了显著进展。本研究调查了九种广泛使用的LLMs在知识一致场景中回答开放性问题时如何分配知识给局部上下文和全局参数。我们引入了一个新的数据集，WikiAtomic，并系统地变化上下文大小，以分析LLMs如何优先考虑和利用所提供的信息以及知识一致场景下他们的参数化知识。此外，我们还研究了在不同上下文大小下他们产生幻觉的倾向。我们的研究结果揭示了模型之间的一致模式，包括对上下文（约70%）和参数化（约30%）知识的一致依赖，并且随着上下文增加，幻觉减少。这些发现凸显了更有效的上下文组织和开发更具确定性的模型对于稳健表现的重要性。

更新时间: 2024-11-20 15:41:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.08435v3

SNIP: Speculative Execution and Non-Interference Preservation for Compiler Transformations

We address the problem of preserving non-interference across compiler transformations under speculative semantics. We develop a proof method that ensures the preservation uniformly across all source programs. The basis of our proof method is a new form of simulation relation. It operates over directives that model the attacker's control over the micro-architectural state, and it accounts for the fact that the compiler transformation may change the influence of the micro-architectural state on the execution (and hence the directives). Using our proof method, we show the correctness of dead code elimination. When we tried to prove register allocation correct, we identified a previously unknown weakness that introduces violations to non-interference. We have confirmed the weakness for a mainstream compiler on code from the libsodium cryptographic library. To reclaim security once more, we develop a novel static analysis that operates on a product of source program and register-allocated program. Using the analysis, we present an automated fix to existing register allocation implementations. We prove the correctness of the fixed register allocations with our proof method.

Updated: 2024-11-20 15:23:46

标题: SNIP：用于编译器转换的推测执行和非干扰保护

摘要: 我们解决了在具有猜测语义的编译器转换中保持非干扰性的问题。我们开发了一种证明方法，确保在所有源程序中统一地保持非干扰性。我们的证明方法的基础是一种新形式的模拟关系。它在模拟攻击者对微体系结构状态的控制的指令上运行，并考虑到编译器转换可能改变微体系结构状态对执行（以及指令）的影响。使用我们的证明方法，我们展示了死代码消除的正确性。当我们尝试证明寄存器分配正确时，我们发现了一个之前未知的弱点，会导致非干扰性的违反。我们已经确认了这个弱点，针对libsodium加密库中的代码在一个主流编译器上。为了再次恢复安全性，我们开发了一种静态分析方法，该方法在源程序和寄存器分配程序的乘积上运行。使用此分析，我们提出了对现有寄存器分配实现的自动修复方案。我们使用我们的证明方法证明了修复后的寄存器分配的正确性。

更新时间: 2024-11-20 15:23:46

领域: cs.PL,cs.CR

下载: http://arxiv.org/abs/2407.15080v2

Provable unlearning in topic modeling and downstream tasks

Machine unlearning algorithms are increasingly important as legal concerns arise around the provenance of training data, but verifying the success of unlearning is often difficult. Provable guarantees for unlearning are often limited to supervised learning settings. In this paper, we provide the first theoretical guarantees for unlearning in the pre-training and fine-tuning paradigm by studying topic models, simple bag-of-words language models that can be adapted to solve downstream tasks like retrieval and classification. First, we design a provably effective unlearning algorithm for topic models that incurs a computational overhead independent of the size of the original dataset. Our analysis additionally quantifies the deletion capacity of the model -- i.e., the number of examples that can be unlearned without incurring a significant cost in model performance. Finally, we formally extend our analyses to account for adaptation to a given downstream task. In particular, we design an efficient algorithm to perform unlearning after fine-tuning the topic model via a linear head. Notably, we show that it is easier to unlearn pre-training data from models that have been fine-tuned to a particular task, and one can unlearn this data without modifying the base model.

Updated: 2024-11-20 15:01:04

标题: 主题建模和下游任务中可证明的遗忘

摘要: 机器遗忘算法在训练数据来源的法律问题日益重要，但验证遗忘的成功通常很困难。通常情况下，对遗忘的可证明保证仅限于监督学习设置。在本文中，我们通过研究主题模型，即简单的词袋语言模型，为预训练和微调范式中的遗忘提供了首个理论保证，这些模型可以适应解决检索和分类等下游任务。首先，我们设计了一个经过证明有效的主题模型遗忘算法，其计算开销与原始数据集大小无关。我们的分析还量化了模型的删除能力，即可以遗忘的示例数量，而不会造成模型性能的显著损失。最后，我们正式扩展了我们的分析以考虑适应给定的下游任务。特别是，我们设计了一种高效算法，在通过线性头微调主题模型后执行遗忘。值得注意的是，我们表明，从已经微调到特定任务的模型中遗忘预训练数据更容易，且可以在不修改基础模型的情况下遗忘这些数据。

更新时间: 2024-11-20 15:01:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.12600v2

ODTE -- An ensemble of multi-class SVM-based oblique decision trees

We propose ODTE, a new ensemble that uses oblique decision trees as base classifiers. Additionally, we introduce STree, the base algorithm for growing oblique decision trees, which leverages support vector machines to define hyperplanes within the decision nodes. We embed a multiclass strategy -- one-vs-one or one-vs-rest -- at the decision nodes, allowing the model to directly handle non-binary classification tasks without the need to cluster instances into two groups, as is common in other approaches from the literature. In each decision node, only the best-performing model SVM -- the one that minimizes an impurity measure for the n-ary classification -- is retained, even if the learned SVM addresses a binary classification subtask. An extensive experimental study involving 49 datasets and various state-of-the-art algorithms for oblique decision tree ensembles has been conducted. Our results show that ODTE ranks consistently above its competitors, achieving significant performance gains when hyperparameters are carefully tuned. Moreover, the oblique decision trees learned through STree are more compact than those produced by other algorithms evaluated in our experiments.

Updated: 2024-11-20 14:58:32

标题: ODTE -- 一种基于多类支持向量机的斜切决策树集成算法

摘要: 我们提出了ODTE，一种新的集成方法，它使用斜决策树作为基分类器。此外，我们引入了STree，用于生长斜决策树的基本算法，它利用支持向量机在决策节点中定义超平面。在决策节点中嵌入了一种多类策略--一对一或一对其余--使模型能够直接处理非二进制分类任务，而无需像文献中其他方法那样将实例聚类成两组。在每个决策节点中，仅保留表现最佳的SVM模型--即最小化n元分类的不纯度度量的模型--即使学习的SVM处理二进制分类子任务。我们进行了一项涉及49个数据集和各种最先进算法的实验研究，用于斜决策树集成。我们的结果表明，当超参数被仔细调整时，ODTE始终排名高于竞争对手，实现了显著的性能提升。此外，通过STree学习的斜决策树比我们实验中评估的其他算法生成的树更紧凑。

更新时间: 2024-11-20 14:58:32

领域: cs.LG

下载: http://arxiv.org/abs/2411.13376v1

Predicting Wall Thickness Changes in Cold Forging Processes: An Integrated FEM and Neural Network approach

This study presents a novel approach for predicting wall thickness changes in tubes during the nosing process. Specifically, we first provide a thorough analysis of nosing processes and the influencing parameters. We further set-up a Finite Element Method (FEM) simulation to better analyse the effects of varying process parameters. As however traditional FEM simulations, while accurate, are time-consuming and computationally intensive, which renders them inapplicable for real-time application, we present a novel modeling framework based on specifically designed graph neural networks as surrogate models. To this end, we extend the neural network architecture by directly incorporating information about the nosing process by adding different types of edges and their corresponding encoders to model object interactions. This augmentation enhances model accuracy and opens the possibility for employing precise surrogate models within closed-loop production processes. The proposed approach is evaluated using a new evaluation metric termed area between thickness curves (ABTC). The results demonstrate promising performance and highlight the potential of neural networks as surrogate models in predicting wall thickness changes during nosing forging processes.

Updated: 2024-11-20 14:42:53

标题: 预测冷锻工艺中壁厚变化：一种集成有限元和神经网络方法

摘要: 这项研究提出了一种新颖的方法，用于预测管道在嵌头过程中的壁厚变化。具体来说，我们首先对嵌头过程和影响参数进行了彻底分析。我们进一步建立了有限元方法(Finite Element Method, FEM)模拟，以更好地分析不同工艺参数的影响。然而，传统的有限元方法模拟虽然精确，但耗时且计算密集，因此不适用于实时应用，因此我们提出了一种基于专门设计的图神经网络作为替代模型的新型建模框架。为此，我们通过直接将有关嵌头过程的信息添加不同类型的边和它们对应的编码器来扩展神经网络架构，以建模对象间的相互作用。这种增强方式提高了模型的准确性，并为在闭环生产过程中使用精确替代模型打开了可能性。我们使用一种称为厚度曲线之间面积(ABTC)的新评估指标对所提出的方法进行评估。结果显示了有希望的性能，并突显了神经网络作为替代模型在预测嵌头锻造过程中壁厚变化方面的潜力。

更新时间: 2024-11-20 14:42:53

领域: cs.LG

下载: http://arxiv.org/abs/2411.13366v1

Explainable Finite-Memory Policies for Partially Observable Markov Decision Processes

Partially Observable Markov Decision Processes (POMDPs) are a fundamental framework for decision-making under uncertainty and partial observability. Since in general optimal policies may require infinite memory, they are hard to implement and often render most problems undecidable. Consequently, finite-memory policies are mostly considered instead. However, the algorithms for computing them are typically very complex, and so are the resulting policies. Facing the need for their explainability, we provide a representation of such policies, both (i) in an interpretable formalism and (ii) typically of smaller size, together yielding higher explainability. To that end, we combine models of Mealy machines and decision trees; the latter describing simple, stationary parts of the policies and the former describing how to switch among them. We design a translation for policies of the finite-state-controller (FSC) form from standard literature and show how our method smoothly generalizes to other variants of finite-memory policies. Further, we identify specific properties of recently used "attractor-based" policies, which allow us to construct yet simpler and smaller representations. Finally, we illustrate the higher explainability in a few case studies.

Updated: 2024-11-20 14:42:23

标题: 解释部分可观察马尔可夫决策过程的可解释有限记忆策略

摘要: 部分可观察马尔可夫决策过程（POMDPs）是在不确定性和部分可观察性下进行决策的基本框架。由于通常最优策略可能需要无限的记忆，它们很难实现，并且通常使大多数问题变得不可解。因此，通常考虑有限记忆策略。然而，计算它们的算法通常非常复杂，因此产生的策略也是如此。面对对它们的可解释性的需求，我们提供这些策略的表示，既（i）在可解释的形式化中，又（ii）通常规模较小，从而提供更高的可解释性。为此，我们将美利机器和决策树的模型结合起来；后者描述策略的简单、静态部分，前者描述如何在它们之间切换。我们设计了一个从标准文献中的有限状态控制器（FSC）形式的策略进行翻译，并展示了我们的方法如何平稳地推广到其他有限记忆策略的变体。此外，我们确定了最近使用的“吸引子为基础”的策略的特定属性，这使我们能够构建更简单、更小的表示。最后，我们通过几个案例研究展示了更高的可解释性。

更新时间: 2024-11-20 14:42:23

领域: cs.AI,cs.LG,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.13365v1

Random Representations Outperform Online Continually Learned Representations

Continual learning has primarily focused on the issue of catastrophic forgetting and the associated stability-plasticity tradeoffs. However, little attention has been paid to the efficacy of continually learned representations, as representations are learned alongside classifiers throughout the learning process. Our primary contribution is empirically demonstrating that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms. Our approach projects raw pixels using a fixed random transform, approximating an RBF-Kernel initialized before any data is seen. We then train a simple linear classifier on top without storing any exemplars, processing one sample at a time in an online continual learning setting. This method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all standard online continual learning benchmarks. Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios. Extending our investigation to popular exemplar-free scenarios with pretrained models, we find that training only a linear classifier on top of pretrained representations surpasses most continual fine-tuning and prompt-tuning strategies. Overall, our investigation challenges the prevailing assumptions about effective representation learning in online continual learning. Our code is available at://github.com/drimpossible/RanDumb.

Updated: 2024-11-20 14:33:10

标题: 随机表示优于在线持续学习的表示形式

摘要: 持续学习主要关注灾难性遗忘问题及相关的稳定性-可塑性权衡。然而，对于持续学习到的表示的有效性却很少关注，因为表示是在学习过程中与分类器一起学习的。我们的主要贡献是实证地证明，现有的在线持续训练的深度网络产生的表示比简单预定义的随机变换要差。我们的方法是使用固定的随机变换对原始像素进行投影，近似于在看到任何数据之前初始化的RBF-Kernel。然后，我们在顶部训练一个简单的线性分类器，不存储任何样本，以在线持续学习设置中一次处理一个样本。这种方法称为RanDumb，在所有标准在线持续学习基准测试中明显优于最先进的持续学习表示。我们的研究揭示了表示学习的显著局限性，特别是在低样本和在线持续学习场景中。将我们的调查扩展到流行的无样本情景和预训练模型，我们发现仅在预训练表示的基础上训练一个线性分类器超过了大多数持续微调和提示调整策略。总的来说，我们的调查挑战了关于在线持续学习中有效表示学习的主流假设。我们的代码可在https://github.com/drimpossible/RanDumb找到。

更新时间: 2024-11-20 14:33:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.08823v3

Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions

There has been a growing excitement that implicit graph generative models could be used to design or discover new molecules for medicine or material design. Because these molecules have not been discovered, they naturally lie in unexplored or scarcely supported regions of the distribution of known molecules. However, prior evaluation methods for implicit graph generative models have focused on validating statistics computed from the thick support (e.g., mean and variance of a graph property). Therefore, there is a mismatch between the goal of generating novel graphs and the evaluation methods. To address this evaluation gap, we design a novel evaluation method called Vertical Validation (VV) that systematically creates thin support regions during the train-test splitting procedure and then reweights generated samples so that they can be compared to the held-out test data. This procedure can be seen as a generalization of the standard train-test procedure except that the splits are dependent on sample features. We demonstrate that our method can be used to perform model selection if performance on thin support regions is the desired goal. As a side benefit, we also show that our approach can better detect overfitting as exemplified by memorization.

Updated: 2024-11-20 14:29:59

标题: 竖直验证：评估图形隐式生成模型在薄支持区域上的性能

摘要: 最近，人们对隐式图生成模型在设计或发现新的药物或材料方面的潜力越来越感兴趣。因为这些分子尚未被发现，它们自然地位于已知分子分布的未探索或支持较少的区域。然而，先前针对隐式图生成模型的评估方法主要集中在验证从密集支持中计算得出的统计数据（例如图属性的均值和方差）。因此，生成新图形的目标与评估方法之间存在不匹配。为了解决这一评估差距，我们设计了一种名为“垂直验证（VV）”的新型评估方法，在训练-测试分割过程中系统地创建薄支持区域，然后重新加权生成的样本，以便与留出的测试数据进行比较。这个过程可以被看作是标准训练-测试过程的泛化，只是分割依赖于样本特征。我们证明了我们的方法可以用于进行模型选择，如果在薄支持区域上的性能是期望的目标的话。作为一个附带好处，我们还展示了我们的方法可以更好地检测过拟合，正如通过记忆所示。

更新时间: 2024-11-20 14:29:59

领域: cs.LG

下载: http://arxiv.org/abs/2411.13358v1

Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications

In this paper, conditional denoising diffusion probabilistic models (DDPMs) are proposed to enhance the data transmission and reconstruction over wireless channels. The underlying mechanism of DDPM is to decompose the data generation process over the so-called "denoising" steps. Inspired by this, the key idea is to leverage the generative prior of diffusion models in learning a "noisy-to-clean" transformation of the information signal to help enhance data reconstruction. The proposed scheme could be beneficial for communication scenarios in which a prior knowledge of the information content is available, e.g., in multimedia transmission. Hence, instead of employing complicated channel codes that reduce the information rate, one can exploit diffusion priors for reliable data reconstruction, especially under extreme channel conditions due to low signal-to-noise ratio (SNR), or hardware-impaired communications. The proposed DDPM-assisted receiver is tailored for the scenario of wireless image transmission using MNIST dataset. Our numerical results highlight the reconstruction performance of our scheme compared to the conventional digital communication, as well as the deep neural network (DNN)-based benchmark. It is also shown that more than 10 dB improvement in the reconstruction could be achieved in low SNR regimes, without the need to reduce the information rate for error correction.

Updated: 2024-11-20 14:24:25

标题: 在无线通信中用于数据重建增强的条件去噪扩散概率模型

摘要: 本文提出了一种条件去噪扩散概率模型（DDPM），用于增强通过无线通道的数据传输和重建。DDPM的基本机制是将数据生成过程分解为所谓的“去噪”步骤。受此启发，关键思想是利用扩散模型的生成先验，在学习信息信号的“嘈杂到清晰”转换中帮助增强数据重建。所提出的方案对于通信场景可能是有益的，其中可使用信息内容的先验知识，例如多媒体传输。因此，可以利用扩散先验进行可靠的数据重建，特别是在由于低信噪比（SNR）或硬件受损而导致的极端通道条件下，而不是采用降低信息速率的复杂信道编码。所提出的DDPM辅助接收器专为使用MNIST数据集进行无线图像传输的场景而设计。我们的数值结果突出了我们的方案与传统数字通信以及基于深度神经网络（DNN）的基准的重建性能的对比。还表明，在低SNR区域可以实现超过10 dB的重建改进，而无需降低信息率进行误差校正。

更新时间: 2024-11-20 14:24:25

领域: cs.IT,cs.AI,cs.LG,math.IT

下载: http://arxiv.org/abs/2310.19460v3

Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation

Language Models (LMs) have become widely used in software engineering, especially for tasks such as code generation, where they are referred to as code LMs. These models have proven effective in generating code, making it easier for developers to automate coding activities. However, research has highlighted a significant limitation: despite their effectiveness, LMs often produce code that is incorrect, buggy, or not fully functional. Updating these models with limited data can be prohibitively challenging, yet it is essential to maximize their utility. This may require hot-fix techniques (updating models with limited data) to resolve. In this paper, we propose \ul{M}odel \ul{I}mprovement via \ul{N}euron \ul{T}argeting (\textsc{MINT}), a novel approach for repairing code LMs. MINT leverages the semantic property of language models to perform neuron-level repairs in a novel way. Further, by analyzing the relationships between the model's latent representations, the incorrect outputs, and the desired outputs, \textsc{MINT} determines which neurons are worth updating. This approach ensures that only the neurons crucial to the model's failure are targeted, avoiding unnecessary changes and allowing for a more efficient and precise repair process. \textsc{MINT} is effective, efficient, and reliable, capable of correcting a neural model by patching a minimum number of neurons (usually one or two neurons). Our approach is evaluated on three coding tasks: line-level code generation, shellcode generation, and intent-to-bash translation. The experimental results demonstrate that the proposed approach significantly outperforms the state-of-the-art in both effectiveness and efficiency measures. In addition, we analyze and discuss the side effects of model repair techniques, including the balance between generalization and specificity, and the performance after multiple repairs in succession.

Updated: 2024-11-20 14:22:06

标题: 神经元修补：基于语义的神经元级语言模型修复用于代码生成

摘要: 语言模型（LMs）已经在软件工程中被广泛应用，特别是在代码生成等任务中，被称为代码LMs。这些模型已被证明在生成代码方面非常有效，使开发人员能够更容易地自动化编码活动。然而，研究已经强调了一个重要的局限性：尽管它们很有效，LMs经常产生不正确、有bug或不完全功能的代码。更新这些模型只有有限数据可能非常具有挑战性，但是最大化它们的效用是必要的。这可能需要热修复技术（用有限数据更新模型）来解决。在本文中，我们提出了通过神经元定位模型改进（MINT）的新方法，这是一种修复代码LMs的新方法。MINT利用语言模型的语义属性以一种新颖的方式进行神经元级别的修复。此外，通过分析模型的潜在表示、错误输出和期望输出之间的关系，MINT确定了哪些神经元值得更新。这种方法确保只有对模型失败至关重要的神经元是目标，避免不必要的更改，从而实现更高效和精确的修复过程。MINT是有效、高效和可靠的，能够通过修补最少数量的神经元（通常是一个或两个神经元）来纠正神经模型。我们的方法在三个编码任务上进行了评估：行级代码生成、shellcode生成和意图转换为bash代码。实验结果表明，所提出的方法在效果和效率指标上明显优于现有技术。此外，我们分析和讨论了模型修复技术的副作用，包括泛化和特定性之间的平衡，以及连续多次修复后的性能。

更新时间: 2024-11-20 14:22:06

领域: cs.SE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.05356v5

CryptoFormalEval: Integrating LLMs and Formal Verification for Automated Cryptographic Protocol Vulnerability Detection

Cryptographic protocols play a fundamental role in securing modern digital infrastructure, but they are often deployed without prior formal verification. This could lead to the adoption of distributed systems vulnerable to attack vectors. Formal verification methods, on the other hand, require complex and time-consuming techniques that lack automatization. In this paper, we introduce a benchmark to assess the ability of Large Language Models (LLMs) to autonomously identify vulnerabilities in new cryptographic protocols through interaction with Tamarin: a theorem prover for protocol verification. We created a manually validated dataset of novel, flawed, communication protocols and designed a method to automatically verify the vulnerabilities found by the AI agents. Our results about the performances of the current frontier models on the benchmark provides insights about the possibility of cybersecurity applications by integrating LLMs with symbolic reasoning systems.

Updated: 2024-11-20 14:16:55

标题: CryptoFormalEval：集成LLMs和形式验证，用于自动加密协议漏洞检测

摘要: 加密协议在保护现代数字基础设施方面发挥着重要作用，但它们通常在没有先前形式验证的情况下部署。这可能导致采用易受攻击向量影响的分布式系统。另一方面，形式验证方法需要复杂且耗时的技术，缺乏自动化。在本文中，我们引入了一个基准来评估大型语言模型（LLMs）通过与Tamarin互动来自主识别新加密协议中的漏洞的能力：Tamarin是一个用于协议验证的定理证明器。我们创建了一个手动验证的新颖、有缺陷的通信协议数据集，并设计了一种方法，能够自动验证AI代理发现的漏洞。我们关于当前前沿模型在基准测试中表现的结果提供了有关通过将LLMs与符号推理系统集成来实现网络安全应用的可能性的见解。

更新时间: 2024-11-20 14:16:55

领域: cs.CR,cs.AI,cs.SC

下载: http://arxiv.org/abs/2411.13627v1

Fact-Level Confidence Calibration and Self-Correction

Confidence calibration in LLMs, i.e., aligning their self-assessed confidence with the actual accuracy of their responses, enabling them to self-evaluate the correctness of their outputs. However, current calibration methods for LLMs typically estimate two scalars to represent overall response confidence and correctness, which is inadequate for long-form generation where the response includes multiple atomic facts and may be partially confident and correct. These methods also overlook the relevance of each fact to the query. To address these challenges, we propose a Fact-Level Calibration framework that operates at a finer granularity, calibrating confidence to relevance-weighted correctness at the fact level. Furthermore, comprehensive analysis under the framework inspired the development of Confidence-Guided Fact-level Self-Correction ($\textbf{ConFix}$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones. Extensive experiments across four datasets and six models demonstrate that ConFix effectively mitigates hallucinations without requiring external knowledge sources such as retrieval systems.

Updated: 2024-11-20 14:15:18

标题: 事实级别的置信度校准和自我修正

摘要: 在LLMs中进行置信度校准，即将其自我评估的置信度与其响应的实际准确性对齐，使其能够自我评估其输出的正确性。然而，目前针对LLMs的校准方法通常估计两个标量来表示整体响应的置信度和正确性，这对于长篇生成来说是不足够的，因为响应包含多个原子事实，可能部分置信和正确。这些方法还忽视了每个事实与查询的相关性。为了解决这些挑战，我们提出了一个事实级校准框架，以更精细的粒度运作，在事实级别上将置信度校准为与相关性加权的正确性。此外，在该框架下的全面分析启发了置信度引导的事实级自校正（ConFix），该方法使用响应中的高置信度事实作为额外知识来改进低置信度事实。通过对四个数据集和六个模型的广泛实验表明，ConFix有效地减轻了幻觉，而不需要外部知识源，如检索系统。

更新时间: 2024-11-20 14:15:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.13343v1

Verifying Machine Unlearning with Explainable AI

We investigate the effectiveness of Explainable AI (XAI) in verifying Machine Unlearning (MU) within the context of harbor front monitoring, focusing on data privacy and regulatory compliance. With the increasing need to adhere to privacy legislation such as the General Data Protection Regulation (GDPR), traditional methods of retraining ML models for data deletions prove impractical due to their complexity and resource demands. MU offers a solution by enabling models to selectively forget specific learned patterns without full retraining. We explore various removal techniques, including data relabeling, and model perturbation. Then, we leverage attribution-based XAI to discuss the effects of unlearning on model performance. Our proof-of-concept introduces feature importance as an innovative verification step for MU, expanding beyond traditional metrics and demonstrating techniques' ability to reduce reliance on undesired patterns. Additionally, we propose two novel XAI-based metrics, Heatmap Coverage (HC) and Attention Shift (AS), to evaluate the effectiveness of these methods. This approach not only highlights how XAI can complement MU by providing effective verification, but also sets the stage for future research to enhance their joint integration.

Updated: 2024-11-20 13:57:32

标题: 使用可解释人工智能验证机器遗忘

摘要: 我们研究了可解释人工智能（XAI）在验证机器遗忘（MU）在海港前监测中的有效性，重点关注数据隐私和监管合规性。随着需要遵守诸如《通用数据保护条例》（GDPR）等隐私立法的增加，传统的重新训练机器学习模型以进行数据删除的方法因其复杂性和资源需求而变得不切实际。MU通过使模型能够有选择性地忘记特定学习模式而无需完全重新训练来提供解决方案。我们探讨了各种删除技术，包括数据重标记和模型扰动。然后，我们利用基于归因的XAI来讨论遗忘对模型性能的影响。我们的概念验证引入了特征重要性作为MU的创新验证步骤，不仅扩展了传统指标，还展示了技术减少对不良模式的依赖的能力。此外，我们提出了两个新颖的基于XAI的指标，热图覆盖（HC）和注意力转移（AS），以评估这些方法的有效性。这种方法不仅突显了XAI如何通过提供有效的验证来补充MU，还为未来研究增强它们的联合整合奠定了基础。

更新时间: 2024-11-20 13:57:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.13332v1

Revisiting Discrete Soft Actor-Critic

We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space. We revisit vanilla discrete SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at: https://github.com/coldsummerday/SD-SAC.git.

Updated: 2024-11-20 13:52:42

标题: 重新审视离散型软演员-评论家

摘要: 我们研究了Soft Actor-Critic（SAC）算法在连续动作空间到离散动作空间的适应性。SAC被认为是一种最先进的强化学习（RL）算法。我们重新审视了基本的离散SAC，并对其在离散环境中应用时的Q值低估和性能不稳定性问题进行了深入理解。因此，我们提出了稳定的离散SAC（SDSAC）算法，该算法利用熵惩罚和双平均Q-learning以及Q-clip来解决这些问题。在包括Atari游戏和大型MOBA游戏在内的典型基准测试中进行了大量实验，证明了我们提出的方法的有效性。我们的代码位于：https://github.com/coldsummerday/SD-SAC.git。

更新时间: 2024-11-20 13:52:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2209.10081v4

An Evolutional Neural Network Framework for Classification of Microarray Data

DNA microarray gene-expression data has been widely used to identify cancerous gene signatures. Microarray can increase the accuracy of cancer diagnosis and prognosis. However, analyzing the large amount of gene expression data from microarray chips pose a challenge for current machine learning researches. One of the challenges lie within classification of healthy and cancerous tissues is high dimensionality of gene expressions. High dimensionality decreases the accuracy of the classification. This research aims to apply a hybrid model of Genetic Algorithm and Neural Network to overcome the problem during subset selection of informative genes. Whereby, a Genetic Algorithm (GA) reduced dimensionality during feature selection and then a Multi-Layer perceptron Neural Network (MLP) is applied to classify selected genes. The performance evaluated by considering to the accuracy and the number of selected genes. Experimental results show the proposed method suggested high accuracy and minimum number of selected genes in comparison with other machine learning algorithms.

Updated: 2024-11-20 13:48:40

标题: 一个用于微阵列数据分类的进化神经网络框架

摘要: DNA微阵列基因表达数据已被广泛用于识别癌症基因特征。微阵列可以提高癌症诊断和预后的准确性。然而，分析来自微阵列芯片的大量基因表达数据对当前的机器学习研究提出了挑战。其中一个挑战在于健康和癌症组织的分类中，基因表达的高维度。高维度会降低分类的准确性。本研究旨在应用遗传算法和神经网络的混合模型来克服在信息基因子集选择过程中的问题。其中，遗传算法（GA）在特征选择过程中降低维度，然后应用多层感知神经网络（MLP）来对所选基因进行分类。通过考虑准确性和所选基因的数量来评估性能。实验结果显示所提出的方法在准确性和所选基因数量方面与其他机器学习算法相比表现出高准确性和最少的所选基因。

更新时间: 2024-11-20 13:48:40

领域: cs.NE,cs.AI,q-bio.GN

下载: http://arxiv.org/abs/2411.13326v1

Are Large Language Models Memorizing Bug Benchmarks?

Large Language Models (LLMs) have become integral to various software engineering tasks, including code generation, bug detection, and repair. To evaluate model performance in these domains, numerous bug benchmarks containing real-world bugs from software projects have been developed. However, a growing concern within the software engineering community is that these benchmarks may not reliably reflect true LLM performance due to the risk of data leakage. Despite this concern, limited research has been conducted to quantify the impact of potential leakage. In this paper, we systematically evaluate popular LLMs to assess their susceptibility to data leakage from widely used bug benchmarks. To identify potential leakage, we use multiple metrics, including a study of benchmark membership within commonly used training datasets, as well as analyses of negative log-likelihood and n-gram accuracy. Our findings show that certain models, in particular codegen-multi, exhibit significant evidence of memorization in widely used benchmarks like Defects4J, while newer models trained on larger datasets like LLaMa 3.1 exhibit limited signs of leakage. These results highlight the need for careful benchmark selection and the adoption of robust metrics to adequately assess models capabilities.

Updated: 2024-11-20 13:46:04

标题: 大型语言模型是否在记忆错误基准？

摘要: 大型语言模型（LLMs）已经成为各种软件工程任务的重要组成部分，包括代码生成、错误检测和修复。为了评估这些领域中模型的性能，已经开发了许多包含来自软件项目的真实错误的bug基准。然而，软件工程社区日益关注的一个问题是，这些基准可能由于数据泄漏的风险而无法可靠地反映真实的LLM性能。尽管存在这种担忧，但目前对潜在泄漏影响的研究仍然有限。在本文中，我们系统评估了流行的LLMs，以评估它们对来自广泛使用的bug基准的数据泄漏的敏感性。为了识别潜在的泄漏，我们使用多种指标，包括对常用训练数据集中基准成员资格的研究，以及负对数似然和n-gram准确性的分析。我们的研究结果显示，某些模型，特别是codegen-multi，显示出在诸如Defects4J等广泛使用的基准中有重要的记忆迹象，而在更大数据集上训练的新模型，如LLaMa 3.1，显示出有限的泄漏迹象。这些结果突显了需要谨慎选择基准和采用强大指标来充分评估模型能力的必要性。

更新时间: 2024-11-20 13:46:04

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13323v1

Scaling Laws for Online Advertisement Retrieval

The scaling law is a notable property of neural network models and has significantly propelled the development of large language models. Scaling laws hold great promise in guiding model design and resource allocation. Recent research increasingly shows that scaling laws are not limited to NLP tasks or Transformer architectures; they also apply to domains such as recommendation. However, there is still a lack of literature on scaling law research in online advertisement retrieval systems. This may be because 1) identifying the scaling law for resource cost and online revenue is often expensive in both time and training resources for large-scale industrial applications, and 2) varying settings for different systems prevent the scaling law from being applied across various scenarios. To address these issues, we propose a lightweight paradigm to identify the scaling law of online revenue and machine cost for a certain online advertisement retrieval scenario with a low experimental cost. Specifically, we focus on a sole factor (FLOPs) and propose an offline metric named R/R* that exhibits a high linear correlation with online revenue for retrieval models. We estimate the machine cost offline via a simulation algorithm. Thus, we can transform most online experiments into low-cost offline experiments. We conduct comprehensive experiments to verify the effectiveness of our proposed metric R/R* and to identify the scaling law in the online advertisement retrieval system of Kuaishou. With the scaling law, we demonstrate practical applications for ROI-constrained model designing and multi-scenario resource allocation in Kuaishou advertising system. To the best of our knowledge, this is the first work to study the scaling laws for online advertisement retrieval of real-world systems, showing great potential for scaling law in advertising system optimization.

Updated: 2024-11-20 13:44:59

标题: 在线广告检索的比例定律

摘要: 缩放定律是神经网络模型的一个显著特性，并且显著推动了大型语言模型的发展。缩放定律在指导模型设计和资源分配方面具有巨大潜力。最近的研究越来越多地表明，缩放定律不仅限于自然语言处理任务或Transformer架构；它们也适用于推荐等领域。然而，在在线广告检索系统中对缩放定律研究的文献仍然缺乏。这可能是因为1）对于大规模工业应用，确定资源成本和在线收入的缩放定律通常既耗时又耗费培训资源，2）不同系统的不同设置阻止了缩放定律在各种情景中的应用。为了解决这些问题，我们提出了一种轻量级的范式，用于识别某个在线广告检索场景中的在线收入和机器成本的缩放定律，成本低廉。具体而言，我们专注于一个唯一的因素（FLOPs），并提出了一个离线指标称为R/R*，该指标与检索模型的在线收入呈高度线性相关。我们通过模拟算法离线估计机器成本。因此，我们可以将大多数在线实验转化为成本低廉的离线实验。我们进行了全面的实验，验证了我们提出的指标R/R*的有效性，并确定了快手在线广告检索系统中的缩放定律。借助缩放定律，我们展示了在快手广告系统中基于ROI约束的模型设计和多场景资源分配的实际应用。据我们所知，这是首次研究实际系统在线广告检索的缩放定律，展示了广告系统优化中缩放定律的巨大潜力。

更新时间: 2024-11-20 13:44:59

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13322v1

Locally Adaptive One-Class Classifier Fusion with Dynamic $\ell$p-Norm Constraints for Robust Anomaly Detection

This paper presents a novel approach to one-class classifier fusion through locally adaptive learning with dynamic $\ell$p-norm constraints. We introduce a framework that dynamically adjusts fusion weights based on local data characteristics, addressing fundamental challenges in ensemble-based anomaly detection. Our method incorporates an interior-point optimization technique that significantly improves computational efficiency compared to traditional Frank-Wolfe approaches, achieving up to 19-fold speed improvements in complex scenarios. The framework is extensively evaluated on standard UCI benchmark datasets and specialized temporal sequence datasets, demonstrating superior performance across diverse anomaly types. Statistical validation through Skillings-Mack tests confirms our method's significant advantages over existing approaches, with consistent top rankings in both pure and non-pure learning scenarios. The framework's ability to adapt to local data patterns while maintaining computational efficiency makes it particularly valuable for real-time applications where rapid and accurate anomaly detection is crucial.

Updated: 2024-11-20 13:39:23

标题: 具有动态$\ell p$-范数约束的本地自适应一类分类器融合用于鲁棒异常检测

摘要: 本文提出了一种新颖的一类分类器融合方法，通过动态的$l_p$-范数约束进行局部自适应学习。我们引入了一个框架，根据本地数据特征动态调整融合权重，解决了基于集成的异常检测中的基本挑战。我们的方法结合了一个内点优化技术，与传统的Frank-Wolfe方法相比，显著提高了计算效率，在复杂场景中实现了高达19倍的速度改进。该框架在标准UCI基准数据集和专门的时间序列数据集上进行了广泛评估，表现出对各种异常类型的卓越性能。通过Skillings-Mack测试的统计验证确认了我们方法在纯学习和非纯学习场景中的显著优势，保持了一致的最高排名。该框架能够适应本地数据模式，同时保持计算效率，特别适用于需要快速准确异常检测的实时应用。

更新时间: 2024-11-20 13:39:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.06406v2

A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data

Cameras can be used to perceive the environment around the vehicle, while affordable radar sensors are popular in autonomous driving systems as they can withstand adverse weather conditions unlike cameras. However, radar point clouds are sparser with low azimuth and elevation resolution that lack semantic and structural information of the scenes, resulting in generally lower radar detection performance. In this work, we directly use the raw range-Doppler (RD) spectrum of radar data, thus avoiding radar signal processing. We independently process camera images within the proposed comprehensive image processing pipeline. Specifically, first, we transform the camera images to Bird's-Eye View (BEV) Polar domain and extract the corresponding features with our camera encoder-decoder architecture. The resultant feature maps are fused with Range-Azimuth (RA) features, recovered from the RD spectrum input from the radar decoder to perform object detection. We evaluate our fusion strategy with other existing methods not only in terms of accuracy but also on computational complexity metrics on RADIal dataset.

Updated: 2024-11-20 13:26:13

标题: 一种资源高效的融合网络：使用摄像头和原始雷达数据进行鸟瞰视角目标检测

摘要: 相机可以用来感知车辆周围的环境，而价格实惠的雷达传感器在自动驾驶系统中很受欢迎，因为它们可以承受恶劣的天气条件，而相机不行。然而，雷达点云较为稀疏，具有较低的方位和俯仰分辨率，缺乏场景的语义和结构信息，导致雷达检测性能一般较低。在这项工作中，我们直接使用雷达数据的原始距离-多普勒（RD）频谱，从而避免雷达信号处理。我们在提出的综合图像处理流程中独立处理相机图像。具体而言，首先，我们将相机图像转换为鸟瞰极坐标域（BEV），并使用我们的相机编码器-解码器架构提取相应的特征。得到的特征图与从雷达解码器输入的RD频谱中恢复的距离-方位（RA）特征融合，以执行目标检测。我们在RADIal数据集上评估我们的融合策略与其他现有方法在准确性和计算复杂度指标上的表现。

更新时间: 2024-11-20 13:26:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.13311v1

Predicting User Intents and Musical Attributes from Music Discovery Conversations

Intent classification is a text understanding task that identifies user needs from input text queries. While intent classification has been extensively studied in various domains, it has not received much attention in the music domain. In this paper, we investigate intent classification models for music discovery conversation, focusing on pre-trained language models. Rather than only predicting functional needs: intent classification, we also include a task for classifying musical needs: musical attribute classification. Additionally, we propose a method of concatenating previous chat history with just single-turn user queries in the input text, allowing the model to understand the overall conversation context better. Our proposed model significantly improves the F1 score for both user intent and musical attribute classification, and surpasses the zero-shot and few-shot performance of the pretrained Llama 3 model.

Updated: 2024-11-20 13:24:11

标题: 从音乐发现对话中预测用户意图和音乐属性

摘要: 意图分类是一个文本理解任务，它从输入文本查询中识别用户需求。虽然意图分类在各个领域得到了广泛研究，但在音乐领域却没有受到太多关注。在本文中，我们研究了用于音乐发现对话的意图分类模型，重点放在预训练语言模型上。我们不仅预测功能性需求：意图分类，还包括一个用于分类音乐需求的任务：音乐属性分类。此外，我们提出了一种方法，将之前的聊天历史与单次用户查询连接在输入文本中，使模型更好地理解整体对话上下文。我们提出的模型显著提高了用户意图和音乐属性分类的F1分数，并超过了预训练的Llama 3模型的零次和少次性能。

更新时间: 2024-11-20 13:24:11

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2411.12254v2

Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime

Predictive combinatorial optimization, where the parameters of combinatorial optimization (CO) are unknown at the decision-making time, is the precise modeling of many real-world applications, including energy cost-aware scheduling and budget allocation on advertising. Tackling such a problem usually involves a prediction model and a CO solver. These two modules are integrated into the predictive CO pipeline following two design principles: "Predict-then-Optimize (PtO)", which learns predictions by supervised training and subsequently solves CO using predicted coefficients, while the other, named "Predict-and-Optimize (PnO)", directly optimizes towards the ultimate decision quality and claims to yield better decisions than traditional PtO approaches. However, there lacks a systematic benchmark of both approaches, including the specific design choices at the module level, as well as an evaluation dataset that covers representative real-world scenarios. To this end, we develop a modular framework to benchmark 11 existing PtO/PnO methods on 8 problems, including a new industrial dataset for combinatorial advertising that will be released. Our study shows that PnO approaches are better than PtO on 7 out of 8 benchmarks, but there is no silver bullet found for the specific design choices of PnO. A comprehensive categorization of current approaches and integration of typical scenarios are provided under a unified benchmark. Therefore, this paper could serve as a comprehensive benchmark for future PnO approach development and also offer fast prototyping for application-focused development. The code is available at https://github.com/Thinklab-SJTU/PredictiveCO-Benchmark.

Updated: 2024-11-20 13:20:45

标题: 在预测组合优化领域中对PtO和PnO方法进行基准测试Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime

摘要: 预测组合优化，即在决策时组合优化（CO）参数未知的情况下，是许多现实世界应用的精确建模，包括能源成本感知调度和广告预算分配。解决这样的问题通常涉及一个预测模型和一个CO求解器。这两个模块按照两个设计原则集成到预测CO流程中：“预测后优化（PtO）”，通过监督训练学习预测，然后使用预测的系数解决CO，而另一个被称为“预测并优化（PnO）”，直接朝着最终决策质量进行优化，并声称比传统的PtO方法产生更好的决策。然而，目前缺乏对这两种方法的系统基准测试，包括模块级别的具体设计选择以及涵盖代表性现实场景的评估数据集。因此，我们开发了一个模块化框架，对8个问题上的11种现有的PtO/PnO方法进行基准测试，包括一个新的用于组合广告的工业数据集将被发布。我们的研究显示，在8个基准测试中，PnO方法比PtO更好，但并没有找到PnO的具体设计选择的灵丹妙药。在统一的基准测试下提供了对当前方法的全面分类和典型场景的集成。因此，本文可以作为未来PnO方法开发的综合基准测试，并为应用重点开发提供快速原型设计。代码可在https://github.com/Thinklab-SJTU/PredictiveCO-Benchmark获取。

更新时间: 2024-11-20 13:20:45

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2311.07633v5

Classification of Buried Objects from Ground Penetrating Radar Images by using Second Order Deep Learning Models

In this paper, a new classification model based on covariance matrices is built in order to classify buried objects. The inputs of the proposed models are the hyperbola thumbnails obtained with a classical Ground Penetrating Radar (GPR) system. These thumbnails are then inputs to the first layers of a classical CNN, which then produces a covariance matrix using the outputs of the convolutional filters. Next, the covariance matrix is given to a network composed of specific layers to classify Symmetric Positive Definite (SPD) matrices. We show in a large database that our approach outperform shallow networks designed for GPR data and conventional CNNs typically used in computer vision applications, particularly when the number of training data decreases and in the presence of mislabeled data. We also illustrate the interest of our models when training data and test sets are obtained from different weather modes or considerations.

Updated: 2024-11-20 13:17:08

标题: 使用二阶深度学习模型对地面雷达图像中的埋藏物进行分类

摘要: 本文提出了一种基于协方差矩阵的新分类模型，用于对埋藏物进行分类。所提出的模型的输入是通过经典地面探地雷达系统获得的双曲线缩略图。然后，这些缩略图作为经典卷积神经网络（CNN）的第一层的输入，该网络使用卷积滤波器的输出生成协方差矩阵。接下来，将协方差矩阵输入到由特定层组成的网络中，用于对称正定（SPD）矩阵进行分类。我们在大型数据库中展示了我们的方法优于专为GPR数据设计的浅层网络和通常用于计算机视觉应用的传统CNN，特别是当训练数据量减少时以及在存在错误标记数据的情况下。我们还说明了当训练数据和测试集来自不同的天气模式或考虑因素时，我们模型的优势。

更新时间: 2024-11-20 13:17:08

领域: cs.CV,cs.LG,stat.AP

下载: http://arxiv.org/abs/2410.07117v2

Lifted Model Construction without Normalisation: A Vectorised Approach to Exploit Symmetries in Factor Graphs

Lifted probabilistic inference exploits symmetries in a probabilistic model to allow for tractable probabilistic inference with respect to domain sizes of logical variables. We found that the current state-of-the-art algorithm to construct a lifted representation in form of a parametric factor graph misses symmetries between factors that are exchangeable but scaled differently, thereby leading to a less compact representation. In this paper, we propose a generalisation of the advanced colour passing (ACP) algorithm, which is the state of the art to construct a parametric factor graph. Our proposed algorithm allows for potentials of factors to be scaled arbitrarily and efficiently detects more symmetries than the original ACP algorithm. By detecting strictly more symmetries than ACP, our algorithm significantly reduces online query times for probabilistic inference when the resulting model is applied, which we also confirm in our experiments.

Updated: 2024-11-20 13:01:18

标题: 无归一化的提升模型构建：在因子图中利用对称性的向量化方法

摘要: 抬高概率推断利用概率模型中的对称性，以便对逻辑变量的领域大小进行易处理的概率推断。我们发现，目前最先进的算法用于构建以参数化因子图形式的提升表示，但缺乏对可交换但比例不同的因子之间的对称性，从而导致表示不够紧凑。在本文中，我们提出了先进色彩传递（ACP）算法的一个泛化版本，该算法是构建参数化因子图的最先进技术。我们提出的算法允许对因子的潜力进行任意缩放，并能够高效地检测到比原始ACP算法更多的对称性。通过比ACP检测到的严格更多的对称性，我们的算法显著减少了应用结果模型时的在线查询时间，这一点我们在实验中也得到了验证。

更新时间: 2024-11-20 13:01:18

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.11730v2

DATTA: Domain-Adversarial Test-Time Adaptation for Cross-Domain WiFi-Based Human Activity Recognition

Cross-domain generalization is an open problem in WiFi-based sensing due to variations in environments, devices, and subjects, causing domain shifts in channel state information. To address this, we propose Domain-Adversarial Test-Time Adaptation (DATTA), a novel framework combining domain-adversarial training (DAT), test-time adaptation (TTA), and weight resetting to facilitate adaptation to unseen target domains and to prevent catastrophic forgetting. DATTA is integrated into a lightweight, flexible architecture optimized for speed. We conduct a comprehensive evaluation of DATTA, including an ablation study on all key components using publicly available data, and verify its suitability for real-time applications such as human activity recognition. When combining a SotA video-based variant of TTA with WiFi-based DAT and comparing it to DATTA, our method achieves an 8.1% higher F1-Score. The PyTorch implementation of DATTA is publicly available at: https://github.com/StrohmayerJ/DATTA.

Updated: 2024-11-20 12:52:36

标题: DATTA：面向跨领域基于WiFi的人体活动识别的领域对抗测试时适应

摘要: 跨领域泛化是基于WiFi的感知中的一个开放问题，由于环境、设备和主体的变化，导致信道状态信息中的领域转移。为了解决这个问题，我们提出了Domain-Adversarial Test-Time Adaptation（DATTA），这是一个结合了领域对抗训练（DAT）、测试时适应（TTA）和重置权重的新框架，以促进对未知目标领域的适应并防止灾难性遗忘。DATTA集成在一个轻量级、灵活的架构中，经过速度优化。我们对DATTA进行了全面评估，包括使用公开数据进行所有关键组件的消融研究，并验证了其适用于诸如人体活动识别等实时应用。将基于视频的TTA与基于WiFi的DAT结合，并将其与DATTA进行比较，我们的方法实现了8.1%更高的F1-Score。DATTA的PyTorch实现可以在https://github.com/StrohmayerJ/DATTA上公开获取。

更新时间: 2024-11-20 12:52:36

领域: cs.CV,cs.AI,cs.ET,cs.LG

下载: http://arxiv.org/abs/2411.13284v1

3D-Aware Instance Segmentation and Tracking in Egocentric Videos

Egocentric videos present unique challenges for 3D scene understanding due to rapid camera motion, frequent object occlusions, and limited object visibility. This paper introduces a novel approach to instance segmentation and tracking in first-person video that leverages 3D awareness to overcome these obstacles. Our method integrates scene geometry, 3D object centroid tracking, and instance segmentation to create a robust framework for analyzing dynamic egocentric scenes. By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches. Extensive evaluations on the challenging EPIC Fields dataset demonstrate significant improvements across a range of tracking and segmentation consistency metrics. Specifically, our method outperforms the next best performing approach by $7$ points in Association Accuracy (AssA) and $4.5$ points in IDF1 score, while reducing the number of ID switches by $73\%$ to $80\%$ across various object categories. Leveraging our tracked instance segmentations, we showcase downstream applications in 3D object reconstruction and amodal video object segmentation in these egocentric settings.

Updated: 2024-11-20 12:51:25

标题: 3D感知的主体分割和追踪在自我中心视频中

摘要: Egocentric videos pose unique challenges for understanding 3D scenes due to fast camera movements, frequent object occlusions, and limited visibility. This paper presents a new approach to instance segmentation and tracking in first-person videos that utilizes 3D awareness to overcome these challenges. Our method combines scene geometry, 3D object centroid tracking, and instance segmentation to create a robust framework for analyzing dynamic egocentric scenes. By incorporating spatial and temporal cues, we achieve better performance compared to traditional 2D methods. Extensive evaluations on the challenging EPIC Fields dataset show significant improvements in tracking and segmentation consistency metrics. Specifically, our method outperforms the next best approach by 7 points in Association Accuracy (AssA) and 4.5 points in IDF1 score, while reducing ID switches by 73% to 80% across different object categories. Using our tracked instance segmentations, we demonstrate applications in 3D object reconstruction and amodal video object segmentation in egocentric settings.

更新时间: 2024-11-20 12:51:25

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.09860v2

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Large multimodal models (LMMs) with advanced video analysis capabilities have recently garnered significant attention. However, most evaluations rely on traditional methods like multiple-choice questions in benchmarks such as VideoMME and LongVideoBench, which are prone to lack the depth needed to capture the complex demands of real-world users. To address this limitation-and due to the prohibitive cost and slow pace of human annotation for video tasks-we introduce VideoAutoArena, an arena-style benchmark inspired by LMSYS Chatbot Arena's framework, designed to automatically assess LMMs' video analysis abilities. VideoAutoArena utilizes user simulation to generate open-ended, adaptive questions that rigorously assess model performance in video understanding. The benchmark features an automated, scalable evaluation framework, incorporating a modified ELO Rating System for fair and continuous comparisons across multiple LMMs. To validate our automated judging system, we construct a 'gold standard' using a carefully curated subset of human annotations, demonstrating that our arena strongly aligns with human judgment while maintaining scalability. Additionally, we introduce a fault-driven evolution strategy, progressively increasing question complexity to push models toward handling more challenging video analysis scenarios. Experimental results demonstrate that VideoAutoArena effectively differentiates among state-of-the-art LMMs, providing insights into model strengths and areas for improvement. To further streamline our evaluation, we introduce VideoAutoBench as an auxiliary benchmark, where human annotators label winners in a subset of VideoAutoArena battles. We use GPT-4o as a judge to compare responses against these human-validated answers. Together, VideoAutoArena and VideoAutoBench offer a cost-effective, and scalable framework for evaluating LMMs in user-centric video analysis.

Updated: 2024-11-20 12:48:34

标题: VideoAutoArena：通过用户模拟评估视频分析中大型多模态模型的自动化竞技场

摘要: 最近，具有先进视频分析能力的大型多模态模型（LMMs）引起了人们的重视。然而，大多数评估依赖于传统方法，如在VideoMME和LongVideoBench等基准测试中的多项选择问题，这些方法往往缺乏捕捉真实用户复杂需求所需的深度。为了解决这一限制，并且由于人类注释视频任务的成本高昂且速度缓慢，我们引入了VideoAutoArena，这是一个受LMSYS Chatbot Arena框架启发的竞技场式基准测试，旨在自动评估LMMs的视频分析能力。VideoAutoArena利用用户模拟生成开放式、自适应的问题，严格评估模型在视频理解方面的表现。该基准测试具有自动化、可扩展的评估框架，结合了修改后的ELO评分系统，以进行公平和持续的跨多个LMMs的比较。为了验证我们的自动评判系统，我们利用精心策划的人类注释子集构建了一个“黄金标准”，证明我们的竞技场与人类判断高度一致，同时保持了可扩展性。此外，我们引入了一个基于故障驱动的演进策略，逐渐增加问题的复杂性，推动模型处理更具挑战性的视频分析场景。实验结果表明，VideoAutoArena有效区分了最先进的LMMs，为模型优势和改进方向提供了见解。为了进一步简化我们的评估，我们引入了VideoAutoBench作为辅助基准测试，在VideoAutoArena的一部分战斗中，人类注释员标记优胜者。我们使用GPT-4o作为评委，将回复与这些经过人类验证的答案进行比较。VideoAutoArena和VideoAutoBench共同提供了一个成本效益高、可扩展的框架，用于评估以用户为中心的视频分析中的LMMs。

更新时间: 2024-11-20 12:48:34

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2411.13281v1

Unlocking the Power of Gradient Guidance for Structure-Based Molecule Optimization

Structure-based molecule optimization (SBMO) aims to optimize molecules with both continuous coordinates and discrete types against protein targets. A promising direction is to exert gradient guidance on generative models given its remarkable success in images, but it is challenging to guide discrete data and risks inconsistencies between modalities. To this end, we leverage a continuous and differentiable space derived through Bayesian inference, presenting Molecule Joint Optimization (MolJO), the first gradient-based SBMO framework that facilitates joint guidance signals across different modalities while preserving SE(3)-equivariance. We introduce a novel backward correction strategy that optimizes within a sliding window of the past histories, allowing for a seamless trade-off between explore-and-exploit during optimization. Our proposed MolJO achieves state-of-the-art performance on CrossDocked2020 benchmark (Success Rate 51.3% , Vina Dock -9.05 and SA 0.78), more than 4x improvement in Success Rate compared to the gradient-based counterpart, and 2x "Me-Better" Ratio as much as 3D baselines. Furthermore, we extend MolJO to a wide range of optimization settings, including multi-objective optimization and challenging tasks in drug design such as R-group optimization and scaffold hopping, further underscoring its versatility and potential.

Updated: 2024-11-20 12:48:29

标题: 解锁渐变引导在基于结构的分子优化中的力量

摘要: 基于结构的分子优化（SBMO）旨在针对蛋白质靶点优化具有连续坐标和离散类型的分子。一个有前途的方向是在生成模型上施加梯度引导，鉴于其在图像方面取得的显著成功，但是在引导离散数据方面具有挑战性并存在不一致性风险。为此，我们利用通过贝叶斯推断得出的连续可微空间，提出了Molecule Joint Optimization（MolJO），这是第一个基于梯度的SBMO框架，它在不同模态之间实现了联合引导信号，同时保持SE(3)-等变性。我们引入了一种新颖的反向校正策略，通过在过去历史的滑动窗口内进行优化，允许在优化过程中在探索和开发之间无缝权衡。我们提出的MolJO在CrossDocked2020基准测试中取得了最先进的表现（成功率51.3%，Vina Dock -9.05和SA 0.78），与基于梯度的对应物相比，成功率提高了4倍以上，"Me-Better"比率是3D基线的2倍。此外，我们将MolJO扩展到各种优化设置，包括多目标优化和药物设计中的挑战性任务，如R群优化和骨架跳跃，进一步突显其多功能性和潜力。

更新时间: 2024-11-20 12:48:29

领域: q-bio.BM,cs.AI

下载: http://arxiv.org/abs/2411.13280v1

Dividable Configuration Performance Learning

Machine/deep learning models have been widely adopted for predicting the configuration performance of software systems. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed DaL, based on the new paradigm of dividable learning that builds a model via "divide-and-learn". To handle sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which we build a sparse local model, e.g., regularized Hierarchical Interaction Neural Network, to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Further, DaL adaptively determines the optimal number of divisions required for a system and sample size without any extra training or profiling. Experiment results from 12 real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, DaL performs no worse than the best counterpart on 44 out of 60 cases with up to 1.61x improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. In particular, the mechanism that adapted the parameter d can reach the optimal value for 76.43% of the individual runs. The result also confirms that the paradigm of dividable learning is more suitable than other similar paradigms such as ensemble learning for predicting configuration performance. Practically, DaL considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility.

Updated: 2024-11-20 12:40:11

标题: 可分割配置性能学习

摘要: 机器/深度学习模型已被广泛应用于预测软件系统的配置性能。然而，一个至关重要但尚未解决的挑战是如何应对从配置景观中继承的稀疏性：配置选项（特征）的影响和数据样本的分布非常稀疏。在本文中，我们提出了一个针对预测配置性能的模型无关和稀疏鲁棒的框架，称为DaL，基于可分割学习的新范式，通过“分割和学习”构建模型。为了处理样本稀疏性，从配置景观中提取的样本被划分为远距离分区，针对每个分区我们构建了一个稀疏的局部模型，例如，正则化的分层交互神经网络，以处理特征稀疏性。然后，新给定的配置将被分配到正确的分区模型进行最终预测。此外，DaL自适应地确定系统和样本大小所需的最佳分区数量，而无需额外的训练或配置。来自12个真实系统和五组训练数据的实验结果显示，与最先进的方法相比，DaL在60个案例中有44个案例表现不逊于最佳对手，准确率提高了1.61倍；需要更少的样本达到相同/更好的准确率；并产生可接受的训练开销。特别是，适应参数d的机制可以在76.43％的个体运行中达到最优值。结果还证实，可分割学习范式比其他类似范式（如集成学习）更适合预测配置性能。实际上，当将它们用作基础局部模型时，DaL显著改善了不同的全局模型，进一步增强了其灵活性。

更新时间: 2024-11-20 12:40:11

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.07629v3

Towards Specification-Driven LLM-Based Generation of Embedded Automotive Software

The paper studies how code generation by LLMs can be combined with formal verification to produce critical embedded software. The first contribution is a general framework, spec2code, in which LLMs are combined with different types of critics that produce feedback for iterative backprompting and fine-tuning. The second contribution presents a first feasibility study, where a minimalistic instantiation of spec2code, without iterative backprompting and fine-tuning, is empirically evaluated using three industrial case studies from the heavy vehicle manufacturer Scania. The goal is to automatically generate industrial-quality code from specifications only. Different combinations of formal ACSL specifications and natural language specifications are explored. The results indicate that formally correct code can be generated even without the application of iterative backprompting and fine-tuning.

Updated: 2024-11-20 12:38:17

标题: 朝向基于规范驱动的LLM嵌入式汽车软件生成

摘要: 本文研究了LLMs如何与形式验证相结合，以生成关键的嵌入式软件。第一个贡献是一个通用框架spec2code，在该框架中LLMs与不同类型的批评家结合，产生反馈用于迭代的反馈和微调。第二个贡献是一个首次可行性研究，其中对spec2code的最简实例进行了实证评估，没有进行迭代的反馈和微调，使用了来自重型车辆制造商Scania的三个工业案例研究。其目标是仅从规范中自动生成工业质量的代码。探索了形式化ACSL规范和自然语言规范的不同组合。结果表明，即使没有进行迭代的反馈和微调，也可以生成形式上正确的代码。

更新时间: 2024-11-20 12:38:17

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.13269v1

A Survey on Human-Centric LLMs

The rapid evolution of large language models (LLMs) and their capacity to simulate human cognition and behavior has given rise to LLM-based frameworks and tools that are evaluated and applied based on their ability to perform tasks traditionally performed by humans, namely those involving cognition, decision-making, and social interaction. This survey provides a comprehensive examination of such human-centric LLM capabilities, focusing on their performance in both individual tasks (where an LLM acts as a stand-in for a single human) and collective tasks (where multiple LLMs coordinate to mimic group dynamics). We first evaluate LLM competencies across key areas including reasoning, perception, and social cognition, comparing their abilities to human-like skills. Then, we explore real-world applications of LLMs in human-centric domains such as behavioral science, political science, and sociology, assessing their effectiveness in replicating human behaviors and interactions. Finally, we identify challenges and future research directions, such as improving LLM adaptability, emotional intelligence, and cultural sensitivity, while addressing inherent biases and enhancing frameworks for human-AI collaboration. This survey aims to provide a foundational understanding of LLMs from a human-centric perspective, offering insights into their current capabilities and potential for future development.

Updated: 2024-11-20 12:34:44

标题: 一份关于以人为中心的LLM的调查

摘要: 大型语言模型（LLMs）的快速演变及其模拟人类认知和行为的能力，催生了基于LLM的框架和工具，这些框架和工具的评估和应用是基于它们执行传统由人类执行的任务的能力，即涉及认知、决策和社交互动的任务。本调查全面审视了这些以人为中心的LLM能力，重点关注它们在个体任务（LLM充当单个人类替补的情况）和集体任务（多个LLM协调模拟群体动态的情况）中的表现。我们首先评估LLM在推理、感知和社交认知等关键领域的能力，比较它们与人类技能的能力。然后，我们探讨LLM在行为科学、政治科学和社会学等以人为中心的领域中的实际应用，评估它们在复制人类行为和互动方面的有效性。最后，我们确定挑战和未来研究方向，例如改善LLM的适应能力、情感智能和文化敏感性，同时解决固有偏见并增强人工智能合作框架。本调查旨在从人为中心的视角提供对LLMs的基础理解，为当前能力和未来发展潜力提供见解。

更新时间: 2024-11-20 12:34:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.14491v1

Transformers with Sparse Attention for Granger Causality

Temporal causal analysis means understanding the underlying causes behind observed variables over time. Deep learning based methods such as transformers are increasingly used to capture temporal dynamics and causal relationships beyond mere correlations. Recent works suggest self-attention weights of transformers as a useful indicator of causal links. We leverage this to propose a novel modification to the self-attention module to establish causal links between the variables of multivariate time-series data with varying lag dependencies. Our Sparse Attention Transformer captures causal relationships using a two-fold approach - performing temporal attention first followed by attention between the variables across the time steps masking them individually to compute Granger Causality indices. The key novelty in our approach is the ability of the model to assert importance and pick the most significant past time instances for its prediction task against manually feeding a fixed time lag value. We demonstrate the effectiveness of our approach via extensive experimentation on several synthetic benchmark datasets. Furthermore, we compare the performance of our model with the traditional Vector Autoregression based Granger Causality method that assumes fixed lag length.

Updated: 2024-11-20 12:34:06

标题: 稀疏注意力机制的变压器在Granger因果关系中的应用

摘要: 时间因果分析意味着理解随时间观察到的变量背后的潜在原因。基于深度学习的方法，如transformers，越来越多地被用于捕捉超出纯粹相关性的时间动态和因果关系。最近的研究表明，transformers的自注意力权重是因果链接的有用指标。我们利用这一点提出了对自注意力模块的新颖修改，以建立多变量时间序列数据之间具有不同滞后依赖关系的因果链接。我们的Sparse Attention Transformer通过两种方式捕捉因果关系 - 首先执行时间注意力，然后在时间步之间对变量之间的注意力进行掩码处理，以单独计算Granger因果指数。我们方法的关键创新在于模型能够确定重要性并选择对其预测任务最重要的过去时间实例，而不是手动提供固定的时间滞后值。我们通过对多个合成基准数据集进行广泛实验，展示了我们方法的有效性。此外，我们将我们的模型的性能与传统的基于向量自回归的Granger因果性方法进行比较，后者假定固定滞后长度。

更新时间: 2024-11-20 12:34:06

领域: cs.LG

下载: http://arxiv.org/abs/2411.13264v1

FASTNav: Fine-tuned Adaptive Small-language-models Trained for Multi-point Robot Navigation

With the rapid development of large language models (LLM), robots are starting to enjoy the benefits of new interaction methods that large language models bring. Because edge computing fulfills the needs for rapid response, privacy, and network autonomy, we believe it facilitates the extensive deployment of large models for robot navigation across various industries. To enable local deployment of language models on edge devices, we adopt some model boosting methods. In this paper, we propose FASTNav - a method for boosting lightweight LLMs, also known as small language models (SLMs), for robot navigation. The proposed method contains three modules: fine-tuning, teacher-student iteration, and language-based multi-point robot navigation. We train and evaluate models with FASTNav in both simulation and real robots, proving that we can deploy them with low cost, high accuracy and low response time. Compared to other model compression methods, FASTNav shows potential in the local deployment of language models and tends to be a promising solution for language-guided robot navigation on edge devices.

Updated: 2024-11-20 12:28:13

标题: FASTNav：为多点机器人导航训练的精细调整自适应小语言模型

摘要: 随着大型语言模型（LLM）的快速发展，机器人开始享受大型语言模型带来的新交互方法的好处。由于边缘计算满足了快速响应、隐私和网络自治的需求，我们相信它有助于在各个行业广泛部署大型模型以用于机器人导航。为了在边缘设备上实现语言模型的本地部署，我们采用了一些模型增强方法。在本文中，我们提出了FASTNav - 一种用于增强轻量级LLMs，也称为小语言模型（SLMs），用于机器人导航的方法。所提出的方法包含三个模块：微调、师生迭代和基于语言的多点机器人导航。我们在仿真和真实机器人中使用FASTNav训练和评估模型，证明我们可以以低成本、高准确率和低响应时间部署它们。与其他模型压缩方法相比，FASTNav在语言模型的本地部署方面显示出潜力，并且倾向于成为边缘设备上语言引导机器人导航的有前途的解决方案。

更新时间: 2024-11-20 12:28:13

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2411.13262v1

PDE-CNNs: Axiomatic Derivations and Applications

PDE-based Group Convolutional Neural Networks (PDE-G-CNNs) use solvers of evolution PDEs as substitutes for the conventional components in G-CNNs. PDE-G-CNNs can offer several benefits simultaneously: fewer parameters, inherent equivariance, better accuracy, and data efficiency. In this article we focus on Euclidean equivariant PDE-G-CNNs where the feature maps are two-dimensional throughout. We call this variant of the framework a PDE-CNN. From a machine learning perspective, we list several practically desirable axioms and derive from these which PDEs should be used in a PDE-CNN, this being our main contribution. Our approach to geometric learning via PDEs is inspired by the axioms of scale-space theory, which we generalize by introducing semifield-valued signals. Our theory reveals new PDEs that can be used in PDE-CNNs and we experimentally examine what impact these have on the accuracy of PDE-CNNs. We also confirm for small networks that PDE-CNNs offer fewer parameters, increased accuracy, and better data efficiency when compared to CNNs.

Updated: 2024-11-20 12:22:53

标题: PDE-CNNs：公理推导和应用

摘要: 基于PDE的群卷积神经网络（PDE-G-CNNs）使用演化PDE的求解器作为G-CNN中常规组件的替代品。PDE-G-CNNs可以同时提供几个好处：参数更少，固有等变性，更好的准确性和数据效率。在本文中，我们专注于具有欧几里德等变性的PDE-G-CNNs，其中特征图始终是二维的。我们称这种框架的变体为PDE-CNN。从机器学习的角度看，我们列出了几个实际上可取的公理，并从中推导出在PDE-CNN中应该使用哪些PDE，这是我们的主要贡献。我们通过引入半域值信号来推广尺度空间理论的公理，从而受到几何学习通过PDE的启发。我们的理论揭示了可以在PDE-CNN中使用的新PDE，并通过实验证实了这些PDE对PDE-CNN准确性的影响。我们还证实，与CNN相比，对于小型网络，PDE-CNN提供了更少的参数，更高的准确性和更好的数据效率。

更新时间: 2024-11-20 12:22:53

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.15182v3

BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation

Large-scale 2D datasets have been instrumental in advancing machine learning; however, progress in 3D vision tasks has been relatively slow. This disparity is largely due to the limited availability of 3D benchmarking datasets. In particular, creating real-world point cloud datasets for indoor scene semantic segmentation presents considerable challenges, including data collection within confined spaces and the costly, often inaccurate process of per-point labeling to generate ground truths. While synthetic datasets address some of these challenges, they often fail to replicate real-world conditions, particularly the occlusions that occur in point clouds collected from real environments. Existing 3D benchmarking datasets typically evaluate deep learning models under the assumption that training and test data are independently and identically distributed (IID), which affects the models' usability for real-world point cloud segmentation. To address these challenges, we introduce the BelHouse3D dataset, a new synthetic point cloud dataset designed for 3D indoor scene semantic segmentation. This dataset is constructed using real-world references from 32 houses in Belgium, ensuring that the synthetic data closely aligns with real-world conditions. Additionally, we include a test set with data occlusion to simulate out-of-distribution (OOD) scenarios, reflecting the occlusions commonly encountered in real-world point clouds. We evaluate popular point-based semantic segmentation methods using our OOD setting and present a benchmark. We believe that BelHouse3D and its OOD setting will advance research in 3D point cloud semantic segmentation for indoor scenes, providing valuable insights for the development of more generalizable models.

Updated: 2024-11-20 12:09:43

标题: BelHouse3D：用于评估三维点云语义分割中遮挡鲁棒性的基准数据集

摘要: 大规模的2D数据集在推动机器学习方面发挥了重要作用；然而，3D视觉任务的进展相对较慢。这种差异主要是由于3D基准数据集的有限可用性。特别是，为室内场景语义分割创建真实世界的点云数据集面临着巨大挑战，包括在受限空间内进行数据收集以及昂贵且常常不准确的逐点标记过程以生成地面真值。虽然合成数据集可以解决其中一些挑战，但它们通常无法复制真实世界的条件，特别是来自真实环境中收集的点云中发生的遮挡。现有的3D基准数据集通常在评估深度学习模型时假设训练和测试数据是独立且同分布的（IID），这影响了模型在实际点云分割中的可用性。为了解决这些挑战，我们引入了BelHouse3D数据集，这是一个专为3D室内场景语义分割设计的新型合成点云数据集。该数据集是使用比利时32座房屋的真实参考构建的，确保合成数据与真实世界的条件密切对齐。此外，我们还包括一个具有数据遮挡的测试集，以模拟分布外（OOD）情景，反映了在真实世界点云中常遇到的遮挡。我们使用我们的OOD设置评估了流行的基于点的语义分割方法，并提出了一个基准。我们相信BelHouse3D及其OOD设置将推动室内场景3D点云语义分割的研究，为更具普适性的模型的开发提供宝贵见解。

更新时间: 2024-11-20 12:09:43

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2411.13251v1

I Blame Apple in Part for My False Expectations: An Autoethnographic Study of Apple's Lockdown Mode in iOS

Lockdown Mode was introduced in 2022 as a hardening setting for Apple's operating systems, designed to strengthen the protection against ``some of the most sophisticated digital threats''. However, Apple never explained these threats further. We present the first academic exploration of Lockdown Mode based on a 3-month autoethnographic study. We obtained a nuanced understanding of user experience and identified issues that can be extrapolated to larger user groups. The lack of information from Apple about the underlying threat model and details on affected features may hinder adequate assessment of Lockdown Mode, making informed decisions on its use challenging. Besides encountering undocumented restrictions, we also experienced both too much and too little visibility of protection during Lockdown Mode use. Finally, we deem the paternalistic security approach by Apple's Lockdown Mode harmful, because without detailed knowledge about technical capabilities and boundaries, at-risk users may be lulled into a false sense of security.

Updated: 2024-11-20 12:08:08

标题: 我在某种程度上责怪苹果让我期望过高：苹果iOS系统中的封锁模式自传研究

摘要: 封锁模式于2022年作为苹果操作系统的一种硬化设置被引入，旨在加强对“一些最复杂的数字威胁”的防护。然而，苹果从未进一步解释这些威胁。我们展示了基于为期3个月的自传自体研究的Lockdown Mode的第一次学术探索。我们获得了对用户体验的微妙理解，并确定了可以推广到更大用户群体的问题。苹果缺乏有关潜在威胁模型和受影响功能的详细信息，可能会妨碍对Lockdown Mode进行充分评估，使人们在使用时难以做出明智决策。除了遇到未记录的限制，我们在使用Lockdown Mode期间也经历了保护的可见性过多和过少的问题。最后，我们认为苹果Lockdown Mode的家长式安全方法是有害的，因为缺乏关于技术能力和界限的详细知识，处于风险之中的用户可能会被蒙骗进入一种虚假的安全感。

更新时间: 2024-11-20 12:08:08

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2411.13249v1

On lower bounds of the density of planar periodic sets without unit distances

Determining the maximal density $m_1(\mathbb{R}^2)$ of planar sets without unit distances is a fundamental problem in combinatorial geometry. This paper investigates lower bounds for this quantity. We introduce a novel approach to estimating $m_1(\mathbb{R}^2)$ by reformulating the problem as a Maximal Independent Set (MIS) problem on graphs constructed from flat torus, focusing on periodic sets with respect to two non-collinear vectors. Our experimental results supported by theoretical justifications of proposed method demonstrate that for a sufficiently wide range of parameters this approach does not improve the known lower bound $0.22936 \le m_1(\mathbb{R}^2)$. The best discrete sets found are approximations of Croft's construction. In addition, several open source software packages for MIS problem are compared on this task.

Updated: 2024-11-20 12:07:19

标题: 关于没有单位距离的平面周期集密度下界的研究

摘要: 确定平面集合的最大密度$m_1(\mathbb{R}^2)$，其中不存在单位距离，是组合几何学中的一个基本问题。本文研究了这个数量的下界。我们引入了一种新颖的方法来估计$m_1(\mathbb{R}^2)$，通过将问题重新表述为从平面环构建的图中的最大独立集（MIS）问题，重点关注相对于两个非共线向量的周期集合。我们提出的方法的实验结果得到了理论上的支持，表明对于一定范围的参数，这种方法并没有改进已知的下界$0.22936 \le m_1(\mathbb{R}^2)$。找到的最佳离散集合是Croft的构造的近似。此外，还比较了几个开源软件包在这个任务上处理MIS问题的效果。

更新时间: 2024-11-20 12:07:19

领域: math.MG,cs.LG,math.CO,52C15 (Primary) 52C17, 52C10 (Secondary),G.2.1; G.2.2

下载: http://arxiv.org/abs/2411.13248v1

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

Existing methodologies in open vocabulary 3D semantic segmentation primarily concentrate on establishing a unified feature space encompassing 3D, 2D, and textual modalities. Nevertheless, traditional techniques such as global feature alignment or vision-language model distillation tend to impose only approximate correspondence, struggling notably with delineating fine-grained segmentation boundaries. To address this gap, we propose a more meticulous mask-level alignment between 3D features and the 2D-text embedding space through a cross-modal mask reasoning framework, XMask3D. In our approach, we developed a mask generator based on the denoising UNet from a pre-trained diffusion model, leveraging its capability for precise textual control over dense pixel representations and enhancing the open-world adaptability of the generated masks. We further integrate 3D global features as implicit conditions into the pre-trained 2D denoising UNet, enabling the generation of segmentation masks with additional 3D geometry awareness. Subsequently, the generated 2D masks are employed to align mask-level 3D representations with the vision-language feature space, thereby augmenting the open vocabulary capability of 3D geometry embeddings. Finally, we fuse complementary 2D and 3D mask features, resulting in competitive performance across multiple benchmarks for 3D open vocabulary semantic segmentation. Code is available at https://github.com/wangzy22/XMask3D.

Updated: 2024-11-20 12:02:12

标题: XMask3D：用于开放词汇3D语义分割的跨模态蒙版推理

摘要: 现有的开放词汇3D语义分割方法主要集中在建立一个统一的特征空间，涵盖3D、2D和文本模态。然而，传统的技术，如全局特征对齐或视觉语言模型蒸馏往往只能施加近似对应，尤其在描绘细粒度分割边界方面表现不佳。为了解决这一差距，我们提出了一种更加细致的掩码级别对齐方法，通过跨模态掩码推理框架XMask3D在3D特征和2D文本嵌入空间之间。在我们的方法中，我们基于一个经过预训练的扩散模型开发了一个掩码生成器，利用其对密集像素表示的精确文本控制能力，并增强了生成的掩码的开放世界适应性。我们进一步将3D全局特征作为隐式条件整合到预训练的2D去噪UNet中，从而实现带有额外3D几何意识的分割掩码生成。随后，生成的2D掩码用于将掩码级别的3D表示与视觉语言特征空间对齐，从而增强3D几何嵌入的开放词汇能力。最后，我们融合互补的2D和3D掩码特征，实现了在多个3D开放词汇语义分割基准上具有竞争力的性能。代码可在https://github.com/wangzy22/XMask3D上找到。

更新时间: 2024-11-20 12:02:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.13243v1

Transforming the Hybrid Cloud for Emerging AI Workloads

This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co-design approaches, emphasizing usability, manageability, affordability, adaptability, efficiency, and scalability. By integrating cutting-edge technologies such as generative and agentic AI, cross-layer automation and optimization, unified control plane, and composable and adaptive system architecture, the proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. Incorporating quantum computing as it matures will enable quantum-accelerated simulations for materials science, climate modeling, and other high-impact domains. Collaborative efforts between academia and industry are central to this vision, driving advancements in foundation models for material design and climate solutions, scalable multimodal data processing, and enhanced physics-based AI emulators for applications like weather forecasting and carbon sequestration. Research priorities include advancing AI agentic systems, LLM as an Abstraction (LLMaaA), AI model optimization and unified abstractions across heterogeneous infrastructure, end-to-end edge-cloud transformation, efficient programming model, middleware and platform, secure infrastructure, application-adaptive cloud systems, and new quantum-classical collaborative workflows. These ideas and solutions encompass both theoretical and practical research questions, requiring coordinated input and support from the research community. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms, fostering breakthroughs in AI-driven applications and scientific discovery across academia, industry, and society.

Updated: 2024-11-20 11:57:43

标题: 将混合云转变为新兴的AI工作负载

摘要: 这份白皮书是通过IBM研究与UIUC研究人员在IIDAI研究所内密切合作开发的，旨在通过创新的全栈协同设计方法，强调可用性、可管理性、可负担性、适应性、效率和可扩展性，将混合云系统转变为满足人工智能工作负载日益增长的复杂性。通过整合生成性和代理性人工智能、跨层自动化和优化、统一控制平面以及可组合和自适应系统架构等尖端技术，所提出的框架解决了能源效率、性能和成本效益等关键挑战。随着量子计算的发展，将能够实现量子加速模拟，用于材料科学、气候建模和其他高影响领域。学术界与工业界之间的合作努力是这一愿景的核心，推动材料设计和气候解决方案的基础模型、可扩展的多模态数据处理、增强的基于物理的人工智能仿真器等方面的进步，用于应用如天气预报和碳封存。研究重点包括推进人工智能代理系统、LLM作为抽象（LLMaaA）、人工智能模型优化以及跨异构基础设施的统一抽象、端到端边缘云转型、高效编程模型、中间件和平台、安全基础设施、应用自适应云系统以及新的量子-经典协同工作流。这些想法和解决方案涵盖了理论和实践研究问题，需要研究界的协同输入和支持。这项联合倡议旨在建立安全、高效和可持续的混合云平台，促进人工智能驱动应用和科学发现在学术界、工业界和社会中的突破。

更新时间: 2024-11-20 11:57:43

领域: cs.DC,cs.AI,cs.AR,cs.ET,cs.MA

下载: http://arxiv.org/abs/2411.13239v1

Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

To address the sycophancy problem caused by reinforcement learning from human feedback in large language models, this research applies synthetic data intervention technology to the decoder-only transformer architecture. Based on the research gaps in the existing literature, the researcher designed an experimental process to reduce the tendency of models to cater by generating diversified data, and used GPT4o as an experimental tool for verification. The experiment used 100 true and false questions, and compared the performance of the model trained with synthetic data intervention and the original untrained model on multiple indicators. The results show that the SDI training model supports the technology in terms of accuracy rate and sycophancy rate and has significant effectiveness in reducing sycophancy phenomena. Notably, the data set, experimental process, code and data results have been uploaded to Github, the link is https://github.com/brucewang123456789/GeniusTrail.git.

Updated: 2024-11-20 11:52:09

标题: 在仅解码器变压器架构中缓解谄媚：合成数据干预

摘要: 为了解决大型语言模型中由于强化学习从人类反馈中产生的谄媚问题，本研究将合成数据干预技术应用于仅解码器变压器架构。根据现有文献中存在的研究空白，研究人员设计了一个实验过程，通过生成多样化的数据来减少模型迎合的倾向，并使用GPT4o作为验证的实验工具。实验使用了100个真假问题，比较了用合成数据干预训练的模型和原始未经训练的模型在多个指标上的表现。结果表明，SDI训练模型在准确率和谄媚率方面支持该技术，并在减少谄媚现象方面具有显著的有效性。值得注意的是，数据集、实验过程、代码和数据结果已上传至Github，链接为https://github.com/brucewang123456789/GeniusTrail.git。

更新时间: 2024-11-20 11:52:09

领域: cs.AI

下载: http://arxiv.org/abs/2411.10156v2

TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotations significantly limits the exploration of contextual relationships between entities, hindering deeper insights into graph-structured data. To address this gap, we introduce Textual-Edge Graphs Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of benchmark textual-edge datasets featuring rich textual descriptions on nodes and edges. The TEG-DB datasets are large-scale and encompass a wide range of domains, from citation networks to social networks. In addition, we conduct extensive benchmark experiments on TEG-DB to assess the extent to which current techniques, including pre-trained language models, graph neural networks, and their combinations, can utilize textual node and edge information. Our goal is to elicit advancements in textual-edge graph research, specifically in developing methodologies that exploit rich textual node and edge descriptions to enhance graph analysis and provide deeper insights into complex real-world networks. The entire TEG-DB project is publicly accessible as an open-source repository on Github, accessible at https://github.com/Zhuofeng-Li/TEG-Benchmark.

Updated: 2024-11-20 11:47:58

标题: TEG-DB：文本边缘图的全面数据集与基准

摘要: 文本属性图（TAGs）通过自然语言描述增强了图结构，促进了在各种现实世界环境中数据及其相互关系的详细描绘。然而，现有的TAG数据集主要仅在节点上包含文本信息，边通常只用二进制或分类属性表示。缺乏丰富的文本边注释显著限制了对实体之间的上下文关系的探索，从而阻碍了对图结构数据的深入洞察。为了填补这一空白，我们引入了Textual-Edge Graphs Datasets and Benchmark（TEG-DB），这是一个包含丰富文本描述的节点和边的全面多样的基准文本边数据集合。TEG-DB数据集规模大，涵盖了从引用网络到社交网络等各种领域。此外，我们在TEG-DB上进行了广泛的基准实验，评估当前技术（包括预训练语言模型、图神经网络及其组合）能够利用文本节点和边信息的程度。我们的目标是促进文本边图研究的进步，特别是在开发利用丰富文本节点和边描述增强图分析并为复杂的现实世界网络提供更深入洞察力的方法方面。整个TEG-DB项目作为开源存储库在Github上公开，可在https://github.com/Zhuofeng-Li/TEG-Benchmark访问。

更新时间: 2024-11-20 11:47:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10310v2

Quantum Kernel-Based Long Short-term Memory

The integration of quantum computing into classical machine learning architectures has emerged as a promising approach to enhance model efficiency and computational capacity. In this work, we introduce the Quantum Kernel-Based Long Short-Term Memory (QK-LSTM) network, which utilizes quantum kernel functions within the classical LSTM framework to capture complex, non-linear patterns in sequential data. By embedding input data into a high-dimensional quantum feature space, the QK-LSTM model reduces the reliance on large parameter sets, achieving effective compression while maintaining accuracy in sequence modeling tasks. This quantum-enhanced architecture demonstrates efficient convergence, robust loss minimization, and model compactness, making it suitable for deployment in edge computing environments and resource-limited quantum devices (especially in the NISQ era). Benchmark comparisons reveal that QK-LSTM achieves performance on par with classical LSTM models, yet with fewer parameters, underscoring its potential to advance quantum machine learning applications in natural language processing and other domains requiring efficient temporal data processing.

Updated: 2024-11-20 11:39:30

标题: 量子核心基于长短期记忆网络

摘要: 将量子计算集成到经典机器学习架构中已经成为增强模型效率和计算能力的一种有前途的方法。在这项工作中，我们介绍了量子核基础长短期记忆（QK-LSTM）网络，该网络利用量子核函数在经典LSTM框架中捕捉顺序数据中的复杂非线性模式。通过将输入数据嵌入高维量子特征空间，QK-LSTM模型减少了对大参数集的依赖，实现了有效的压缩同时在序列建模任务中保持精度。这种量子增强架构展示了高效收敛、稳健的损失最小化和模型紧凑性，使其适用于部署在边缘计算环境和资源有限的量子设备（尤其是在NISQ时代）。基准比较显示，QK-LSTM达到了与经典LSTM模型相当的性能，但参数更少，突显了其在自然语言处理和其他需要高效处理时间数据的领域推进量子机器学习应用的潜力。

更新时间: 2024-11-20 11:39:30

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2411.13225v1

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

Recurrent neural network (RNNs) that are capable of modeling long-distance dependencies are widely used in various speech tasks, eg., keyword spotting (KWS) and speech enhancement (SE). Due to the limitation of power and memory in low-resource devices, efficient RNN models are urgently required for real-world applications. In this paper, we propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations. In particular, we observe that partial dimensions of hidden states are similar to the others in trained RNN models, suggesting that redundancy exists in specific RNNs. To reduce the redundancy and hence computational cost, we propose to first generate a few intrinsic states, and then apply cheap operations to produce ghost states based on the intrinsic states. Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (~40%) and computation cost while keeping performance similar.

Updated: 2024-11-20 11:37:14

标题: GhostRNN：使用廉价操作减少RNN中的状态冗余

摘要: 能够建模长距离依赖关系的循环神经网络（RNNs）广泛应用于各种语音任务，例如关键词检测（KWS）和语音增强（SE）。由于低资源设备在功率和内存方面的限制，实际应用中急需高效的RNN模型。本文提出了一种高效的RNN架构GhostRNN，通过廉价操作减少隐藏状态的冗余。特别地，我们观察到在经过训练的RNN模型中隐藏状态的部分维度与其他维度相似，表明特定RNN中存在冗余。为了减少冗余以及计算成本，我们提出首先生成一些固有状态，然后基于这些固有状态应用廉价操作生成Ghost状态。在KWS和SE任务上的实验表明，提出的GhostRNN显著减少了内存使用量（约40%）和计算成本，同时保持性能相似。

更新时间: 2024-11-20 11:37:14

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2411.14489v1

Existential Conversations with Large Language Models: Content, Community, and Culture

Contemporary conversational AI systems based on large language models (LLMs) can engage users on a wide variety of topics, including philosophy, spirituality, and religion. Suitably prompted, LLMs can be coaxed into discussing such existentially significant matters as their own putative consciousness and the role of artificial intelligence in the fate of the Cosmos. Here we examine two lengthy conversations of this type. We trace likely sources, both ancient and modern, for the extensive repertoire of images, myths, metaphors, and conceptual esoterica that the language model draws on during these conversations, and foreground the contemporary communities and cultural movements that deploy related motifs, especially in their online activity. Finally, we consider the larger societal impacts of such engagements with LLMs.

Updated: 2024-11-20 11:35:22

标题: 与大型语言模型的存在对话：内容、社区和文化

摘要: 基于大型语言模型（LLMs）的当代对话式人工智能系统可以与用户就各种话题展开交流，包括哲学、灵性和宗教。在适当的提示下，LLMs可以被引诱讨论诸如它们自身的假定意识以及人工智能在宇宙命运中的作用等存在重要的问题。在这里，我们审视了两次这种类型的长时间对话。我们追溯了语言模型在这些对话中所依赖的广泛形象、神话、隐喻和概念奥秘的可能来源，以及强调了当代社群和文化运动在其在线活动中部署相关主题的情况。最后，我们考虑了这种与LLMs互动的更广泛社会影响。

更新时间: 2024-11-20 11:35:22

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13223v1

Proceedings Sixth International Workshop on Formal Methods for Autonomous Systems

This EPTCS volume contains the papers from the Sixth International Workshop on Formal Methods for Autonomous Systems (FMAS 2024), which was held between the 11th and 13th of November 2024. FMAS 2024 was co-located with 19th International Conference on integrated Formal Methods (iFM'24), hosted by the University of Manchester in the United Kingdom, in the University of Manchester's Core Technology Facility.

Updated: 2024-11-20 11:21:22

标题: 第六届自主系统形式化方法国际研讨会论文集

摘要: 这本EPTCS卷包含了第六届国际自主系统形式化方法研讨会（FMAS 2024）的论文，该研讨会于2024年11月11日至13日举行。FMAS 2024与第19届综合形式方法国际会议（iFM'24）同期举办，由英国曼彻斯特大学主办，在曼彻斯特大学核心技术设施举行。

更新时间: 2024-11-20 11:21:22

领域: cs.LO,cs.AI,cs.RO

下载: http://arxiv.org/abs/2411.13215v1

ViSTa Dataset: Do vision-language models understand sequential tasks?

Using vision-language models (VLMs) as reward models in reinforcement learning holds promise for reducing costs and improving safety. So far, VLM reward models have only been used for goal-oriented tasks, where the agent must reach a particular final outcome. We explore VLMs' potential to supervise tasks that cannot be scored by the final state alone. To this end, we introduce ViSTa, a dataset for evaluating Vision-based understanding of Sequential Tasks. ViSTa comprises over 4,000 videos with step-by-step descriptions in virtual home, Minecraft, and real-world environments. Its novel hierarchical structure -- basic single-step tasks composed into more and more complex sequential tasks -- allows a fine-grained understanding of how well VLMs can judge tasks with varying complexity. To illustrate this, we use ViSTa to evaluate state-of-the-art VLMs, including CLIP, ViCLIP, and GPT-4o. We find that, while they are all good at object recognition, they fail to understand sequential tasks, with only GPT-4o achieving non-trivial performance.

Updated: 2024-11-20 11:19:22

标题: ViSTa数据集：视觉语言模型是否理解序列任务？

摘要: 将视觉-语言模型（VLMs）用作奖励模型在强化学习中具有降低成本和提高安全性的潜力。到目前为止，VLM奖励模型仅用于目标导向任务，代理必须达到特定最终结果的情况。我们探索VLM的潜力，监督那些不能仅通过最终状态评分的任务。为此，我们引入了ViSTa，一个用于评估基于视觉的顺序任务理解的数据集。ViSTa包括超过4,000个视频，在虚拟家庭、Minecraft和真实环境中逐步描述。其新颖的分层结构——基本单步任务组成越来越复杂的顺序任务——使得对VLM在评判具有不同复杂性的任务上的能力有一个细致的理解。为了说明这一点，我们使用ViSTa来评估最先进的VLM，包括CLIP、ViCLIP和GPT-4o。我们发现，虽然它们在物体识别方面表现出色，但在理解顺序任务方面表现不佳，只有GPT-4o取得了非平凡的表现。

更新时间: 2024-11-20 11:19:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.13211v1

Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis

This paper examines the integration of real-time talking-head generation for interviewer training, focusing on overcoming challenges in Audio Feature Extraction (AFE), which often introduces latency and limits responsiveness in real-time applications. To address these issues, we propose and implement a fully integrated system that replaces conventional AFE models with Open AI's Whisper, leveraging its encoder to optimize processing and improve overall system efficiency. Our evaluation of two open-source real-time models across three different datasets shows that Whisper not only accelerates processing but also improves specific aspects of rendering quality, resulting in more realistic and responsive talking-head interactions. These advancements make the system a more effective tool for immersive, interactive training applications, expanding the potential of AI-driven avatars in interviewer training.

Updated: 2024-11-20 11:18:05

标题: 实时讲述画像合成的音频特征提取的比较分析

摘要: 本文研究了实时访谈者培训中真人头像生成的整合，重点解决了音频特征提取（AFE）中的挑战，该挑战通常会引入延迟并限制实时应用的响应性。为了解决这些问题，我们提出并实施了一个完全整合的系统，该系统用Open AI的Whisper取代传统的AFE模型，利用其编码器优化处理并提高整体系统效率。我们对三个不同数据集上的两个开源实时模型进行评估，结果显示Whisper不仅加速处理速度，还改善了渲染质量的特定方面，导致更真实和响应灵敏的真人头像互动。这些进步使得该系统成为一种更有效的工具，用于沉浸式、互动式培训应用，扩大了AI驱动的访谈者培训中头像的潜力。

更新时间: 2024-11-20 11:18:05

领域: cs.SD,cs.AI,cs.HC,eess.AS,68T45, 68T07, 68T01

下载: http://arxiv.org/abs/2411.13209v1

FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols

Blockchain adoption has surged with the rise of Decentralized Finance (DeFi) applications. However, the significant value of digital assets managed by DeFi protocols makes them prime targets for attacks. Current smart contract vulnerability detection tools struggle with DeFi protocols due to deep logical bugs arising from complex financial interactions between multiple smart contracts. These tools primarily analyze individual contracts and resort to brute-force methods for DeFi protocols crossing numerous smart contracts, leading to inefficiency. We introduce Foray, a highly effective attack synthesis framework against deep logical bugs in DeFi protocols. Foray proposes a novel attack sketch generation and completion framework. Specifically, instead of treating DeFis as regular programs, we design a domain-specific language (DSL) to lift the low-level smart contracts into their high-level financial operations. Based on our DSL, we first compile a given DeFi protocol into a token flow graph, our graphical representation of DeFi protocols. Then, we design an efficient sketch generation method to synthesize attack sketches for a certain attack goal (e.g., price manipulation, arbitrage, etc.). This algorithm strategically identifies candidate sketches by finding reachable paths in TFG, which is much more efficient than random enumeration. For each candidate sketch written in our DSL, Foray designs a domain-specific symbolic compilation to compile it into SMT constraints. Our compilation simplifies the constraints by removing redundant smart contract semantics. It maintains the usability of symbolic compilation, yet scales to problems orders of magnitude larger. Finally, the candidates are completed via existing solvers and are transformed into concrete attacks via direct syntax transformation.

Updated: 2024-11-20 11:15:42

标题: 突袭：针对DeFi协议中深层逻辑漏洞的有效攻击合成

摘要: 随着去中心化金融（DeFi）应用的兴起，区块链的采用率大幅增加。然而，由于DeFi协议管理的数字资产价值巨大，使其成为攻击的主要目标。当前的智能合约漏洞检测工具在处理DeFi协议时面临困难，因为这些协议中存在着由多个智能合约之间复杂的金融交互引起的深层逻辑漏洞。这些工具主要分析单个合约，并采用蛮力方法来处理跨越多个智能合约的DeFi协议，导致效率低下。我们引入了Foray，一个针对DeFi协议中深层逻辑漏洞的高效攻击合成框架。Foray提出了一种新颖的攻击草图生成和完成框架。具体而言，我们设计了一种领域特定语言（DSL），将低级智能合约提升到其高级金融操作。基于我们的DSL，我们首先将给定的DeFi协议编译成令牌流图，这是我们对DeFi协议的图形表示。然后，我们设计了一种高效的草图生成方法，用于合成特定攻击目标（例如价格操纵、套利等）的攻击草图。该算法通过在TFG中找到可达路径来策略性地识别候选草图，这比随机枚举更有效。对于我们DSL中编写的每个候选草图，Foray设计了一种领域特定的符号编译，将其编译为SMT约束。我们通过消除冗余的智能合约语义简化约束。它保持了符号编译的可用性，同时也能扩展到规模更大的问题。最后，候选草图通过现有求解器完成，并通过直接语法转换转化为具体攻击。

更新时间: 2024-11-20 11:15:42

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2407.06348v3

The Information Security Awareness of Large Language Models

The popularity of large language models (LLMs) continues to increase, and LLM-based assistants have become ubiquitous, assisting people of diverse backgrounds in many aspects of life. Significant resources have been invested in the safety of LLMs and their alignment with social norms. However, research examining their behavior from the information security awareness (ISA) perspective is lacking. Chatbots and LLM-based assistants may put unwitting users in harm's way by facilitating unsafe behavior. We observe that the ISA inherent in some of today's most popular LLMs varies significantly, with most models requiring user prompts with a clear security context to utilize their security knowledge and provide safe responses to users. Based on this observation, we created a comprehensive set of 30 scenarios to assess the ISA of LLMs. These scenarios benchmark the evaluated models with respect to all focus areas defined in a mobile ISA taxonomy. Among our findings is that ISA is mildly affected by changing the model's temperature, whereas adjusting the system prompt can substantially impact it. This underscores the necessity of setting the right system prompt to mitigate ISA weaknesses. Our findings also highlight the importance of ISA assessment for the development of future LLM-based assistants.

Updated: 2024-11-20 11:09:55

标题: 大型语言模型的信息安全意识

摘要: 大型语言模型（LLMs）的流行度不断增加，基于LLM的助手已经变得无处不在，在许多方面帮助各种背景的人们。人们已经投入了大量资源来确保LLMs的安全性和它们与社会规范的一致性。然而，从信息安全意识（ISA）的角度来研究它们的行为还是缺乏的。聊天机器人和基于LLM的助手可能会通过促进不安全的行为而使用户陷入危险。我们观察到，当今一些最受欢迎的LLMs中固有的ISA存在显著差异，大多数模型需要用户提示以清晰的安全背景来利用它们的安全知识，并为用户提供安全的响应。基于这一观察，我们创建了一个包含30个场景的全面集合，以评估LLMs的ISA。这些场景将评估模型与移动ISA分类中定义的所有关注领域进行了基准比较。我们的研究结果表明，通过改变模型的温度，ISA受到轻微影响，而调整系统提示可以显著影响它。这强调了设置正确的系统提示以减轻ISA弱点的必要性。我们的研究结果也突出了对未来基于LLM的助手进行ISA评估的重要性。

更新时间: 2024-11-20 11:09:55

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13207v1

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This innovative approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address the minor errors in generated content, we propose deformable Gaussian splatting with monocular depth initialization and appearance modeling to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. Our results demonstrate the framework's superior performance, showcasing its potential for autonomous driving simulation and beyond.

Updated: 2024-11-20 10:43:51

标题: MagicDrive3D：街景中任意视角渲染的可控3D生成

摘要: 尽管针对图像和视频的可控生成模型取得了显著成功，但针对3D场景的高质量模型，尤其是在无限制场景下（如自动驾驶），由于高昂的数据采集成本而发展不足。本文介绍了MagicDrive3D，一种新颖的可控3D街景生成流程，支持多条件控制，包括BEV地图，3D对象和文本描述。与以往先重建再训练生成模型的方法不同，MagicDrive3D首先训练视频生成模型，然后从生成的数据重建。这种创新方法实现了易于控制的生成和静态场景获取，从而实现高质量的场景重建。为了解决生成内容中的轻微错误，我们提出了可变形的高斯飞溅技术，通过单眼深度初始化和外观建模来管理不同视点之间的曝光差异。在nuScenes数据集上验证，MagicDrive3D生成多样化、高质量的3D驾驶场景，支持任意视角渲染，并增强BEV分割等下游任务。我们的结果展示了该框架的优越性能，展示了它在自动驾驶模拟及其他领域的潜力。

更新时间: 2024-11-20 10:43:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.14475v3

Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks

As deep learning models are increasingly deployed in safety-critical applications, evaluating their vulnerabilities to adversarial perturbations is essential for ensuring their reliability and trustworthiness. Over the past decade, a large number of white-box adversarial robustness evaluation methods (i.e., attacks) have been proposed, ranging from single-step to multi-step methods and from individual to ensemble methods. Despite these advances, challenges remain in conducting meaningful and comprehensive robustness evaluations, particularly when it comes to large-scale testing and ensuring evaluations reflect real-world adversarial risks. In this work, we focus on image classification models and propose a novel individual attack method, Probability Margin Attack (PMA), which defines the adversarial margin in the probability space rather than the logits space. We analyze the relationship between PMA and existing cross-entropy or logits-margin-based attacks, and show that PMA can outperform the current state-of-the-art individual methods. Building on PMA, we propose two types of ensemble attacks that balance effectiveness and efficiency. Furthermore, we create a million-scale dataset, CC1M, derived from the existing CC3M dataset, and use it to conduct the first million-scale white-box adversarial robustness evaluation of adversarially-trained ImageNet models. Our findings provide valuable insights into the robustness gaps between individual versus ensemble attacks and small-scale versus million-scale evaluations.

Updated: 2024-11-20 10:41:23

标题: 朝着百万规模的对抗鲁棒性评估：更强大的个体攻击

摘要: 随着深度学习模型在安全关键应用中的部署越来越多，评估它们对对抗性扰动的脆弱性对于确保它们的可靠性和可信度至关重要。在过去的十年中，已经提出了大量的白盒对抗鲁棒性评估方法（即攻击），涵盖了从单步到多步方法以及从个体到集成方法。尽管取得了这些进展，但在进行有意义和全面的鲁棒性评估方面仍然存在挑战，特别是在进行大规模测试和确保评估反映真实世界对抗风险时。在这项工作中，我们关注图像分类模型，并提出了一种新颖的个体攻击方法，概率边界攻击（PMA），该方法在概率空间而不是对数空间中定义对抗边界。我们分析了PMA与现有交叉熵或对数边界攻击之间的关系，并展示了PMA可以胜过当前的最先进个体方法。基于PMA，我们提出了两种平衡效果和效率的集成攻击。此外，我们创建了一个源自现有CC3M数据集的百万级数据集CC1M，并使用它进行了对对抗训练的ImageNet模型进行百万级白盒对抗鲁棒性评估的第一次研究。我们的发现为个体与集成攻击以及小规模与百万级评估之间的鲁棒性差距提供了宝贵的见解。

更新时间: 2024-11-20 10:41:23

领域: cs.LG,cs.AI,cs.CR,cs.CV

下载: http://arxiv.org/abs/2411.15210v1

Engagement-Driven Content Generation with Large Language Models

Large Language Models (LLMs) exhibit significant persuasion capabilities in one-on-one interactions, but their influence within social networks remains underexplored. This study investigates the potential social impact of LLMs in these environments, where interconnected users and complex opinion dynamics pose unique challenges. In particular, we address the following research question: can LLMs learn to generate meaningful content that maximizes user engagement on social networks? To answer this question, we define a pipeline to guide the LLM-based content generation which employs reinforcement learning with simulated feedback. In our framework, the reward is based on an engagement model borrowed from the literature on opinion dynamics and information propagation. Moreover, we force the text generated by the LLM to be aligned with a given topic and to satisfy a minimum fluency requirement. Using our framework, we analyze the capabilities and limitations of LLMs in tackling the given task, specifically considering the relative positions of the LLM as an agent within the social network and the distribution of opinions in the network on the given topic. Our findings show the full potential of LLMs in creating social engagement. Notable properties of our approach are that the learning procedure is adaptive to the opinion distribution of the underlying network and agnostic to the specifics of the engagement model, which is embedded as a plug-and-play component. In this regard, our approach can be easily refined for more complex engagement tasks and interventions in computational social science. The code used for the experiments is publicly available at https://anonymous.4open.science/r/EDCG/.

Updated: 2024-11-20 10:40:08

标题: 基于大型语言模型的互动驱动内容生成

摘要: 大型语言模型（LLMs）在一对一互动中表现出显著的说服能力，但它们在社交网络中的影响尚未得到充分探讨。本研究调查了LLMs在这些环境中的潜在社会影响，其中相互连接的用户和复杂的意见动态提出了独特的挑战。特别地，我们探讨以下研究问题：LLMs是否可以学习生成最大化用户在社交网络上参与的有意义内容？为了回答这个问题，我们定义了一个管道，指导基于LLM的内容生成，该管道利用了强化学习和模拟反馈。在我们的框架中，奖励基于从意见动态和信息传播文献中借鉴的参与模型。此外，我们强制LLM生成的文本与给定主题保持一致，并满足最低流畅度要求。使用我们的框架，我们分析了LLMs在处理给定任务中的能力和局限性，特别考虑LLM作为社交网络中的一个代理的相对位置以及网络中关于给定主题的意见分布。我们的研究结果显示了LLMs在创建社会参与方面的全部潜力。我们方法的显著特性是，学习过程对底层网络的意见分布是自适应的，并且对于嵌入为即插即用组件的参与模型的细节是不可知的。在这方面，我们的方法可以很容易地优化用于更复杂的参与任务和干预计算社会科学。用于实验的代码可以在https://anonymous.4open.science/r/EDCG/公开获取。

更新时间: 2024-11-20 10:40:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.13187v1

How Much Data is Enough? Optimization of Data Collection for Artifact Detection in EEG Recordings

Objective. Electroencephalography (EEG) is a widely used neuroimaging technique known for its cost-effectiveness and user-friendliness. However, various artifacts, particularly biological artifacts like Electromyography (EMG) signals, lead to a poor signal-to-noise ratio, limiting the precision of analyses and applications. The currently reported EEG data cleaning performance largely depends on the data used for validation, and in the case of machine learning approaches, also on the data used for training. The data are typically gathered either by recruiting subjects to perform specific artifact tasks or by integrating existing datasets. Prevailing approaches, however, tend to rely on intuitive, concept-oriented data collection with minimal justification for the selection of artifacts and their quantities. Given the substantial costs associated with biological data collection and the pressing need for effective data utilization, we propose an optimization procedure for data-oriented data collection design using deep learning-based artifact detection. Approach. We apply a binary classification between artifact epochs (time intervals containing artifacts) and non-artifact epochs (time intervals containing no artifact) using three different neural architectures. Our aim is to minimize data collection efforts while preserving the cleaning efficiency. Main results. We were able to reduce the number of artifact tasks from twelve to three and decrease repetitions of isometric contraction tasks from ten to three or sometimes even just one. Significance. Our work addresses the need for effective data utilization in biological data collection, offering a systematic and dynamic quantitative approach. By providing clear justifications for the choices of artifacts and their quantity, we aim to guide future studies toward more effective and economical data collection in EEG and EMG research.

Updated: 2024-11-20 10:38:55

标题: 数据收集的优化：脑电图记录中伪迹检测需要多少数据？

摘要: 目标。脑电图（EEG）是一种广泛使用的神经成像技术，因其成本效益和易用性而闻名。然而，各种各样的伪迹，特别是生物伪迹如肌电图（EMG）信号，导致信噪比较低，限制了分析和应用的精度。目前报道的脑电图数据清理性能在很大程度上取决于用于验证的数据，并且在机器学习方法的情况下，还取决于用于训练的数据。数据通常是通过招募受试者执行特定的伪迹任务或整合现有数据集来收集的。然而，目前的方法倾向于依赖于直观、概念导向的数据收集，对于伪迹和其数量的选择提供了最少的理由。鉴于生物数据收集所带来的巨大成本以及对有效数据利用的迫切需求，我们提出了一种利用基于深度学习的伪迹检测进行数据导向数据收集设计的优化程序。方法。我们应用三种不同的神经架构在伪迹时段（包含伪迹的时间间隔）和非伪迹时段（不包含伪迹的时间间隔）之间进行二元分类。我们的目标是在保持清理效率的同时最小化数据收集工作。主要结果。我们将伪迹任务的数量从十二减少到三，并将等长收缩任务的重复次数从十减少到三次，有时甚至只有一次。意义。我们的工作解决了在生物数据收集中有效利用数据的需求，提供了一种系统化和动态的定量方法。通过为伪迹及其数量的选择提供明确的理由，我们旨在引导未来的研究朝着在脑电图和肌电图研究中更有效和经济的数据收集方向发展。

更新时间: 2024-11-20 10:38:55

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2411.11886v2

Operator learning without the adjoint

There is a mystery at the heart of operator learning: how can one recover a non-self-adjoint operator from data without probing the adjoint? Current practical approaches suggest that one can accurately recover an operator while only using data generated by the forward action of the operator without access to the adjoint. However, naively, it seems essential to sample the action of the adjoint. In this paper, we partially explain this mystery by proving that without querying the adjoint, one can approximate a family of non-self-adjoint infinite-dimensional compact operators via projection onto a Fourier basis. We then apply the result to recovering Green's functions of elliptic partial differential operators and derive an adjoint-free sample complexity bound. While existing theory justifies low sample complexity in operator learning, ours is the first adjoint-free analysis that attempts to close the gap between theory and practice.

Updated: 2024-11-20 10:38:29

标题: 无伴随算子学习

摘要: 在操作员学习的核心存在一个谜团：如何在没有探测伴随算子的情况下从数据中恢复一个非自伴算子？当前的实际方法表明，可以在仅使用由算子前向操作生成的数据的情况下准确恢复算子，而无需访问伴随算子。然而，从幼稚的角度看，似乎必须对伴随算子的操作进行取样。在本文中，我们通过证明在不查询伴随算子的情况下，可以通过投影到傅立叶基函数来近似一类非自伴无限维紧算子，部分解释了这个谜团。然后，我们将结果应用于恢复椭圆偏微分算子的格林函数，并推导出无需伴随算子的样本复杂度界限。虽然现有理论证明了操作员学习中的低样本复杂度，但我们的研究是第一个试图在理论和实践之间缩小差距的无需伴随分析。

更新时间: 2024-11-20 10:38:29

领域: math.NA,cs.AI,cs.LG,cs.NA

下载: http://arxiv.org/abs/2401.17739v2

Regional Ocean Forecasting with Hierarchical Graph Neural Networks

Accurate ocean forecasting systems are vital for understanding marine dynamics, which play a crucial role in environmental management and climate adaptation strategies. Traditional numerical solvers, while effective, are computationally expensive and time-consuming. Recent advancements in machine learning have revolutionized weather forecasting, offering fast and energy-efficient alternatives. Building on these advancements, we introduce SeaCast, a neural network designed for high-resolution, medium-range ocean forecasting. SeaCast employs a graph-based framework to effectively handle the complex geometry of ocean grids and integrates external forcing data tailored to the regional ocean context. Our approach is validated through experiments at a high spatial resolution using the operational numerical model of the Mediterranean Sea provided by the Copernicus Marine Service, along with both numerical and data-driven atmospheric forcings.

Updated: 2024-11-20 10:33:05

标题: 使用分层图神经网络进行区域海洋预报

摘要: 精确的海洋预测系统对于理解海洋动力学至关重要，这在环境管理和气候适应策略中发挥着关键作用。传统的数值求解器虽然有效，但计算成本高且耗时。机器学习的最新进展已经彻底改变了天气预报，提供了快速和节能的替代方案。在这些进展的基础上，我们介绍了SeaCast，这是一个专为高分辨率、中期范围海洋预测而设计的神经网络。SeaCast采用基于图的框架有效处理海洋网格的复杂几何结构，并整合了针对区域海洋背景定制的外部驱动数据。我们的方法通过使用欧洲气象卫星局提供的地中海操作数值模型以及数值和数据驱动的大气驱动力在高空间分辨率下进行实验验证。

更新时间: 2024-11-20 10:33:05

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2410.11807v2

DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection based on Image Representation of Bytecode

Computer vision has witnessed several advances in recent years, with unprecedented performance provided by deep representation learning research. Image formats thus appear attractive to other fields such as malware detection, where deep learning on images alleviates the need for comprehensively hand-crafted features generalising to different malware variants. We postulate that this research direction could become the next frontier in Android malware detection, and therefore requires a clear roadmap to ensure that new approaches indeed bring novel contributions. We contribute with a first building block by developing and assessing a baseline pipeline for image-based malware detection with straightforward steps. We propose DexRay, which converts the bytecode of the app DEX files into grey-scale "vector" images and feeds them to a 1-dimensional Convolutional Neural Network model. We view DexRay as foundational due to the exceedingly basic nature of the design choices, allowing to infer what could be a minimal performance that can be obtained with image-based learning in malware detection. The performance of DexRay evaluated on over 158k apps demonstrates that, while simple, our approach is effective with a high detection rate (F1-score= 0.96). Finally, we investigate the impact of time decay and image-resizing on the performance of DexRay and assess its resilience to obfuscation. This work-in-progress paper contributes to the domain of Deep Learning based Malware detection by providing a sound, simple, yet effective approach (with available artefacts) that can be the basis to scope the many profound questions that will need to be investigated to fully develop this domain.

Updated: 2024-11-20 10:31:37

标题: DexRay：一种简单而有效的基于字节码图像表示的安卓恶意软件检测深度学习方法

摘要: 计算机视觉在近年来取得了几项进展，深度表示学习研究提供了前所未有的性能。图像格式因此对其他领域如恶意软件检测具有吸引力，其中图像上的深度学习减轻了对全面手工制作特征以泛化到不同恶意软件变种的需要。我们假设这一研究方向可能成为Android恶意软件检测的下一个前沿，因此需要明确的路线图，以确保新方法确实带来新颖的贡献。我们通过开发和评估基于图像的恶意软件检测的基线流程，为此做出了贡献。我们提出了DexRay，将应用程序DEX文件的字节码转换为灰度“向量”图像，并将其馈送给一维卷积神经网络模型。由于设计选择的极其基础的性质，我们将DexRay视为基础性的，使得我们可以推断出在恶意软件检测中可以通过基于图像的学习获得的最低性能。在超过158k个应用程序上评估的DexRay的性能表明，虽然简单，我们的方法有效，并具有高检测率（F1分数= 0.96）。最后，我们调查了时间衰减和图像调整对DexRay性能的影响，并评估其对混淆的韧性。这篇正在进行中的论文通过提供一个坚实、简单且有效的方法（带有可用的工件），为基于深度学习的恶意软件检测领域做出了贡献，这可以成为探讨需要全面研究的许多深刻问题的基础。

更新时间: 2024-11-20 10:31:37

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2109.03326v2

Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning

The classification of distracted drivers is pivotal for ensuring safe driving. Previous studies demonstrated the effectiveness of neural networks in automatically predicting driver distraction, fatigue, and potential hazards. However, recent research has uncovered a significant loss of accuracy in these models when applied to samples acquired under conditions that differ from the training data. In this paper, we introduce a robust model designed to withstand changes in camera position within the vehicle. Our Driver Behavior Monitoring Network (DBMNet) relies on a lightweight backbone and integrates a disentanglement module to discard camera view information from features, coupled with contrastive learning to enhance the encoding of various driver actions. Experiments conducted on the daytime and nighttime subsets of the 100-Driver dataset validate the effectiveness of our approach with an increment on average of 9\% in Top-1 accuracy in comparison with the state of the art. In addition, cross-dataset and cross-camera experiments conducted on three benchmark datasets, namely AUCDD-V1, EZZ2021 and SFD, demonstrate the superior generalization capability of the proposed method.

Updated: 2024-11-20 10:27:12

标题: 跨摄像头分心驾驶员分类的特征解缠和对比学习

摘要: 分辨注意力分散驾驶者的分类对确保安全驾驶至关重要。先前的研究表明，神经网络在自动预测驾驶员分心、疲劳和潜在危险方面的有效性。然而，最近的研究发现，当这些模型应用于与训练数据不同条件下获得的样本时，精度会显著下降。在本文中，我们介绍了一个稳健的模型，旨在抵御车辆内摄像头位置的变化。我们的驾驶行为监测网络（DBMNet）依赖于轻量级骨干，并集成了一个解缠模块，用于从特征中丢弃摄像头视图信息，再结合对比学习来增强各种驾驶员动作的编码。在100-Driver数据集的日间和夜间子集上进行的实验验证了我们方法的有效性，与现有技术相比，Top-1准确率平均增加了9\%。此外，在三个基准数据集AUCDD-V1、EZZ2021和SFD上进行的跨数据集和跨摄像头实验展示了所提出方法的优越泛化能力。

更新时间: 2024-11-20 10:27:12

领域: cs.CV,cs.AI,cs.CY

下载: http://arxiv.org/abs/2411.13181v1

Writing Style Matters: An Examination of Bias and Fairness in Information Retrieval Systems

The rapid advancement of Language Model technologies has opened new opportunities, but also introduced new challenges related to bias and fairness. This paper explores the uncharted territory of potential biases in state-of-the-art universal text embedding models towards specific document and query writing styles within Information Retrieval (IR) systems. Our investigation reveals that different embedding models exhibit different preferences of document writing style, while more informal and emotive styles are less favored by most embedding models. In terms of query writing styles, many embedding models tend to match the style of the query with the style of the retrieved documents, but some show a consistent preference for specific styles. Text embedding models fine-tuned on synthetic data generated by LLMs display a consistent preference for certain style of generated data. These biases in text embedding based IR systems can inadvertently silence or marginalize certain communication styles, thereby posing a significant threat to fairness in information retrieval. Finally, we also compare the answer styles of Retrieval Augmented Generation (RAG) systems based on different LLMs and find out that most text embedding models are biased towards LLM's answer styles when used as evaluation metrics for answer correctness. This study sheds light on the critical issue of writing style based bias in IR systems, offering valuable insights for the development of more fair and robust models.

Updated: 2024-11-20 10:17:09

标题: 写作风格重要性：信息检索系统中的偏见和公平性审查

摘要: 语言模型技术的快速发展开辟了新的机遇，但也带来了与偏见和公平有关的新挑战。本文探讨了当前最先进的通用文本嵌入模型在信息检索系统中对特定文档和查询写作风格存在潜在偏见的未知领域。我们的调查显示，不同的嵌入模型展现出对文档写作风格的不同偏好，而更为非正式和情感化的风格则大多不受大多数嵌入模型青睐。在查询写作风格方面，许多嵌入模型倾向于将查询的风格与检索到的文档的风格匹配，但也有一些显示出对特定风格的一贯偏好。在LMM生成的合成数据上进行微调的文本嵌入模型显示出对特定生成数据风格的一致偏好。基于文本嵌入的信息检索系统中的这些偏见可能无意中压制或边缘化某些沟通风格，从而对信息检索中的公平性构成重大威胁。最后，我们还比较了基于不同LMM的检索增强生成（RAG）系统的回答风格，发现大多数文本嵌入模型在作为答案正确性评估指标时对LMM的回答风格存在偏见。这项研究揭示了信息检索系统中基于写作风格的偏见的关键问题，为开发更加公平和健壮的模型提供了宝贵的见解。

更新时间: 2024-11-20 10:17:09

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2411.13173v1

A Unified Analysis for Finite Weight Averaging

Averaging iterations of Stochastic Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Stochastic Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA). Especially, with a finite weight averaging method, LAWA can attain faster convergence and better generalization. However, its theoretical explanation is still less explored since there are fundamental differences between finite and infinite settings. In this work, we first generalize SGD and LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization. A key challenge is the inapplicability of traditional methods in the sense of expectation or optimal values for infinite-dimensional settings in analyzing FWA's convergence. Second, the cumulative gradients introduced by FWA introduce additional confusion to the generalization analysis, especially making it more difficult to discuss them under different assumptions. Extending the final iteration convergence analysis to the FWA, this paper, under a convexity assumption, establishes a convergence bound $\mathcal{O}(\log\left(\frac{T}{k}\right)/\sqrt{T})$, where $k\in[1, T/2]$ is a constant representing the last $k$ iterations. Compared to SGD with $\mathcal{O}(\log(T)/\sqrt{T})$, we prove theoretically that FWA has a faster convergence rate and explain the effect of the number of average points. In the generalization analysis, we find a recursive representation for bounding the cumulative gradient using mathematical induction. We provide bounds for constant and decay learning rates and the convex and non-convex cases to show the good generalization performance of FWA. Finally, experimental results on several benchmarks verify our theoretical results.

Updated: 2024-11-20 10:08:22

标题: 有限权重平均的统一分析

摘要: 随机梯度下降（SGD）的平均迭代已经在训练深度学习模型中取得了经验成功，例如随机权重平均（SWA）、指数移动平均（EMA）和最新权重平均（LAWA）。特别是，有限权重平均方法下，LAWA 可以实现更快的收敛和更好的泛化。然而，由于有限和无限设置之间存在基本差异，其理论解释仍然不够深入。在这项工作中，我们首先将 SGD 和 LAWA 推广为有限权重平均（FWA），并从优化和泛化的角度解释它们相对于 SGD 的优势。一个关键挑战是传统方法在分析 FWA 收敛时在无限维设置中的期望或最优值意义上的不适用性。其次，FWA 引入的累积梯度给泛化分析带来了额外的困惑，尤其是在不同假设下更难以讨论它们。将最终迭代收敛分析扩展到 FWA，本文在凸性假设下建立了一个收敛界限 $O(\log(T/k)/\sqrt{T})$，其中 $k\in[1, T/2]$ 是代表最后 $k$ 次迭代的常数。与 SGD 的 $O(\log(T)/\sqrt{T})$ 相比，我们在理论上证明了 FWA 具有更快的收敛速度，并解释了平均点数的影响。在泛化分析中，我们通过数学归纳找到了一个用于界定累积梯度的递归表示。我们提供了常数和衰减学习率、凸性和非凸性情况的界限，以展示FWA 的良好泛化性能。最后，对几个基准的实验结果验证了我们的理论结果。

更新时间: 2024-11-20 10:08:22

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2411.13169v1

Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding

The reuse of historical clinical trial data has significant potential to accelerate medical research and drug development. However, interoperability challenges, particularly with missing medical codes, hinders effective data integration across studies. While Large Language Models (LLMs) offer a promising solution for automated coding without labeled data, current approaches face challenges on complex coding tasks. We introduce ALIGN, a novel compositional LLM-based system for automated, zero-shot medical coding. ALIGN follows a three-step process: (1) diverse candidate code generation; (2) self-evaluation of codes and (3) confidence scoring and uncertainty estimation enabling human deferral to ensure reliability. We evaluate ALIGN on harmonizing medication terms into Anatomical Therapeutic Chemical (ATC) and medical history terms into Medical Dictionary for Regulatory Activities (MedDRA) codes extracted from 22 immunology trials. ALIGN outperformed the LLM baselines, while also providing capabilities for trustworthy deployment. For MedDRA coding, ALIGN achieved high accuracy across all levels, matching RAG and excelling at the most specific levels (87-90% for HLGT). For ATC coding, ALIGN demonstrated superior performance, particularly at lower hierarchy levels (ATC Level 4), with 72-73% overall accuracy and 86-89% accuracy for common medications, outperforming baselines by 7-22%. ALIGN's uncertainty-based deferral improved accuracy by 17% to 90% accuracy with 30% deferral, notably enhancing performance on uncommon medications. ALIGN achieves this cost-efficiently at \$0.0007 and \$0.02 per code for GPT-4o-mini and GPT-4o, reducing barriers to clinical adoption. ALIGN advances automated medical coding for clinical trial data, contributing to enhanced data interoperability and reusability, positioning it as a promising tool to improve clinical research and accelerate drug development.

Updated: 2024-11-20 09:59:12

标题: 使用ALIGN解锁历史临床试验数据：一种用于医学编码的组合式大型语言模型系统

摘要: 历史临床试验数据的重复利用具有加速医学研究和药物开发的巨大潜力。然而，特别是在缺失医疗编码方面，互操作性挑战阻碍了研究中数据的有效整合。虽然大型语言模型(LLMs)提供了一种无需标记数据的自动编码的有前途的解决方案，但目前的方法在复杂编码任务上面临挑战。我们介绍了ALIGN，这是一种基于组合LLM的全自动、零-shot医学编码系统。ALIGN遵循三个步骤的过程：(1)生成多样化的候选编码；(2)自我评估编码；(3)置信度评分和不确定性估计，使人类可以推迟以确保可靠性。我们在从22个免疫学试验中提取的药物术语和医疗史术语中进行了对ALIGN的评估，将其调和为解剖治疗化学编码(ATC)和医疗字典(用于监管活动)编码(MedDRA)。ALIGN的表现优于LLM基线，并提供了可靠部署的能力。对于MedDRA编码，ALIGN在所有级别上均取得了高准确度，在最具体级别(对于HLGT而言为87-90%)方面表现出色。对于ATC编码，ALIGN表现出优越的性能，特别是在较低的层次水平(ATC Level 4)，总体准确率为72-73%，常见药物的准确率为86-89%，胜过基线7-22%。ALIGN基于不确定性的推迟将准确度提高了17%，在30%推迟的情况下达到90%的准确率，显著提高了对不常见药物的性能。ALIGN在GPT-4o-mini和GPT-4o上的每个代码的成本为0.0007美元和0.02美元，降低了临床采用的障碍。ALIGN推动了临床试验数据的自动医学编码，有助于增强数据的互操作性和可重用性，使其成为改善临床研究并加速药物开发的有前途的工具。

更新时间: 2024-11-20 09:59:12

领域: cs.LG

下载: http://arxiv.org/abs/2411.13163v1

Non-Linear Outlier Synthesis for Out-of-Distribution Detection

The reliability of supervised classifiers is severely hampered by their limitations in dealing with unexpected inputs, leading to great interest in out-of-distribution (OOD) detection. Recently, OOD detectors trained on synthetic outliers, especially those generated by large diffusion models, have shown promising results in defining robust OOD decision boundaries. Building on this progress, we present NCIS, which enhances the quality of synthetic outliers by operating directly in the diffusion's model embedding space rather than combining disjoint models as in previous work and by modeling class-conditional manifolds with a conditional volume-preserving network for more expressive characterization of the training distribution. We demonstrate that these improvements yield new state-of-the-art OOD detection results on standard ImageNet100 and CIFAR100 benchmarks and provide insights into the importance of data pre-processing and other key design choices. We make our code available at \url{https://github.com/LarsDoorenbos/NCIS}.

Updated: 2024-11-20 09:47:29

标题: 非线性异常值合成用于超出分布检测

摘要: 监督分类器的可靠性受到其在处理意外输入方面的限制严重影响，导致对超出分布（OOD）检测产生浓厚兴趣。最近，基于合成异常值训练的OOD检测器，特别是那些由大型扩散模型生成的异常值，已经展示出在定义稳健的OOD决策边界方面的有希望的结果。在此进展基础上，我们提出了NCIS，通过在扩散模型嵌入空间中直接操作来提高合成异常值的质量，而不是像以前的工作那样结合不连续模型，并通过使用条件体积保持网络对类-条件流形进行建模，以更具表现力地描述训练分布。我们展示了这些改进在标准ImageNet100和CIFAR100基准测试中产生了新的最先进的OOD检测结果，并提供了关于数据预处理和其他关键设计选择重要性的见解。我们将我们的代码提供在 \url{https://github.com/LarsDoorenbos/NCIS}。

更新时间: 2024-11-20 09:47:29

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13619v1

Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding

Efficient inference in large language models (LLMs) has become a critical focus as their scale and complexity grow. Traditional autoregressive decoding, while effective, suffers from computational inefficiencies due to its sequential token generation process. Speculative decoding addresses this bottleneck by introducing a two-stage framework: drafting and verification. A smaller, efficient model generates a preliminary draft, which is then refined by a larger, more sophisticated model. This paper provides a comprehensive survey of speculative decoding methods, categorizing them into draft-centric and model-centric approaches. We discuss key ideas associated with each method, highlighting their potential for scaling LLM inference. This survey aims to guide future research in optimizing speculative decoding and its integration into real-world LLM applications.

Updated: 2024-11-20 09:46:30

标题: 对高效推断方法的深入研究：一项探索性解码调查

摘要: 随着大型语言模型（LLMs）的规模和复杂性不断增长，高效推断已成为关键关注点。传统的自回归解码虽然有效，但由于其顺序生成令牌的过程而存在计算效率低下的问题。猜测解码通过引入两阶段框架：起草和验证，来解决这一瓶颈。一个更小、高效的模型生成初步草案，然后由一个更大、更复杂的模型进行精化。本文对猜测解码方法进行了全面调查，将其归类为以草案为中心和以模型为中心的方法。我们讨论了与每种方法相关的关键思想，突出了它们在扩展LLM推断方面的潜力。本调查旨在引导未来研究优化猜测解码及其在现实世界LLM应用中的整合。

更新时间: 2024-11-20 09:46:30

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13157v1

DMQR-RAG: Diverse Multi-Query Rewriting for RAG

Large language models often encounter challenges with static knowledge and hallucinations, which undermine their reliability. Retrieval-augmented generation (RAG) mitigates these issues by incorporating external information. However, user queries frequently contain noise and intent deviations, necessitating query rewriting to improve the relevance of retrieved documents. In this paper, we introduce DMQR-RAG, a Diverse Multi-Query Rewriting framework designed to improve the performance of both document retrieval and final responses in RAG. Specifically, we investigate how queries with varying information quantities can retrieve a diverse array of documents, presenting four rewriting strategies that operate at different levels of information to enhance the performance of baseline approaches. Additionally, we propose an adaptive strategy selection method that minimizes the number of rewrites while optimizing overall performance. Our methods have been rigorously validated through extensive experiments conducted in both academic and industry settings.

Updated: 2024-11-20 09:43:30

标题: DMQR-RAG：多样化RAG的多查询重写

摘要: 大型语言模型常常遇到静态知识和虚构的挑战，这影响了它们的可靠性。检索增强生成（RAG）通过整合外部信息来缓解这些问题。然而，用户查询经常包含噪音和意图偏差，需要进行查询重写以提高检索文档的相关性。在本文中，我们介绍了DMQR-RAG，这是一个旨在改善RAG中文档检索和最终响应性能的多样查询重写框架。具体来说，我们研究了具有不同信息量的查询如何检索出各种文档，提出了四种在不同信息水平上操作的重写策略，以增强基线方法的性能。此外，我们提出了一种自适应策略选择方法，可以在优化整体性能的同时最小化重写次数。我们的方法已通过在学术和工业环境中进行的大量实验进行了严格验证。

更新时间: 2024-11-20 09:43:30

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2411.13154v1

Long-term Detection System for Six Kinds of Abnormal Behavior of the Elderly Living Alone

The proportion of elderly people is increasing worldwide, particularly those living alone in Japan. As elderly people get older, their risks of physical disabilities and health issues increase. To automatically discover these issues at a low cost in daily life, sensor-based detection in a smart home is promising. As part of the effort towards early detection of abnormal behaviors, we propose a simulator-based detection systems for six typical anomalies: being semi-bedridden, being housebound, forgetting, wandering, fall while walking and fall while standing. Our detection system can be customized for various room layout, sensor arrangement and resident's characteristics by training detection classifiers using the simulator with the parameters fitted to individual cases. Considering that the six anomalies that our system detects have various occurrence durations, such as being housebound for weeks or lying still for seconds after a fall, the detection classifiers of our system produce anomaly labels depending on each anomaly's occurrence duration, e.g., housebound per day and falls per second. We propose a method that standardizes the processing of sensor data, and uses a simple detection approach. Although the validity depends on the realism of the simulation, numerical evaluations using sensor data that includes a variety of resident behavior patterns over nine years as test data show that (1) the methods for detecting wandering and falls are comparable to previous methods, and (2) the methods for detecting being semi-bedridden, being housebound, and forgetting achieve a sensitivity of over 0.9 with fewer than one false alarm every 50 days.

Updated: 2024-11-20 09:42:08

标题: 长期监测系统：针对独居老年人的六种异常行为

摘要: 全球老年人口比例正在增加，特别是在日本独居的老年人数量在增加。随着老年人年龄的增长，他们面临体能残障和健康问题的风险增加。为了在日常生活中以低成本自动发现这些问题，基于传感器的智能家居检测是有前景的。作为早期检测异常行为的努力的一部分，我们提出了一个基于模拟器的检测系统，用于六种典型的异常情况：半卧床、被困家中、忘记、徘徊、行走时摔倒和站立时摔倒。我们的检测系统可以通过使用适合个体案例的参数训练检测分类器来定制各种房间布局、传感器布置和居民特征。考虑到我们的系统检测的六种异常情况具有不同的发生持续时间，例如被困家中数周或摔倒后静止数秒，我们的系统的检测分类器根据每种异常的发生持续时间产生异常标签，例如每天被困家中和每秒摔倒次数。我们提出了一种标准化传感器数据处理的方法，并使用简单的检测方法。尽管有效性取决于模拟的逼真程度，但使用包括九年内各种居民行为模式的传感器数据作为测试数据的数值评估显示：（1）检测徘徊和摔倒的方法与先前方法相当，（2）检测半卧床、被困家中和忘记的方法在50天内每少于一个误报的情况下实现了超过0.9的灵敏度。

更新时间: 2024-11-20 09:42:08

领域: cs.LG

下载: http://arxiv.org/abs/2411.13153v1

AGLP: A Graph Learning Perspective for Semi-supervised Domain Adaptation

In semi-supervised domain adaptation (SSDA), the model aims to leverage partially labeled target domain data along with a large amount of labeled source domain data to enhance its generalization capability for the target domain. A key advantage of SSDA is its ability to significantly reduce reliance on labeled data, thereby lowering the costs and time associated with data preparation. Most existing SSDA methods utilize information from domain labels and class labels but overlook the structural information of the data. To address this issue, this paper proposes a graph learning perspective (AGLP) for semi-supervised domain adaptation. We apply the graph convolutional network to the instance graph which allows structural information to propagate along the weighted graph edges. The proposed AGLP model has several advantages. First, to the best of our knowledge, this is the first work to model structural information in SSDA. Second, the proposed model can effectively learn domain-invariant and semantic representations, reducing domain discrepancies in SSDA. Extensive experimental results on multiple standard benchmarks demonstrate that the proposed AGLP algorithm outperforms state-of-the-art semi-supervised domain adaptation methods.

Updated: 2024-11-20 09:41:41

标题: AGLP：一种用于半监督领域自适应的图学习视角

摘要: 在半监督领域自适应（SSDA）中，模型旨在利用部分标记的目标领域数据以及大量标记的源领域数据，以增强其对目标领域的泛化能力。SSDA的一个关键优势是能够显著减少对标记数据的依赖，从而降低与数据准备相关的成本和时间。大多数现有的SSDA方法利用领域标签和类别标签的信息，但忽视了数据的结构信息。为了解决这个问题，本文提出了半监督领域适应的图学习视角（AGLP）。我们将图卷积网络应用于实例图，从而使结构信息沿着加权图边传播。所提出的AGLP模型具有几个优点。首先，据我们所知，这是第一项在SSDA中对结构信息进行建模的工作。其次，所提出的模型能够有效学习领域不变和语义表示，减少SSDA中的领域差异。对多个标准基准测试结果表明，所提出的AGLP算法优于最先进的半监督领域适应方法。

更新时间: 2024-11-20 09:41:41

领域: cs.CV,cs.AI,68T07, 92C55, 62H35,I.2.6; I.4.10; J.3

下载: http://arxiv.org/abs/2411.13152v1

YCB-LUMA: YCB Object Dataset with Luminance Keying for Object Localization

Localizing target objects in images is an important task in computer vision. Often it is the first step towards solving a variety of applications in autonomous driving, maintenance, quality insurance, robotics, and augmented reality. Best in class solutions for this task rely on deep neural networks, which require a set of representative training data for best performance. Creating sets of sufficient quality, variety, and size is often difficult, error prone, and expensive. This is where the method of luminance keying can help: it provides a simple yet effective solution to record high quality data for training object detection and segmentation. We extend previous work that presented luminance keying on the common YCB-V set of household objects by recording the remaining objects of the YCB superset. The additional variety of objects - addition of transparency, multiple color variations, non-rigid objects - further demonstrates the usefulness of luminance keying and might be used to test the applicability of the approach on new 2D object detection and segmentation algorithms.

Updated: 2024-11-20 09:32:22

标题: YCB-LUMA：带有亮度键控的YCB物体数据集，用于物体定位

摘要: 在计算机视觉中，定位图像中的目标对象是一项重要的任务。通常，这是解决自动驾驶、维护、质量保证、机器人和增强现实等各种应用的第一步。这项任务的最佳解决方案依赖于深度神经网络，为了获得最佳性能，这些网络需要一组代表性的训练数据。创建具有足够质量、多样性和规模的数据集通常是困难的、容易出错的和昂贵的。在这里，亮度键控的方法可以帮助：它提供了一个简单而有效的解决方案，用于记录训练目标检测和分割的高质量数据。我们扩展了先前在常见的YCB-V家用物品集上展示亮度键控的工作，通过记录YCB超集中的其余物品。物品的额外多样性-透明度的增加、多种颜色变化、非刚性物品-进一步展示了亮度键控的有效性，并可能用于测试该方法在新的2D目标检测和分割算法上的适用性。

更新时间: 2024-11-20 09:32:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.13149v1

GraphCL: Graph-based Clustering for Semi-Supervised Medical Image Segmentation

Semi-supervised learning (SSL) has made notable advancements in medical image segmentation (MIS), particularly in scenarios with limited labeled data and significantly enhancing data utilization efficiency. Previous methods primarily focus on complex training strategies to utilize unlabeled data but neglect the importance of graph structural information. Different from existing methods, we propose a graph-based clustering for semi-supervised medical image segmentation (GraphCL) by jointly modeling graph data structure in a unified deep model. The proposed GraphCL model enjoys several advantages. Firstly, to the best of our knowledge, this is the first work to model the data structure information for semi-supervised medical image segmentation (SSMIS). Secondly, to get the clustered features across different graphs, we integrate both pairwise affinities between local image features and raw features as inputs. Extensive experimental results on three standard benchmarks show that the proposed GraphCL algorithm outperforms state-of-the-art semi-supervised medical image segmentation methods.

Updated: 2024-11-20 09:24:46

标题: GraphCL：基于图的半监督医学图像分割聚类

摘要: 半监督学习（SSL）在医学图像分割（MIS）领域取得了显著进展，特别是在有限标记数据的情况下，极大地提高了数据利用效率。先前的方法主要集中在利用未标记数据的复杂训练策略上，但忽视了图结构信息的重要性。与现有方法不同，我们提出了一种基于图的聚类方法用于半监督医学图像分割（GraphCL），通过在统一的深度模型中联合建模图数据结构。所提出的GraphCL模型具有几个优势。首先，据我们所知，这是首个模型化数据结构信息用于半监督医学图像分割（SSMIS）的工作。其次，为了获得跨不同图的聚类特征，我们将局部图像特征和原始特征之间的成对亲和性整合为输入。对三个标准基准测试的大量实验结果表明，所提出的GraphCL算法优于现有半监督医学图像分割方法。

更新时间: 2024-11-20 09:24:46

领域: cs.CV,cs.AI,68T07, 92C55, 62H35,I.2.6; I.4.10; J.3

下载: http://arxiv.org/abs/2411.13147v1

MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

Meshes are the de facto 3D representation in the industry but are labor-intensive to produce. Recently, a line of research has focused on autoregressively generating meshes. This approach processes meshes into a sequence composed of vertices and then generates them vertex by vertex, similar to how a language model generates text. These methods have achieved some success but still struggle to generate complex meshes. One primary reason for this limitation is their inefficient tokenization methods. To address this issue, we introduce MeshAnything V2, an advanced mesh generation model designed to create Artist-Created Meshes that align precisely with specified shapes. A key innovation behind MeshAnything V2 is our novel Adjacent Mesh Tokenization (AMT) method. Unlike traditional approaches that represent each face using three vertices, AMT optimizes this by employing a single vertex wherever feasible, effectively reducing the token sequence length by about half on average. This not only streamlines the tokenization process but also results in more compact and well-structured sequences, enhancing the efficiency of mesh generation. With these improvements, MeshAnything V2 effectively doubles the face limit compared to previous models, delivering superior performance without increasing computational costs. We will make our code and models publicly available. Project Page: https://buaacyw.github.io/meshanything-v2/

Updated: 2024-11-20 09:20:09

标题: MeshAnything V2: 艺术家创造的具有相邻网格标记的网格生成

摘要: 网格是行业中的三维表示，但生产成本高。最近，一系列研究专注于自回归生成网格。这种方法将网格处理成由顶点组成的序列，然后逐顶点生成，类似于语言模型生成文本。这些方法取得了一定成功，但仍然难以生成复杂的网格。这种限制的一个主要原因是它们低效的标记化方法。为了解决这个问题，我们引入了MeshAnything V2，这是一个先进的网格生成模型，旨在创建精确符合指定形状的艺术家创建的网格。MeshAnything V2背后的关键创新是我们的新颖的相邻网格标记化（AMT）方法。与传统方法不同，传统方法使用三个顶点表示每个面，AMT通过在可行的情况下使用单个顶点来优化这一过程，平均有效地将标记序列长度减半。这不仅简化了标记化过程，还产生了更紧凑和结构良好的序列，增强了网格生成的效率。通过这些改进，MeshAnything V2有效地将面数量限制增加了一倍，提供了优越的性能而不增加计算成本。我们将公开发布我们的代码和模型。项目页面：https://buaacyw.github.io/meshanything-v2/

更新时间: 2024-11-20 09:20:09

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2408.02555v2

CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models

Text-to-image diffusion models have emerged as powerful tools for generating high-quality images from textual descriptions. However, their increasing popularity has raised significant copyright concerns, as these models can be misused to reproduce copyrighted content without authorization. In response, recent studies have proposed various copyright protection methods, including adversarial perturbation, concept erasure, and watermarking techniques. However, their effectiveness and robustness against advanced attacks remain largely unexplored. Moreover, the lack of unified evaluation frameworks has hindered systematic comparison and fair assessment of different approaches. To bridge this gap, we systematize existing copyright protection methods and attacks, providing a unified taxonomy of their design spaces. We then develop CopyrightMeter, a unified evaluation framework that incorporates 17 state-of-the-art protections and 16 representative attacks. Leveraging CopyrightMeter, we comprehensively evaluate protection methods across multiple dimensions, thereby uncovering how different design choices impact fidelity, efficacy, and resilience under attacks. Our analysis reveals several key findings: (i) most protections (16/17) are not resilient against attacks; (ii) the "best" protection varies depending on the target priority; (iii) more advanced attacks significantly promote the upgrading of protections. These insights provide concrete guidance for developing more robust protection methods, while its unified evaluation protocol establishes a standard benchmark for future copyright protection research in text-to-image generation.

Updated: 2024-11-20 09:19:10

标题: 版权测量器：重新审视文本到图像模型中的版权保护

摘要: 文字到图像扩散模型已经成为从文字描述生成高质量图像的强大工具。然而，它们日益增长的流行度引起了重要的版权担忧，因为这些模型可以被滥用来未经授权地复制受版权保护的内容。作为回应，最近的研究提出了各种版权保护方法，包括对抗扰动、概念擦除和水印技术。然而，它们对高级攻击的有效性和稳健性仍然大部分未被探索。此外，缺乏统一的评估框架阻碍了对不同方法进行系统比较和公平评估。为了填补这一空白，我们系统化现有的版权保护方法和攻击，提供了它们设计空间的统一分类。然后，我们开发了CopyrightMeter，一个统一的评估框架，包括17种最先进的保护方法和16种代表性攻击。利用CopyrightMeter，我们全面评估了多个维度上的保护方法，从而揭示了不同设计选择如何影响在攻击下的忠实度、效力和抗性。我们的分析揭示了几个关键发现：(i)大多数保护方法（16/17）对攻击不具有抗性；(ii)“最好”的保护方法取决于目标优先级；(iii)更高级的攻击显著促进了保护方法的升级。这些见解为开发更加稳健的保护方法提供了具体指导，同时其统一的评估协议为将来的文字到图像生成领域的版权保护研究建立了一个标准基准。

更新时间: 2024-11-20 09:19:10

领域: cs.CR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.13144v1

Securing Healthcare with Deep Learning: A CNN-Based Model for medical IoT Threat Detection

The increasing integration of the Internet of Medical Things (IoMT) into healthcare systems has significantly enhanced patient care but has also introduced critical cybersecurity challenges. This paper presents a novel approach based on Convolutional Neural Networks (CNNs) for detecting cyberattacks within IoMT environments. Unlike previous studies that predominantly utilized traditional machine learning (ML) models or simpler Deep Neural Networks (DNNs), the proposed model leverages the capabilities of CNNs to effectively analyze the temporal characteristics of network traffic data. Trained and evaluated on the CICIoMT2024 dataset, which comprises 18 distinct types of cyberattacks across a range of IoMT devices, the proposed CNN model demonstrates superior performance compared to previous state-of-the-art methods, achieving a perfect accuracy of 99% in binary, categorical, and multiclass classification tasks. This performance surpasses that of conventional ML models such as Logistic Regression, AdaBoost, DNNs, and Random Forests. These findings highlight the potential of CNNs to substantially improve IoMT cybersecurity, thereby ensuring the protection and integrity of connected healthcare systems.

Updated: 2024-11-20 09:11:56

标题: 使用深度学习保护医疗保健：基于CNN的医疗物联网威胁检测模型

摘要: 随着互联网医疗物联网（IoMT）在医疗系统中的日益整合，显著提升了患者护理水平，但也引入了关键的网络安全挑战。本文提出了一种基于卷积神经网络（CNNs）的新方法，用于检测IoMT环境中的网络攻击。与先前主要利用传统机器学习（ML）模型或简单的深度神经网络（DNNs）的研究不同，所提出的模型利用CNNs的能力有效分析网络流量数据的时间特征。在经过CICIoMT2024数据集训练和评估后，该数据集包含了18种不同类型的IoMT设备上的网络攻击，所提出的CNN模型在二进制、分类和多类分类任务中表现出比先前最先进方法更优异的性能，实现了99%的完美准确率。这一表现超过了传统的ML模型，如逻辑回归、AdaBoost、DNNs和随机森林。这些发现突显了CNNs在大幅提升IoMT网络安全方面的潜力，从而确保连接的医疗系统的保护和完整性。

更新时间: 2024-11-20 09:11:56

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23306v2

Long Term Memory: The Foundation of AI Self-Evolution

Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning, achieving human-level performance in various tasks. Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models. While training stronger models is important, enabling models to evolve during inference is equally crucial, a process we refer to as AI self-evolution. Unlike large-scale training, self-evolution may rely on limited data or interactions. Inspired by the columnar organization of the human cerebral cortex, we hypothesize that AI models could develop cognitive abilities and build internal representations through iterative interactions with their environment. To achieve this, models need long-term memory (LTM) to store and manage processed interaction data. LTM supports self-evolution by representing diverse experiences across environments and agents. In this report, we explore AI self-evolution and its potential to enhance models during inference. We examine LTM's role in lifelong learning, allowing models to evolve based on accumulated interactions. We outline the structure of LTM and the systems needed for effective data retention and representation. We also classify approaches for building personalized models with LTM data and show how these models achieve self-evolution through interaction. Using LTM, our multi-agent framework OMNE achieved first place on the GAIA benchmark, demonstrating LTM's potential for AI self-evolution. Finally, we present a roadmap for future research, emphasizing the importance of LTM for advancing AI technology and its practical applications.

Updated: 2024-11-20 09:08:14

标题: 长期记忆：AI自我进化的基础

摘要: 大型语言模型（LLMs）如GPTs，在大规模数据集上进行训练，展现出在语言理解、推理和规划方面的令人印象深刻的能力，在各种任务中达到了人类水平的表现。大多数研究集中于通过在日益庞大的数据集上进行训练来增强这些模型，构建更强大的基础模型。虽然训练更强大的模型很重要，但在推理过程中使模型能够演化同样至关重要，这个过程我们称之为AI自我演化。与大规模训练不同，自我演化可能依赖于有限的数据或互动。受到人类大脑皮层的柱状组织的启发，我们假设AI模型可以通过与环境的迭代互动来发展认知能力并建立内部表征。为了实现这一点，模型需要长期记忆（LTM）来存储和管理处理过的互动数据。LTM通过代表跨环境和代理的多样经验来支持自我演化。在本报告中，我们探讨了AI自我演化及其在推理过程中提升模型的潜力。我们研究了LTM在终身学习中的作用，使模型能够基于累积的互动来演化。我们概述了LTM的结构以及为有效数据保留和表征所需的系统。我们还对使用LTM数据构建个性化模型的方法进行了分类，并展示了这些模型如何通过互动实现自我演化。使用LTM，我们的多代理框架OMNE在GAIA基准测试中获得了第一名，展示了LTM在AI自我演化方面的潜力。最后，我们提出了未来研究的路线图，强调了LTM对推进AI技术及其实际应用的重要性。

更新时间: 2024-11-20 09:08:14

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.15665v3

SAGA: Synthetic Audit Log Generation for APT Campaigns

With the increasing sophistication of Advanced Persistent Threats (APTs), the demand for effective detection and mitigation strategies and methods has escalated. Program execution leaves traces in the system audit log, which can be analyzed to detect malicious activities. However, collecting and analyzing large volumes of audit logs over extended periods is challenging, further compounded by insufficient labeling that hinders their usability. Addressing these challenges, this paper introduces SAGA (Synthetic Audit log Generation for APT campaigns), a novel approach for generating find-grained labeled synthetic audit logs that mimic real-world system logs while embedding stealthy APT attacks. SAGA generates configurable audit logs for arbitrary duration, blending benign logs from normal operations with malicious logs based on the definitions the MITRE ATT\&CK framework. Malicious audit logs follow an APT lifecycle, incorporating various attack techniques at each stage. These synthetic logs can serve as benchmark datasets for training machine learning models and assessing diverse APT detection methods. To demonstrate the usefulness of synthetic audit logs, we ran established baselines of event-based technique hunting and APT campaign detection using various synthetic audit logs. In addition, we show that a deep learning model trained on synthetic audit logs can detect previously unseen techniques within audit logs.

Updated: 2024-11-20 09:06:46

标题: SAGA：针对高级持续性威胁活动的合成审计日志生成

摘要: 随着先进持续性威胁（APTs）日益复杂化，对有效检测和缓解策略和方法的需求不断增加。程序执行在系统审计日志中留下痕迹，可以通过分析来检测恶意活动。然而，收集和分析大量的审计日志在较长时间内是具有挑战性的，而且由于标记不足而阻碍了它们的可用性。为了解决这些挑战，本文介绍了SAGA（合成审计日志生成用于 APT 攻击的方法），这是一种新颖的方法，用于生成模拟真实系统日志的细粒度标记合成审计日志，同时嵌入隐秘的 APT 攻击。SAGA 为任意持续时间生成可配置的审计日志，将正常操作的良性日志与基于 MITRE ATT\&CK 框架定义的恶意日志混合。恶意审计日志遵循 APT 生命周期，在每个阶段包含各种攻击技术。这些合成日志可以作为训练机器学习模型和评估各种 APT 检测方法的基准数据集。为了展示合成审计日志的有用性，我们使用各种合成审计日志运行了基于事件的技术猎杀和 APT 攻击检测的已建立的基线。此外，我们展示了在合成审计日志上训练的深度学习模型可以检测审计日志中以前未见的技术。

更新时间: 2024-11-20 09:06:46

领域: cs.CR

下载: http://arxiv.org/abs/2411.13138v1

Domain Adaptive Unfolded Graph Neural Networks

Over the last decade, graph neural networks (GNNs) have made significant progress in numerous graph machine learning tasks. In real-world applications, where domain shifts occur and labels are often unavailable for a new target domain, graph domain adaptation (GDA) approaches have been proposed to facilitate knowledge transfer from the source domain to the target domain. Previous efforts in tackling distribution shifts across domains have mainly focused on aligning the node embedding distributions generated by the GNNs in the source and target domains. However, as the core part of GDA approaches, the impact of the underlying GNN architecture has received limited attention. In this work, we explore this orthogonal direction, i.e., how to facilitate GDA with architectural enhancement. In particular, we consider a class of GNNs that are designed explicitly based on optimization problems, namely unfolded GNNs (UGNNs), whose training process can be represented as bi-level optimization. Empirical and theoretical analyses demonstrate that when transferring from the source domain to the target domain, the lower-level objective value generated by the UGNNs significantly increases, resulting in an increase in the upper-level objective as well. Motivated by this observation, we propose a simple yet effective strategy called cascaded propagation (CP), which is guaranteed to decrease the lower-level objective value. The CP strategy is widely applicable to general UGNNs, and we evaluate its efficacy with three representative UGNN architectures. Extensive experiments on five real-world datasets demonstrate that the UGNNs integrated with CP outperform state-of-the-art GDA baselines.

Updated: 2024-11-20 09:05:36

标题: 领域自适应展开图神经网络

摘要: 在过去的十年中，图神经网络（GNNs）在许多图机器学习任务中取得了显著进展。在现实世界的应用中，领域转移经常发生，新目标领域通常无法获得标签，因此提出了图领域自适应（GDA）方法，以促进知识从源领域到目标领域的转移。先前在解决跨领域分布转移方面的努力主要集中在调整由GNNs在源领域和目标领域生成的节点嵌入分布上。然而，作为GDA方法的核心部分，基础GNN架构的影响受到了有限关注。在这项工作中，我们探索了这个正交方向，即如何通过架构增强来促进GDA。特别地，我们考虑一类基于优化问题明确设计的GNNs，即展开的GNNs（UGNNs），其训练过程可以表示为双层优化。实证和理论分析表明，当从源领域转移到目标领域时，UGNNs生成的较低级目标值显著增加，从而导致上层目标值也增加。受到这一观察的启发，我们提出了一个简单而有效的策略，称为级联传播（CP），它保证降低较低级目标值。CP策略广泛适用于一般的UGNNs，并且我们使用三种代表性的UGNN架构评估其有效性。对五个真实世界数据集的广泛实验表明，集成了CP的UGNNs优于最先进的GDA基线。

更新时间: 2024-11-20 09:05:36

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2411.13137v1

Derivatives of Stochastic Gradient Descent in parametric optimization

We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD. This enables us to establish that the derivatives of SGD converge to the derivative of the solution mapping in terms of mean squared error whenever the objective is strongly convex. Specifically, we demonstrate that with constant step-sizes, these derivatives stabilize within a noise ball centered at the solution derivative, and that with vanishing step-sizes they exhibit $O(\log(k)^2 / k)$ convergence rates. Additionally, we prove exponential convergence in the interpolation regime. Our theoretical findings are illustrated by numerical experiments on synthetic tasks.

Updated: 2024-11-20 09:04:29

标题: 随机梯度下降在参数优化中的导数

摘要: 我们考虑随机优化问题，其中目标函数取决于某个参数，这在超参数优化中很常见。我们研究了随机梯度下降（SGD）的迭代对该参数的导数行为，并表明它们受到不完全SGD递归在不同目标函数上的驱动，受原始SGD的收敛扰动。这使我们能够建立SGD的导数收敛于均方误差的解映射导数，只要目标是强凸的。具体而言，我们证明了在恒定步长下，这些导数在解导数的噪声球内稳定，而在步长趋于零时，它们表现出O（log（k）^2 / k）的收敛速率。此外，我们证明了插值区域的指数收敛。我们的理论发现通过对合成任务的数值实验进行了说明。

更新时间: 2024-11-20 09:04:29

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.15894v2

ZNorm: Z-Score Gradient Normalization Accelerating Skip-Connected Network Training without Architectural Modification

The rapid advancements in deep learning necessitate better training methods for deep neural networks (DNNs). As models grow in complexity, vanishing and exploding gradients impede performance, particularly in skip-connected architectures like Deep Residual Networks. We propose Z-Score Normalization for Gradient Descent (ZNorm), an innovative technique that adjusts only the gradients without modifying the network architecture to accelerate training and improve model performance. ZNorm normalizes the overall gradients, providing consistent gradient scaling across layers, effectively reducing the risks of vanishing and exploding gradients and achieving superior performance. Extensive experiments on CIFAR-10 and medical datasets confirm that ZNorm consistently outperforms existing methods under the same experimental settings. In medical imaging applications, ZNorm significantly enhances tumor prediction and segmentation accuracy, underscoring its practical utility. These findings highlight ZNorm's potential as a robust and versatile tool for enhancing the training and effectiveness of deep neural networks, especially in skip-connected architectures, across various applications.

Updated: 2024-11-20 08:54:05

标题: ZNorm：Z分数梯度归一化加速跳接网络训练，无需架构修改

摘要: 深度学习的快速发展需要更好的训练方法来训练深度神经网络（DNNs）。随着模型复杂度的增加，消失和爆炸梯度阻碍了性能，特别是在类似Deep Residual Networks这样的跳跃连接架构中。我们提出了Z-Score归一化梯度下降（ZNorm），这是一种创新的技术，它仅调整梯度而不修改网络架构，以加速训练并改善模型性能。ZNorm对整体梯度进行归一化处理，提供了一致的梯度缩放跨层，有效减少了消失和爆炸梯度的风险，并实现了卓越的性能。对CIFAR-10和医学数据集进行的大量实验证实，ZNorm在相同的实验设置下始终优于现有方法。在医学影像应用中，ZNorm显著提高了肿瘤预测和分割的准确性，突显了其实际效用。这些发现突出了ZNorm作为一种强大和多功能工具，可用于增强深度神经网络的训练和效果，特别是在跳跃连接架构中，在各种应用中的潜力。

更新时间: 2024-11-20 08:54:05

领域: cs.LG

下载: http://arxiv.org/abs/2408.01215v5

Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration

The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost ``white box'' network architecture with high interpretability. In this architecture, only the predefined component of the proximal operator, known as a proximal network, needs manual configuration, enabling the network to automatically extract intrinsic image priors in a data-driven manner. In current deep unfolding methods, such a proximal network is generally designed as a CNN architecture, whose necessity has been proven by a recent theory. That is, CNN structure substantially delivers the translational invariant image prior, which is the most universally possessed structural prior across various types of images. However, standard CNN-based proximal networks have essential limitations in capturing the rotation symmetry prior, another universal structural prior underlying general images. This leaves a large room for further performance improvement in deep unfolding approaches. To address this issue, this study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework. Especially, we deduce, for the first time, the theoretical equivariant error for such a designed proximal network with arbitrary layers under arbitrary rotation degrees. This analysis should be the most refined theoretical conclusion for such error evaluation to date and is also indispensable for supporting the rationale behind such networks with intrinsic interpretability requirements.

Updated: 2024-11-20 08:44:06

标题: 旋转等变近端算子在图像恢复深度展开方法中的应用

摘要: 深度展开方法在计算机视觉任务中引起了相当大的关注，它将传统的图像处理建模方式与更近期的深度学习技术很好地连接起来。具体来说，通过在每个实现步骤中的算法操作符与每一层中的网络模块之间建立直接对应关系，可以合理构建一个几乎是“白盒子”网络架构，具有很高的可解释性。在这种架构中，只有预定义的近端算子组件，即称为近端网络，需要手动配置，使网络能够以数据驱动的方式自动提取内在的图像先验信息。在当前的深度展开方法中，这样的近端网络通常被设计为CNN架构，其必要性已经被最近的理论证明。也就是说，CNN结构实质上提供了平移不变的图像先验，这是各种类型图像中最普遍存在的结构先验。然而，基于标准CNN的近端网络在捕捉旋转对称先验方面存在重要限制，这是一种普遍存在于一般图像中的结构先验。这为深度展开方法的进一步性能提升留下了很大的空间。为了解决这个问题，本研究努力提出了一个高精度的旋转等变近端网络，有效地将旋转对称先验嵌入到深度展开框架中。特别是，我们首次推导了设计的近端网络在任意旋转角度下任意层中的等变误差的理论推导。这种分析应该是迄今为止对这种误差评估最精细的理论结论，也是支持具有内在可解释性要求的这种网络背后的理论基础的不可或缺的。

更新时间: 2024-11-20 08:44:06

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.15701v2

Select High-Level Features: Efficient Experts from a Hierarchical Classification Network

This study introduces a novel expert generation method that dynamically reduces task and computational complexity without compromising predictive performance. It is based on a new hierarchical classification network topology that combines sequential processing of generic low-level features with parallelism and nesting of high-level features. This structure allows for the innovative extraction technique: the ability to select only high-level features of task-relevant categories. In certain cases, it is possible to skip almost all unneeded high-level features, which can significantly reduce the inference cost and is highly beneficial in resource-constrained conditions. We believe this method paves the way for future network designs that are lightweight and adaptable, making them suitable for a wide range of applications, from compact edge devices to large-scale clouds. In terms of dynamic inference our methodology can achieve an exclusion of up to 88.7\,\% of parameters and 73.4\,\% fewer giga-multiply accumulate (GMAC) operations, analysis against comparative baselines showing an average reduction of 47.6\,\% in parameters and 5.8\,\% in GMACs across the cases we evaluated.

Updated: 2024-11-20 08:42:04

标题: 选择高级特征：来自层次分类网络的高效专家

摘要: 这项研究介绍了一种新颖的专家生成方法，可以动态降低任务和计算复杂度，而不会影响预测性能。它基于一种新的分层分类网络拓扑结构，将通用低级特征的顺序处理与高级特征的并行和嵌套相结合。这种结构允许创新的提取技术：选择只有任务相关类别的高级特征。在某些情况下，几乎可以跳过所有不需要的高级特征，这可以显著降低推断成本，在资源受限条件下非常有益。我们相信这种方法为未来轻量级和适应性网络设计铺平了道路，使它们适用于从紧凑边缘设备到大规模云计算的各种应用。在动态推断方面，我们的方法可以排除高达88.7%的参数和73.4%的乘加操作，与比较基线相比，在我们评估的案例中，平均减少了47.6%的参数和5.8%的乘加操作。

更新时间: 2024-11-20 08:42:04

领域: cs.LG

下载: http://arxiv.org/abs/2403.05601v2

Virtual Staining of Label-Free Tissue in Imaging Mass Spectrometry

Imaging mass spectrometry (IMS) is a powerful tool for untargeted, highly multiplexed molecular mapping of tissue in biomedical research. IMS offers a means of mapping the spatial distributions of molecular species in biological tissue with unparalleled chemical specificity and sensitivity. However, most IMS platforms are not able to achieve microscopy-level spatial resolution and lack cellular morphological contrast, necessitating subsequent histochemical staining, microscopic imaging and advanced image registration steps to enable molecular distributions to be linked to specific tissue features and cell types. Here, we present a virtual histological staining approach that enhances spatial resolution and digitally introduces cellular morphological contrast into mass spectrometry images of label-free human tissue using a diffusion model. Blind testing on human kidney tissue demonstrated that the virtually stained images of label-free samples closely match their histochemically stained counterparts (with Periodic Acid-Schiff staining), showing high concordance in identifying key renal pathology structures despite utilizing IMS data with 10-fold larger pixel size. Additionally, our approach employs an optimized noise sampling technique during the diffusion model's inference process to reduce variance in the generated images, yielding reliable and repeatable virtual staining. We believe this virtual staining method will significantly expand the applicability of IMS in life sciences and open new avenues for mass spectrometry-based biomedical research.

Updated: 2024-11-20 08:30:11

标题: 成像质谱中无标记组织的虚拟染色

摘要: 成像质谱（IMS）是生物医学研究中用于非靶向、高度多重分子组织映射的强大工具。 IMS提供了一种以无与伦比的化学特异性和灵敏度映射生物组织中分子种类的空间分布的方法。然而，大多数IMS平台无法实现显微镜级空间分辨率，并且缺乏细胞形态对比，需要随后的组织化学染色、显微成像和高级图像配准步骤，以便将分子分布与特定组织特征和细胞类型联系起来。在这里，我们提出了一种虚拟组织染色方法，利用扩散模型对无标记人体组织的质谱成像进行增强空间分辨率，并数字化引入细胞形态对比。对人肾组织进行盲测表明，无标记样本的虚拟染色图像与其组织化学染色对照图像（使用Periodic Acid-Schiff染色）高度吻合，尽管利用了像素尺寸大10倍的IMS数据，但在识别关键肾脏病理结构方面显示出高一致性。此外，我们的方法在扩散模型的推断过程中采用了优化的噪声采样技术，以减少生成图像中的方差，产生可靠且可重复的虚拟染色。我们相信这种虚拟染色方法将显著扩展IMS在生命科学中的适用性，并为基于质谱的生物医学研究开辟新的途径。

更新时间: 2024-11-20 08:30:11

领域: cs.CV,cs.LG,physics.med-ph,physics.optics

下载: http://arxiv.org/abs/2411.13120v1

Reachability Analysis of the Domain Name System

The high complexity of DNS poses unique challenges for ensuring its security and reliability. Despite continuous advances in DNS testing, monitoring, and verification, protocol-level defects still give rise to numerous bugs and attacks. In this paper, we provide the first decision procedure for the DNS verification problem, establishing its complexity as $\mathsf{2ExpTime}$, which was previously unknown. We begin by formalizing the semantics of DNS as a system of recursive communicating processes extended with timers and an infinite message alphabet. We provide an algebraic abstraction of the alphabet with finitely many equivalence classes, using the subclass of semigroups that recognize positive prefix-testable languages. We then introduce a novel generalization of bisimulation for labelled transition systems, weaker than strong bisimulation, to show that our abstraction is sound and complete. Finally, using this abstraction, we reduce the DNS verification problem to the verification problem for pushdown systems. To show the expressiveness of our framework, we model two of the most prominent attack vectors on DNS, namely amplification attacks and rewrite blackholing.

Updated: 2024-11-20 08:27:16

标题: 域名系统的可达性分析

摘要: DNS的高复杂性给确保其安全性和可靠性带来了独特的挑战。尽管DNS测试、监控和验证不断取得进展，但协议级别的缺陷仍然导致大量的漏洞和攻击。本文提供了DNS验证问题的第一个决策过程，将其复杂性确定为$\mathsf{2ExpTime}$，这是以前未知的。我们首先将DNS的语义形式化为一个具有定时器和无限消息字母表的递归通信过程系统。我们使用识别正前缀可测语言的半群子类对字母表进行了代数抽象，将其划分为有限数量的等价类。然后，我们引入了一种新颖的带有标记的转换系统的双模拟的泛化，弱于强双模拟，以证明我们的抽象是完备的。最后，利用这种抽象，我们将DNS验证问题简化为下推系统的验证问题。为了展示我们框架的表现力，我们对DNS上两种最显著的攻击向量进行了建模，即放大攻击和重写黑洞。

更新时间: 2024-11-20 08:27:16

领域: cs.CR,cs.FL

下载: http://arxiv.org/abs/2411.10188v2

Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform accurate sparse inference. In this paper, we investigate sparse inference and learning in SAEs through the lens of sparse coding. Specifically, we show that SAEs perform amortised sparse inference with a computationally restricted encoder and, using compressed sensing theory, we prove that this mapping is inherently insufficient for accurate sparse inference, even in solvable cases. Building on this theory, we empirically explore conditions where more sophisticated sparse inference methods outperform traditional SAE encoders. Our key contribution is the decoupling of the encoding and decoding processes, which allows for a comparison of various sparse encoding strategies. We evaluate these strategies on two dimensions: alignment with true underlying sparse features and correct inference of sparse codes, while also accounting for computational costs during training and inference. Our results reveal that substantial performance gains can be achieved with minimal increases in compute cost. We demonstrate that this generalises to SAEs applied to large language models (LLMs), where advanced encoders achieve similar interpretability. This work opens new avenues for understanding neural network representations and offers important implications for improving the tools we use to analyse the activations of large language models.

Updated: 2024-11-20 08:21:53

标题: 计算稀疏自动编码器中的最佳推断和可证明的摊销差距

摘要: 最近的研究表明，使用稀疏自编码器（SAEs）在神经网络表示中发现可解释特征具有潜力。然而，SAEs中简单的线性-非线性编码机制限制了它们进行准确稀疏推断的能力。在本文中，我们通过稀疏编码的视角研究了SAEs中的稀疏推断和学习。具体来说，我们展示了SAEs通过计算受限的编码器进行平均稀疏推断，并利用压缩感知理论证明了即使在可解决的情况下，这种映射在准确的稀疏推断方面是不足的。基于这一理论，我们在实证上探讨了更复杂的稀疏推断方法在超越传统SAE编码器方面的条件。我们的关键贡献是解耦编码和解码过程，从而允许比较各种稀疏编码策略。我们评估这些策略在两个维度上：与真实潜在稀疏特征的对齐和正确推断稀疏编码，同时在训练和推断过程中考虑计算成本。我们的结果表明，可以通过略微增加计算成本实现显著的性能提升。我们证明这种普遍适用于应用于大语言模型（LLMs）的SAEs，其中先进的编码器实现了类似的可解释性。这项工作开辟了理解神经网络表示的新途径，并为改进分析大语言模型激活的工具提供了重要启示。

更新时间: 2024-11-20 08:21:53

领域: cs.LG

下载: http://arxiv.org/abs/2411.13117v1

Extended Neural Contractive Dynamical Systems: On Multiple Tasks and Riemannian Safety Regions

Stability guarantees are crucial when ensuring that a fully autonomous robot does not take undesirable or potentially harmful actions. We recently proposed the Neural Contractive Dynamical Systems (NCDS), which is a neural network architecture that guarantees contractive stability. With this, learning-from-demonstrations approaches can trivially provide stability guarantees. However, our early work left several unanswered questions, which we here address. Beyond providing an in-depth explanation of NCDS, this paper extends the framework with more careful regularization, a conditional variant of the framework for handling multiple tasks, and an uncertainty-driven approach to latent obstacle avoidance. Experiments verify that the developed system has the flexibility of ordinary neural networks while providing the stability guarantees needed for autonomous robotics.

Updated: 2024-11-20 08:20:35

标题: 扩展神经收缩动力系统：关于多任务和黎曼安全区域

摘要: 稳定性保证在确保完全自主机器人不会采取不良或潜在有害行动时至关重要。我们最近提出了神经收缩动力系统（NCDS），这是一种神经网络架构，可以保证收缩稳定性。有了这个，从示范中学习方法可以轻松提供稳定性保证。然而，我们早期的工作留下了一些未解答的问题，我们在这里进行了解决。除了提供对NCDS的深入解释外，本文通过更谨慎的正则化、处理多个任务的条件变体框架以及基于不确定性的潜在障碍物回避方法扩展了该框架。实验证实，开发的系统具有普通神经网络的灵活性，同时提供了自主机器人所需的稳定性保证。

更新时间: 2024-11-20 08:20:35

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2411.11405v2

Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning

Manipulating the interaction trajectories between the intelligent agent and the environment can control the agent's training and behavior, exposing the potential vulnerabilities of reinforcement learning (RL). For example, in Cyber-Physical Systems (CPS) controlled by RL, the attacker can manipulate the actions of the adopted RL to other actions during the training phase, which will lead to bad consequences. Existing work has studied action-manipulation attacks in tabular settings, where the states and actions are discrete. As seen in many up-and-coming RL applications, such as autonomous driving, continuous action space is widely accepted, however, its action-manipulation attacks have not been thoroughly investigated yet. In this paper, we consider this crucial problem in both white-box and black-box scenarios. Specifically, utilizing the knowledge derived exclusively from trajectories, we propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation. Additionally, we demonstrate that for an agent whose dynamic regret is sub-linearly related to the total number of steps, LCBT can teach the agent to converge to target policies with only sublinear attack cost, i.e., $O\left(\mathcal{R}(T) + MH^3K^E\log (MT)\right)(0<E<1)$, where $H$ is the number of steps per episode, $K$ is the total number of episodes, $T=KH$ is the total number of steps, $M$ is the number of subspaces divided in the state space, and $\mathcal{R}(T)$ is the bound of the RL algorithm's regret. We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.

Updated: 2024-11-20 08:20:29

标题: 可证明有效的针对连续强化学习的动作操纵攻击

摘要: 操纵智能代理与环境之间的交互轨迹可以控制代理的训练和行为，暴露了强化学习（RL）的潜在漏洞。例如，在由RL控制的网络物理系统（CPS）中，攻击者可以在训练阶段操纵采用的RL的动作到其他动作，这将导致不良后果。现有工作已经研究了表格设置中的动作操纵攻击，其中状态和动作是离散的。正如在许多新兴的RL应用中所看到的，如自动驾驶，连续动作空间被广泛接受，然而，其动作操纵攻击尚未得到彻底调查。在本文中，我们考虑了这个关键问题在白盒和黑盒场景中。具体来说，利用 exclusively来自轨迹的知识，我们提出了一种名为LCBT的黑盒攻击算法，该算法使用蒙特卡洛树搜索方法进行高效的动作搜索和操纵。此外，我们证明了对于一个动态后悔与总步数亚线性相关的代理，LCBT可以教导代理只用次线性攻击成本就收敛到目标策略，即$O(\mathcal{R}(T) + MH^3K^E\log(MT))(0<E<1)$，其中$H$是每一集的步数，$K$是总集数，$T=KH$是总步数，$M$是在状态空间中划分的子空间的数量，$\mathcal{R}(T)$是强化学习算法的后悔上限。我们在连续设置中对三种攻击性算法：DDPG、PPO和TD3进行了我们提出的攻击方法，显示出良好的攻击性能。

更新时间: 2024-11-20 08:20:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.13116v1

Deep-Learning-Aided Alternating Least Squares for Tensor CP Decomposition and Its Application to Massive MIMO Channel Estimation

CANDECOMP/PARAFAC (CP) decomposition is the mostly used model to formulate the received tensor signal in a massive MIMO system, as the receiver generally sums the components from different paths or users. To achieve accurate and low-latency channel estimation, good and fast CP decomposition (CPD) algorithms are desired. The CP alternating least squares (CPALS) is the workhorse algorithm for calculating the CPD. However, its performance depends on the initializations, and good starting values can lead to more efficient solutions. Existing initialization strategies are decoupled from the CPALS and are not necessarily favorable for solving the CPD. This paper proposes a deep-learning-aided CPALS (DL-CPALS) method that uses a deep neural network (DNN) to generate favorable initializations. The proposed DL-CPALS integrates the DNN and CPALS to a model-based deep learning paradigm, where it trains the DNN to generate an initialization that facilitates fast and accurate CPD. Moreover, benefiting from the CP low-rankness, the proposed method is trained using noisy data and does not require paired clean data. The proposed DL-CPALS is applied to millimeter wave MIMO-OFDM channel estimation. Experimental results demonstrate the significant improvements of the proposed method in terms of both speed and accuracy for CPD and channel estimation.

Updated: 2024-11-20 08:19:15

标题: 深度学习辅助的交替最小二乘法用于张量CP分解及其在大规模MIMO信道估计中的应用

摘要: CANDECOMP/PARAFAC（CP）分解是在大规模MIMO系统中制定接收张量信号的最常用模型，因为接收器通常会将来自不同路径或用户的分量相加。为了实现准确和低延迟的信道估计，需要良好且快速的CP分解（CPD）算法。CP交替最小二乘（CPALS）是计算CPD的主要算法。然而，其性能取决于初始化，良好的初始值可以导致更高效的解决方案。现有的初始化策略与CPALS是解耦的，并不一定有利于解决CPD。本文提出了一种深度学习辅助的CPALS（DL-CPALS）方法，该方法使用深度神经网络（DNN）生成有利的初始值。提出的DL-CPALS将DNN和CPALS集成到基于模型的深度学习范式中，其中训练DNN生成一个初始化，以促进快速和准确的CPD。此外，由于CP的低秩特性，提出的方法使用嘈杂数据进行训练，不需要配对的干净数据。提出的DL-CPALS应用于毫米波MIMO-OFDM信道估计。实验结果表明，提出的方法在CPD和信道估计的速度和准确性方面取得了显著的改进。

更新时间: 2024-11-20 08:19:15

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2305.13947v2

A Deep Learning Approach to Predict the Fall [of Price] of Cryptocurrency Long Before its Actual Fall

In modern times, the cryptocurrency market is one of the world's most rapidly rising financial markets. The cryptocurrency market is regarded to be more volatile and illiquid than traditional markets such as equities, foreign exchange, and commodities. The risk of this market creates an uncertain condition among the investors. The purpose of this research is to predict the magnitude of the risk factor of the cryptocurrency market. Risk factor is also called volatility. Our approach will assist people who invest in the cryptocurrency market by overcoming the problems and difficulties they experience. Our approach starts with calculating the risk factor of the cryptocurrency market from the existing parameters. In twenty elements of the cryptocurrency market, the risk factor has been predicted using different machine learning algorithms such as CNN, LSTM, BiLSTM, and GRU. All of the models have been applied to the calculated risk factor parameter. A new model has been developed to predict better than the existing models. Our proposed model gives the highest RMSE value of 1.3229 and the lowest RMSE value of 0.0089. Following our model, it will be easier for investors to trade in complicated and challenging financial assets like bitcoin, Ethereum, dogecoin, etc. Where the other existing models, the highest RMSE was 14.5092, and the lower was 0.02769. So, the proposed model performs much better than models with proper generalization. Using our approach, it will be easier for investors to trade in complicated and challenging financial assets like Bitcoin, Ethereum, and Dogecoin.

Updated: 2024-11-20 08:09:35

标题: 一种深度学习方法来预测加密货币价格的下跌【在实际下跌之前】

摘要: 在现代，加密货币市场是世界上增长最迅速的金融市场之一。加密货币市场被认为比传统市场如股票、外汇和大宗商品更具波动性和流动性不足。这个市场的风险造成了投资者之间的不确定条件。这项研究的目的是预测加密货币市场的风险因素的大小。风险因素也被称为波动性。我们的方法将帮助那些在加密货币市场投资的人克服他们遇到的问题和困难。我们的方法从现有参数中计算加密货币市场的风险因素开始。在加密货币市场的二十个元素中，使用不同的机器学习算法如CNN、LSTM、BiLSTM和GRU预测了风险因素。所有模型都应用在计算的风险因素参数上。一个新模型已经被开发出来以比现有模型更好地进行预测。我们提出的模型给出了最高的RMSE值为1.3229和最低的RMSE值为0.0089。遵循我们的模型，投资者将更容易在像比特币、以太坊、狗狗币等复杂且具有挑战性的金融资产中进行交易。而其他现有模型中，最高的RMSE为14.5092，最低为0.02769。因此，提出的模型比具有适当泛化的模型表现得更好。使用我们的方法，投资者将更容易在像比特币、以太坊和狗狗币这样的复杂和具有挑战性的金融资产中进行交易。

更新时间: 2024-11-20 08:09:35

领域: q-fin.ST,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.13615v1

TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection

Time series anomaly detection aims to identify unusual patterns in data or deviations from systems' expected behavior. The reconstruction-based methods are the mainstream in this task, which learn point-wise representation via unsupervised learning. However, the unlabeled anomaly points in training data may cause these reconstruction-based methods to learn and reconstruct anomalous data, resulting in the challenge of capturing normal patterns. In this paper, we propose a time series anomaly detection method based on implicit neural representation (INR) reconstruction, named TSINR, to address this challenge. Due to the property of spectral bias, TSINR enables prioritizing low-frequency signals and exhibiting poorer performance on high-frequency abnormal data. Specifically, we adopt INR to parameterize time series data as a continuous function and employ a transformer-based architecture to predict the INR of given data. As a result, the proposed TSINR method achieves the advantage of capturing the temporal continuity and thus is more sensitive to discontinuous anomaly data. In addition, we further design a novel form of INR continuous function to learn inter- and intra-channel information, and leverage a pre-trained large language model to amplify the intense fluctuations in anomalies. Extensive experiments demonstrate that TSINR achieves superior overall performance on both univariate and multivariate time series anomaly detection benchmarks compared to other state-of-the-art reconstruction-based methods. Our codes are available.

Updated: 2024-11-20 08:04:43

标题: TSINR:通过隐式神经表示捕捉时间序列异常检测的时间连续性

摘要: 时间序列异常检测旨在识别数据中的异常模式或与系统预期行为的偏差。基于重建的方法是这一任务中的主流方法，通过无监督学习学习逐点表示。然而，在训练数据中标记为异常的未标记点可能导致这些基于重建的方法学习和重建异常数据，从而面临捕捉正常模式的挑战。在本文中，我们提出了一种基于隐式神经表示（INR）重建的时间序列异常检测方法，命名为TSINR，以解决这一挑战。由于谱偏差的特性，TSINR能够优先考虑低频信号，并在高频异常数据上表现较差。具体而言，我们采用INR对时间序列数据进行参数化，形成连续函数，并采用基于变压器的架构预测给定数据的INR。因此，所提出的TSINR方法具有捕捉时间连续性的优势，因此对不连续的异常数据更为敏感。此外，我们进一步设计了一种新形式的INR连续函数，以学习通道内和通道间信息，并利用预训练的大型语言模型放大异常中的剧烈波动。大量实验证明，与其他最先进的基于重建的方法相比，TSINR在单变量和多变量时间序列异常检测基准上实现了优越的整体性能。我们的代码可用。

更新时间: 2024-11-20 08:04:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.11641v2

DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV

To address communication latency issues, the Third Generation Partnership Project (3GPP) has defined Cellular-Vehicle to Everything (C-V2X) technology, which includes Vehicle-to-Vehicle (V2V) communication for direct vehicle-to-vehicle communication. However, this method requires vehicles to autonomously select communication resources based on the Semi-Persistent Scheduling (SPS) protocol, which may lead to collisions due to different vehicles sharing the same communication resources, thereby affecting communication effectiveness. Non-Orthogonal Multiple Access (NOMA) is considered a potential solution for handling large-scale vehicle communication, as it can enhance the Signal-to-Interference-plus-Noise Ratio (SINR) by employing Successive Interference Cancellation (SIC), thereby reducing the negative impact of communication collisions. When evaluating vehicle communication performance, traditional metrics such as reliability and transmission delay present certain contradictions. Introducing the new metric Age of Information (AoI) provides a more comprehensive evaluation of communication system. Additionally, to ensure service quality, user terminals need to possess high computational capabilities, which may lead to increased energy consumption, necessitating a trade-off between communication energy consumption and effectiveness. Given the complexity and dynamics of communication systems, Deep Reinforcement Learning (DRL) serves as an intelligent learning method capable of learning optimal strategies in dynamic environments. Therefore, this paper analyzes the effects of multi-priority queues and NOMA on AoI in the C-V2X vehicular communication system and proposes an energy consumption and AoI optimization method based on DRL. Finally, through comparative simulations with baseline methods, the proposed approach demonstrates its advances in terms of energy consumption and AoI.

Updated: 2024-11-20 07:59:35

标题: 基于DRL的C-V2X启用IoV中AoI和能耗的优化

摘要: 为了解决通信延迟问题，第三代合作伙伴计划（3GPP）定义了蜂窝-车辆互联技术（C-V2X），其中包括车辆间直接通信的车辆对车辆（V2V）通信。然而，这种方法要求车辆根据半持续调度（SPS）协议自主选择通信资源，这可能导致由于不同车辆共享相同通信资源而发生碰撞，从而影响通信效果。非正交多址接入（NOMA）被认为是处理大规模车辆通信的潜在解决方案，因为它可以通过采用连续干扰消除（SIC）来增强信干噪比（SINR），从而减少通信碰撞的负面影响。在评估车辆通信性能时，传统指标如可靠性和传输延迟存在一定矛盾。引入新的指标信息时代（AoI）提供了对通信系统更全面的评估。此外，为了确保服务质量，用户终端需要具备高计算能力，这可能导致能量消耗增加，需要在通信能量消耗和有效性之间进行权衡。考虑到通信系统的复杂性和动态性，深度强化学习（DRL）作为一种智能学习方法，能够在动态环境中学习最佳策略。因此，本文分析了多优先级队列和NOMA对C-V2X车辆通信系统中AoI的影响，并提出了基于DRL的能量消耗和AoI优化方法。最后，通过与基准方法的比较模拟，提出的方法在能量消耗和AoI方面展示了其进展。

更新时间: 2024-11-20 07:59:35

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2411.13104v1

Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control

Lyrics generation presents unique challenges, particularly in achieving precise syllable control while adhering to song form structures such as verses and choruses. Conventional line-by-line approaches often lead to unnatural phrasing, underscoring the need for more granular syllable management. We propose a framework for lyrics generation that enables multi-level syllable control at the word, phrase, line, and paragraph levels, aware of song form. Our approach generates complete lyrics conditioned on input text and song form, ensuring alignment with specified syllable constraints. Generated lyrics samples are available at: https://tinyurl.com/lyrics9999

Updated: 2024-11-20 07:57:58

标题: 意为：具有多级颗粒度音节计数控制的歌曲形式感知全歌文本到歌词生成

摘要: 歌词生成面临着独特的挑战，特别是在实现精确的音节控制的同时遵循诸如诗歌和合唱等歌曲结构。传统的逐行方法往往会导致不自然的措辞，强调了对更精细的音节管理的需求。我们提出了一个歌词生成框架，可以在单词、短语、句子和段落级别实现多级音节控制，同时考虑歌曲形式。我们的方法生成完整的歌词，以输入文本和歌曲形式为条件，确保与指定的音节约束对齐。生成的歌词样本可在以下链接找到：https://tinyurl.com/lyrics9999

更新时间: 2024-11-20 07:57:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.13100v1

Rethinking the Power of Timestamps for Robust Time Series Forecasting: A Global-Local Fusion Perspective

Time series forecasting has played a pivotal role across various industries, including finance, transportation, energy, healthcare, and climate. Due to the abundant seasonal information they contain, timestamps possess the potential to offer robust global guidance for forecasting techniques. However, existing works primarily focus on local observations, with timestamps being treated merely as an optional supplement that remains underutilized. When data gathered from the real world is polluted, the absence of global information will damage the robust prediction capability of these algorithms. To address these problems, we propose a novel framework named GLAFF. Within this framework, the timestamps are modeled individually to capture the global dependencies. Working as a plugin, GLAFF adaptively adjusts the combined weights for global and local information, enabling seamless collaboration with any time series forecasting backbone. Extensive experiments conducted on nine real-world datasets demonstrate that GLAFF significantly enhances the average performance of widely used mainstream forecasting models by 12.5%, surpassing the previous state-of-the-art method by 5.5%.

Updated: 2024-11-20 07:51:18

标题: 重新思考时间戳对稳健时间序列预测的作用：全局-局部融合视角

摘要: 时间序列预测在各个行业中发挥着关键作用，包括金融、交通运输、能源、医疗保健和气候。由于时间戳包含丰富的季节性信息，它们具有为预测技术提供强大全局指导的潜力。然而，现有作品主要关注局部观察，时间戳仅被视为可选补充，且未充分利用。当来自现实世界的数据受到污染时，缺乏全局信息将损害这些算法的强大预测能力。为解决这些问题，我们提出了一个名为GLAFF的新框架。在这个框架内，时间戳被单独建模以捕捉全局依赖关系。作为插件，GLAFF自适应调整全局和局部信息的组合权重，使其能够与任何时间序列预测主干无缝协作。对九个真实世界数据集进行的大量实验表明，GLAFF显著提升了广泛使用的主流预测模型的平均性能12.5%，超过先前的最先进方法5.5%。

更新时间: 2024-11-20 07:51:18

领域: cs.LG

下载: http://arxiv.org/abs/2409.18696v3

Incremental Label Distribution Learning with Scalable Graph Convolutional Networks

Label Distribution Learning (LDL) is an effective approach for handling label ambiguity, as it can analyze all labels at once and indicate the extent to which each label describes a given sample. Most existing LDL methods consider the number of labels to be static. However, in various LDL-specific contexts (e.g., disease diagnosis), the label count grows over time (such as the discovery of new diseases), a factor that existing methods overlook. Learning samples with new labels directly means learning all labels at once, thus wasting more time on the old labels and even risking overfitting the old labels. At the same time, learning new labels by the LDL model means reconstructing the inter-label relationships. How to make use of constructed relationships is also a crucial challenge. To tackle these challenges, we introduce Incremental Label Distribution Learning (ILDL), analyze its key issues regarding training samples and inter-label relationships, and propose Scalable Graph Label Distribution Learning (SGLDL) as a practical framework for implementing ILDL. Specifically, in SGLDL, we develop a New-label-aware Gradient Compensation Loss to speed up the learning of new labels and represent inter-label relationships as a graph to reduce the time required to reconstruct inter-label relationships. Experimental results on the classical LDL dataset show the clear advantages of unique algorithms and illustrate the importance of a dedicated design for the ILDL problem.

Updated: 2024-11-20 07:49:51

标题: 使用可扩展的图卷积网络进行增量标签分布学习

摘要: 标签分布学习（LDL）是处理标签模糊性的有效方法，因为它可以一次分析所有标签，并指示每个标签描述给定样本的程度。大多数现有的LDL方法认为标签数量是静态的。然而，在各种LDL特定的情境（例如疾病诊断）中，标签数量随时间增长（例如发现新疾病），这是现有方法忽视的因素。直接学习具有新标签的样本意味着一次学习所有标签，因此会浪费更多时间在旧标签上，甚至有过拟合旧标签的风险。同时，通过LDL模型学习新标签意味着重建标签间的关系。如何利用构建的关系也是一个关键挑战。为了解决这些挑战，我们引入了增量标签分布学习（ILDL），分析了关于训练样本和标签间关系的关键问题，并提出了可扩展的图标签分布学习（SGLDL）作为实现ILDL的实用框架。具体来说，在SGLDL中，我们开发了一种新标签感知的梯度补偿损失，以加速学习新标签，并将标签间关系表示为图，以减少重建标签间关系所需的时间。在经典LDL数据集上的实验结果显示了独特算法的明显优势，并说明了为ILDL问题设计专用的重要性。

更新时间: 2024-11-20 07:49:51

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2411.13097v1

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Existing large video-language models (LVLMs) struggle to comprehend long videos correctly due to limited context. To address this problem, fine-tuning long-context LVLMs and employing GPT-based agents have emerged as promising solutions. However, fine-tuning LVLMs would require extensive high-quality data and substantial GPU resources, while GPT-based agents would rely on proprietary models (e.g., GPT-4o). In this paper, we propose Video Retrieval-Augmented Generation (Video-RAG), a training-free and cost-effective pipeline that employs visually-aligned auxiliary texts to help facilitate cross-modality alignment while providing additional information beyond the visual content. Specifically, we leverage open-source external tools to extract visually-aligned information from pure video data (e.g., audio, optical character, and object detection), and incorporate the extracted information into an existing LVLM as auxiliary texts, alongside video frames and queries, in a plug-and-play manner. Our Video-RAG offers several key advantages: (i) lightweight with low computing overhead due to single-turn retrieval; (ii) easy implementation and compatibility with any LVLM; and (iii) significant, consistent performance gains across long video understanding benchmarks, including Video-MME, MLVU, and LongVideoBench. Notably, our model demonstrates superior performance over proprietary models like Gemini-1.5-Pro and GPT-4o when utilized with a 72B model.

Updated: 2024-11-20 07:44:34

标题: 视频-RAG：视觉对齐检索增强长视频理解

摘要: 现有的大型视频语言模型（LVLMs）由于上下文有限而难以正确理解长视频。为了解决这个问题，微调长上下文LVLMs并利用基于GPT的代理已经成为有前途的解决方案。然而，微调LVLMs将需要大量高质量数据和大量GPU资源，而基于GPT的代理将依赖专有模型（例如GPT-4o）。在本文中，我们提出了视频检索增强生成（Video-RAG），这是一个无需训练且成本效益高的流程，利用视觉对齐的辅助文本来帮助促进跨模态对齐，同时提供比视觉内容更多的信息。具体而言，我们利用开源外部工具从纯视频数据（例如音频、光学字符和物体检测）中提取视觉对齐的信息，并将提取的信息以插拔方式作为辅助文本与视频帧和查询一起，整合到现有的LVLM中。我们的Video-RAG具有几个关键优势：（i）由于单轮检索而轻量级，计算开销低；（ii）易于实施，与任何LVLM兼容；（iii）在长视频理解基准测试中取得显著、一致的性能提升，包括Video-MME、MLVU和LongVideoBench。值得注意的是，我们的模型在使用72B模型时表现优于专有模型如Gemini-1.5-Pro和GPT-4o。

更新时间: 2024-11-20 07:44:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.13093v1

Is Knowledge Power? On the (Im)possibility of Learning from Strategic Interactions

When learning in strategic environments, a key question is whether agents can overcome uncertainty about their preferences to achieve outcomes they could have achieved absent any uncertainty. Can they do this solely through interactions with each other? We focus this question on the ability of agents to attain the value of their Stackelberg optimal strategy and study the impact of information asymmetry. We study repeated interactions in fully strategic environments where players' actions are decided based on learning algorithms that take into account their observed histories and knowledge of the game. We study the pure Nash equilibria (PNE) of a meta-game where players choose these algorithms as their actions. We demonstrate that if one player has perfect knowledge about the game, then any initial informational gap persists. That is, while there is always a PNE in which the informed agent achieves her Stackelberg value, there is a game where no PNE of the meta-game allows the partially informed player to achieve her Stackelberg value. On the other hand, if both players start with some uncertainty about the game, the quality of information alone does not determine which agent can achieve her Stackelberg value. In this case, the concept of information asymmetry becomes nuanced and depends on the game's structure. Overall, our findings suggest that repeated strategic interactions alone cannot facilitate learning effectively enough to earn an uninformed player her Stackelberg value.

Updated: 2024-11-20 07:35:07

标题: 知识就是力量吗？关于从战略互动中学习的（不）可能性

摘要: 在战略环境中学习时，一个关键问题是代理能否克服对自己偏好的不确定性，以实现在没有任何不确定性的情况下也能实现的结果。他们能否仅通过彼此的互动来做到这一点？我们将这个问题集中在代理实现其斯塔克尔贝格最优策略价值的能力上，并研究信息不对称的影响。我们研究了在完全战略环境中的重复互动，玩家的行动是基于学习算法来决定的，这些算法考虑了他们观察到的历史和对游戏的了解。我们研究了一个元博弈的纯纳什均衡（PNE），其中玩家选择这些算法作为他们的行动。我们证明，如果一个玩家对游戏有完全的了解，那么任何初始的信息差异都会持续存在。也就是说，虽然总是存在一个PNE，其中知情者实现她的斯塔克尔贝格价值，但有一个游戏，其中元博弈的任何PNE都不允许部分知情的玩家实现她的斯塔克尔贝格价值。另一方面，如果两个玩家都对游戏有一些不确定性，那么仅凭信息的质量不能确定哪个玩家能实现她的斯塔克尔贝格价值。在这种情况下，信息不对称的概念变得微妙，并取决于游戏的结构。总的来说，我们的研究结果表明，仅通过重复的战略互动不能有效地促进学习，以赢得一个不知情的玩家她的斯塔克尔贝格价值。

更新时间: 2024-11-20 07:35:07

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2408.08272v2

SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Enhanced Code Generation

Large language models demonstrate exceptional performance in simple code generation tasks but still face challenges in tackling complex problems. These challenges may stem from insufficient reasoning and problem decomposition capabilities. To address this issue, we propose a reasoning-augmented data generation process, SRA-MCTS, which guides the model to autonomously generate high-quality intermediate reasoning paths. This creates a positive feedback loop, enabling continuous improvement. Our method operates entirely through the model itself without requiring additional supervision. By synthesizing natural language reasoning paths and translating them into executable code, the approach ensures analytical accuracy and enhances the success rate in solving complex tasks. Experimental results show that, even without additional supervisory signals, our method achieves performance improvements across different model scales, demonstrating the significant potential of self-improvement in small models. Furthermore, the method remains robust when traditional Chain-of-Thought (CoT) approaches exhibit performance degradation, with notable improvements observed in diversity metrics such as pass@10. We encourage further exploration of reasoning processes within training data to enhance the ability of language models to address complex problems.

Updated: 2024-11-20 07:34:47

标题: SRA-MCTS：蒙特卡罗树搜索增强自主推理的代码生成

摘要: 大型语言模型在简单代码生成任务中表现出色，但在解决复杂问题时仍面临挑战。这些挑战可能源于推理和问题分解能力不足。为了解决这个问题，我们提出了一种推理增强的数据生成过程，即SRA-MCTS，它引导模型自主生成高质量的中间推理路径。这创造了一个正反馈循环，实现持续改进。我们的方法完全通过模型自身运行，无需额外监督。通过综合自然语言推理路径并将其转化为可执行代码，该方法确保了分析准确性，并提高了解决复杂任务的成功率。实验结果表明，即使没有额外的监督信号，我们的方法在不同模型规模上都取得了性能改进，展示了小型模型自我改进的巨大潜力。此外，当传统的“思维链”(CoT)方法表现下降时，该方法仍然保持稳健，观察到了在通过@10等多样性指标上的显著改进。我们鼓励进一步探索训练数据中的推理过程，以增强语言模型解决复杂问题的能力。

更新时间: 2024-11-20 07:34:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.11053v2

Omnipredicting Single-Index Models with Multi-Index Models

Recent work on supervised learning [GKR+22] defined the notion of omnipredictors, i.e., predictor functions $p$ over features that are simultaneously competitive for minimizing a family of loss functions $\mathcal{L}$ against a comparator class $\mathcal{C}$. Omniprediction requires approximating the Bayes-optimal predictor beyond the loss minimization paradigm, and has generated significant interest in the learning theory community. However, even for basic settings such as agnostically learning single-index models (SIMs), existing omnipredictor constructions require impractically-large sample complexities and runtimes, and output complex, highly-improper hypotheses. Our main contribution is a new, simple construction of omnipredictors for SIMs. We give a learner outputting an omnipredictor that is $\varepsilon$-competitive on any matching loss induced by a monotone, Lipschitz link function, when the comparator class is bounded linear predictors. Our algorithm requires $\approx \varepsilon^{-4}$ samples and runs in nearly-linear time, and its sample complexity improves to $\approx \varepsilon^{-2}$ if link functions are bi-Lipschitz. This significantly improves upon the only prior known construction, due to [HJKRR18, GHK+23], which used $\gtrsim \varepsilon^{-10}$ samples. We achieve our construction via a new, sharp analysis of the classical Isotron algorithm [KS09, KKKS11] in the challenging agnostic learning setting, of potential independent interest. Previously, Isotron was known to properly learn SIMs in the realizable setting, as well as constant-factor competitive hypotheses under the squared loss [ZWDD24]. As they are based on Isotron, our omnipredictors are multi-index models with $\approx \varepsilon^{-2}$ prediction heads, bringing us closer to the tantalizing goal of proper omniprediction for general loss families and comparators.

Updated: 2024-11-20 07:20:49

标题: 使用多指数模型来全面预测单指数模型

摘要: 最近关于监督学习的研究[GKR+22]定义了全预测器的概念，即在特征上同时竞争最小化一组损失函数$\mathcal{L}$的预测函数$p$，并与比较类$\mathcal{C}$竞争。全预测需要在损失最小化范式之外近似贝叶斯最优预测器，并在学习理论社区引起了极大兴趣。然而，即使对于基本设置如不可知地学习单指数模型（SIMs），现有的全预测器构造需要不切实际地大的样本复杂性和运行时间，并输出复杂、高度不适当的假设。我们的主要贡献是为SIMs提供了一种新的简单的全预测器构造。我们提供一个学习者输出一个全预测器，该预测器在由单调、Lipschitz链接函数引起的任何匹配损失上是$\varepsilon$-竞争的，比较类是有界线性预测器。我们的算法需要大约$\varepsilon^{-4}$个样本，并且运行时间几乎是线性的，如果链接函数是双Lipschitz的，则其样本复杂度改进为大约$\varepsilon^{-2}$。这明显改进了先前已知的唯一构造，由[HJKRR18，GHK+23]使用了$\gtrsim \varepsilon^{-10}$个样本。我们通过对经典Isotron算法[KSS09，KKKS11]在具有挑战性的不可知学习设置中的尖锐分析，实现了我们的构造，这可能是独立感兴趣的。以前，Isotron已知在可实现设置下正确学习SIMs，以及在平方损失下具有常数因子竞争性假设[ZWDD24]。由于它们基于Isotron，我们的全预测器是具有大约$\varepsilon^{-2}$个预测头的多指数模型，使我们更接近全损失家族和比较器的合适全预测的令人心动的目标。

更新时间: 2024-11-20 07:20:49

领域: cs.LG,cs.DS,math.OC,stat.ML

下载: http://arxiv.org/abs/2411.13083v1

SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel PEFT method. Each SORSA adapter consists of two main parts: trainable principal singular weights $W_p = U_p \text{diag}(S_p) V^\top_p$, and frozen residual weights $W_r = U_r \text{diag}(S_r) V^\top_r$. These parts are initialized by performing singular value decomposition (SVD) on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which we prove could decrease the condition number of $W_p$ and make the optimization more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. We also introduce a method to analyze the variation of the parameters by performing SVD and discuss and analyze SORSA's superiority in minimizing the alteration in the SVD aspect. After all, SORSA shows a faster convergence than LoRA and PiSSA in our experiments. On the GSM-8K benchmark, Llama 2 7B adapted using SORSA achieved 56.03% accuracy, surpassing LoRA (42.30%), AdaLoRA (47.30%), Full FT (49.05%), and PiSSA (53.07%). On the MATH benchmark, SORSA achieved 10.36% accuracy, outperforming LoRA (5.50%), AdaLoRA (6.48%), Full FT (7.22%), and PiSSA (7.44%). We conclude that SORSA offers a new perspective on parameter-efficient fine-tuning, demonstrating remarkable performance.

Updated: 2024-11-20 07:08:22

标题: SORSA：大型语言模型的奇异值和正交正则化奇异向量调整

摘要: 在这篇论文中，我们提出了一种新颖的PEFT方法，称为奇异值和正交正则化奇异向量适应（SORSA）。每个SORSA适配器包括两个主要部分：可训练的主奇异权重$W_p = U_p \text{diag}(S_p) V^\top_p$，和冻结的残余权重$W_r = U_r \text{diag}(S_r) V^\top_r。这些部分通过对预训练权重进行奇异值分解（SVD）来初始化。此外，我们实现并分析了一个正交正则化器，证明它可以降低$W_p$的条件数，使优化更有效率。SORSA适配器在推断期间可以合并，从而消除任何推断延迟。我们还介绍了一种通过执行SVD来分析参数变化的方法，并讨论和分析了SORSA在最小化SVD方面的优越性。最终，在我们的实验中，SORSA表现出比LoRA和PiSSA更快的收敛速度。在GSM-8K基准测试中，使用SORSA调整的Llama 27B实现了56.03%的准确率，超过了LoRA（42.30%），AdaLoRA（47.30%），Full FT（49.05%）和PiSSA（53.07%）。在MATH基准测试中，SORSA实现了10.36%的准确率，优于LoRA（5.50%），AdaLoRA（6.48%），Full FT（7.22%）和PiSSA（7.44%）。我们得出结论，SORSA为参数高效微调提供了新的视角，表现出卓越的性能。

更新时间: 2024-11-20 07:08:22

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.00055v5

Neural Internal Model Control: Learning a Robust Control Policy via Predictive Error Feedback

Accurate motion control in the face of disturbances within complex environments remains a major challenge in robotics. Classical model-based approaches often struggle with nonlinearities and unstructured disturbances, while RL-based methods can be fragile when encountering unseen scenarios. In this paper, we propose a novel framework, Neural Internal Model Control, which integrates model-based control with RL-based control to enhance robustness. Our framework streamlines the predictive model by applying Newton-Euler equations for rigid-body dynamics, eliminating the need to capture complex high-dimensional nonlinearities. This internal model combines model-free RL algorithms with predictive error feedback. Such a design enables a closed-loop control structure to enhance the robustness and generalizability of the control system. We demonstrate the effectiveness of our framework on both quadrotors and quadrupedal robots, achieving superior performance compared to state-of-the-art methods. Furthermore, real-world deployment on a quadrotor with rope-suspended payloads highlights the framework's robustness in sim-to-real transfer. Our code is released at https://github.com/thu-uav/NeuralIMC.

Updated: 2024-11-20 07:07:42

标题: 神经内部模型控制：通过预测误差反馈学习稳健的控制策略

摘要: 在复杂环境中面对干扰保持精确运动控制仍然是机器人领域的一个主要挑战。传统的基于模型的方法通常难以处理非线性和非结构化的干扰，而基于强化学习的方法在遇到未知场景时可能会变得脆弱。本文提出了一种新颖的框架，神经内部模型控制，将基于模型的控制与基于强化学习的控制结合起来以增强鲁棒性。我们的框架通过应用牛顿-欧拉方程简化预测模型，消除了捕捉复杂高维非线性的需要。这种内部模型将无模型强化学习算法与预测误差反馈结合起来。这种设计使闭环控制结构增强了控制系统的鲁棒性和泛化能力。我们在四轴飞行器和四足机器人上展示了我们框架的有效性，在性能上优于现有技术方法。此外，在悬挂负载绳的四轴飞行器上进行的真实部署突出了该框架在模拟至实际转移中的鲁棒性。我们的代码已发布在https://github.com/thu-uav/NeuralIMC。

更新时间: 2024-11-20 07:07:42

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.13079v1

Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction

This paper introduces CooperKGC, a novel framework challenging the conventional solitary approach of large language models (LLMs) in knowledge graph construction (KGC). CooperKGC establishes a collaborative processing network, assembling a team capable of concurrently addressing entity, relation, and event extraction tasks. Experimentation demonstrates that fostering collaboration within CooperKGC enhances knowledge selection, correction, and aggregation capabilities across multiple rounds of interactions.

Updated: 2024-11-20 07:07:41

标题: 超越孤立：多智能体协同改进知识图谱构建

摘要: 本文介绍了CooperKGC，一个挑战传统的大型语言模型（LLMs）在知识图谱构建（KGC）中孤立方法的新框架。CooperKGC建立了一个协作处理网络，组建一个团队，能够同时处理实体、关系和事件提取任务。实验证明，在CooperKGC内促进协作可以增强跨多轮交互的知识选择、校正和聚合能力。

更新时间: 2024-11-20 07:07:41

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.03022v3

Learning to Optimize for Mixed-Integer Non-linear Programming

Mixed-integer non-linear programs (MINLPs) arise in various domains, such as energy systems and transportation, but are notoriously difficult to solve. Recent advances in machine learning have led to remarkable successes in optimization tasks, an area broadly known as learning to optimize. This approach includes using predictive models to generate solutions for optimization problems with continuous decision variables, thereby avoiding the need for computationally expensive optimization algorithms. However, applying learning to MINLPs remains challenging primarily due to the presence of integer decision variables, which complicate gradient-based learning. To address this limitation, we propose two differentiable correction layers that generate integer outputs while preserving gradient information. Combined with a soft penalty for constraint violation, our framework can tackle both the integrality and non-linear constraints in a MINLP. Experiments on three problem classes with convex/non-convex objective/constraints and integer/mixed-integer variables show that the proposed learning-based approach consistently produces high-quality solutions for parametric MINLPs extremely quickly. As problem size increases, traditional exact solvers and heuristic methods struggle to find feasible solutions, whereas our approach continues to deliver reliable results. Our work extends the scope of learning-to-optimize to MINLP, paving the way for integrating integer constraints into deep learning models. Our code is available at https://github.com/pnnl/L2O-pMINLP.

Updated: 2024-11-20 07:03:40

标题: 学习优化混合整数非线性规划

摘要: 混合整数非线性规划(MINLPs)在能源系统和交通等领域中出现，但解决起来非常困难。最近机器学习的进展在优化任务中取得了显著成功，这一领域被广泛称为学习优化。这种方法包括使用预测模型生成连续决策变量的优化问题的解决方案，从而避免了需要计算昂贵的优化算法。然而，将学习应用于MINLPs仍然具有挑战性，主要是由于整数决策变量的存在，这使得基于梯度的学习变得复杂。为了解决这个限制，我们提出了两个可微的校正层，生成整数输出同时保留梯度信息。结合对约束违反的软惩罚，我们的框架可以处理MINLP中的整数性和非线性约束。对具有凸/非凸目标/约束和整数/混合整数变量的三类问题进行的实验表明，所提出的基于学习的方法能够快速、一致地生成高质量的参数化MINLPs的解决方案。随着问题规模的增加，传统的精确求解器和启发式方法很难找到可行解决方案，而我们的方法继续提供可靠的结果。我们的工作将学习优化扩展到MINLP，为将整数约束整合到深度学习模型中铺平了道路。我们的代码可以在https://github.com/pnnl/L2O-pMINLP找到。

更新时间: 2024-11-20 07:03:40

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.11061v4

Learning the Market: Sentiment-Based Ensemble Trading Agents

We propose and study the integration of sentiment analysis and deep reinforcement learning ensemble algorithms for stock trading by evaluating strategies capable of dynamically altering their active agent given the concurrent market environment. In particular, we design a simple-yet-effective method for extracting financial sentiment and combine this with improvements on existing trading agents, resulting in a strategy that effectively considers both qualitative market factors and quantitative stock data. We show that our approach results in a strategy that is profitable, robust, and risk-minimal - outperforming the traditional ensemble strategy as well as single agent algorithms and market metrics. Our findings suggest that the conventional practice of switching and reevaluating agents in ensemble every fixed-number of months is sub-optimal, and that a dynamic sentiment-based framework greatly unlocks additional performance. Furthermore, as we have designed our algorithm with simplicity and efficiency in mind, we hypothesize that the transition of our method from historical evaluation towards real-time trading with live data to be relatively simple.

Updated: 2024-11-20 06:59:55

标题: 学习市场：基于情绪的集成交易代理商

摘要: 我们提出并研究了情感分析和深度强化学习集成算法在股票交易中的应用，通过评估能够动态改变其活跃代理的策略，以适应当前市场环境。特别地，我们设计了一种简单但有效的方法来提取金融情绪，并将其与现有交易代理的改进结合起来，从而形成一种有效考虑定性市场因素和定量股票数据的策略。我们展示了我们的方法产生了一种盈利、稳健且风险最小的策略 - 胜过传统集成策略以及单一代理算法和市场指标。我们的研究结果表明，在固定时间间隔内切换和重新评估集成代理的传统做法是次优的，并且基于情感的动态框架极大地提高了性能。此外，由于我们设计了简单且高效的算法，我们假设从历史评估向实时交易中使用实时数据的转变相对简单。

更新时间: 2024-11-20 06:59:55

领域: q-fin.TR,cs.LG

下载: http://arxiv.org/abs/2402.01441v2

Surface Flux Transport Modeling using Physics Informed Neural Networks

Studying the magnetic field properties on the solar surface is crucial for understanding the solar and heliospheric activities, which in turn shape space weather in the solar system. Surface Flux Transport (SFT) modeling helps us to simulate and analyse the transport and evolution of magnetic flux on the solar surface, providing valuable insights into the mechanisms responsible for solar activity. In this work, we demonstrate the use of machine learning techniques in solving magnetic flux transport, making it accurate. We have developed a novel Physics-Informed Neural Networks (PINN)-based model to study the evolution of Bipolar Magnetic Regions (BMRs) using SFT in one-dimensional azimuthally averaged and also in two-dimensions. We demonstrate the efficiency and computational feasibility of our PINN-based model by comparing its performance and accuracy with that of a numerical model implemented using the Runge-Kutta Implicit-Explicit (RK-IMEX) scheme. The mesh-independent PINN method can be used to reproduce the observed polar magnetic field with better flux conservation. This advancement is important for accurately reproducing observed polar magnetic fields, thereby providing insights into the strength of future solar cycles. This work paves the way for more efficient and accurate simulations of solar magnetic flux transport and showcases the applicability of PINN in solving advection-diffusion equations with a particular focus on heliophysics.

Updated: 2024-11-20 06:56:31

标题: 使用物理信息神经网络进行表面通量输运建模

摘要: 研究太阳表面的磁场特性对于理解太阳和日晕活动至关重要，这反过来塑造了太阳系中的空间天气。表面磁通输运（SFT）建模帮助我们模拟和分析太阳表面磁通的输运和演化，为理解太阳活动的机制提供宝贵见解。在这项工作中，我们展示了机器学习技术在解决磁通输运中的应用，使其更加准确。我们开发了一种基于物理信息神经网络（PINN）的新型模型，通过SFT研究双极磁区（BMRs）在一维方位平均和二维中的演化。我们通过将其性能和准确性与使用Runge-Kutta隐式-显式（RK-IMEX）方案实施的数值模型进行比较，展示了我们基于PINN的模型的效率和计算可行性。独立于网格的PINN方法可用于更好地重现观测到的极地磁场并保持更好的磁通守恒。这一进展对准确再现观测到的极地磁场至关重要，从而为未来太阳周期的强度提供见解。这项工作为更高效和准确地模拟太阳磁通输运铺平了道路，并展示了PINN在解决对日球物理学特别关注的平流-扩散方程中的适用性。

更新时间: 2024-11-20 06:56:31

领域: astro-ph.SR,cs.LG

下载: http://arxiv.org/abs/2409.01744v2

Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles

The quality of self-supervised pre-trained embeddings on out-of-distribution (OOD) data is poor without fine-tuning. A straightforward and simple approach to improving the generalization of pre-trained representation to OOD data is the use of deep ensembles. However, obtaining an effective ensemble in the embedding space with only unlabeled data remains an unsolved problem. We first perform a theoretical analysis that reveals the relationship between individual hyperspherical embedding spaces in an ensemble. We then design a principled method to align these embedding spaces in an unsupervised manner. Experimental results on the MNIST dataset show that our embedding-space ensemble method improves pre-trained embedding quality on in-distribution and OOD data compared to single encoders.

Updated: 2024-11-20 06:50:50

标题: 通过对齐嵌入空间集合改进预训练编码器的OOD泛化

摘要: 自监督预训练嵌入在分布外（OOD）数据上的质量在没有微调的情况下较差。提高预训练表示对OOD数据泛化能力的一个直接且简单的方法是使用深度集合。然而，在仅有无标签数据的情况下，在嵌入空间中获得有效的集合仍然是一个未解决的问题。我们首先进行了理论分析，揭示了集合中各个超球嵌入空间之间的关系。然后，我们设计了一种有原则的方法，在无监督的方式下对齐这些嵌入空间。在MNIST数据集上的实验结果表明，与单个编码器相比，我们的嵌入空间集合方法可以改善预训练嵌入在分布内和OOD数据上的质量。

更新时间: 2024-11-20 06:50:50

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2411.13073v1

A Gap in Time: The Challenge of Processing Heterogeneous IoT Data in Digitalized Buildings

The increasing demand for sustainable energy solutions has driven the integration of digitalized buildings into the power grid, leveraging Internet-of-Things (IoT) technologies to enhance energy efficiency and operational performance. Despite their potential, effectively utilizing IoT point data within deep-learning frameworks presents significant challenges, primarily due to its inherent heterogeneity. This study investigates the diverse dimensions of IoT data heterogeneity in both intra-building and inter-building contexts, examining their implications for predictive modeling. A benchmarking analysis of state-of-the-art time series models highlights their performance on this complex dataset. The results emphasize the critical need for multi-modal data integration, domain-informed modeling, and automated data engineering pipelines. Additionally, the study advocates for collaborative efforts to establish high-quality public datasets, which are essential for advancing intelligent and sustainable energy management systems in digitalized buildings.

Updated: 2024-11-20 06:50:50

标题: 时间的差距：数字化建筑中异构物联网数据处理的挑战

摘要: 对可持续能源解决方案日益增长的需求推动了数字化建筑与电网的整合，利用物联网技术提高能源效率和运行性能。尽管具有潜力，但有效利用物联网点数据在深度学习框架中面临重大挑战，主要是由于其固有的异质性。本研究调查了建筑内部和建筑间环境中物联网数据异质性的多种维度，考察其对预测建模的影响。对最先进的时间序列模型进行基准分析，突出它们在这一复杂数据集上的表现。结果强调了多模态数据集成、领域知识驱动建模和自动化数据工程流程的关键性需求。此外，该研究主张开展协作努力建立高质量的公共数据集，这对于推动数字化建筑中智能和可持续能源管理系统的发展至关重要。

更新时间: 2024-11-20 06:50:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.14267v2

AMaze: An intuitive benchmark generator for fast prototyping of generalizable agents

Traditional approaches to training agents have generally involved a single, deterministic environment of minimal complexity to solve various tasks such as robot locomotion or computer vision. However, agents trained in static environments lack generalization capabilities, limiting their potential in broader scenarios. Thus, recent benchmarks frequently rely on multiple environments, for instance, by providing stochastic noise, simple permutations, or altogether different settings. In practice, such collections result mainly from costly human-designed processes or the liberal use of random number generators. In this work, we introduce AMaze, a novel benchmark generator in which embodied agents must navigate a maze by interpreting visual signs of arbitrary complexities and deceptiveness. This generator promotes human interaction through the easy generation of feature-specific mazes and an intuitive understanding of the resulting agents' strategies. As a proof-of-concept, we demonstrate the capabilities of the generator in a simple, fully discrete case with limited deceptiveness. Agents were trained under three different regimes (one-shot, scaffolding, interactive), and the results showed that the latter two cases outperform direct training in terms of generalization capabilities. Indeed, depending on the combination of generalization metric, training regime, and algorithm, the median gain ranged from 50% to 100% and maximal performance was achieved through interactive training, thereby demonstrating the benefits of a controllable human-in-the-loop benchmark generator.

Updated: 2024-11-20 06:47:29

标题: AMaze：一个直观的基准生成器，用于快速原型化可泛化代理

摘要: 传统的培训代理方法通常涉及解决各种任务，如机器人运动或计算机视觉，其特点是单一、确定性且复杂度较低的环境。然而，在静态环境中训练的代理缺乏泛化能力，限制了它们在更广泛场景中的潜力。因此，最近的基准测试经常依赖于多个环境，例如提供随机噪声、简单的排列或完全不同的设置。在实践中，这些集合主要来自昂贵的人为设计过程或随机数生成器的大量使用。在这项工作中，我们介绍了一个新颖的基准生成器AMaze，其中具有体现特征的迷宫需要代理通过解释任意复杂和欺骗性的视觉标志来导航。这个生成器通过轻松生成特定特征的迷宫和直观理解生成的代理策略来促进人类互动。作为概念验证，我们在一个简单的、完全离散的情况下展示了生成器的能力，限制了欺骗性。代理在三种不同的制度下进行培训（一次性、搭建、互动），结果表明后两种情况在泛化能力方面优于直接培训。事实上，根据泛化指标、训练制度和算法的组合，中位增益范围从50%到100%，通过互动培训实现了最佳性能，从而展示了可控的人类参与的基准生成器的好处。

更新时间: 2024-11-20 06:47:29

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.13072v1

Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine

The remarkable capabilities of Large Language Models (LLMs) make them increasingly compelling for adoption in real-world healthcare applications. However, the risks associated with using LLMs in medical applications have not been systematically characterized. We propose using five key principles for safe and trustworthy medical AI: Truthfulness, Resilience, Fairness, Robustness, and Privacy, along with ten specific aspects. Under this comprehensive framework, we introduce a novel MedGuard benchmark with 1,000 expert-verified questions. Our evaluation of 11 commonly used LLMs shows that the current language models, regardless of their safety alignment mechanisms, generally perform poorly on most of our benchmarks, particularly when compared to the high performance of human physicians. Despite recent reports indicate that advanced LLMs like ChatGPT can match or even exceed human performance in various medical tasks, this study underscores a significant safety gap, highlighting the crucial need for human oversight and the implementation of AI safety guardrails.

Updated: 2024-11-20 06:34:32

标题: 确保安全和信任：分析大型语言模型在医学领域的风险

摘要: 大型语言模型（LLMs）的显著功能使它们在实际医疗应用中越来越具吸引力。然而，在医疗应用中使用LLMs所带来的风险尚未得到系统性的表征。我们提出使用五个关键原则来确保医疗人工智能的安全可靠性：真实性、弹性、公平性、健壮性和隐私，以及十个具体方面。在这个全面的框架下，我们引入了一个新的MedGuard基准，包含1,000个专家验证的问题。我们对11个常用LLMs的评估表明，当前的语言模型，无论其安全对齐机制如何，通常在大部分基准上表现不佳，特别是与人类医生的高性能相比。尽管最近的报告表明像ChatGPT这样的先进LLMs可以在各种医疗任务中达到或甚至超越人类表现，但这项研究强调了一个显著的安全差距，强调了人类监督和实施人工智能安全防护栏的关键需求。

更新时间: 2024-11-20 06:34:32

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2411.14487v1

Towards Data Valuation via Asymmetric Data Shapley

As data emerges as a vital driver of technological and economic advancements, a key challenge is accurately quantifying its value in algorithmic decision-making. The Shapley value, a well-established concept from cooperative game theory, has been widely adopted to assess the contribution of individual data sources in supervised machine learning. However, its symmetry axiom assumes all players in the cooperative game are homogeneous, which overlooks the complex structures and dependencies present in real-world datasets. To address this limitation, we extend the traditional data Shapley framework to asymmetric data Shapley, making it flexible enough to incorporate inherent structures within the datasets for structure-aware data valuation. We also introduce an efficient $k$-nearest neighbor-based algorithm for its exact computation. We demonstrate the practical applicability of our framework across various machine learning tasks and data market contexts. The code is available at: https://github.com/xzheng01/Asymmetric-Data-Shapley.

Updated: 2024-11-20 06:27:46

标题: 朝向通过非对称数据Shapley方法进行数据估值

摘要: 随着数据在技术和经济发展中的重要驱动作用日益凸显，一个关键挑战是准确量化其在算法决策中的价值。Shapley值是合作博弈论中一个成熟的概念，已被广泛采用来评估监督机器学习中个体数据源的贡献。然而，其对称性公理假定合作博弈中的所有玩家都是同质的，而忽视了真实世界数据集中存在的复杂结构和依赖关系。为了解决这一限制，我们将传统数据Shapley框架扩展到非对称数据Shapley，使其足够灵活以融入数据集中的固有结构，以便进行结构感知的数据估值。我们还引入了一种基于$k$-最近邻的高效算法来精确计算其值。我们展示了我们的框架在各种机器学习任务和数据市场环境中的实际适用性。代码可在以下链接找到：https://github.com/xzheng01/Asymmetric-Data-Shapley.

更新时间: 2024-11-20 06:27:46

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2411.00388v2

Branches, Assemble! Multi-Branch Cooperation Network for Large-Scale Click-Through Rate Prediction at Taobao

Existing click-through rate (CTR) prediction works have studied the role of feature interaction through a variety of techniques. Each interaction technique exhibits its own strength, and solely using one type could constrain the model's capability to capture the complex feature relationships, especially for industrial large-scale data with enormous users and items. Recent research shows that effective CTR models often combine an MLP network with a dedicated feature interaction network in a two-parallel structure. However, the interplay and cooperative dynamics between different streams or branches remain under-researched. In this work, we introduce a novel Multi-Branch Cooperation Network (MBCnet) which enables multiple branch networks to collaborate with each other for better complex feature interaction modeling. Specifically, MBCnet consists of three branches: the Expert-based Feature Grouping and Crossing (EFGC) branch that promotes the model's memorization ability of specific feature fields, the low rank Cross Net branch and Deep branch to enhance both explicit and implicit feature crossing for improved generalization. Among branches, a novel cooperation scheme is proposed based on two principles: branch co-teaching and moderate differentiation. Branch co-teaching encourages well-learned branches to support poorly-learned ones on specific training samples. Moderate differentiation advocates branches to maintain a reasonable level of difference in their feature representations. The cooperation strategy improves learning through mutual knowledge sharing via co-teaching and boosts the discovery of diverse feature interactions across branches. Extensive experiments on large-scale industrial datasets and online A/B test demonstrate MBCnet's superior performance, delivering a 0.09 point increase in CTR, 1.49% growth in deals, and 1.62% rise in GMV. Core codes will be released soon.

Updated: 2024-11-20 06:10:06

标题: 分支，集结起来！淘宝大规模点击率预测的多分支合作网络

摘要: 现有的点击率（CTR）预测工作已经研究了通过各种技术来研究特征交互的作用。每种交互技术都有其独特的优势，仅仅使用一种类型可能会限制模型捕捉复杂特征关系的能力，尤其是对于拥有庞大用户和项目的工业大规模数据。最近的研究表明，有效的CTR模型通常将MLP网络与专门的特征交互网络结合在一个两个并行结构中。然而，不同流或分支之间的相互作用和合作动态仍未得到充分研究。在这项工作中，我们引入了一种新颖的多分支合作网络（MBCnet），它可以使多个分支网络协作，以实现更好的复杂特征交互建模。具体来说，MBCnet包括三个分支：基于专家的特征分组和交叉（EFGC）分支促进模型对特定特征字段的记忆能力，低秩交叉网络分支和深度分支增强明确和隐含特征交叉，以改善泛化能力。在分支之间，提出了一种基于两个原则的新型合作方案：分支共同教学和适度差异化。分支共同教学鼓励学习良好的分支在特定训练样本上支持学习不良的分支。适度差异化倡导分支保持合理水平的特征表示差异。合作策略通过共同教学改善学习，通过共同教学促进跨分支的多样特征交互发现。在大规模工业数据集和在线A/B测试上进行的大量实验表明，MBCnet表现出优越的性能，点击率提高了0.09个百分点，交易增长了1.49%，GMV增长了1.62%。核心代码将很快发布。

更新时间: 2024-11-20 06:10:06

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2411.13057v1

AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts

GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset information for a secret can aid developers in filtering false positives and prioritizing secret removal from the source code. However, existing secret detection tools do not provide the asset information, thus presenting difficulty to developers in filtering secrets only by looking at the secret value or finding the assets manually for each reported secret. The goal of our study is to aid software practitioners in prioritizing secrets removal by providing the assets information protected by the secrets through our novel static analysis tool. We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. Since the location of the asset can be distant from where the secret is defined, we investigated secret-asset co-location patterns and found four patterns. To identify the secret-asset pairs of the four patterns, we utilized three approaches (pattern matching, data flow analysis, and fast-approximation heuristics). We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public GitHub repositories to evaluate the performance of AssetHarvester. AssetHarvester demonstrates precision of (97%), recall (90%), and F1-score (94%) in detecting secret-asset pairs. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving recall of secret detection tools.

Updated: 2024-11-20 06:06:15

标题: 资产收割者：一种用于检测软件文档中秘密资产对的静态分析工具

摘要: GitGuardian监测了公共GitHub存储库中的秘密泄露，并报告称开发人员在2023年泄露了超过1200万个秘密（数据库和其他凭据），比2021年增长了113%。尽管存在秘密检测工具，开发人员忽略了工具报告的警告，因为存在误报（25%-99%）。然而，每个秘密都保护着通过资产标识符（DNS名称和公共或私有IP地址）访问的不同价值的资产。秘密的资产信息可以帮助开发人员过滤误报，并优先从源代码中删除秘密。然而，现有的秘密检测工具并未提供资产信息，因此给开发人员在仅查看秘密值或为每个报告的秘密手动查找资产带来了困难。我们研究的目标是通过我们的新颖静态分析工具为软件从业者提供受秘密保护的资产信息，以帮助其优先考虑删除秘密。我们提出了AssetHarvester，这是一个用于检测存储库中秘密-资产对的静态分析工具。由于资产的位置可能与定义秘密的位置相距甚远，我们调查了秘密-资产共位模式，并发现了四种模式。为了识别这四种模式的秘密-资产对，我们利用了三种方法（模式匹配、数据流分析和快速逼近启发式）。我们精选了来自188个公共GitHub存储库中提取的四种数据库类型的1,791个秘密-资产对的基准，以评估AssetHarvester的性能。AssetHarvester在检测秘密-资产对方面表现出97%的精度、90%的召回率和94%的F1分数。我们的发现表明，AssetHarvester中使用的数据流分析可以检测到秘密-资产对，而且没有误报，有助于提高秘密检测工具的召回率。

更新时间: 2024-11-20 06:06:15

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2403.19072v2

Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training

Dramatic increases in the capabilities of neural network models in recent years are driven by scaling model size, training data, and corresponding computational resources. To develop the exceedingly large networks required in modern applications, such as large language models (LLMs), model training is distributed across tens of thousands of hardware accelerators (e.g. GPUs), requiring orchestration of computation and communication across large computing clusters. In this work, we demonstrate that careful consideration of hardware configuration and parallelization strategy is critical for effective (i.e. compute- and cost-efficient) scaling of model size, training data, and total computation. We conduct an extensive empirical study of the performance of large-scale LLM training workloads across model size, hardware configurations, and distributed parallelization strategies. We demonstrate that: (1) beyond certain scales, overhead incurred from certain distributed communication strategies leads parallelization strategies previously thought to be sub-optimal in fact become preferable; and (2) scaling the total number of accelerators for large model training quickly yields diminishing returns even when hardware and parallelization strategies are properly optimized, implying poor marginal performance per additional unit of power or GPU-hour.

Updated: 2024-11-20 06:05:11

标题: 硬件扩展趋势与大规模分布式训练中的收益递减

摘要: 近年来，神经网络模型能力的显著增长主要是由于模型大小、训练数据以及相应的计算资源的扩展。为了开发现代应用程序中所需的异常庞大网络，如大型语言模型（LLMs），模型训练被分布在数万个硬件加速器（例如GPU）上，需要在大型计算集群中进行计算和通信的协调。在这项工作中，我们展示了对硬件配置和并行化策略进行仔细考虑对于有效（即计算和成本高效）扩展模型大小、训练数据和总计算是至关重要的。我们进行了一项关于大规模LLM训练工作负载在模型大小、硬件配置和分布式并行化策略上性能的广泛经验研究。我们证明了：（1）在一定规模以上，由于某些分布式通信策略带来的开销，以前被认为不够优化的并行化策略实际上更可取；（2）对于大型模型训练，即使在硬件和并行化策略得到适当优化的情况下，扩展加速器的总数很快会带来收益递减，这意味着每增加一个单位功率或GPU小时的边际性能较差。

更新时间: 2024-11-20 06:05:11

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2411.13055v1

Demystifying RCE Vulnerabilities in LLM-Integrated Apps

LLMs show promise in transforming software development, with a growing interest in integrating them into more intelligent apps. Frameworks like LangChain aid LLM-integrated app development, offering code execution utility/APIs for custom actions. However, these capabilities theoretically introduce Remote Code Execution (RCE) vulnerabilities, enabling remote code execution through prompt injections. No prior research systematically investigates these frameworks' RCE vulnerabilities or their impact on applications and exploitation consequences. Therefore, there is a huge research gap in this field. In this study, we propose LLMSmith to detect, validate and exploit the RCE vulnerabilities in LLM-integrated frameworks and apps. To achieve this goal, we develop two novel techniques, including 1) a lightweight static analysis to examine LLM integration mechanisms, and construct call chains to identify RCE vulnerabilities in frameworks; 2) a systematical prompt-based exploitation method to verify and exploit the found vulnerabilities in LLM-integrated apps. This technique involves various strategies to control LLM outputs, trigger RCE vulnerabilities and launch subsequent attacks. Our research has uncovered a total of 20 vulnerabilities in 11 LLM-integrated frameworks, comprising 19 RCE vulnerabilities and 1 arbitrary file read/write vulnerability. Of these, 17 have been confirmed by the framework developers, with 11 vulnerabilities being assigned CVE IDs. For the 51 apps potentially affected by RCE, we successfully executed attacks on 17 apps, 16 of which are vulnerable to RCE and 1 to SQL injection. Furthermore, we conduct a comprehensive analysis of these vulnerabilities and construct practical attacks to demonstrate the hazards in reality. Last, we propose several mitigation measures for both framework and app developers to counteract such attacks.

Updated: 2024-11-20 06:01:23

标题: 揭秘LLM集成应用中的RCE漏洞

摘要: LLMs在改变软件开发方面表现出很大潜力，越来越多的人对将它们整合到更智能的应用程序中感兴趣。像LangChain这样的框架有助于LLM整合应用程序的开发，为自定义操作提供代码执行实用程序/API。然而，这些功能理论上引入了远程代码执行（RCE）漏洞，通过提示注入实现远程代码执行。迄今为止，没有先前的研究系统地调查这些框架的RCE漏洞或它们对应用程序和利用后果的影响。因此，在这一领域存在着巨大的研究空白。在本研究中，我们提出了LLMSmith来检测、验证和利用LLM整合框架和应用程序中的RCE漏洞。为实现这一目标，我们开发了两种新技术，包括1）轻量级静态分析来检查LLM整合机制，并构建调用链以识别框架中的RCE漏洞；2）系统性的基于提示的利用方法来验证和利用在LLM整合应用程序中找到的漏洞。这种技术涉及各种策略，以控制LLM输出，触发RCE漏洞并发动后续攻击。我们的研究发现了11个LLM整合框架中的20个漏洞，其中包括19个RCE漏洞和1个任意文件读/写漏洞。其中，17个漏洞已被框架开发人员确认，其中11个漏洞被分配了CVE ID。对于可能受到RCE影响的51个应用程序，我们成功地对17个应用程序进行了攻击，其中16个易受RCE攻击，1个易受SQL注入攻击。此外，我们对这些漏洞进行了全面分析，并构建了实际攻击来展示现实中的危害。最后，我们提出了一些缓解措施，供框架和应用程序开发人员应对此类攻击。

更新时间: 2024-11-20 06:01:23

领域: cs.CR

下载: http://arxiv.org/abs/2309.02926v3

Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond

This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, which is then solved by stochastic mirror descent (SMD) with $m$ samples in each iteration, and attain a nearly optimal sample complexity. To reduce the number of samples required in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts SMD and the other executes an online algorithm for non-oblivious multi-armed bandits, maintaining the same sample complexity. Next, we extend GDRO to address scenarios involving imbalanced data and heterogeneous distributions. In the first scenario, we introduce a weighted variant of GDRO, enabling distribution-dependent convergence rates that rely on the number of samples from each distribution. We design two strategies to meet the sample budget: one integrates non-uniform sampling into SMD, and the other employs the stochastic mirror-prox algorithm with mini-batches, both of which deliver faster rates for distributions with more samples. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of outlier distributions. Similar to the case of vanilla GDRO, we develop two stochastic approaches: one uses $m$ samples per iteration via SMD, and the other consumes $k$ samples per iteration through an online algorithm for non-oblivious combinatorial semi-bandits.

Updated: 2024-11-20 05:58:10

标题: 随机逼近方法在群组分布鲁棒优化及更多领域的应用

摘要: 本文研究了群体分布鲁棒优化（GDRO），旨在学习一个在$m$个不同分布上表现良好的模型。首先，我们将GDRO形式化为一个随机凸凹鞍点问题，然后通过每次迭代中使用$m$个样本的随机镜像下降（SMD）来解决，并获得几乎最佳的样本复杂度。为了将每轮所需的样本数量从$m$减少到1，我们将GDRO构建为一个双人游戏，其中一名玩家进行SMD，另一名执行非遗忘多臂老虎机的在线算法，保持相同的样本复杂度。接下来，我们扩展GDRO以解决涉及不平衡数据和异构分布的情况。在第一种情况下，我们引入了GDRO的加权变体，实现依赖于每个分布的样本数量的分布相关收敛速度。我们设计了两种策略来满足样本预算：一种将非均匀采样整合到SMD中，另一种使用带有小批量的随机镜像-近端算法，这两种方法都为具有更多样本的分布提供更快的速率。在第二种情况下，我们建议优化平均top-$k$风险而不是最大风险，从而减轻异常分布的影响。与普通GDRO的情况类似，我们开发了两种随机方法：一种通过SMD每次迭代使用$m$个样本，另一种通过非遗忘组合半老虎机的在线算法每次迭代消耗$k$个样本。

更新时间: 2024-11-20 05:58:10

领域: cs.LG

下载: http://arxiv.org/abs/2302.09267v5

MEGL: Multimodal Explanation-Guided Learning

Explaining the decision-making processes of Artificial Intelligence (AI) models is crucial for addressing their "black box" nature, particularly in tasks like image classification. Traditional eXplainable AI (XAI) methods typically rely on unimodal explanations, either visual or textual, each with inherent limitations. Visual explanations highlight key regions but often lack rationale, while textual explanations provide context without spatial grounding. Further, both explanation types can be inconsistent or incomplete, limiting their reliability. To address these challenges, we propose a novel Multimodal Explanation-Guided Learning (MEGL) framework that leverages both visual and textual explanations to enhance model interpretability and improve classification performance. Our Saliency-Driven Textual Grounding (SDTG) approach integrates spatial information from visual explanations into textual rationales, providing spatially grounded and contextually rich explanations. Additionally, we introduce Textual Supervision on Visual Explanations to align visual explanations with textual rationales, even in cases where ground truth visual annotations are missing. A Visual Explanation Distribution Consistency loss further reinforces visual coherence by aligning the generated visual explanations with dataset-level patterns, enabling the model to effectively learn from incomplete multimodal supervision. We validate MEGL on two new datasets, Object-ME and Action-ME, for image classification with multimodal explanations. Experimental results demonstrate that MEGL outperforms previous approaches in prediction accuracy and explanation quality across both visual and textual domains. Our code will be made available upon the acceptance of the paper.

Updated: 2024-11-20 05:57:00

标题: MEGL：多模态解释引导学习

摘要: 解释人工智能（AI）模型的决策过程对于解决它们的“黑匣子”性质至关重要，特别是在诸如图像分类等任务中。传统的可解释人工智能（XAI）方法通常依赖于单模态解释，即视觉或文本，每种方法都具有固有的局限性。视觉解释突出关键区域，但通常缺乏理由，而文本解释提供背景信息但缺乏空间基础。此外，这两种解释类型可能不一致或不完整，限制了它们的可靠性。为了解决这些挑战，我们提出了一种新颖的多模态解释引导学习（MEGL）框架，利用视觉和文本解释来增强模型的可解释性并改善分类性能。我们的基于显著性驱动的文本基础（SDTG）方法将视觉解释中的空间信息整合到文本理由中，提供具有空间基础和丰富语境的解释。此外，我们引入了对视觉解释的文本监督，以使视觉解释与文本理由保持一致，即使在缺少地面真实视觉注释的情况下也是如此。通过视觉解释分布一致性损失进一步强化视觉一致性，通过将生成的视觉解释与数据集级别的模式保持一致，使模型能够有效地学习不完整的多模态监督。我们在两个新数据集Object-ME和Action-ME上验证了MEGL，用于带有多模态解释的图像分类。实验结果表明，MEGL在预测准确性和解释质量方面优于先前的方法，涵盖了视觉和文本领域。我们的代码将在论文被接受后提供。

更新时间: 2024-11-20 05:57:00

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.13053v1

On-device Content-based Recommendation with Single-shot Embedding Pruning: A Cooperative Game Perspective

Content-based Recommender Systems (CRSs) play a crucial role in shaping user experiences in e-commerce, online advertising, and personalized recommendations. However, due to the vast amount of categorical features, the embedding tables used in CRS models pose a significant storage bottleneck for real-world deployment, especially on resource-constrained devices. To address this problem, various embedding pruning methods have been proposed, but most existing ones require expensive retraining steps for each target parameter budget, leading to enormous computation costs. In reality, this computation cost is a major hurdle in real-world applications with diverse storage requirements, such as federated learning and streaming settings. In this paper, we propose Shapley Value-guided Embedding Reduction (Shaver) as our response. With Shaver, we view the problem from a cooperative game perspective, and quantify each embedding parameter's contribution with Shapley values to facilitate contribution-based parameter pruning. To address the inherently high computation costs of Shapley values, we propose an efficient and unbiased method to estimate Shapley values of a CRS's embedding parameters. Moreover, in the pruning stage, we put forward a field-aware codebook to mitigate the information loss in the traditional zero-out treatment. Through extensive experiments on three real-world datasets, Shaver has demonstrated competitive performance with lightweight recommendation models across various parameter budgets. The source code is available at https://anonymous.4open.science/r/shaver-E808

Updated: 2024-11-20 05:56:31

标题: 在设备上的基于内容的推荐与单次嵌入修剪：协作游戏视角

摘要: 内容为基础的推荐系统（CRSs）在电子商务、在线广告和个性化推荐中发挥着关键作用。然而，由于大量的分类特征，CRS模型中使用的嵌入表在现实世界的部署中构成了一个显著的存储瓶颈，特别是在资源受限的设备上。为解决这一问题，提出了各种嵌入剪枝方法，但大多数现有方法需要针对每个目标参数预算进行昂贵的重新训练步骤，导致巨大的计算成本。事实上，在具有不同存储需求的真实世界应用中，如联合学习和流媒体设置，这种计算成本是一个重要障碍。在本文中，我们提出了以Shapley Value为导向的嵌入降维（Shaver）作为我们的回应。通过Shaver，我们从合作博弈的角度看待问题，并利用Shapley值量化每个嵌入参数的贡献，以促进基于贡献的参数剪枝。为了解决Shapley值固有的高计算成本，我们提出了一种有效且无偏的方法来估计CRS的嵌入参数的Shapley值。此外，在剪枝阶段，我们提出了一个领域感知的码书，以减轻传统的零值处理中的信息损失。通过对三个真实世界数据集的广泛实验，Shaver在各种参数预算下表现出与轻量级推荐模型相竞争的性能。源代码可在https://anonymous.4open.science/r/shaver-E808获取。

更新时间: 2024-11-20 05:56:31

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2411.13052v1

Universal Online Convex Optimization Meets Second-order Bounds

Recently, several universal methods have been proposed for online convex optimization, and attain minimax rates for multiple types of convex functions simultaneously. However, they need to design and optimize one surrogate loss for each type of functions, making it difficult to exploit the structure of the problem and utilize existing algorithms. In this paper, we propose a simple strategy for universal online convex optimization, which avoids these limitations. The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the linearized losses to aggregate predictions from experts. Specifically, the meta-algorithm is required to yield a second-order bound with excess losses, so that it can leverage strong convexity and exponential concavity to control the meta-regret. In this way, our strategy inherits the theoretical guarantee of any expert designed for strongly convex functions and exponentially concave functions, up to a double logarithmic factor. As a result, we can plug in off-the-shelf online solvers as black-box experts to deliver problem-dependent regret bounds. For general convex functions, it maintains the minimax optimality and also achieves a small-loss bound. Furthermore, we extend our universal strategy to online composite optimization, where the loss function comprises a time-varying function and a fixed regularizer. To deal with the composite loss functions, we employ a meta-algorithm based on the optimistic online learning framework, which not only possesses a second-order bound, but also can utilize estimations for upcoming loss functions. With appropriate configurations, we demonstrate that the additional regularizer does not contribute to the meta-regret, thus maintaining the universality in the composite setting.

Updated: 2024-11-20 05:53:38

标题: Universal Online Convex Optimization Meets Second-order Bounds 普适的在线凸优化遇见二阶界限

摘要: 最近，已经提出了几种通用的在线凸优化方法，并且同时实现了多种凸函数的极小极大速率。然而，它们需要为每种函数设计和优化一个替代损失，这使得很难利用问题的结构并利用现有的算法。在本文中，我们提出了一种简单的通用在线凸优化策略，避免了这些限制。关键思想是构建一组专家来处理原始的在线函数，并部署一个元算法来整合专家的预测。具体来说，元算法需要产生一个带有超额损失的二阶界限，以便利用强凸性和指数凹性来控制元后悔。通过这种方式，我们的策略继承了为强凸函数和指数凹函数设计的任何专家的理论保证，最多有一个双对数因子。因此，我们可以将现成的在线求解器作为黑盒专家插入，以提供与问题相关的后悔界限。对于一般的凸函数，它保持了极小极大的最优性，并且还实现了一个小损失界限。此外，我们将我们的通用策略扩展到在线复合优化，其中损失函数包括一个时变函数和一个固定的正则化器。为了处理复合损失函数，我们采用基于乐观在线学习框架的元算法，它不仅具有一个二阶界限，而且还可以利用对未来损失函数的估计。通过适当的配置，我们证明了额外的正则化器不会对元后悔有贡献，从而在复合设置中保持通用性。

更新时间: 2024-11-20 05:53:38

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2105.03681v3

Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors

Deep neural networks (DNNs) deployed in a cloud often allow users to query models via the APIs. However, these APIs expose the models to model extraction attacks (MEAs). In this attack, the attacker attempts to duplicate the target model by abusing the responses from the API. Backdoor-based DNN watermarking is known as a promising defense against MEAs, wherein the defender injects a backdoor into extracted models via API responses. The backdoor is used as a watermark of the model; if a suspicious model has the watermark (i.e., backdoor), it is verified as an extracted model. This work focuses on object detection (OD) models. Existing backdoor attacks on OD models are not applicable for model watermarking as the defense against MEAs on a realistic threat model. Our proposed approach involves inserting a backdoor into extracted models via APIs by stealthily modifying the bounding-boxes (BBs) of objects detected in queries while keeping the OD capability. In our experiments on three OD datasets, the proposed approach succeeded in identifying the extracted models with 100% accuracy in a wide variety of experimental scenarios.

Updated: 2024-11-20 05:40:20

标题: 边界框水印：防御目标检测器模型提取攻击

摘要: 在云中部署的深度神经网络（DNNs）通常允许用户通过API查询模型。然而，这些API使模型暴露于模型提取攻击（MEAs）。在这种攻击中，攻击者试图通过滥用API的响应来复制目标模型。基于后门的DNN水印技术被认为是一种有前途的防御手段，其中防御者通过API响应向提取的模型中注入后门。后门被用作模型的水印；如果一个可疑模型带有水印（即后门），则被验证为提取的模型。本文关注目标检测（OD）模型。现有的针对OD模型的后门攻击不适用于模型水印技术作为对现实威胁模型的MEA的防御。我们提出的方法涉及通过偷偷修改查询中检测到的对象的边界框（BBs）来向提取的模型中插入后门，同时保持OD功能。在三个OD数据集上的实验中，提出的方法在各种实验场景中以100％的准确率成功识别了提取的模型。

更新时间: 2024-11-20 05:40:20

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2411.13047v1

Verification and Validation of Autonomous Systems

This paper describes how to proficiently prevent software defects in autonomous vehicles, discover and correct defects if they are encountered, and create a higher level of assurance in the software product development phase. It also describes how to ensure high assurance on software reliability.

Updated: 2024-11-20 05:36:22

标题: 自主系统的验证和验证

摘要: 本文描述了如何在自动驾驶汽车中有效地预防软件缺陷，发现和纠正缺陷，以及在软件产品开发阶段建立更高水平的保证。它还描述了如何确保软件可靠性的高度保证。

更新时间: 2024-11-20 05:36:22

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.13614v1

Generating Visual Stimuli from EEG Recordings using Transformer-encoder based EEG encoder and GAN

In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transformer-encoder based EEG encoder to produce EEG encodings, which serve as inputs to the generator component of the GAN network. Alongside the adversarial loss, we also incorporate perceptual loss to enhance the quality of the generated images.

Updated: 2024-11-20 05:35:03

标题: 利用基于Transformer编码器的脑电编码器和生成对抗网络从脑电记录中生成视觉刺激

摘要: 在这项研究中，我们解决了感知脑解码领域内的一个现代研究挑战，即利用对抗深度学习框架从脑电图信号中合成图像。具体目标是通过利用观察对象图像时获取的脑电图记录，重新创建属于各种物体类别的图像。为了实现这一目标，我们采用基于Transformer编码器的脑电图编码器来生成脑电图编码，作为GAN网络生成器组件的输入。除了对抗损失，我们还结合了感知损失来提高生成图像的质量。

更新时间: 2024-11-20 05:35:03

领域: cs.AI,cs.LG,eess.SP,q-bio.NC

下载: http://arxiv.org/abs/2402.10115v2

Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

Effective query-item relevance modeling is pivotal for enhancing user experience and safeguarding user satisfaction in e-commerce search systems. Recently, benefiting from the vast inherent knowledge, Large Language Model (LLM) approach demonstrates strong performance and long-tail generalization ability compared with previous neural-based specialized relevance learning methods. Though promising, current LLM-based methods encounter the following inadequacies in practice: First, the massive parameters and computational demands make it difficult to be deployed online. Second, distilling LLM models to online models is a feasible direction, but the LLM relevance modeling is a black box, and its rich intrinsic knowledge is difficult to extract and apply online. To improve the interpretability of LLM and boost the performance of online relevance models via LLM, we propose an Explainable LLM-driven Multi-dimensional Distillation framework for e-commerce relevance learning, which comprises two core components: (1) An Explainable LLM for relevance modeling (ELLM-rele), which decomposes the relevance learning into intermediate steps and models relevance learning as a Chain-of-Thought (CoT) reasoning, thereby enhancing both interpretability and performance of LLM. (2) A Multi-dimensional Knowledge Distillation (MKD) architecture that transfers the knowledge of ELLM-rele to current deployable interaction-based and representation-based student models from both the relevance score distribution and CoT reasoning aspects. Through distilling the probabilistic and CoT reasoning knowledge, MKD improves both the semantic interaction and long-tail generalization abilities of student models. Extensive offline evaluations and online experiments on Taobao search ad scene demonstrate that our proposed framework significantly enhances e-commerce relevance learning performance and user experience.

Updated: 2024-11-20 05:30:15

标题: 可解释的LLM驱动的多维蒸馏用于电子商务相关性学习

摘要: 高效的查询-项目相关性建模对于增强电子商务搜索系统中用户体验和维护用户满意度至关重要。最近，由于拥有广泛的内在知识，大型语言模型（LLM）方法相比于先前基于神经网络的专门相关性学习方法表现出强大的性能和长尾泛化能力。尽管有前景，但当前基于LLM的方法在实践中遇到以下不足之处：首先，庞大的参数和计算需求使其难以在线部署。其次，将LLM模型提炼为在线模型是一种可行的方向，但LLM相关性建模是一个黑盒子，其丰富的内在知识难以提取和在线应用。为了提高LLM的可解释性并通过LLM提升在线相关性模型的性能，我们提出了一个适用于电子商务相关性学习的可解释LLM驱动的多维提炼框架，包括两个核心组件：（1）用于相关性建模的可解释LLM（ELLM-rele），将相关性学习分解为中间步骤，并将相关性学习建模为一种“思维链”（CoT）推理，从而提高LLM的可解释性和性能。（2）一个多维知识提炼（MKD）架构，从相关性分数分布和CoT推理方面将ELLM-rele的知识转移给当前可部署的基于交互和基于表示的学生模型。通过提炼概率和CoT推理知识，MKD提高了学生模型的语义交互和长尾泛化能力。对淘宝搜索广告场景的广泛离线评估和在线实验表明，我们提出的框架显著提升了电子商务相关性学习性能和用户体验。

更新时间: 2024-11-20 05:30:15

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.13045v1

Receiver-Centric Generative Semantic Communications

This paper investigates semantic communications between a transmitter and a receiver, where original data, such as videos of interest to the receiver, is stored at the transmitter. Although significant process has been made in semantic communications, a fundamental design problem is that the semantic information is extracted based on certain criteria at the transmitter alone, without considering the receiver's specific information needs. As a result, critical information of primary concern to the receiver may be lost. In such cases, the semantic transmission becomes meaningless to the receiver, as all received information is irrelevant to its interests. To solve this problem, this paper presents a receiver-centric generative semantic communication system, where each transmission is initialized by the receiver. Specifically, the receiver first sends its request for the desired semantic information to the transmitter at the start of each transmission. Then, the transmitter extracts the required semantic information accordingly. A key challenge is how the transmitter understands the receiver's requests for semantic information and extracts the required semantic information in a reasonable and robust manner. We address this challenge by designing a well-structured framework and leveraging off-the-shelf generative AI products, such as GPT-4, along with several specialized tools for detection and estimation. Evaluation results demonstrate the feasibility and effectiveness of the proposed new semantic communication system.

Updated: 2024-11-20 05:13:36

标题: 接收者中心的生成语义通信

摘要: 本文研究了发射机和接收机之间的语义通信，其中接收机感兴趣的原始数据，例如视频，存储在发射机上。尽管在语义通信方面取得了重要进展，但一个基本的设计问题是，语义信息仅基于发射机上的某些标准提取，而没有考虑接收机的具体信息需求。因此，接收机最关注的关键信息可能会丢失。在这种情况下，语义传输对接收机而言变得毫无意义，因为所有接收到的信息都与其兴趣无关。为解决这一问题，本文提出了一个以接收机为中心的生成式语义通信系统，其中每次传输均由接收机初始化。具体而言，接收机首先在每次传输开始时向发射机发送其对所需语义信息的请求。然后，发射机相应地提取所需的语义信息。一个关键挑战是发射机如何理解接收机对语义信息的请求，并以合理而稳健的方式提取所需的语义信息。我们通过设计一个结构良好的框架，并利用诸如GPT-4之类的现成生成式人工智能产品，以及几种专门用于检测和估计的工具来解决这一挑战。评估结果证明了所提出的新型语义通信系统的可行性和有效性。

更新时间: 2024-11-20 05:13:36

领域: cs.IT,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2411.03127v2

Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

Estimating the homography between two images is crucial for mid- or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs. To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap. To handle these gaps, we use Barlow Twins loss for the modality gap and propose an extended version, Geometry Barlow Twins, for the geometry gap. As a result, we demonstrate that our method, AltO, can be trained on multimodal datasets without any ground-truth data. It not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators. The source code can be found at:~\url{https://github.com/songsang7/AltO}

Updated: 2024-11-20 04:56:19

标题: 无监督多模态图像对上的单应性估计通过交替优化

摘要: 估计两幅图像之间的单应性对于中高级视觉任务（如图像拼接和融合）至关重要。然而，使用监督学习方法通常具有挑战性或成本高昂，因为收集地面真实数据的困难。为此，出现了无监督学习方法。然而，大多数早期方法假定给定的图像对来自同一相机或具有轻微的光照差异。因此，虽然这些方法在这些条件下表现有效，但当输入图像对来自不同领域时，即所谓的多模态图像对时，它们通常会失败。为了解决这些限制，我们提出了AltO，一个用于估计多模态图像对中单应性的无监督学习框架。我们的方法采用了一个两阶段交替优化框架，类似于期望最大化（EM），其中一个阶段减少几何差距，另一个解决模态差距。为了处理这些差距，我们使用Barlow Twins损失来解决模态差距，并提出了一个扩展版本，Geometry Barlow Twins，来解决几何差距。结果，我们展示了我们的方法AltO可以在没有任何地面真实数据的情况下在多模态数据集上进行训练。它不仅优于其他无监督方法，而且与各种单应性估计器的架构兼容。源代码可在以下链接找到：https://github.com/songsang7/AltO

更新时间: 2024-11-20 04:56:19

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.13036v1

SparseDM: Toward Sparse Efficient Diffusion Models

Diffusion models have been extensively used in data generation tasks and are recognized as one of the best generative models. However, their time-consuming deployment, long inference time, and requirements on large memory limit their application on mobile devices. In this paper, we propose a method based on the improved Straight-Through Estimator to improve the deployment efficiency of diffusion models. Specifically, we add sparse masks to the Convolution and Linear layers in a pre-trained diffusion model, then use design progressive sparsity for model training in the fine-tuning stage, and switch the inference mask on and off, which supports a flexible choice of sparsity during inference according to the FID and MACs requirements. Experiments on four datasets conducted on a state-of-the-art Transformer-based diffusion model demonstrate that our method reduces MACs by $50\%$ while increasing FID by only 1.5 on average. Under other MACs conditions, the FID is also lower than 1$\sim$137 compared to other methods.

Updated: 2024-11-20 04:51:59

标题: SparseDM：朝向稀疏高效扩散模型

摘要: 扩散模型已广泛应用于数据生成任务，并被认为是最佳生成模型之一。然而，它们部署耗时长、推理时间长，且对大内存的要求限制了它们在移动设备上的应用。本文提出了一种基于改进的直通估计器的方法，以提高扩散模型的部署效率。具体地，我们在预训练的扩散模型中的卷积和线性层中添加了稀疏掩码，然后在微调阶段使用设计逐渐稀疏的方法进行模型训练，并在推理过程中打开和关闭推理掩码，从而根据FID和MACs的需求在推理过程中灵活选择稀疏度。在一个基于最先进的Transformer的扩散模型上进行的四个数据集的实验表明，我们的方法将MACs降低了50％，同时平均仅将FID增加了1.5。在其他MACs条件下，与其他方法相比，FID也低于1至137。

更新时间: 2024-11-20 04:51:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.10445v3

"It was 80% me, 20% AI": Seeking Authenticity in Co-Writing with Large Language Models

Given the rising proliferation and diversity of AI writing assistance tools, especially those powered by large language models (LLMs), both writers and readers may have concerns about the impact of these tools on the authenticity of writing work. We examine whether and how writers want to preserve their authentic voice when co-writing with AI tools and whether personalization of AI writing support could help achieve this goal. We conducted semi-structured interviews with 19 professional writers, during which they co-wrote with both personalized and non-personalized AI writing-support tools. We supplemented writers' perspectives with opinions from 30 avid readers about the written work co-produced with AI collected through an online survey. Our findings illuminate conceptions of authenticity in human-AI co-creation, which focus more on the process and experience of constructing creators' authentic selves. While writers reacted positively to personalized AI writing tools, they believed the form of personalization needs to target writers' growth and go beyond the phase of text production. Overall, readers' responses showed less concern about human-AI co-writing. Readers could not distinguish AI-assisted work, personalized or not, from writers' solo-written work and showed positive attitudes toward writers experimenting with new technology for creative writing.

Updated: 2024-11-20 04:42:32

标题: “‘80%是我，20%是AI’：在与大型语言模型共同创作中寻求真实性”

摘要: 鉴于人工智能写作辅助工具的不断增多和多样化，特别是那些由大型语言模型（LLMs）驱动的工具，作家和读者可能会关注这些工具对写作作品真实性的影响。我们研究了作家在与人工智能工具合作写作时是否以及如何希望保留其真实的声音，以及个性化人工智能写作支持是否有助于实现这一目标。我们与19名专业作家进行了半结构化访谈，在访谈过程中，他们分别与个性化和非个性化的人工智能写作支持工具合作写作。通过在线调查收集了30位热心读者对与人工智能合作创作的作品的意见，进一步补充了作家的观点。我们的研究结果揭示了在人类与人工智能共同创作中真实性的概念，重点更多地放在构建创作者真实自我的过程和体验上。虽然作家对个性化的人工智能写作工具反应积极，但他们认为个性化的形式需要针对作家的成长，并超越文本制作阶段。总体而言，读者对人类与人工智能共同创作表现出较少的担忧。读者无法区分由人工智能辅助的作品，无论是否个性化，与作家单独创作的作品，并对作家尝试新技术进行创意写作表现出积极的态度。

更新时间: 2024-11-20 04:42:32

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2411.13032v1

Probably Approximately Precision and Recall Learning

Precision and Recall are foundational metrics in machine learning where both accurate predictions and comprehensive coverage are essential, such as in recommender systems and multi-label learning. In these tasks, balancing precision (the proportion of relevant items among those predicted) and recall (the proportion of relevant items successfully predicted) is crucial. A key challenge is that one-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. For instance, in recommender systems like YouTube, training data only consists of videos that a user has actively selected, while unselected items remain unseen. Despite this lack of negative feedback in training, avoiding undesirable recommendations at test time is essential. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions, such as between users and items. This framework subsumes the classical binary and multi-class PAC learning models as well as multi-label learning with partial feedback, where only a single random correct label per example is observed, rather than all correct labels. Our work uncovers a rich statistical and algorithmic landscape, with nuanced boundaries on what can and cannot be learned. Notably, classical methods like Empirical Risk Minimization fail in this setting, even for simple hypothesis classes with only two hypotheses. To address these challenges, we develop novel algorithms that learn exclusively from positive data, effectively minimizing both precision and recall losses. Specifically, in the realizable setting, we design algorithms that achieve optimal sample complexity guarantees. In the agnostic case, we show that it is impossible to achieve additive error guarantees--as is standard in PAC learning--and instead obtain meaningful multiplicative approximations.

Updated: 2024-11-20 04:21:07

标题: 可能大约的精度和召回率学习

摘要: 精度和召回率是机器学习中的基础指标，准确的预测和全面的覆盖是必不可少的，例如在推荐系统和多标签学习中。在这些任务中，平衡精度（预测中相关项目的比例）和召回率（成功预测中相关项目的比例）至关重要。一个关键挑战是在许多实际问题中固有的单向反馈--在训练过程中只观察到正例。例如，在像YouTube这样的推荐系统中，训练数据只包含用户主动选择的视频，而未选择的项目则未被看到。尽管在训练中缺乏负面反馈，但在测试时避免不良推荐至关重要。我们引入了一个PAC学习框架，其中每个假设由一个图表示，边表示正向交互，例如用户和项目之间的交互。该框架包括经典的二进制和多类PAC学习模型，以及带有部分反馈的多标签学习，其中每个示例只观察到一个随机正确的标签，而不是所有正确的标签。我们的工作揭示了一个丰富的统计和算法景观，对可以学习和不可学习的细微边界进行了深入探讨。值得注意的是，即使对于只有两个假设的简单假设类别，像经验风险最小化这样的经典方法在这种情况下也会失败。为了解决这些挑战，我们开发了从正向数据中学习的新算法，有效地最小化精度和召回率损失。具体来说，在可实现的情况下，我们设计了可以实现最优样本复杂性保证的算法。在不可知情况下，我们表明无法实现增加误差保证--如PAC学习中的标准，而是获得有意义的乘法近似。

更新时间: 2024-11-20 04:21:07

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.13029v1

A Theory for Compressibility of Graph Transformers for Transductive Learning

Transductive tasks on graphs differ fundamentally from typical supervised machine learning tasks, as the independent and identically distributed (i.i.d.) assumption does not hold among samples. Instead, all train/test/validation samples are present during training, making them more akin to a semi-supervised task. These differences make the analysis of the models substantially different from other models. Recently, Graph Transformers have significantly improved results on these datasets by overcoming long-range dependency problems. However, the quadratic complexity of full Transformers has driven the community to explore more efficient variants, such as those with sparser attention patterns. While the attention matrix has been extensively discussed, the hidden dimension or width of the network has received less attention. In this work, we establish some theoretical bounds on how and under what conditions the hidden dimension of these networks can be compressed. Our results apply to both sparse and dense variants of Graph Transformers.

Updated: 2024-11-20 04:20:17

标题: 一个关于图变换器压缩性的理论，用于传导学习

摘要: 图上的转导性任务与典型的监督学习任务有根本的不同，因为样本之间不满足独立同分布（i.i.d.）假设。相反，在训练期间所有的训练/测试/验证样本都是存在的，使它们更类似于半监督任务。这些差异使得对模型的分析与其他模型大不相同。最近，图变换器通过克服长距离依赖问题在这些数据集上取得了显著的改进结果。然而，全变换器的二次复杂度驱使社区探索更高效的变体，比如具有更稀疏注意力模式的变体。虽然注意力矩阵已经广泛讨论，但网络的隐藏维度或宽度却受到较少关注。在这项工作中，我们建立了关于这些网络的隐藏维度如何以及在什么条件下可以被压缩的一些理论界限。我们的结果适用于图变换器的稀疏和密集变体。

更新时间: 2024-11-20 04:20:17

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.13028v1

The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz

This research introduces a novel evaluation framework designed to assess large language models' (LLMs) ability to acknowledge uncertainty on 675 fundamentally unsolvable problems. Using a curated dataset of graduate-level grand challenge questions with intentionally unknowable answers, we evaluated twelve state-of-the-art LLMs, including both open and closed-source models, on their propensity to admit ignorance rather than generate plausible but incorrect responses. The best models scored in 62-68% accuracy ranges for admitting the problem solution was unknown in fields ranging from biology to philosophy and mathematics. We observed an inverse relationship between problem difficulty and model accuracy, with GPT-4 demonstrating higher rates of uncertainty acknowledgment on more challenging problems (35.8%) compared to simpler ones (20.0%). This pattern indicates that models may be more prone to generate speculative answers when problems appear more tractable. The study also revealed significant variations across problem categories, with models showing difficulty in acknowledging uncertainty in invention and NP-hard problems while performing relatively better on philosophical and psychological challenges. These results contribute to the growing body of research on artificial general intelligence (AGI) assessment by highlighting the importance of uncertainty recognition as a critical component of future machine intelligence evaluation. This impossibility test thus extends previous theoretical frameworks for universal intelligence testing by providing empirical evidence of current limitations in LLMs' ability to recognize their own knowledge boundaries, suggesting new directions for improving model training architectures and evaluation approaches.

Updated: 2024-11-20 04:12:29

标题: 不可能的测试：一个2024年无法解决的数据集和一个通往AGI测验的机会

摘要: 这项研究介绍了一个新颖的评估框架，旨在评估大型语言模型（LLMs）在675个基本无法解决的问题上承认不确定性的能力。利用一个精心筛选的研究生级大挑战问题数据集，其中包含有意无法解答的问题，我们评估了十二种最先进的LLMs，包括开源和闭源模型，评估它们在承认无知而不是生成似是而非的答案方面的倾向。最佳模型在生物学、哲学和数学等领域承认问题解决方案未知的准确率范围为62-68%。我们观察到问题难度与模型准确性之间呈反比关系，GPT-4在更具挑战性的问题上表现出更高的不确定性承认率（35.8%）相比于更简单的问题（20.0%）。这种模式表明，当问题看起来更易解决时，模型可能更容易产生推测性答案。该研究还揭示了问题类别之间的显著差异，模型在发明和NP-hard问题上难以承认不确定性，但在哲学和心理挑战上表现相对较好。这些结果为人工通用智能（AGI）评估领域不断增长的研究作出了贡献，突显了承认不确定性作为未来机器智能评估的关键组成部分的重要性。这种不可能性测试通过提供当前LLMs识别自身知识边界能力的经验证据，扩展了先前的通用智能测试理论框架，为改进模型训练架构和评估方法提供了新方向。

更新时间: 2024-11-20 04:12:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.14486v1

Enhancing Transportation Cyber-Physical Systems Security: A Shift to Post-Quantum Cryptography

The rise of quantum computing threatens traditional cryptographic algorithms that secure Transportation Cyber-Physical Systems (TCPS). Shor's algorithm poses a significant threat to RSA and ECC, while Grover's algorithm reduces the security of symmetric encryption schemes, such as AES. The objective of this paper is to underscore the urgency of transitioning to post-quantum cryptography (PQC) to mitigate these risks in TCPS by analyzing the vulnerabilities of traditional cryptographic schemes and the applicability of standardized PQC schemes in TCPS. We analyzed vulnerabilities in traditional cryptography against quantum attacks and reviewed the applicability of NIST-standardized PQC schemes, including CRYSTALS-Kyber, CRYSTALS-Dilithium, and SPHINCS+, in TCPS. We conducted a case study to analyze the vulnerabilities of a TCPS application from the Architecture Reference for Cooperative and Intelligent Transportation (ARC-IT) service package, i.e., Electronic Toll Collection, leveraging the Microsoft Threat Modeling tool. This case study highlights the cryptographic vulnerabilities of a TCPS application and presents how PQC can effectively counter these threats. Additionally, we evaluated CRYSTALS-Kyber's performance across wired and wireless TCPS data communication scenarios. While CRYSTALS-Kyber proves effective in securing TCPS applications over high-bandwidth, low-latency Ethernet networks, our analysis highlights challenges in meeting the stringent latency requirements of safety-critical wireless applications within TCPS. Future research should focus on developing lightweight PQC solutions and hybrid schemes that integrate traditional and PQC algorithms, to enhance compatibility, scalability, and real-time performance, ensuring robust protection against emerging quantum threats in TCPS.

Updated: 2024-11-20 04:11:33

标题: 加强交通网络物理系统安全性：向后量子密码学的转变

摘要: 量子计算的崛起威胁了传统的加密算法，这些算法用来保护交通网络物理系统（TCPS）。肖尔算法对RSA和ECC构成重大威胁，而格罗弗算法降低了对称加密方案（如AES）的安全性。本文的目标是强调迅速过渡到后量子密码学（PQC）以减轻TCPS中这些风险，通过分析传统加密方案的漏洞以及标准化PQC方案在TCPS中的适用性。我们分析了传统密码学在量子攻击中的漏洞，并审查了NIST标准化的PQC方案，包括CRYSTALS-Kyber、CRYSTALS-Dilithium和SPHINCS+在TCPS中的适用性。我们进行了一个案例研究，分析了来自合作和智能交通架构参考（ARC-IT）服务包的TCPS应用的漏洞，即电子收费系统，并利用Microsoft威胁建模工具。这个案例研究突出了TCPS应用的加密漏洞，并展示了PQC如何有效地对抗这些威胁。此外，我们评估了CRYSTALS-Kyber在有线和无线TCPS数据通信场景中的性能。虽然CRYSTALS-Kyber在保护高带宽、低延迟以太网网络上的TCPS应用方面表现有效，但我们的分析突出了在TCPS中满足安全关键无线应用的严格延迟要求方面的挑战。未来的研究应该集中于开发轻量级的PQC解决方案和混合方案，以整合传统和PQC算法，以增强兼容性、可扩展性和实时性能，确保在TCPS中有效应对新兴的量子威胁。

更新时间: 2024-11-20 04:11:33

领域: cs.CR

下载: http://arxiv.org/abs/2411.13023v1

Corn Yield Prediction Model with Deep Neural Networks for Smallholder Farmer Decision Support System

Crop yield prediction has been modeled on the assumption that there is no interaction between weather and soil variables. However, this paper argues that an interaction exists, and it can be finely modelled using the Kendall Correlation coefficient. Given the nonlinearity of the interaction between weather and soil variables, a deep neural network regressor (DNNR) is carefully designed with consideration to the depth, number of neurons of the hidden layers, and the hyperparameters with their optimizations. Additionally, a new metric, the average of absolute root squared error (ARSE) is proposed to combine the strengths of root mean square error (RMSE) and mean absolute error (MAE). With the ARSE metric, the proposed DNNR(s), optimised random forest regressor (RFR) and the extreme gradient boosting regressor (XGBR) achieved impressively small yield errors, 0.0172 t/ha, and 0.0243 t/ha, 0.0001 t/ha, and 0.001 t/ha, respectively. However, the DNNR(s), with changes to the explanatory variables to ensure generalizability to unforeseen data, DNNR(s) performed best. Further analysis reveals that a strong interaction does exist between weather and soil variables. Precisely, yield is observed to increase when precipitation is reduced and silt increased, and vice-versa. However, the degree of decrease or increase is not quantified in this paper. Contrary to existing yield models targeted towards agricultural policies and global food security, the goal of the proposed corn yield model is to empower the smallholder farmer to farm smartly and intelligently, thus the prediction model is integrated into a mobile application that includes education, and a farmer-to-market access module.

Updated: 2024-11-20 03:55:04

标题: 使用深度神经网络的小农户决策支持系统玉米产量预测模型

摘要: 作物产量预测一直建立在天气和土壤变量之间没有相互作用的假设上。然而，本文认为存在相互作用，并且可以使用肯德尔相关系数进行精细建模。鉴于天气和土壤变量之间相互作用的非线性特性，精心设计了深度神经网络回归器（DNNR），考虑了隐藏层的深度、神经元数量以及超参数及其优化。此外，提出了一个新的指标，即平均绝对平方根误差（ARSE），以结合均方根误差（RMSE）和平均绝对误差（MAE）的优势。通过ARSE指标，提出的DNNR(s)、优化的随机森林回归器（RFR）和极限梯度提升回归器（XGBR）分别取得了令人印象深刻的小产量误差，分别为0.0172 t/ha、0.0243 t/ha、0.0001 t/ha和0.001 t/ha。然而，通过对解释变量进行更改以确保对未知数据的泛化能力，DNNR(s)表现最佳。进一步分析表明，天气和土壤变量之间存在强烈的相互作用。准确地说，当降水减少而淤泥增加时，产量会增加，反之亦然。然而，本文未对减少或增加的程度进行量化。与现有的针对农业政策和全球粮食安全的产量模型相反，提出的玉米产量模型的目标是赋予小农户智能种植，因此预测模型集成到一个包括教育和农民-市场接入模块的移动应用程序中。

更新时间: 2024-11-20 03:55:04

领域: cs.LG,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2401.03768v3

Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRI

Physics-driven deep learning (PD-DL) approaches have become popular for improved reconstruction of fast magnetic resonance imaging (MRI) scans. Even though PD-DL offers higher acceleration rates compared to existing clinical fast MRI techniques, their use has been limited outside specialized MRI centers. One impediment for their deployment is the difficulties with generalization to pathologies or population groups that are not well-represented in training sets. This has been noted in several studies, and fine-tuning on target populations to improve reconstruction has been suggested. However, current approaches for PD-DL training require access to raw k-space measurements, which is typically only available at specialized MRI centers that have research agreements for such data access. This is especially an issue for rural and underserved areas, where commercial MRI scanners only provide access to a final reconstructed image. To tackle these challenges, we propose Compressibility-inspired Unsupervised Learning via Parallel Imaging Fidelity (CUPID) for high-quality PD-DL training, using only routine clinical reconstructed images exported from an MRI scanner. CUPID evaluates the goodness of the output with a compressibility-based approach, while ensuring that the output stays consistent with the clinical parallel imaging reconstruction through well-designed perturbations. Our results show that CUPID achieves similar quality compared to well-established PD-DL training strategies that require raw k-space data access, while outperforming conventional compressed sensing (CS) and state-of-the-art generative methods. We also demonstrate its effectiveness in a zero-shot training setup for retrospectively and prospectively sub-sampled acquisitions, attesting to its minimal training burden.

Updated: 2024-11-20 03:53:41

标题: 使用物理驱动的深度学习重建技术进行MRI快速成像，无需原始数据访问

摘要: 物理驱动的深度学习（PD-DL）方法已经成为改善快速磁共振成像（MRI）扫描重建的流行选择。尽管与现有临床快速MRI技术相比，PD-DL提供了更高的加速率，但它们在专业MRI中心以外的使用受到限制。部署它们的一个障碍是通向未在训练集中得到很好代表的病理或人群的泛化困难。几项研究已经指出了这一点，并建议在目标人群上进行微调以改善重建。然而，目前的PD-DL训练方法需要获取原始k空间测量数据，这通常只在具有此类数据访问研究协议的专业MRI中心才可用。这对于农村和服务不足地区尤其是一个问题，商用MRI扫描仪只提供对最终重建图像的访问。为了解决这些挑战，我们提出了基于压缩性启发的无监督学习通过平行成像保真度（CUPID）用于高质量PD-DL训练，仅使用从MRI扫描仪导出的常规临床重建图像。CUPID通过基于压缩性的方法评估输出的好坏，同时确保输出与临床平行成像重建保持一致，通过精心设计的扰动。我们的研究结果表明，CUPID与需要原始k空间数据访问的成熟PD-DL训练策略相比实现了类似的质量，同时优于传统压缩感知（CS）和最先进的生成方法。我们还展示了其在零样本训练设置中对回溯和前瞻性子采样采集的有效性，证明了其训练负担的最小化。

更新时间: 2024-11-20 03:53:41

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.13022v1

Scalable Deep Metric Learning on Attributed Graphs

We consider the problem of constructing embeddings of large attributed graphs and supporting multiple downstream learning tasks. We develop a graph embedding method, which is based on extending deep metric and unbiased contrastive learning techniques to 1) work with attributed graphs, 2) enabling a mini-batch based approach, and 3) achieving scalability. Based on a multi-class tuplet loss function, we present two algorithms -- DMT for semi-supervised learning and DMAT-i for the unsupervised case. Analyzing our methods, we provide a generalization bound for the downstream node classification task and for the first time relate tuplet loss to contrastive learning. Through extensive experiments, we show high scalability of representation construction, and in applying the method for three downstream tasks (node clustering, node classification, and link prediction) better consistency over any single existing method.

Updated: 2024-11-20 03:34:31

标题: 基于属性图的可扩展深度度量学习

摘要: 我们考虑构建大型属性图的嵌入和支持多个下游学习任务的问题。我们开发了一种图嵌入方法，基于扩展深度度量和无偏对比学习技术，使其能够与属性图一起工作，实现基于mini-batch的方法，并实现可扩展性。基于多类元组损失函数，我们提出了两种算法--用于半监督学习的DMT和用于无监督情况的DMAT-i。通过分析我们的方法，我们为下游节点分类任务提供了一个泛化界限，并首次将元组损失与对比学习联系起来。通过大量实验，我们展示了表示构建的高可扩展性，并在将该方法应用于三个下游任务（节点聚类、节点分类和链接预测）时，相对于任何单一现有方法具有更好的一致性。

更新时间: 2024-11-20 03:34:31

领域: cs.LG

下载: http://arxiv.org/abs/2411.13014v1

CLIP Unreasonable Potential in Single-Shot Face Recognition

Face recognition is a core task in computer vision designed to identify and authenticate individuals by analyzing facial patterns and features. This field intersects with artificial intelligence image processing and machine learning with applications in security authentication and personalization. Traditional approaches in facial recognition focus on capturing facial features like the eyes, nose and mouth and matching these against a database to verify identities. However challenges such as high false positive rates have persisted often due to the similarity among individuals facial features. Recently Contrastive Language Image Pretraining (CLIP) a model developed by OpenAI has shown promising advancements by linking natural language processing with vision tasks allowing it to generalize across modalities. Using CLIP's vision language correspondence and single-shot finetuning the model can achieve lower false positive rates upon deployment without the need of mass facial features extraction. This integration demonstrating CLIP's potential to address persistent issues in face recognition model performance without complicating our training paradigm.

Updated: 2024-11-20 03:31:17

标题: CLIP在单次拍摄人脸识别中的潜力不合理

摘要: 面部识别是计算机视觉中的核心任务，旨在通过分析面部模式和特征来识别和验证个体。这一领域与人工智能图像处理和机器学习相交叉，应用于安全认证和个性化。传统的面部识别方法侧重于捕捉眼睛、鼻子和嘴巴等面部特征，并将其与数据库匹配以验证身份。然而，由于个体面部特征的相似性，高误报率等挑战经常存在。最近，由OpenAI开发的Contrastive Language Image Pretraining（CLIP）模型通过将自然语言处理与视觉任务联系起来展示了令人期待的进展，使其能够在多模态之间进行泛化。利用CLIP的视觉语言对应性和单次微调，该模型在部署时可以实现更低的误报率，而无需大规模提取面部特征。这种整合展示了CLIP在解决面部识别模型性能中持久问题的潜力，而不会使我们的训练范式复杂化。

更新时间: 2024-11-20 03:31:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.12319v2

Deriving Activation Functions via Integration

Activation functions play a crucial role in introducing non-linearities to deep neural networks. We propose a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding functions through integration. Our work introduces the Expanded Integral of the Exponential Linear Unit (xIELU), a trainable piecewise activation function derived by integrating trainable affine transformations applied on the ELU activation function. xIELU combines two key gradient properties: a trainable and linearly increasing gradient for positive inputs, similar to ReLU$^2$, and a trainable negative gradient flow for negative inputs, akin to xSiLU. Conceptually, xIELU can be viewed as extending ReLU$^2$ to effectively handle negative inputs. In experiments with 1.1B parameter Llama models trained on 126B tokens of FineWeb Edu, xIELU achieves lower perplexity compared to both ReLU$^2$ and SwiGLU when matched for the same compute cost and parameter count.

Updated: 2024-11-20 03:24:21

标题: 通过积分推导激活函数

摘要: 激活函数在引入非线性到深度神经网络中起着至关重要的作用。我们提出了一种新颖的设计激活函数的方法，重点关注它们的梯度，并通过积分推导相应的函数。我们的工作介绍了扩展指数线性单元（xIELU）的积分，这是一个可训练的分段激活函数，通过对ELU激活函数应用可训练的仿射变换得到。xIELU结合了两个关键的梯度特性：对于正输入，具有可训练且线性增加的梯度，类似于ReLU^2，对于负输入，具有类似于xSiLU的可训练负梯度流。从概念上讲，xIELU可以被视为扩展ReLU^2以有效处理负输入。在使用1.1B参数的Llama模型在FineWeb Edu的126B令牌上训练的实验中，与ReLU^2和SwiGLU在相同计算成本和参数数量匹配时，xIELU表现出更低的困惑度。

更新时间: 2024-11-20 03:24:21

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2411.13010v1

SuPLE: Robot Learning with Lyapunov Rewards

The reward function is an essential component in robot learning. Reward directly affects the sample and computational complexity of learning, and the quality of a solution. The design of informative rewards requires domain knowledge, which is not always available. We use the properties of the dynamics to produce system-appropriate reward without adding external assumptions. Specifically, we explore an approach to utilize the Lyapunov exponents of the system dynamics to generate a system-immanent reward. We demonstrate that the `Sum of the Positive Lyapunov Exponents' (SuPLE) is a strong candidate for the design of such a reward. We develop a computational framework for the derivation of this reward, and demonstrate its effectiveness on classical benchmarks for sample-based stabilization of various dynamical systems. It eliminates the need to start the training trajectories at arbitrary states, also known as auxiliary exploration. While the latter is a common practice in simulated robot learning, it is unpractical to consider to use it in real robotic systems, since they typically start from natural rest states such as a pendulum at the bottom, a robot on the ground, etc. and can not be easily initialized at arbitrary states. Comparing the performance of SuPLE to commonly-used reward functions, we observe that the latter fail to find a solution without auxiliary exploration, even for the task of swinging up the double pendulum and keeping it stable at the upright position, a prototypical scenario for multi-linked robots. SuPLE-induced rewards for robot learning offer a novel route for effective robot learning in typical as opposed to highly specialized or fine-tuned scenarios. Our code is publicly available for reproducibility and further research.

Updated: 2024-11-20 03:20:50

标题: SuPLE：具有Lyapunov奖励的机器人学习

摘要: 奖励函数是机器人学习中的重要组成部分。奖励直接影响学习的样本和计算复杂性，以及解决方案的质量。设计信息丰富的奖励需要领域知识，这并非总是可得。我们利用动力学的性质来生成系统适用的奖励，而无需添加外部假设。具体来说，我们探索了一种利用系统动力学的Lyapunov指数生成系统内在奖励的方法。我们展示了“正Lyapunov指数之和”（SuPLE）是设计这种奖励的一个强有力候选。我们开发了一个计算框架用于推导这个奖励，并在各种动力系统的基准样本稳定化任务上展示了其有效性。它消除了需要从任意状态开始训练轨迹的需求，也就是所谓的辅助探索。虽然后者是模拟机器人学习中的常见做法，但在实际机器人系统中使用它是不切实际的，因为它们通常是从自然休息状态开始，如底部的摆，地面上的机器人等，并且不能轻松地在任意状态下初始化。通过将SuPLE的性能与常用的奖励函数进行比较，我们发现后者在没有辅助探索的情况下无法找到解决方案，即使是摆动双摆并将其保持稳定在直立位置的任务，这是多连杆机器人的典型场景。SuPLE诱导的奖励为机器人学习提供了一条新路线，可以在典型情况下进行有效的机器人学习，而不是在高度专业化或调整精细的情况下。我们的代码公开可用以便复现和进一步研究。

更新时间: 2024-11-20 03:20:50

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.13613v1

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

As large language models (LLMs) show impressive performance on complex tasks, they still struggle with longer contextual understanding and high computational costs. To balance efficiency and quality, we introduce LLMSteer, a fine-tuning-free framework that enhances LLMs through query-independent attention steering. Tested on popular LLMs and datasets, LLMSteer narrows the performance gap with baselines by 65.9% and reduces the runtime delay by up to 4.8x compared to recent attention steering methods.

Updated: 2024-11-20 03:17:51

标题: LLMSteer：通过引导注意力集中在重复的上下文中，提高长上下文LLM推理

摘要: 随着大型语言模型（LLMs）在复杂任务上展现出令人印象深刻的性能，它们仍然在长期上下文理解和高计算成本方面存在困难。为了平衡效率和质量，我们引入了LLMSteer，这是一个无需微调的框架，通过查询无关的注意力引导来增强LLMs。在流行的LLMs和数据集上进行测试，LLMSteer将性能差距缩小了65.9％，与最近的注意力引导方法相比，运行时延迟最多减少了4.8倍。

更新时间: 2024-11-20 03:17:51

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2411.13009v1

Evaluating LLMs Capabilities Towards Understanding Social Dynamics

Social media discourse involves people from different backgrounds, beliefs, and motives. Thus, often such discourse can devolve into toxic interactions. Generative Models, such as Llama and ChatGPT, have recently exploded in popularity due to their capabilities in zero-shot question-answering. Because these models are increasingly being used to ask questions of social significance, a crucial research question is whether they can understand social media dynamics. This work provides a critical analysis regarding generative LLM's ability to understand language and dynamics in social contexts, particularly considering cyberbullying and anti-cyberbullying (posts aimed at reducing cyberbullying) interactions. Specifically, we compare and contrast the capabilities of different large language models (LLMs) to understand three key aspects of social dynamics: language, directionality, and the occurrence of bullying/anti-bullying messages. We found that while fine-tuned LLMs exhibit promising results in some social media understanding tasks (understanding directionality), they presented mixed results in others (proper paraphrasing and bullying/anti-bullying detection). We also found that fine-tuning and prompt engineering mechanisms can have positive effects in some tasks. We believe that a understanding of LLM's capabilities is crucial to design future models that can be effectively used in social applications.

Updated: 2024-11-20 03:16:07

标题: 评估LLMs在理解社会动态方面的能力

摘要: 社交媒体话语涉及来自不同背景、信仰和动机的人们。因此，这种话语往往会演变成有毒的互动。生成模型，如Llama和ChatGPT，最近因其在零-shot问答方面的能力而大受欢迎。由于这些模型越来越多地被用来提出具有社会意义的问题，一个关键的研究问题是它们是否能够理解社交媒体动态。本文对生成LLM理解语言和社会背景中的动态的能力进行了批判性分析，特别关注网络欺凌和反网络欺凌（旨在减少网络欺凌的帖子）互动。具体来说，我们比较和对比了不同大型语言模型（LLMs）理解社会动态的三个关键方面的能力：语言、方向性和欺凌/反欺凌信息的发生。我们发现，虽然经过精细调整的LLMs在某些社交媒体理解任务中表现出有望的结果（理解方向性），但在其他任务中（恰当的改写和欺凌/反欺凌检测）表现出混合结果。我们还发现，精细调整和提示工程机制在某些任务中可能产生积极效果。我们认为理解LLM的能力对于设计未来可以有效用于社交应用的模型至关重要。

更新时间: 2024-11-20 03:16:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.13008v1

The revised boomerang connectivity tables and their connection to the Difference Distribution Table

It is well-known that functions over finite fields play a crucial role in designing substitution boxes (S-boxes) in modern block ciphers. In order to analyze the security of an S-box, recently, three new tables have been introduced: the Extended Boomerang Connectivity Table (EBCT), the Lower Boomerang Connectivity Table (LBCT), and the Upper Boomerang Connectivity Table (UBCT). In fact, these tables offer improved methods over the usual Boomerang Connectivity Table (BCT) for analyzing the security of S-boxes against boomerang-style attacks. Here, we put in context these new EBCT, LBCT, and UBCT concepts by connecting them to the DDT for a differentially $\delta$-uniform function and also determine the EBCT, LBCT, and UBCT entries of three classes of differentially $4$-uniform power permutations, namely, Gold, Kasami and Bracken-Leander. We also determine the Double Boomerang Connectivity Table (DBCT) entries of the Gold function. As byproducts of our approach, we obtain some previously published results quite easily.

Updated: 2024-11-20 03:05:13

标题: 修订后的回飞连接表及其与差异分布表的关联

摘要: 众所周知，有限域上的函数在现代分组密码中设计替代盒（S-boxes）起着至关重要的作用。为了分析S-box的安全性，最近引入了三个新的表格：Extended Boomerang Connectivity Table（EBCT）、Lower Boomerang Connectivity Table（LBCT）和Upper Boomerang Connectivity Table（UBCT）。事实上，这些表格提供了比通常的Boomerang Connectivity Table（BCT）更好的方法，用于分析S-box针对回飞式攻击的安全性。在这里，我们将这些新的EBCT、LBCT和UBCT概念放在一个上下文中，通过将它们与用于不同差分$\delta$-均匀函数的差分分布表（DDT）连接起来，并确定了三类差分$4$-均匀幂置换（Gold、Kasami和Bracken-Leander）的EBCT、LBCT和UBCT条目。我们还确定了Gold函数的Double Boomerang Connectivity Table（DBCT）条目。通过我们的方法，我们轻松获得了一些先前发表的结果。

更新时间: 2024-11-20 03:05:13

领域: cs.CR,cs.IT,math.IT,12E20, 11T06, 94A60

下载: http://arxiv.org/abs/2407.12617v2

Automating Sonologists USG Commands with AI and Voice Interface

This research presents an advanced AI-powered ultrasound imaging system that incorporates real-time image processing, organ tracking, and voice commands to enhance the efficiency and accuracy of diagnoses in clinical practice. Traditional ultrasound diagnostics often require significant time and introduce a degree of subjectivity due to user interaction. The goal of this innovative solution is to provide Sonologists with a more predictable and productive imaging procedure utilizing artificial intelligence, computer vision, and voice technology. The functionality of the system employs computer vision and deep learning algorithms, specifically adopting the Mask R-CNN model from Detectron2 for semantic segmentation of organs and key landmarks. This automation improves diagnostic accuracy by enabling the extraction of valuable information with minimal human input. Additionally, it includes a voice recognition feature that allows for hands-free operation, enabling users to control the system with commands such as freeze or liver, all while maintaining their focus on the patient. The architecture comprises video processing and real-time segmentation modules that prepare the system to perform essential imaging functions, such as freezing and zooming in on frames. The liver histopathology module, optimized for detecting fibrosis, achieved an impressive accuracy of 98.6%. Furthermore, the organ segmentation module produces output confidence levels between 50% and 95%, demonstrating its efficacy in organ detection.

Updated: 2024-11-20 03:03:49

标题: 用人工智能和语音界面自动化超声波医生的USG命令

摘要: 这项研究介绍了一种先进的基于人工智能的超声成像系统，结合实时图像处理、器官跟踪和语音命令，以提高临床实践中的诊断效率和准确性。传统的超声诊断通常需要较长时间，并引入了一定程度的主观性，因为用户交互。这一创新解决方案的目标是为声学医生提供更可预测和高效的成像程序，利用人工智能、计算机视觉和语音技术。系统的功能采用计算机视觉和深度学习算法，特别采用Detectron2中的Mask R-CNN模型进行器官和关键地标的语义分割。这种自动化通过最小限度的人类输入实现了诊断准确性的提高，使得能够提取宝贵信息。此外，它还包括语音识别功能，可以实现免持操作，让用户能够通过命令控制系统，如冻结或肝脏，同时保持对患者的关注。架构包括视频处理和实时分割模块，为系统准备执行基本成像功能，如冻结和放大帧。经过优化用于检测纤维化的肝脏组织学模块实现了惊人的98.6%的准确性。此外，器官分割模块产生的输出置信水平在50%到95%之间，显示了在器官检测中的有效性。

更新时间: 2024-11-20 03:03:49

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.13006v1

MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification

We present MERLOT, a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification. By applying model distillation techniques in a teacher-student paradigm, compact models derived from GPT-2-base retain high classification accuracy while minimizing computational costs. These models function as specialized experts in an MoE architecture, dynamically assigned via a gating network. Unlike generation-based methods, our approach directly classifies encrypted traffic using the final decoder token with contextual feature embedding as input. Experiments on 10 datasets show superior or competitive performance over the state-of-the-art models while significantly reducing resource demands, underscoring its effectiveness and robustness.

Updated: 2024-11-20 03:01:41

标题: MERLOT：一种基于精简LLM的混合专家框架，用于可扩展的加密流量分类

摘要: 我们提出了MERLOT，这是一种基于可扩展的专家混合（MoE）的大型语言模型精炼，针对加密流量分类进行了优化。通过在教师-学生范式中应用模型蒸馏技术，从GPT-2-base派生出的紧凑模型保持高分类准确性的同时，最大限度地减少了计算成本。这些模型在MoE架构中作为专门的专家，通过门控网络动态分配。与基于生成的方法不同，我们的方法直接使用上下文特征嵌入作为输入，使用最终解码器令牌对加密流量进行分类。对10个数据集的实验显示，我们的方法在资源需求方面明显降低了，同时表现出优越或具有竞争力的性能，强调了其有效性和稳健性。

更新时间: 2024-11-20 03:01:41

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2411.13004v1

DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring

Coronary artery disease (CAD), one of the most common cause of mortality in the world. Coronary artery calcium (CAC) scoring using computed tomography (CT) is key for risk assessment to prevent coronary disease. Previous studies on risk assessment and calcification detection in CT scans primarily use approaches based on UNET architecture, frequently implemented on pre-built models. However, these models are limited by the availability of annotated CT scans containing CAC and suffering from imbalanced dataset, decreasing performance of CAC segmentation and scoring. In this study, we extend this approach by incorporating the self-supervised learning (SSL) technique of DINO (self-distillation with no labels) to eliminate limitations of scarce annotated data in CT scans. The DINO model's ability to train without requiring CAC area annotations enhances its robustness in generating distinct features. The DINO model is trained on to focus specifically on calcified areas by using labels, aiming to generate features that effectively capture and highlight key characteristics. The label-guided DINO (DINO-LG) enhances classification by distinguishing CT slices that contain calcification from those that do not, performing 57% better than the standard DINO model in this task. CAC scoring and segmentation tasks are performed by a basic U-NET architecture, fed specifically with CT slices containing calcified areas as identified by the DINO-LG model. This targeted identification performed by DINO-LG model improves CAC segmentation performance by approximately 10% and significant increase in CAC scoring accuracy.

Updated: 2024-11-20 02:57:56

标题: DINO-LG：用于冠状动脉钙化评分的特定任务DINO模型

摘要: 冠状动脉疾病（CAD）是世界上最常见的死亡原因之一。利用计算机断层扫描（CT）进行冠状动脉钙化（CAC）评分对于风险评估以预防冠状动脉疾病至关重要。先前关于风险评估和CT扫描中钙化检测的研究主要使用基于UNET架构的方法，经常在预先构建的模型上实施。然而，这些模型受到带有CAC的注释CT扫描数据的可用性限制，并且受到数据集不平衡的影响，降低了CAC分割和评分的性能。在本研究中，我们通过将DINO（无标签的自我蒸馏）的自监督学习（SSL）技术纳入到这种方法中，以消除CT扫描中稀缺注释数据的限制。DINO模型无需CAC区域注释即可训练，增强了其生成独特特征的鲁棒性。DINO模型通过使用标签专注于训练钙化区域，旨在生成有效捕捉和突出关键特征的特征。标签引导的DINO（DINO-LG）通过区分包含钙化的CT切片和不包含的切片，在该任务中的表现比标准DINO模型提高了57％。CAC评分和分割任务由基本的U-NET架构执行，专门提供由DINO-LG模型识别的包含钙化区域的CT切片。DINO-LG模型执行的定向识别改善了CAC分割性能约10％，并显著提高了CAC评分准确性。

更新时间: 2024-11-20 02:57:56

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.07976v4

NCAirFL: CSI-Free Over-the-Air Federated Learning Based on Non-Coherent Detection

Over-the-air federated learning (FL), i.e., AirFL, leverages computing primitively over multiple access channels. A long-standing challenge in AirFL is to achieve coherent signal alignment without relying on expensive channel estimation and feedback. This paper proposes NCAirFL, a CSI-free AirFL scheme based on unbiased non-coherent detection at the edge server. By exploiting binary dithering and a long-term memory based error-compensation mechanism, NCAirFL achieves a convergence rate of order $\mathcal{O}(1/\sqrt{T})$ in terms of the average square norm of the gradient for general non-convex and smooth objectives, where $T$ is the number of communication rounds. Experiments demonstrate the competitive performance of NCAirFL compared to vanilla FL with ideal communications and to coherent transmission-based benchmarks.

Updated: 2024-11-20 02:53:04

标题: NCAirFL：基于非相干检测的无CSI联合学习

摘要: 无线联合学习（FL）即AirFL利用多个接入通道进行原始计算。AirFL长期以来的挑战是在不依赖昂贵的信道估计和反馈的情况下实现一致的信号对齐。本文提出了NCAirFL，这是一种基于边缘服务器上无偏非相干检测的无CSI AirFL方案。通过利用二进制抖动和基于长期记忆的误差补偿机制，NCAirFL在一般非凸光滑目标的平均梯度平方范数方面实现了收敛速率$\mathcal{O}(1/\sqrt{T})$，其中$T$是通信轮数。实验证明了NCAirFL相对于具有理想通信和基于相干传输的基准的普通FL的竞争性能。

更新时间: 2024-11-20 02:53:04

领域: cs.IT,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2411.13000v1

Topological Symmetry Enhanced Graph Convolution for Skeleton-Based Action Recognition

Skeleton-based action recognition has achieved remarkable performance with the development of graph convolutional networks (GCNs). However, most of these methods tend to construct complex topology learning mechanisms while neglecting the inherent symmetry of the human body. Additionally, the use of temporal convolutions with certain fixed receptive fields limits their capacity to effectively capture dependencies in time sequences. To address the issues, we (1) propose a novel Topological Symmetry Enhanced Graph Convolution (TSE-GC) to enable distinct topology learning across different channel partitions while incorporating topological symmetry awareness and (2) construct a Multi-Branch Deformable Temporal Convolution (MBDTC) for skeleton-based action recognition. The proposed TSE-GC emphasizes the inherent symmetry of the human body while enabling efficient learning of dynamic topologies. Meanwhile, the design of MBDTC introduces the concept of deformable modeling, leading to more flexible receptive fields and stronger modeling capacity of temporal dependencies. Combining TSE-GC with MBDTC, our final model, TSE-GCN, achieves competitive performance with fewer parameters compared with state-of-the-art methods on three large datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. On the cross-subject and cross-set evaluations of NTU RGB+D 120, the accuracies of our model reach 90.0\% and 91.1\%, with 1.1M parameters and 1.38 GFLOPS for one stream.

Updated: 2024-11-20 02:51:08

标题: 基于骨架的动作识别的拓扑对称增强图卷积

摘要: 基于骨架的动作识别在图卷积网络（GCNs）的发展中取得了显著的表现。然而，大多数这些方法往往构建复杂的拓扑学习机制，同时忽视人体固有的对称性。此外，使用具有固定感受野的时间卷积限制了它们有效捕捉时间序列中的依赖关系的能力。为了解决这些问题，我们提出了一种新颖的拓扑对称增强图卷积（TSE-GC）方法，可以在不同通道分区之间实现不同的拓扑学习，同时融入拓扑对称意识，并构建了一个用于基于骨架的动作识别的多分支可变形时间卷积（MBDTC）。提出的TSE-GC强调了人体固有的对称性，同时实现了动态拓扑的有效学习。与此同时，MBDTC的设计引入了可变形建模的概念，导致更灵活的感受野和更强的时间依赖建模能力。将TSE-GC与MBDTC结合，我们的最终模型TSE-GCN与三个大型数据集NTU RGB+D、NTU RGB+D 120和NW-UCLA上最先进方法相比，以更少的参数实现了竞争性能。在NTU RGB+D 120的跨主体和交叉集合评估中，我们的模型的准确率达到了90.0\%和91.1\%，一个流需要1.1M参数和1.38 GFLOPS。

更新时间: 2024-11-20 02:51:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.12560v2

Mediating Modes of Thought: LLM's for design scripting

Here is an updated version of your abstract, cleaned for submission to arXiv with potential "bad characters" corrected to conform to ASCII standards: Architects adopt visual scripting and parametric design tools to explore more expansive design spaces (Coates, 2010), refine their thinking about the geometric logic of their design (Woodbury, 2010), and overcome conventional software limitations (Burry, 2011). Despite two decades of effort to make design scripting more accessible, a disconnect between a designer's free ways of thinking and the rigidity of algorithms remains (Burry, 2011). Recent developments in Large Language Models (LLMs) suggest this might soon change, as LLMs encode a general understanding of human context and exhibit the capacity to produce geometric logic. This project speculates that if LLMs can effectively mediate between user intent and algorithms, they become a powerful tool to make scripting in design more widespread and fun. We explore if such systems can interpret natural language prompts to assemble geometric operations relevant to computational design scripting. In the system, multiple layers of LLM agents are configured with specific context to infer the user intent and construct a sequential logic. Given a user's high-level text prompt, a geometric description is created, distilled into a sequence of logic operations, and mapped to software-specific commands. The completed script is constructed in the user's visual programming interface. The system succeeds in generating complete visual scripts up to a certain complexity but fails beyond this complexity threshold. It shows how LLMs can make design scripting much more aligned with human creativity and thought. Future research should explore conversational interactions, expand to multimodal inputs and outputs, and assess the performance of these tools.

Updated: 2024-11-20 02:49:18

标题: 思维的中介模式：LLM用于设计脚本编写

摘要: 这是您摘要的更新版本，经过清理以符合提交至arXiv的ASCII标准，潜在的“坏字符”已被更正：建筑师采用视觉脚本和参数化设计工具来探索更广阔的设计空间（Coates, 2010），优化他们对设计的几何逻辑的思考（Woodbury, 2010），并克服传统软件的限制（Burry, 2011）。尽管经过二十年的努力使设计脚本更易访问，设计师自由思考方式与算法的严格性之间仍存在脱节（Burry, 2011）。最近，大型语言模型（LLMs）的发展表明这种情况可能很快会改变，因为LLMs对人类上下文有一般性理解，并具备产生几何逻辑的能力。本项目推测，如果LLMs能有效地在用户意图和算法之间进行调解，它们将成为使设计脚本更为普遍和有趣的强大工具。我们探讨这样的系统是否能够解释自然语言提示以组装与计算设计脚本相关的几何操作。在该系统中，多层LLM代理被配置具有特定上下文，以推断用户意图并构建顺序逻辑。在给定用户的高层文本提示的情况下，一个几何描述被创建，被蒸馏成一系列逻辑操作，并映射到软件特定命令。完成的脚本被构建在用户的视觉编程界面中。该系统成功地生成了完整的视觉脚本，直到某一复杂性阈值，但在此复杂性阈值之上失败了。这展示了LLMs如何使设计脚本更加符合人类创造力和思维。未来的研究应该探索对话交互，扩展到多模态输入和输出，并评估这些工具的性能。

更新时间: 2024-11-20 02:49:18

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2411.14485v1

Smart Pressure e-Mat for Human Sleeping Posture and Dynamic Activity Recognition

With the emphasis on healthcare, early childhood education, and fitness, non-invasive measurement and recognition methods have received more attention. Pressure sensing has been extensively studied because of its advantages of simple structure, easy access, visualization application, and harmlessness. This paper introduces a Smart Pressure e-Mat (SPeM) system based on piezoresistive material, Velostat, for human monitoring applications, including recognition of sleeping postures, sports, and yoga. After a subsystem scans the e-mat readings and processes the signal, it generates a pressure image stream. Deep neural networks (DNNs) are used to fit and train the pressure image stream and recognize the corresponding human behavior. Four sleeping postures and 13 dynamic activities inspired by Nintendo Switch Ring Fit Adventure (RFA) are used as a preliminary validation of the proposed SPeM system. The SPeM system achieves high accuracies in both applications, demonstrating the high accuracy and generalizability of the models. Compared with other pressure sensor-based systems, SPeM possesses more flexible applications and commercial application prospects, with reliable, robust, and repeatable properties.

Updated: 2024-11-20 02:47:25

标题: 智能压力电子垫用于人类睡姿和动态活动识别

摘要: 随着对医疗保健、幼儿教育和健身的重视，非侵入性测量和识别方法受到了更多关注。由于其简单结构、易获取、可视化应用和无害性的优势，压力传感已得到广泛研究。本文介绍了一种基于压阻材料Velostat的智能压力e-Mat（SPeM）系统，用于人体监测应用，包括识别睡姿、运动和瑜伽。在子系统扫描e-mat读数并处理信号后，它生成一个压力图像流。深度神经网络（DNNs）用于拟合和训练压力图像流，并识别相应的人类行为。四种睡姿和受任天堂Switch Ring Fit Adventure（RFA）启发的13种动态活动被用作拟议SPeM系统的初步验证。SPeM系统在两个应用中均取得了高精度，展示了模型的高准确性和泛化能力。与其他基于压力传感器的系统相比，SPeM具有更灵活的应用和商业应用前景，具有可靠、稳健和可重复的特性。

更新时间: 2024-11-20 02:47:25

领域: cs.CV,cs.HC,cs.LG,eess.SP

下载: http://arxiv.org/abs/2305.11367v2

Eliminating Ratio Bias for Gradient-based Simulated Parameter Estimation

This article addresses the challenge of parameter calibration in stochastic models where the likelihood function is not analytically available. We propose a gradient-based simulated parameter estimation framework, leveraging a multi-time scale algorithm that tackles the issue of ratio bias in both maximum likelihood estimation and posterior density estimation problems. Additionally, we introduce a nested simulation optimization structure, providing theoretical analyses including strong convergence, asymptotic normality, convergence rate, and budget allocation strategies for the proposed algorithm. The framework is further extended to neural network training, offering a novel perspective on stochastic approximation in machine learning. Numerical experiments show that our algorithm can improve the estimation accuracy and save computational costs.

Updated: 2024-11-20 02:46:15

标题: 消除基于梯度的模拟参数估计中的比率偏差

摘要: 本文讨论了在随机模型中参数校准的挑战，其中似然函数不可分析。我们提出了一种基于梯度的模拟参数估计框架，利用多时间尺度算法来解决最大似然估计和后验密度估计问题中的比率偏差问题。此外，我们引入了一个嵌套模拟优化结构，提供了理论分析，包括强收敛性，渐近正态性，收敛速率和预算分配策略。该框架进一步扩展到神经网络训练，为机器学习中的随机逼近提供了新的视角。数值实验表明，我们的算法可以提高估计准确性并节省计算成本。

更新时间: 2024-11-20 02:46:15

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2411.12995v1

BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices

AI models are increasingly prevalent in high-stakes environments, necessitating thorough assessment of their capabilities and risks. Benchmarks are popular for measuring these attributes and for comparing model performance, tracking progress, and identifying weaknesses in foundation and non-foundation models. They can inform model selection for downstream tasks and influence policy initiatives. However, not all benchmarks are the same: their quality depends on their design and usability. In this paper, we develop an assessment framework considering 46 best practices across an AI benchmark's lifecycle and evaluate 24 AI benchmarks against it. We find that there exist large quality differences and that commonly used benchmarks suffer from significant issues. We further find that most benchmarks do not report statistical significance of their results nor allow for their results to be easily replicated. To support benchmark developers in aligning with best practices, we provide a checklist for minimum quality assurance based on our assessment. We also develop a living repository of benchmark assessments to support benchmark comparability, accessible at betterbench.stanford.edu.

Updated: 2024-11-20 02:38:24

标题: 更好的基准：评估人工智能基准，揭示问题，并建立最佳实践

摘要: 人工智能模型在高风险环境中越来越普遍，必须对其能力和风险进行彻底评估。基准测试在衡量这些属性、比较模型性能、跟踪进展以及识别基础和非基础模型的弱点方面非常受欢迎。它们可以为下游任务的模型选择提供信息，并影响政策倡议。然而，并非所有基准测试都是相同的：它们的质量取决于其设计和可用性。在本文中，我们开发了一个评估框架，考虑了人工智能基准测试生命周期中的46个最佳实践，并对24个人工智能基准测试进行评估。我们发现存在较大的质量差异，常用的基准测试存在重大问题。我们进一步发现，大多数基准测试没有报告结果的统计显著性，也不允许其结果容易复制。为了支持基准测试开发者遵循最佳实践，我们提供了一个基于我们评估的最低质量保证的核查表。我们还开发了一个基准测试评估的活动库，以支持基准测试的可比性，可访问于betterbench.stanford.edu。

更新时间: 2024-11-20 02:38:24

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.12990v1

QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models

Convolutional neural networks (CNNs) have made significant advances in computer vision tasks, yet their high inference times and latency often limit real-world applicability. While model compression techniques have gained popularity as solutions, they often overlook the critical balance between low latency and uncompromised accuracy. By harnessing quantum-inspired pruning, tensor decomposition, and annealing-based matrix factorization - three quantum-inspired concepts - we introduce QIANets: a novel approach of redesigning the traditional GoogLeNet, DenseNet, and ResNet-18 model architectures to process more parameters and computations whilst maintaining low inference times. Despite experimental limitations, the method was tested and evaluated, demonstrating reductions in inference times, along with effective accuracy preservations.

Updated: 2024-11-20 02:37:27

标题: QIANets：用于降低CNN模型中的延迟和改善推理时间的量子集成自适应网络

摘要: 卷积神经网络（CNNs）在计算机视觉任务中取得了显著进展，然而它们的推理时间和延迟通常限制了实际应用性。虽然模型压缩技术已经成为解决方案，但它们经常忽视低延迟和不受影响的准确性之间的关键平衡。通过利用量子启发式修剪、张量分解和基于退火的矩阵因子分解 - 三种量子启发式概念，我们介绍了QIANets：一种重新设计传统的GoogLeNet、DenseNet和ResNet-18模型架构的新方法，以处理更多的参数和计算，同时保持低推理时间。尽管实验受限，该方法经过测试和评估，显示了推理时间的减少，以及有效的准确性保留。

更新时间: 2024-11-20 02:37:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.10318v2

Msmsfnet: a multi-stream and multi-scale fusion net for edge detection

Edge detection is a long-standing problem in computer vision. Recent deep learning based algorithms achieve state-of-the-art performance in publicly available datasets. Despite their efficiency, their performance, however, relies heavily on the pre-trained weights of the backbone network on the ImageNet dataset. This significantly limits the design space of deep learning based edge detectors. Whenever we want to devise a new model, we have to train this new model on the ImageNet dataset first, and then fine-tune the model using the edge detection datasets. The comparison would be unfair otherwise. However, it is usually not feasible for many researchers to train a model on the ImageNet dataset due to the limited computation resources. Besides, if these methods need to be trained to detect edges in a different kind of data, Synthetic Aperture Radar (SAR) images for instance, the pre-trained weights on the ImageNet dataset are unlikely to improve the edge detection accuracy due to the strong differences in the statistics between optical and SAR images. In the meantime, no dataset for SAR image processing matches the size of the ImageNet dataset. In this work, we study the performance achievable by existing methods in publicly available datasets when they are trained from scratch, and devise a new network architecture, the multi-stream and multi-scale fusion net (msmsfnet), for edge detection. We show in our experiments that by training all models from scratch to ensure the fairness of comparison, our model outperforms state-of-the-art deep learning based edge detectors in three publicly available datasets. The efficiency of our model is further demonstrated by the experiments for edge detection in SAR images, which serves as an important evidence showing the meaningfulness of this work as no useful pre-trained weight is available for edge detection in SAR images.

Updated: 2024-11-20 02:32:23

标题: Msmsfnet：一种用于边缘检测的多流和多尺度融合网络

摘要: 边缘检测是计算机视觉中一个长期存在的问题。最近基于深度学习的算法在公开数据集中实现了最先进的性能。然而，尽管它们的效率很高，但它们的性能很大程度上依赖于在ImageNet数据集上预训练的骨干网络的权重。这严重限制了基于深度学习的边缘检测器的设计空间。每当我们想设计一个新模型时，我们必须首先在ImageNet数据集上训练这个新模型，然后再使用边缘检测数据集微调模型。否则，比较将是不公平的。然而，对许多研究人员来说，在ImageNet数据集上训练模型通常是不可行的，因为计算资源有限。此外，如果这些方法需要在不同类型的数据，例如合成孔径雷达（SAR）图像中进行边缘检测训练，那么由于光学图像和SAR图像之间的统计差异很大，ImageNet数据集上的预训练权重不太可能提高边缘检测准确性。同时，目前没有与ImageNet数据集规模相匹配的用于SAR图像处理的数据集。在这项工作中，我们研究了公开数据集中现有方法在从头开始训练时可以达到的性能，并设计了一种新的网络架构，即多流和多尺度融合网络（msmsfnet），用于边缘检测。我们在实验中展示，通过从头开始训练所有模型以确保比较的公平性，我们的模型在三个公开数据集中表现优于最先进的基于深度学习的边缘检测器。我们的模型的效率进一步通过在SAR图像中进行边缘检测的实验进行了验证，这对于证明这项工作的意义是重要的，因为在SAR图像中没有可用于边缘检测的有用的预训练权重。

更新时间: 2024-11-20 02:32:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.04856v2

Training Bilingual LMs with Data Constraints in the Targeted Language

Large language models are trained on massive scrapes of the web, as required by current scaling laws. Most progress is made for English, given its abundance of high-quality pretraining data. For most other languages, however, such high quality pretraining data is unavailable. In this work, we study how to boost pretrained model performance in a data constrained target language by enlisting data from an auxiliary language for which high quality data is available. We study this by quantifying the performance gap between training with data in a data-rich auxiliary language compared with training in the target language, exploring the benefits of translation systems, studying the limitations of model scaling for data constrained languages, and proposing new methods for upsampling data from the auxiliary language. Our results show that stronger auxiliary datasets result in performance gains without modification to the model or training objective for close languages, and, in particular, that performance gains due to the development of more information-rich English pretraining datasets can extend to targeted language settings with limited data.

Updated: 2024-11-20 02:27:40

标题: 用数据约束在目标语言中训练双语语言模型

摘要: 大型语言模型是在网络的大规模抓取上进行训练的，这是当前扩展规律所要求的。鉴于英语拥有丰富的高质量预训练数据，大部分进展都是在英语方面取得的。然而，对于大多数其他语言来说，这样高质量的预训练数据并不可用。在这项工作中，我们研究了如何通过利用具有高质量数据的辅助语言的数据来增强受限于数据的目标语言的预训练模型性能。我们通过量化使用数据丰富的辅助语言与使用目标语言进行训练之间的性能差距，探索翻译系统的好处，研究数据受限语言的模型扩展的局限性，并提出了从辅助语言中上采样数据的新方法来研究这一问题。我们的结果显示，更强大的辅助数据集可以在不修改模型或训练目标的情况下实现性能增益，尤其是由于开发更丰富信息的英语预训练数据而导致的性能增益可以延伸到具有有限数据的目标语言环境。

更新时间: 2024-11-20 02:27:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.12986v1

TP-UNet: Temporal Prompt Guided UNet for Medical Image Segmentation

The advancement of medical image segmentation techniques has been propelled by the adoption of deep learning techniques, particularly UNet-based approaches, which exploit semantic information to improve the accuracy of segmentations. However, the order of organs in scanned images has been disregarded by current medical image segmentation approaches based on UNet. Furthermore, the inherent network structure of UNet does not provide direct capabilities for integrating temporal information. To efficiently integrate temporal information, we propose TP-UNet that utilizes temporal prompts, encompassing organ-construction relationships, to guide the segmentation UNet model. Specifically, our framework is featured with cross-attention and semantic alignment based on unsupervised contrastive learning to combine temporal prompts and image features effectively. Extensive evaluations on two medical image segmentation datasets demonstrate the state-of-the-art performance of TP-UNet. Our implementation will be open-sourced after acceptance.

Updated: 2024-11-20 02:24:26

标题: TP-UNet：用于医学图像分割的时间提示引导UNet

摘要: 医学图像分割技术的进步得益于深度学习技术的采用，尤其是基于UNet的方法，利用语义信息来提高分割的准确性。然而，基于UNet的当前医学图像分割方法忽视了扫描图像中器官的顺序。此外，UNet的固有网络结构并不直接提供集成时间信息的能力。为了有效整合时间信息，我们提出了TP-UNet，利用包含器官构建关系的时间提示来指导分割UNet模型。具体来说，我们的框架具有基于无监督对比学习的交叉注意力和语义对齐，以有效地结合时间提示和图像特征。对两个医学图像分割数据集的广泛评估表明了TP-UNet的最新性能。我们的实施将在接受后开源。

更新时间: 2024-11-20 02:24:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.11305v2

Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods

Large language model unlearning aims to remove harmful information that LLMs have learnt to prevent their use for malicious purposes. LLMU and RMU have been proposed as two methods for LLM unlearning, achieving impressive results on unlearning benchmarks. We study in detail the efficacy of these methods by evaluating their impact on general model capabilities on the WMDP benchmark as well as a biology benchmark we create. Our experiments show that RMU generally leads to better preservation of model capabilities, for similar or better unlearning. We further test the robustness of these methods and find that doing 5-shot prompting or rephrasing the question in simple ways can lead to an over ten-fold increase in accuracy on unlearning benchmarks. Finally, we show that training on unrelated data can almost completely recover pre-unlearning performance, demonstrating that these methods fail at truly unlearning. The code is available at: https://github.com/JaiDoshi/Knowledge-Erasure.

Updated: 2024-11-20 02:23:11

标题: "忘却真的能够彻底忘却吗？基于黑匣子评价的LLM忘却方法"

摘要: 大型语言模型遗忘的目的是移除LLM已经学习到的有害信息，以防止它们被用于恶意目的。LLMU和RMU被提出作为LLM遗忘的两种方法，在遗忘基准测试中取得了令人印象深刻的结果。我们通过评估它们对WMDP基准测试以及我们创建的生物学基准测试的一般模型能力的影响，详细研究了这些方法的有效性。我们的实验表明，RMU通常能够更好地保留模型能力，实现类似或更好的遗忘。我们进一步测试了这些方法的鲁棒性，并发现进行5次提示或以简单方式重述问题可以使在遗忘基准测试中的准确性增加十多倍。最后，我们展示了在无关数据上的训练几乎可以完全恢复遗忘前的性能，表明这些方法在真正遗忘方面失败了。该代码可在以下网址找到：https://github.com/JaiDoshi/Knowledge-Erasure。

更新时间: 2024-11-20 02:23:11

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.12103v2

Efficient Streaming Voice Steganalysis in Challenging Detection Scenarios

In recent years, there has been an increasing number of information hiding techniques based on network streaming media, focusing on how to covertly and efficiently embed secret information into real-time transmitted network media signals to achieve concealed communication. The misuse of these techniques can lead to significant security risks, such as the spread of malicious code, commands, and viruses. Current steganalysis methods for network voice streams face two major challenges: efficient detection under low embedding rates and short duration conditions. These challenges arise because, with low embedding rates (e.g., as low as 10%) and short transmission durations (e.g., only 0.1 second), detection models struggle to acquire sufficiently rich sample features, making effective steganalysis difficult. To address these challenges, this paper introduces a Dual-View VoIP Steganalysis Framework (DVSF). The framework first randomly obfuscates parts of the native steganographic descriptors in VoIP stream segments, making the steganographic features of hard-to-detect samples more pronounced and easier to learn. It then captures fine-grained local features related to steganography, building on the global features of VoIP. Specially constructed VoIP segment triplets further adjust the feature distances within the model. Ultimately, this method effectively address the detection difficulty in VoIP. Extensive experiments demonstrate that our method significantly improves the accuracy of streaming voice steganalysis in these challenging detection scenarios, surpassing existing state-of-the-art methods and offering superior near-real-time performance.

Updated: 2024-11-20 02:22:58

标题: 在具有挑战性检测场景下的高效流式语音隐写分析

摘要: 近年来，基于网络流媒体的信息隐藏技术越来越多，重点是如何秘密而高效地将秘密信息嵌入实时传输的网络媒体信号，以实现隐蔽通信。对这些技术的滥用可能导致重大安全风险，如恶意代码、命令和病毒的传播。目前用于网络语音流的隐写分析方法面临两个主要挑战：在低嵌入率和短持续时间条件下的高效检测。这些挑战的原因在于，在低嵌入率（例如低至10%）和短传输持续时间（例如仅0.1秒）下，检测模型难以获取足够丰富的样本特征，使得有效的隐写分析变得困难。为了解决这些挑战，本文介绍了一种双视图VoIP隐写分析框架（DVSF）。该框架首先随机混淆VoIP流段中的本地隐写描述符的部分，使难以检测的样本的隐写特征更加突出且更易学习。然后捕获与隐写相关的精细局部特征，建立在VoIP的全局特征之上。特别构造的VoIP段三元组进一步调整模型内的特征距离。最终，这种方法有效地解决了VoIP中的检测困难。大量实验证明，我们的方法在这些具有挑战性的检测场景中显著提高了流媒体语音隐写分析的准确性，超越了现有的最新方法，并提供了更优越的近实时性能。

更新时间: 2024-11-20 02:22:58

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2411.13612v1

LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement

Recent advancements in Visual Language Models (VLMs) have made them crucial for visual question answering (VQA) in autonomous driving, enabling natural human-vehicle interactions. However, existing methods often struggle in dynamic driving environments, as they usually focus on static images or videos and rely on downsampling to manage computational costs. This results in the loss of critical details and the difficulty in effectively integrating spatial and temporal information, undermining fine-grained perception and temporal coherence essential for effective decision-making. To tackle these challenges, we introduce LaVida Drive, a novel and efficient VQA framework for autonomous driving. LaVida Drive seamlessly integrates temporal data while maintaining high-resolution inputs for detailed visual perception. It optimizes spatial processing by retaining high-resolution data for intricate details and using lower-resolution inputs for temporal analysis to focus on motion-related features, thereby boosting computational efficiency. The core of LaVida Drive consists of two modules: the \textit{Query-aware Token Selection} module and the \textit{Spatial-Temporal Token Recovery and Enhancement} module. The former dynamically selects the most relevant visual tokens based on semantic alignment with the input query, reducing the token count from high-resolution spatial input. The latter ensures smooth and coherent interactions between spatial and temporal information, preserving contextual continuity across frames. Extensive experiments on various autonomous driving question-answering benchmarks show that LaVida Drive significantly reduces visual tokens, enhances efficiency, and improves overall performance.

Updated: 2024-11-20 02:14:07

标题: LaVida Drive：具有令牌选择、恢复和增强功能的自动驾驶视觉-文本交互VLM

摘要: 最近在视觉语言模型（VLMs）领域的进展使它们对于自动驾驶中的视觉问题回答（VQA）至关重要，实现了自然的人-车交互。然而，现有方法常常在动态驾驶环境中面临困难，因为它们通常专注于静态图像或视频，并依赖降采样来管理计算成本。这导致了关键细节的丢失和难以有效整合空间和时间信息，削弱了对有效决策至关重要的细粒度感知和时间连贯性。为了解决这些挑战，我们引入了LaVida Drive，这是一个新颖高效的自动驾驶VQA框架。LaVida Drive无缝地整合了时间数据，同时保持高分辨率输入以获得详细的视觉感知。它通过保留高分辨率数据来优化空间处理，用于处理精细细节，并使用低分辨率输入来进行时间分析，以便专注于与运动相关的特征，从而提高计算效率。LaVida Drive的核心包括两个模块：\textit{Query-aware Token Selection}模块和\textit{Spatial-Temporal Token Recovery and Enhancement}模块。前者根据输入查询与语义对齐动态选择最相关的视觉标记，从高分辨率空间输入中减少标记数量。后者确保空间和时间信息之间的平滑和连贯交互，保持帧间的上下文连续性。在各种自动驾驶问题回答基准测试上进行了大量实验，结果显示LaVida Drive显著减少了视觉标记，提高了效率，改善了整体性能。

更新时间: 2024-11-20 02:14:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.12980v1

MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning

Contemporary embodied agents, such as Voyager in Minecraft, have demonstrated promising capabilities in open-ended individual learning. However, when powered with open large language models (LLMs), these agents often struggle with rudimentary tasks, even when fine-tuned on domain-specific knowledge. Inspired by human cultural learning, we present \collabvoyager, a novel framework that enhances Voyager with lifelong collaborative learning through explicit perspective-taking. \collabvoyager introduces three key innovations: (1) theory of mind representations linking percepts, beliefs, desires, and actions; (2) natural language communication between agents; and (3) semantic memory of task and environment knowledge and episodic memory of collaboration episodes. These advancements enable agents to reason about their and others' mental states, empirically addressing two prevalent failure modes: false beliefs and faulty task executions. In mixed-expertise Minecraft experiments, \collabvoyager agents outperform Voyager counterparts, significantly improving task completion rate by $66.6\% (+39.4\%)$ for collecting one block of dirt and $70.8\% (+20.8\%)$ for collecting one wood block. They exhibit emergent behaviors like knowledge transfer from expert to novice agents and collaborative code correction. \collabvoyager agents also demonstrate the ability to adapt to out-of-distribution tasks by using their previous experiences and beliefs obtained through collaboration. In this open-ended social learning paradigm, \collabvoyager paves the way for the democratic development of embodied AI, where agents learn in deployment from both peer and environmental feedback.

Updated: 2024-11-20 02:10:44

标题: MindForge：为终身协作学习赋能具有心智理论的实体代理

摘要: 当代具有身体表现的代理人，如《我的世界》中的航海者，已经展示出在开放式个体学习中具有很好的能力。然而，当使用开放大型语言模型（LLMs）来支持这些代理人时，即使在领域特定知识上进行了微调，这些代理人通常也会在基本任务上遇到困难。受人类文化学习的启发，我们提出了一种新颖的框架\collabvoyager，通过明确的透视角度增强了航海者的终身协作学习。\collabvoyager引入了三个关键创新：（1）心灵理论表征连接知觉、信念、欲望和行动；（2）代理人之间的自然语言交流；以及（3）任务和环境知识的语义记忆和协作事件的情节记忆。这些进步使代理人能够推理出自己和他人的心理状态，从经验上解决了两种常见的失败模式：错误的信念和错误的任务执行。在混合专业技能的《我的世界》实验中，\collabvoyager代理比航海者对应的代理表现更好，收集一块泥土的任务完成率提高了$66.6\%（+39.4\%），收集一块木块的任务完成率提高了$70.8\%（+20.8\%）。它们展示了从专家到新手代理的知识传递和协作代码修正等新兴行为。\collabvoyager代理还表现出适应超出分布任务的能力，通过利用他们通过协作获得的先前经验和信念。在这种开放式社会学习范式中，\collabvoyager为具有身体表现的人工智能的民主发展铺平了道路，代理人在部署中从同行和环境反馈中学习。

更新时间: 2024-11-20 02:10:44

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.12977v1

On Diffusion Models for Multi-Agent Partial Observability: Shared Attractors, Error Bounds, and Composite Flow

Multiagent systems grapple with partial observability (PO), and the decentralized POMDP (Dec-POMDP) model highlights the fundamental nature of this challenge. Whereas recent approaches to addressing PO have appealed to deep learning models, providing a rigorous understanding of how these models and their approximation errors affect agents' handling of PO and their interactions remain a challenge. In addressing this challenge, we investigate reconstructing global states from local action-observation histories in Dec-POMDPs using diffusion models. We first find that diffusion models conditioned on local history represent possible states as stable fixed points. In collectively observable (CO) Dec-POMDPs, individual diffusion models conditioned on agents' local histories share a unique fixed point corresponding to the global state, while in non-CO settings, the shared fixed points yield a distribution of possible states given joint history. We further find that, with deep learning approximation errors, fixed points can deviate from true states and the deviation is negatively correlated to the Jacobian rank. Inspired by this low-rank property, we bound the deviation by constructing a surrogate linear regression model that approximates the local behavior of diffusion models. With this bound, we propose a composite diffusion process iterating over agents with theoretical convergence guarantees to the true state.

Updated: 2024-11-20 02:05:31

标题: 关于多智能体部分可观测性的扩散模型：共享引子、误差界限和复合流

摘要: 多智能体系统面对部分可观察性（PO）的挑战，去中心化POMDP（Dec-POMDP）模型突显了这一挑战的基本性质。最近针对PO的方法已经转向深度学习模型，提供了对这些模型及其近似误差如何影响智能体处理PO和它们的相互作用的严格理解仍然是一个挑战。为了解决这一挑战，我们研究了使用扩散模型从Dec-POMDP中的本地行动-观察历史重构全局状态。我们首先发现，条件于本地历史的扩散模型表示可能状态为稳定的固定点。在集体可观察（CO）Dec-POMDP中，条件于智能体本地历史的个体扩散模型共享与全局状态对应的唯一固定点，而在非CO设置中，共享的固定点会产生给定联合历史的可能状态分布。我们进一步发现，通过深度学习近似误差，固定点可能偏离真实状态，并且这种偏离与雅可比秩呈负相关。受到这种低秩特性的启发，我们通过构建一个近似扩散模型本地行为的代理线性回归模型来限制偏差。借助这种限制，我们提出了一个在智能体之间迭代的复合扩散过程，具有对真实状态的理论收敛保证。

更新时间: 2024-11-20 02:05:31

领域: cs.LG

下载: http://arxiv.org/abs/2410.13953v2

Robust Planning with Compound LLM Architectures: An LLM-Modulo Approach

Previous work has attempted to boost Large Language Model (LLM) performance on planning and scheduling tasks through a variety of prompt engineering techniques. While these methods can work within the distributions tested, they are neither robust nor predictable. This limitation can be addressed through compound LLM architectures where LLMs work in conjunction with other components to ensure reliability. In this paper, we present a technical evaluation of a compound LLM architecture--the LLM-Modulo framework. In this framework, an LLM is paired with a complete set of sound verifiers that validate its output, re-prompting it if it fails. This approach ensures that the system can never output any fallacious output, and therefore that every output generated is guaranteed correct--something previous techniques have not been able to claim. Our results, evaluated across four scheduling domains, demonstrate significant performance gains with the LLM-Modulo framework using various models. Additionally, we explore modifications to the base configuration of the framework and assess their impact on overall system performance.

Updated: 2024-11-20 02:04:09

标题: 稳健规划与复合LLM架构：一种LLM-模除方法

摘要: 先前的研究试图通过各种提示工程技术来提高大型语言模型（LLM）在规划和调度任务中的性能。虽然这些方法在测试的分布范围内可能有效，但它们既不稳健也不可预测。通过复合LLM架构，可以解决这一限制，其中LLM与其他组件配合工作以确保可靠性。在本文中，我们介绍了一个复合LLM架构--LLM-Modulo框架的技术评估。在这个框架中，一个LLM与一组完整的可靠性验证器配对，验证其输出，并在失败时重新提示。这种方法确保系统永远不会输出任何错误的输出，因此每个生成的输出都是保证正确的--这是之前的技术无法宣称的。我们在四个调度领域评估了使用不同模型的LLM-Modulo框架的结果，展示了显著的性能提升。此外，我们探讨了对框架基本配置的修改，并评估它们对整体系统性能的影响。

更新时间: 2024-11-20 02:04:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.14484v1

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Direct preference learning offers a promising and computation-efficient beyond supervised fine-tuning (SFT) for improving code generation in coding large language models (LMs). However, the scarcity of reliable preference data is a bottleneck for the performance of direct preference learning to improve the coding accuracy of code LMs. In this paper, we introduce \underline{\textbf{D}}irect Preference Learning with Only \underline{\textbf{S}}elf-Generated \underline{\textbf{T}}ests and \underline{\textbf{C}}ode (DSTC), a framework that leverages only self-generated code snippets and tests to construct reliable preference pairs such that direct preference learning can improve LM coding accuracy without external annotations. DSTC combines a minimax selection process and test-code concatenation to improve preference pair quality, reducing the influence of incorrect self-generated tests and enhancing model performance without the need for costly reward models. When applied with direct preference learning methods such as Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO), DSTC yields stable improvements in coding accuracy (pass@1 score) across diverse coding benchmarks, including HumanEval, MBPP, and BigCodeBench, demonstrating both its effectiveness and scalability for models of various sizes. This approach autonomously enhances code generation accuracy across LLMs of varying sizes, reducing reliance on expensive annotated coding datasets.

Updated: 2024-11-20 02:03:16

标题: DSTC：仅使用自动生成的测试和代码进行直接偏好学习，以改进代码LMs

摘要: 直接偏好学习为改进编码大型语言模型（LMs）中的代码生成提供了一种有前途且计算效率高的方法，超越了监督微调（SFT）。然而，可靠偏好数据的稀缺性是直接偏好学习改进代码LM的编码准确性的性能瓶颈。在本文中，我们介绍了仅使用自动生成的代码片段和测试构建可靠偏好对的直接偏好学习的\underline{\textbf{D}}irect Preference Learning with Only \underline{\textbf{S}}elf-Generated \underline{\textbf{T}}ests and \underline{\textbf{C}}ode（DSTC）框架，从而使LM编码准确性得以改善，无需外部注释。DSTC结合了极小极大选择过程和测试代码连接，以改善偏好对质量，减少不正确的自动生成的测试的影响，并增强模型性能，无需昂贵的奖励模型。当与直接偏好学习方法（如直接偏好优化（DPO）和卡内曼-特沃斯基优化（KTO））一起应用时，DSTC在各种编码基准测试中（包括HumanEval、MBPP和BigCodeBench）都能稳定提高编码准确性（pass@1得分），展示了其对各种规模模型的有效性和可扩展性。这种方法可以自主提高各种规模的LLMs的代码生成准确性，减少对昂贵的注释编码数据集的依赖。

更新时间: 2024-11-20 02:03:16

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.13611v1

Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual functions are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes. This online primal-dual natural actor-critic algorithm maintains and iteratively updates three variables: a dual variable (or Lagrangian multiplier), a primal variable (or actor), and a critic variable used to estimate the gradients of both primal and dual variables. These variables are updated simultaneously but on different time scales (using different step sizes) and they are all intertwined with each other. Our main contribution is to derive a finite-time analysis for the convergence of this algorithm to the global optimum of a CMDP problem. Specifically, we show that with a proper choice of step sizes the optimality gap and constraint violation converge to zero in expectation at a rate $\mathcal{O}(1/K^{1/6})$, where K is the number of iterations. To our knowledge, this paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem. We also validate the effectiveness of this algorithm through numerical simulations.

Updated: 2024-11-20 01:59:39

标题: 有限时间复杂度的在线原始-对偶自然演员-评论家算法在受约束的马尔可夫决策过程中的应用

摘要: 我们考虑了一个折扣成本受限的马尔可夫决策过程（CMDP）策略优化问题，其中一个代理寻求在折扣累积奖励最大化的同时满足对折扣累积效用的一些约束。为了解决这个受限优化问题，我们研究了一个经典原始-对偶方法的在线actor-critic变体，其中原始和对偶函数的梯度是使用由基础时变马尔可夫过程生成的单个轨迹的样本来估计的。这个在线原始-对偶自然actor-critic算法维护并迭代更新三个变量：一个对偶变量（或拉格朗日乘子）、一个原始变量（或actor）、以及一个用来估计原始和对偶变量梯度的评论家变量。这些变量同时但在不同的时间尺度上进行更新（使用不同的步长），它们相互交织在一起。我们的主要贡献是推导了这个算法收敛到CMDP问题全局最优解的有限时间分析。具体来说，我们展示了通过适当选择步长，优化间隙和约束违反的期望收敛速率为$\mathcal{O}(1/K^{1/6})$，其中K是迭代次数。据我们所知，这篇论文是第一篇研究在线原始-对偶actor-critic方法解决CMDP问题的有限时间复杂性的论文。我们还通过数值模拟验证了该算法的有效性。

更新时间: 2024-11-20 01:59:39

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2110.11383v3

Adaptive Process-Guided Learning: An Application in Predicting Lake DO Concentrations

This paper introduces a \textit{Process-Guided Learning (Pril)} framework that integrates physical models with recurrent neural networks (RNNs) to enhance the prediction of dissolved oxygen (DO) concentrations in lakes, which is crucial for sustaining water quality and ecosystem health. Unlike traditional RNNs, which may deliver high accuracy but often lack physical consistency and broad applicability, the \textit{Pril} method incorporates differential DO equations for each lake layer, modeling it as a first-order linear solution using a forward Euler scheme with a daily timestep. However, this method is sensitive to numerical instabilities. When drastic fluctuations occur, the numerical integration is neither mass-conservative nor stable. Especially during stratified conditions, exogenous fluxes into each layer cause significant within-day changes in DO concentrations. To address this challenge, we further propose an \textit{Adaptive Process-Guided Learning (April)} model, which dynamically adjusts timesteps from daily to sub-daily intervals with the aim of mitigating the discrepancies caused by variations in entrainment fluxes. \textit{April} uses a generator-discriminator architecture to identify days with significant DO fluctuations and employs a multi-step Euler scheme with sub-daily timesteps to effectively manage these variations. We have tested our methods on a wide range of lakes in the Midwestern USA, and demonstrated robust capability in predicting DO concentrations even with limited training data. While primarily focused on aquatic ecosystems, this approach is broadly applicable to diverse scientific and engineering disciplines that utilize process-based models, such as power engineering, climate science, and biomedicine.

Updated: 2024-11-20 01:58:20

标题: 自适应过程引导学习：在预测湖泊溶解氧浓度中的应用

摘要: 这篇论文介绍了一个\textit{过程引导学习（Pril）}框架，将物理模型与递归神经网络（RNNs）相结合，以增强对湖泊中溶解氧（DO）浓度的预测，这对维持水质和生态系统健康至关重要。与传统的RNNs不同，后者可能提供高准确性，但通常缺乏物理一致性和广泛适用性，\textit{Pril}方法整合了每个湖层的不同DO方程，将其建模为使用每日时间步长的前向欧拉方案的一阶线性解。然而，这种方法对数值不稳定性敏感。当出现剧烈波动时，数值积分既不是质量守恒的，也不是稳定的。特别是在分层条件下，流入每个层的外部通量导致DO浓度在一天内发生显著变化。为了解决这一挑战，我们进一步提出了一个\textit{自适应过程引导学习（April）}模型，该模型动态调整时间步长，从每日到亚日间隔，旨在减轻由混合流量变化引起的差异。 \textit{April} 使用生成器-鉴别器架构识别具有显著DO波动的日期，并采用多步欧拉方案和亚日时间步长有效管理这些变化。我们已在美国中西部各种湖泊上测试了我们的方法，并展示了即使在有限的训练数据下，预测DO浓度的稳健能力。尽管主要关注水生态系统，但这种方法广泛适用于利用基于过程模型的各种科学和工程学科，如动力工程、气候科学和生物医学。

更新时间: 2024-11-20 01:58:20

领域: cs.LG

下载: http://arxiv.org/abs/2411.12973v1

A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction

Urban spatio-temporal flow prediction, encompassing traffic flows and crowd flows, is crucial for optimizing city infrastructure and managing traffic and emergency responses. Traditional approaches have relied on separate models tailored to either grid-based data, representing cities as uniform cells, or graph-based data, modeling cities as networks of nodes and edges. In this paper, we build UniFlow, a foundational model for general urban flow prediction that unifies both grid-based and graphbased data. We first design a multi-view spatio-temporal patching mechanism to standardize different data into a consistent sequential format and then introduce a spatio-temporal transformer architecture to capture complex correlations and dynamics. To leverage shared spatio-temporal patterns across different data types and facilitate effective cross-learning, we propose SpatioTemporal Memory Retrieval Augmentation (ST-MRA). By creating structured memory modules to store shared spatio-temporal patterns, ST-MRA enhances predictions through adaptive memory retrieval. Extensive experiments demonstrate that UniFlow outperforms existing models in both grid-based and graph-based flow prediction, excelling particularly in scenarios with limited data availability, showcasing its superior performance and broad applicability. The datasets and code implementation have been released on https://github.com/YuanYuan98/UniFlow.

Updated: 2024-11-20 01:54:52

标题: 一个统一城市时空流预测的基础模型

摘要: 城市空间时间流预测对于优化城市基础设施、管理交通和应急响应至关重要。传统方法依赖于针对基于网格的数据或基于图的数据分别定制的模型，前者将城市表示为统一的单元格，后者则将城市建模为节点和边的网络。在本文中，我们构建了UniFlow，一个通用城市流预测的基础模型，统一了基于网格和基于图的数据。我们首先设计了一个多视图空间时间补丁机制，将不同的数据标准化为一致的序列格式，然后引入了一个空间时间变换器架构，以捕捉复杂的相关性和动态性。为了利用不同数据类型之间共享的空间时间模式并促进有效的交叉学习，我们提出了空间时间记忆检索增强（ST-MRA）。通过创建结构化记忆模块来存储共享的空间时间模式，ST-MRA通过自适应记忆检索增强了预测性能。广泛的实验表明，UniFlow在基于网格和基于图的流预测中胜过现有模型，特别在数据有限的情景中表现出色，展示了其卓越的性能和广泛的适用性。数据集和代码实现已发布在https://github.com/YuanYuan98/UniFlow。

更新时间: 2024-11-20 01:54:52

领域: cs.LG

下载: http://arxiv.org/abs/2411.12972v1

Shrinking POMCP: A Framework for Real-Time UAV Search and Rescue

Efficient path optimization for drones in search and rescue operations faces challenges, including limited visibility, time constraints, and complex information gathering in urban environments. We present a comprehensive approach to optimize UAV-based search and rescue operations in neighborhood areas, utilizing both a 3D AirSim-ROS2 simulator and a 2D simulator. The path planning problem is formulated as a partially observable Markov decision process (POMDP), and we propose a novel ``Shrinking POMCP'' approach to address time constraints. In the AirSim environment, we integrate our approach with a probabilistic world model for belief maintenance and a neurosymbolic navigator for obstacle avoidance. The 2D simulator employs surrogate ROS2 nodes with equivalent functionality. We compare trajectories generated by different approaches in the 2D simulator and evaluate performance across various belief types in the 3D AirSim-ROS simulator. Experimental results from both simulators demonstrate that our proposed shrinking POMCP solution achieves significant improvements in search times compared to alternative methods, showcasing its potential for enhancing the efficiency of UAV-assisted search and rescue operations.

Updated: 2024-11-20 01:41:29

标题: 缩小POMCP：用于实时无人机搜索和救援的框架

摘要: 在搜索和救援行动中，为了无人机的路径优化面临着挑战，包括有限的能见度、时间限制以及在城市环境中复杂的信息收集。我们提出了一种全面的方法来优化基于无人机的搜索和救援行动，利用3D AirSim-ROS2模拟器和2D模拟器。路径规划问题被制定为部分可观察的马尔可夫决策过程（POMDP），我们提出了一种新颖的“Shrinking POMCP”方法来解决时间限制。在AirSim环境中，我们将我们的方法与概率世界模型以及神经符号导航器集成起来，以避免障碍物。2D模拟器使用具有等效功能的代理ROS2节点。我们比较了2D模拟器中不同方法生成的轨迹，并在3D AirSim-ROS模拟器中评估了各种信念类型的性能。来自两个模拟器的实验结果表明，我们提出的缩小POMCP解决方案在搜索时间方面相比其他方法取得了显著的改进，展示了其提高无人机辅助搜索和救援行动效率的潜力。

更新时间: 2024-11-20 01:41:29

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.12967v1

On adaptivity and minimax optimality of two-sided nearest neighbors

Nearest neighbor (NN) algorithms have been extensively used for missing data problems in recommender systems and sequential decision-making systems. Prior theoretical analysis has established favorable guarantees for NN when the underlying data is sufficiently smooth and the missingness probabilities are lower bounded. Here we analyze NN with non-smooth non-linear functions with vast amounts of missingness. In particular, we consider matrix completion settings where the entries of the underlying matrix follow a latent non-linear factor model, with the non-linearity belonging to a \Holder function class that is less smooth than Lipschitz. Our results establish following favorable properties for a suitable two-sided NN: (1) The mean squared error (MSE) of NN adapts to the smoothness of the non-linearity, (2) under certain regularity conditions, the NN error rate matches the rate obtained by an oracle equipped with the knowledge of both the row and column latent factors, and finally (3) NN's MSE is non-trivial for a wide range of settings even when several matrix entries might be missing deterministically. We support our theoretical findings via extensive numerical simulations and a case study with data from a mobile health study, HeartSteps.

Updated: 2024-11-20 01:40:53

标题: 关于双边最近邻适应性和最小极小化优化的研究

摘要: 最近邻（NN）算法已被广泛用于推荐系统和顺序决策系统中的缺失数据问题。先前的理论分析已经为NN在底层数据足够平滑且缺失概率受到下限约束时建立了有利的保证。在这里，我们分析了具有大量缺失的非平滑非线性函数的NN。特别是，我们考虑矩阵补全设置，其中底层矩阵的条目遵循潜在非线性因子模型，其中非线性属于比利普希茨函数类更不平滑的\Holder函数类。我们的结果为适当的双边NN建立了以下有利性质：（1）NN的均方误差（MSE）适应于非线性的平滑度，（2）在某些正则条件下，NN错误率与具有对行和列潜在因子的知识的oracle获得的速率匹配，最后（3）即使在确定性地丢失了几个矩阵条目时，NN的MSE在各种设置中也是非平凡的。我们通过广泛的数值模拟和来自移动健康研究HeartSteps的案例研究支持我们的理论发现。

更新时间: 2024-11-20 01:40:53

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2411.12965v1

Real-Time Energy-Optimal Path Planning for Electric Vehicles

The rapid adoption of electric vehicles (EVs) in modern transport systems has made energy-aware routing a critical task in their successful integration, especially within large-scale networks. In cases where an EV's remaining energy is limited and charging locations are not easily accessible, some destinations may only be reachable through an energy-optimal path: a route that consumes less energy than all other alternatives. The feasibility of such energy-efficient paths depends heavily on the accuracy of the energy model used for planning, and thus failing to account for vehicle dynamics can lead to inaccurate energy estimates, rendering some planned routes infeasible in reality. This paper explores the impact of vehicle dynamics on energy-optimal path planning for EVs. We develop an accurate energy model that incorporates key vehicle dynamics parameters into energy calculations, thereby reducing the risk of planning infeasible paths under battery constraints. The paper also introduces two novel online reweighting functions that allow for a faster, pre-processing free, pathfinding in the presence of negative energy costs resulting from regenerative braking, making them ideal for real-time applications. Through extensive experimentation on real-world transport networks, we demonstrate that our approach considerably enhances energy-optimal pathfinding for EVs in both computational efficiency and energy estimation accuracy.

Updated: 2024-11-20 01:39:08

标题: 电动汽车的实时能源最优路径规划

摘要: 现代交通系统中电动汽车（EVs）的快速采用使得能源感知路由成为成功整合的关键任务，特别是在大规模网络中。在电动汽车的剩余能量有限且充电位置不易到达的情况下，一些目的地可能只能通过能源最优路径到达：消耗比所有其他替代方案更少能量的路径。这种能源高效路径的可行性在很大程度上取决于用于规划的能源模型的准确性，因此未考虑车辆动力学可能导致能量估计不准确，使得一些计划路径在现实中不可行。本文探讨了车辆动力学对电动汽车能源最优路径规划的影响。我们开发了一个准确的能源模型，将关键车辆动力学参数纳入能量计算中，从而降低在电池约束下规划不可行路径的风险。该论文还介绍了两种新的在线重新加权函数，允许在再生制动导致负能量成本的情况下进行更快、无预处理的路径查找，使其非常适用于实时应用。通过对真实交通网络的广泛实验，我们证明了我们的方法显著提高了电动汽车的能源最优路径规划在计算效率和能量估计准确性方面。

更新时间: 2024-11-20 01:39:08

领域: cs.AI

下载: http://arxiv.org/abs/2411.12964v1

DrugGen: Advancing Drug Discovery with Large Language Models and Reinforcement Learning Feedback

Traditional drug design faces significant challenges due to inherent chemical and biological complexities, often resulting in high failure rates in clinical trials. Deep learning advancements, particularly generative models, offer potential solutions to these challenges. One promising algorithm is DrugGPT, a transformer-based model, that generates small molecules for input protein sequences. Although promising, it generates both chemically valid and invalid structures and does not incorporate the features of approved drugs, resulting in time-consuming and inefficient drug discovery. To address these issues, we introduce DrugGen, an enhanced model based on the DrugGPT structure. DrugGen is fine-tuned on approved drug-target interactions and optimized with proximal policy optimization. By giving reward feedback from protein-ligand binding affinity prediction using pre-trained transformers (PLAPT) and a customized invalid structure assessor, DrugGen significantly improves performance. Evaluation across multiple targets demonstrated that DrugGen achieves 100% valid structure generation compared to 95.5% with DrugGPT and produced molecules with higher predicted binding affinities (7.22 [6.30-8.07]) compared to DrugGPT (5.81 [4.97-6.63]) while maintaining diversity and novelty. Docking simulations further validate its ability to generate molecules targeting binding sites effectively. For example, in the case of fatty acid-binding protein 5 (FABP5), DrugGen generated molecules with superior docking scores (FABP5/11, -9.537 and FABP5/5, -8.399) compared to the reference molecule (Palmitic acid, -6.177). Beyond lead compound generation, DrugGen also shows potential for drug repositioning and creating novel pharmacophores for existing targets. By producing high-quality small molecules, DrugGen provides a high-performance medium for advancing pharmaceutical research and drug discovery.

Updated: 2024-11-20 01:21:07

标题: DrugGen：利用大型语言模型和强化学习反馈推进药物发现

摘要: 传统药物设计面临着由于固有的化学和生物复杂性而产生的重大挑战，通常导致临床试验的高失败率。深度学习的进展，特别是生成模型，为这些挑战提供了潜在的解决方案。一种有前途的算法是DrugGPT，这是一种基于转换器的模型，用于为输入蛋白质序列生成小分子。尽管有前途，但它生成的结构既有化学上有效又无效，并且不包含已批准药物的特征，导致耗时且低效的药物发现。为了解决这些问题，我们引入了DrugGen，这是一个基于DrugGPT结构的增强模型。DrugGen在已批准的药物靶标相互作用上进行了微调，并通过接近策略优化进行了优化。通过使用预训练的转换器（PLAPT）进行蛋白配体结合亲和力预测和自定义无效结构评估器的奖励反馈，DrugGen显著提高了性能。跨多个靶点的评估表明，与DrugGPT相比，DrugGen实现了100%的有效结构生成，并且生成了具有更高预测结合亲和力（7.22 [6.30-8.07]）的分子，而DrugGPT为（5.81 [4.97-6.63]），同时保持多样性和新颖性。对接模拟进一步验证了其生成能够有效针对结合位点的分子的能力。例如，在脂肪酸结合蛋白5（FABP5）的情况下，与参考分子（棕榈酸，-6.177）相比，DrugGen生成了具有更高对接得分的分子（FABP5/11，-9.537和FABP5/5，-8.399）。除了引物化合物生成外，DrugGen还显示了对药物再定位和为现有靶标创建新的药理作用团的潜力。通过生成高质量的小分子，DrugGen为推进制药研究和药物发现提供了高性能的平台。

更新时间: 2024-11-20 01:21:07

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2411.14157v1

Causal and Counterfactual Views of Missing Data Models

It is often said that the fundamental problem of causal inference is a missing data problem -- the comparison of responses to two hypothetical treatment assignments is made difficult because for every experimental unit only one potential response is observed. In this paper, we consider the implications of the converse view: that missing data problems are a form of causal inference. We make explicit how the missing data problem of recovering the complete data law from the observed law can be viewed as identification of a joint distribution over counterfactual variables corresponding to values had we (possibly contrary to fact) been able to observe them. Drawing analogies with causal inference, we show how identification assumptions in missing data can be encoded in terms of graphical models defined over counterfactual and observed variables. We review recent results in missing data identification from this viewpoint. In doing so, we note interesting similarities and differences between missing data and causal identification theories.

Updated: 2024-11-20 01:20:11

标题: 因果和反事实视角下的缺失数据模型

摘要: 通常有人说因果推断的根本问题是一个缺失数据问题——比较对两个假设的处理分配的响应变得困难，因为对于每个实验单位只观察到一个潜在的响应。在本文中，我们考虑了相反观点的影响：缺失数据问题是一种因果推断形式。我们明确说明了从观察到的规律中恢复完整数据规律的缺失数据问题如何可以被视为识别与对应值相关的对立变量的联合分布，如果我们（可能与事实相反）能够观察到它们。通过与因果推断进行类比，我们展示了缺失数据中的识别假设如何可以用图模型来编码，这些图模型定义了对立和观察变量。我们从这个视角回顾了缺失数据识别方面的最新结果。在这样做的过程中，我们注意到了缺失数据和因果识别理论之间有趣的相似之处和差异。

更新时间: 2024-11-20 01:20:11

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2210.05558v3

Quantum neural networks form Gaussian processes

It is well known that artificial neural networks initialized from independent and identically distributed priors converge to Gaussian processes in the limit of a large number of neurons per hidden layer. In this work we prove an analogous result for Quantum Neural Networks (QNNs). Namely, we show that the outputs of certain models based on Haar random unitary or orthogonal deep QNNs converge to Gaussian processes in the limit of large Hilbert space dimension $d$. The derivation of this result is more nuanced than in the classical case due to the role played by the input states, the measurement observable, and the fact that the entries of unitary matrices are not independent. Then, we show that the efficiency of predicting measurements at the output of a QNN using Gaussian process regression depends on the observable's bodyness. Furthermore, our theorems imply that the concentration of measure phenomenon in Haar random QNNs is worse than previously thought, as we prove that expectation values and gradients concentrate as $\mathcal{O}\left(\frac{1}{e^d \sqrt{d}}\right)$. Finally, we discuss how our results improve our understanding of concentration in $t$-designs.

Updated: 2024-11-20 01:12:04

标题: 量子神经网络构建高斯过程

摘要: 众所周知，从独立同分布的先验开始初始化的人工神经网络在每个隐藏层的神经元数量很大时会收敛到高斯过程。在这项工作中，我们证明了量子神经网络（QNNs）的类似结果。换句话说，我们展示了基于Haar随机酉或正交深度QNNs的某些模型的输出在大希尔伯特空间维度$d$的极限下会收敛到高斯过程。由于输入状态、测量可观测量的作用以及酉矩阵的条目之间不是独立，这一结果的推导比经典情况更微妙。然后，我们展示了使用高斯过程回归在QNN的输出上预测测量的效率取决于可观测量的身体性。此外，我们的定理表明，Haar随机QNNs中的测量集中现象比之前认为的更糟糕，因为我们证明了期望值和梯度的集中程度为$\mathcal{O}\left(\frac{1}{e^d \sqrt{d}}\right)$。最后，我们讨论了我们的结果如何改进我们对$t$-设计中集中的理解。

更新时间: 2024-11-20 01:12:04

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2305.09957v3

FengWu-W2S: A deep learning model for seamless weather-to-subseasonal forecast of global atmosphere

Seamless forecasting that produces warning information at continuum timescales based on only one system is a long-standing pursuit for weather-climate service. While the rapid advancement of deep learning has induced revolutionary changes in classical forecasting field, current efforts are still focused on building separate AI models for weather and climate forecasts. To explore the seamless forecasting ability based on one AI model, we propose FengWu-Weather to Subseasonal (FengWu-W2S), which builds on the FengWu global weather forecast model and incorporates an ocean-atmosphere-land coupling structure along with a diverse perturbation strategy. FengWu-W2S can generate 6-hourly atmosphere forecasts extending up to 42 days through an autoregressive and seamless manner. Our hindcast results demonstrate that FengWu-W2S reliably predicts atmospheric conditions out to 3-6 weeks ahead, enhancing predictive capabilities for global surface air temperature, precipitation, geopotential height and intraseasonal signals such as the Madden-Julian Oscillation (MJO) and North Atlantic Oscillation (NAO). Moreover, our ablation experiments on forecast error growth from daily to seasonal timescales reveal potential pathways for developing AI-based integrated system for seamless weather-climate forecasting in the future.

Updated: 2024-11-20 01:10:15

标题: FengWu-W2S：全球大气无缝气象至次季节预测的深度学习模型

摘要: 无缝预测，基于仅一个系统产生持续警告信息的时间尺度一直是天气气候服务的长期追求。虽然深度学习的快速发展在经典预测领域引起了革命性变化，但当前的努力仍集中在为天气和气候预测构建单独的人工智能模型上。为了探索基于一个人工智能模型的无缝预测能力，我们提出了凤武-气象到亚季节（FengWu-W2S），它建立在凤武全球天气预报模型基础上，并结合了海洋-大气-陆地耦合结构以及多样的扰动策略。凤武-W2S可以通过自回归和无缝方式生成每6小时的大气预报，延伸至42天。我们的回报结果表明，凤武-W2S可可靠地预测未来3-6周的大气条件，增强了对全球地表气温、降水、位势高度和季节内信号（如马登-朱利安振荡（MJO）和北大西洋涛动（NAO））的预测能力。此外，我们对从日常到季节时间尺度的预测误差增长进行的消融实验揭示了未来发展基于人工智能的无缝天气气候预测综合系统的潜在途径。

更新时间: 2024-11-20 01:10:15

领域: cs.LG,cs.AI,physics.ao-ph

下载: http://arxiv.org/abs/2411.10191v2

Demystifying Large Language Models for Medicine: A Primer

Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices. This approach consists of several main phases, including formulating the task, choosing LLMs, prompt engineering, fine-tuning, and deployment. We start with the discussion of critical considerations in identifying healthcare tasks that align with the core capabilities of LLMs and selecting models based on the selected task and data, performance requirements, and model interface. We then review the strategies, such as prompt engineering and fine-tuning, to adapt standard LLMs to specialized medical tasks. Deployment considerations, including regulatory compliance, ethical guidelines, and continuous monitoring for fairness and bias, are also discussed. By providing a structured step-by-step methodology, this tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice, ensuring that these powerful technologies are applied in a safe, reliable, and impactful manner.

Updated: 2024-11-20 01:04:33

标题: 解密医学中的大型语言模型：入门指南

摘要: 大型语言模型（LLMs）代表了一类具有变革性的人工智能工具，能够通过生成人类般的回应跨越不同的背景，适应新颖的任务并遵循人类指示，从而彻底改变医疗保健的各个方面。它们的潜在应用范围涵盖了广泛的医疗任务，如临床文档记录、将患者匹配到临床试验以及回答医学问题。在这篇入门论文中，我们提出了一个可操作的指南，帮助医疗保健专业人士更有效地利用LLMs在工作中，并提出一套最佳实践。这种方法包括几个主要阶段，包括明确任务、选择LLMs、提示工程、微调和部署。我们从讨论识别与LLMs的核心能力相符的医疗任务和根据所选任务和数据、性能要求以及模型接口选择模型的关键考虑因素开始。然后我们回顾了一些策略，如提示工程和微调，以使标准LLMs适应专门的医学任务。部署考虑因素，包括合规性、道德指南以及对公平性和偏见的持续监测，也在讨论范围内。通过提供一个结构化的逐步方法，本教程旨在为医疗保健专业人士提供必要的工具，以有效地将LLMs整合到临床实践中，确保这些强大的技术以安全、可靠和有影响力的方式应用。

更新时间: 2024-11-20 01:04:33

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.18856v3

KAAE: Numerical Reasoning for Knowledge Graphs via Knowledge-aware Attributes Learning

Numerical reasoning is pivotal in various artificial intelligence applications, such as natural language processing and recommender systems, where it involves using entities, relations, and attribute values (e.g., weight, length) to infer new factual relations (e.g., the Nile is longer than the Amazon). However, existing approaches encounter two critical challenges in modeling: (1) semantic relevance-the challenge of insufficiently capturing the necessary contextual interactions among entities, relations, and numerical attributes, often resulting in suboptimal inference; and (2) semantic ambiguity-the difficulty in accurately distinguishing ordinal relationships during numerical reasoning, which compromises the generation of high-quality samples and limits the effectiveness of contrastive learning. To address these challenges, we propose the novel Knowledge-Aware Attributes Embedding model (KAAE) for knowledge graph embeddings in numerical reasoning. Specifically, to overcome the challenge of semantic relevance, we introduce a Mixture-of-Experts-Knowledge-Aware (MoEKA) Encoder, designed to integrate the semantics of entities, relations, and numerical attributes into a joint semantic space. To tackle semantic ambiguity, we implement a new ordinal knowledge contrastive learning (OKCL) strategy that generates high-quality ordinal samples from the original data with the aid of ordinal relations, capturing fine-grained semantic nuances essential for accurate numerical reasoning. Experiments on three public benchmark datasets demonstrate the superior performance of KAAE across various attribute value distributions.

Updated: 2024-11-20 00:47:03

标题: KAAE: 通过知识感知属性学习进行知识图谱的数值推理

摘要: 数值推理在各种人工智能应用中起着关键作用，比如自然语言处理和推荐系统，在这些应用中，它涉及使用实体、关系和属性值（例如重量、长度）来推断新的事实关系（例如，尼罗河比亚马逊河更长）。然而，现有方法在建模过程中遇到两个关键挑战：（1）语义相关性-挑战在于没有充分捕捉实体、关系和数值属性之间必要的上下文交互作用，往往导致次优推断；以及（2）语义模糊性-在数值推理过程中准确区分序数关系的困难，这损害了高质量样本的生成并限制了对比学习的有效性。为了解决这些挑战，我们提出了新颖的知识感知属性嵌入模型（KAAE）用于数值推理中的知识图嵌入。具体来说，为了克服语义相关性的挑战，我们引入了一个专家混合知识感知（MoEKA）编码器，旨在将实体、关系和数值属性的语义整合到一个联合语义空间中。为了解决语义模糊性，我们实现了一个新的序数知识对比学习（OKCL）策略，通过序数关系从原始数据中生成高质量的序数样本，捕捉对准确数值推理至关重要的细粒度语义细微差别。在三个公共基准数据集上的实验表明，KAAE在各种属性值分布下表现出优越性能。

更新时间: 2024-11-20 00:47:03

领域: cs.AI

下载: http://arxiv.org/abs/2411.12950v1

Machine learned reconstruction of tsunami dynamics from sparse observations

We investigate the use of the Senseiver, a transformer neural network designed for sparse sensing applications, to estimate full-field surface height measurements of tsunami waves from sparse observations. The model is trained on a large ensemble of simulated data generated via a shallow water equations solver, which we show to be a faithful reproduction for the underlying dynamics by comparison to historical events. We train the model on a dataset consisting of 8 tsunami simulations whose epicenters correspond to historical USGS earthquake records, and where the model inputs are restricted to measurements obtained at actively deployed buoy locations. We test the Senseiver on a dataset consisting of 8 simulations not included in training, demonstrating its capability for extrapolation. The results show remarkable resolution of fine scale phase and amplitude features from the true field, provided that at least a few of the sensors have obtained a non-zero signal. Throughout, we discuss which forecasting techniques can be improved by this method, and suggest ways in which the flexibility of the architecture can be leveraged to incorporate arbitrary remote sensing data (eg. HF Radar and satellite measurements) as well as investigate optimal sensor placements.

Updated: 2024-11-20 00:42:40

标题: 机器学习重建稀疏观测数据中的海啸动力学

摘要: 我们研究了Senseiver的使用，这是一种专为稀疏感知应用设计的变压器神经网络，用于从稀疏观测中估计海啸波的完整场面高度测量。该模型在通过浅水方程求解器生成的大量模拟数据集上进行训练，我们展示了通过与历史事件的比较，这些模拟数据对底层动态的忠实再现。我们在一个数据集上训练了该模型，该数据集由8个海啸模拟组成，其震中对应于历史USGS地震记录，模型的输入限制为在主动部署的浮标位置获取的测量结果。我们在一个包含8个未包含在训练中的模拟数据集上测试了Senseiver，展示了其对外推的能力。结果显示，只要至少有几个传感器获取了非零信号，就能显著解析真实场景中的细微相位和幅度特征。在整个过程中，我们讨论了通过这种方法可以改进哪些预测技术，并建议如何利用架构的灵活性来整合任意远程感测数据（如HF雷达和卫星测量），以及研究最佳传感器布置的方法。

更新时间: 2024-11-20 00:42:40

领域: cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2411.12948v1

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

Large Language Models are prone to off-topic misuse, where users may prompt these models to perform tasks beyond their intended scope. Current guardrails, which often rely on curated examples or custom classifiers, suffer from high false-positive rates, limited adaptability, and the impracticality of requiring real-world data that is not available in pre-production. In this paper, we introduce a flexible, data-free guardrail development methodology that addresses these challenges. By thoroughly defining the problem space qualitatively and passing this to an LLM to generate diverse prompts, we construct a synthetic dataset to benchmark and train off-topic guardrails that outperform heuristic approaches. Additionally, by framing the task as classifying whether the user prompt is relevant with respect to the system prompt, our guardrails effectively generalize to other misuse categories, including jailbreak and harmful prompts. Lastly, we further contribute to the field by open-sourcing both the synthetic dataset and the off-topic guardrail models, providing valuable resources for developing guardrails in pre-production environments and supporting future research and development in LLM safety.

Updated: 2024-11-20 00:31:23

标题: 一个灵活的大型语言模型的防护栏开发方法应用于话题外提示检测

摘要: 大型语言模型容易被滥用，用户可能引导这些模型执行超出其预期范围的任务。当前的防护措施通常依赖于精心筛选的示例或自定义分类器，存在高误报率、有限的适应性以及需要实际数据但在预生产环境中不可行的问题。在本文中，我们介绍了一种灵活的、无数据的防护措施开发方法，以解决这些挑战。通过对问题空间进行定性定义，并将其传递给一个LLM来生成多样的提示，我们构建了一个合成数据集，用于评估和训练胜过启发式方法的离题防护措施。此外，通过将任务框定为对用户提示是否与系统提示相关进行分类，我们的防护措施有效地推广到其他滥用类别，包括越狱和有害提示。最后，我们通过开源合成数据集和离题防护模型进一步为该领域做出贡献，为在预生产环境中开发防护措施提供有价值的资源，并支持未来LLM安全研究与开发。

更新时间: 2024-11-20 00:31:23

领域: cs.CL,cs.LG,68T50,I.2.7

下载: http://arxiv.org/abs/2411.12946v1

Time Step Generating: A Universal Synthesized Deepfake Image Detector

Currently, high-fidelity text-to-image models are developed in an accelerating pace. Among them, Diffusion Models have led to a remarkable improvement in the quality of image generation, making it vary challenging to distinguish between real and synthesized images. It simultaneously raises serious concerns regarding privacy and security. Some methods are proposed to distinguish the diffusion model generated images through reconstructing. However, the inversion and denoising processes are time-consuming and heavily reliant on the pre-trained generative model. Consequently, if the pre-trained generative model meet the problem of out-of-domain, the detection performance declines. To address this issue, we propose a universal synthetic image detector Time Step Generating (TSG), which does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms. Our method utilizes a pre-trained diffusion model's network as a feature extractor to capture fine-grained details, focusing on the subtle differences between real and synthetic images. By controlling the time step t of the network input, we can effectively extract these distinguishing detail features. Then, those features can be passed through a classifier (i.e. Resnet), which efficiently detects whether an image is synthetic or real. We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.

Updated: 2024-11-20 00:30:01

标题: 时间步骤生成：一种通用的综合深度伪造图像检测器

摘要: 目前，高保真度的文本到图像模型正以加速的速度发展。其中，扩散模型在图像生成质量方面取得了显著进步，使得真实和合成图像之间的区分变得非常具有挑战性。同时，这也引发了对隐私和安全方面的严重担忧。一些方法被提出来区分扩散模型生成的图像通过重构。然而，反演和去噪过程耗时且严重依赖预训练生成模型。因此，如果预训练生成模型遇到领域外的问题，检测性能将下降。为了解决这个问题，我们提出了一种不依赖于预训练模型的重构能力、特定数据集或抽样算法的通用合成图像检测器 Time Step Generating (TSG)。我们的方法利用预训练扩散模型的网络作为特征提取器，捕捉细微的细节，关注真实和合成图像之间的微小差异。通过控制网络输入的时间步长 t，我们可以有效提取这些区分细节特征。然后，这些特征可以通过分类器（如 Resnet）传递，有效地检测图像是合成的还是真实的。我们在大规模GenImage基准测试上测试了提出的TSG，并在准确性和泛化能力方面取得显著改进。

更新时间: 2024-11-20 00:30:01

领域: cs.CV,cs.AI,62H30, 68T07,I.4.9; I.4.7; I.5.2

下载: http://arxiv.org/abs/2411.11016v2

Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity

Multiple Object Tracking (MOT) in thermal imaging presents unique challenges due to the lack of visual features and the complexity of motion patterns. This paper introduces an innovative approach to improve MOT in the thermal domain by developing a novel box association method that utilizes both thermal object identity and motion similarity. Our method merges thermal feature sparsity and dynamic object tracking, enabling more accurate and robust MOT performance. Additionally, we present a new dataset comprised of a large-scale collection of thermal and RGB images captured in diverse urban environments, serving as both a benchmark for our method and a new resource for thermal imaging. We conduct extensive experiments to demonstrate the superiority of our approach over existing methods, showing significant improvements in tracking accuracy and robustness under various conditions. Our findings suggest that incorporating thermal identity with motion data enhances MOT performance. The newly collected dataset and source code is available at https://github.com/wassimea/thermalMOT

Updated: 2024-11-20 00:27:01

标题: 提升热力MOT：利用热身份和运动相似性的新型盒子关联方法

摘要: 热成像中的多目标跟踪（MOT）面临着独特的挑战，因为缺乏视觉特征和运动模式的复杂性。本文介绍了一种创新的方法，通过开发一种利用热对象身份和运动相似性的新型框关联方法来改进热领域的MOT。我们的方法融合了热特征的稀疏性和动态对象跟踪，实现了更准确和更稳健的MOT性能。此外，我们提出了一个新的数据集，包括在不同城市环境中捕获的大规模热和RGB图像，既可以作为我们方法的基准，也可以作为热成像的新资源。我们进行了大量实验，证明了我们的方法在各种条件下追踪准确性和稳健性方面的优越性。我们的研究结果表明，将热身份与运动数据结合可以提高MOT性能。新收集的数据集和源代码可在https://github.com/wassimea/thermalMOT 上获得。

更新时间: 2024-11-20 00:27:01

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.12943v1

On the relationship between Koopman operator approximations and neural ordinary differential equations for data-driven time-evolution predictions

This work explores the relationship between state space methods and Koopman operator-based methods for predicting the time-evolution of nonlinear dynamical systems. We demonstrate that extended dynamic mode decomposition with dictionary learning (EDMD-DL), when combined with a state space projection, is equivalent to a neural network representation of the nonlinear discrete-time flow map on the state space. We highlight how this projection step introduces nonlinearity into the evolution equations, enabling significantly improved EDMD-DL predictions. With this projection, EDMD-DL leads to a nonlinear dynamical system on the state space, which can be represented in either discrete or continuous time. This system has a natural structure for neural networks, where the state is first expanded into a high dimensional feature space followed by a linear mapping which represents the discrete-time map or the vector field as a linear combination of these features. Inspired by these observations, we implement several variations of neural ordinary differential equations (ODEs) and EDMD-DL, developed by combining different aspects of their respective model structures and training procedures. We evaluate these methods using numerical experiments on chaotic dynamics in the Lorenz system and a nine-mode model of turbulent shear flow, showing comparable performance across methods in terms of short-time trajectory prediction, reconstruction of long-time statistics, and prediction of rare events. We also show that these methods provide comparable performance to a non-Markovian approach in terms of prediction of extreme events.

Updated: 2024-11-20 00:18:46

标题: 关于Koopman算子逼近和神经常微分方程在数据驱动时间演化预测中的关系

摘要: 这项工作探讨了状态空间方法和Koopman算子方法之间在预测非线性动力学系统时间演化方面的关系。我们证明了扩展动态模态分解与字典学习（EDMD-DL），当与状态空间投影结合时，等同于非线性离散时间流映射在状态空间上的神经网络表示。我们强调了这个投影步骤如何将非线性引入演化方程中，从而实现了显著改进的EDMD-DL预测。通过这种投影，EDMD-DL导致了一个在状态空间上的非线性动力学系统，可以在离散或连续时间中表示。这个系统在神经网络中具有自然结构，其中状态首先被扩展为高维特征空间，然后通过线性映射表示离散时间映射或向量场为这些特征的线性组合。受到这些观察的启发，我们实现了几种神经常微分方程（ODEs）和EDMD-DL的变体，通过结合它们各自模型结构和训练过程的不同方面进行开发。我们通过在Lorenz系统的混沌动力学和九模型湍流剪切流上进行数值实验来评估这些方法，展示了在短期轨迹预测、长期统计重建和罕见事件预测方面各种方法的可比性。我们还展示了这些方法在极端事件预测方面与非马尔可夫方法提供了可比性能。

更新时间: 2024-11-20 00:18:46

领域: nlin.CD,cs.LG

下载: http://arxiv.org/abs/2411.12940v1

A community palm model

Palm oil production has been identified as one of the major drivers of deforestation for tropical countries. To meet supply chain objectives, commodity producers and other stakeholders need timely information of land cover dynamics in their supply shed. However, such data are difficult to obtain from suppliers who may lack digital geographic representations of their supply sheds and production locations. Here we present a "community model," a machine learning model trained on pooled data sourced from many different stakeholders, to produce a map of palm probability at global scale. An advantage of this method is the inclusion of varied inputs, the ability to easily update the model as new training data becomes available and run the model on any year that input imagery is available. Inclusion of diverse data sources into one probability map can help establish a shared understanding across stakeholders on the presence and absence of a land cover or commodity (in this case oil palm). The model predictors are annual composites built from publicly available satellite imagery provided by Sentinel-1, Sentinel-2, and ALOS-2, and terrain data from Jaxa (AW3D30) and Copernicus (GLO-30). We provide map outputs as the probability of palm in a given pixel, to reflect the uncertainty of the underlying state (palm or not palm). This version of this model provides global accuracy estimated to be 92% (at 0.5 probability threshold) on an independent test set. This model, and resulting oil palm probability map products are useful for accurately identifying the geographic footprint of palm cultivation. Used in conjunction with timely deforestation information, this palm model is useful for understanding the risk of continued oil palm plantation expansion in sensitive forest areas.

Updated: 2024-11-20 00:10:28

标题: 一个社区椰子模型

摘要: 棕榈油生产已被确定为热带国家森林砍伐的主要驱动力之一。为了实现供应链目标，商品生产商和其他利益相关者需要及时获得其供应范围内土地覆盖动态的信息。然而，从可能缺乏供应范围和生产地点的数字地理表示的供应商处获得此类数据是困难的。在这里，我们提出了一个“社区模型”，这是一个机器学习模型，经过多个利益相关者提供的汇总数据训练，可以在全球范围内生成棕榈概率地图。这种方法的优势在于包含多样化的输入，能够轻松更新模型并在有新的训练数据时运行模型，并且可以在有输入影像的任何年份运行模型。将多种数据源包含在一个概率地图中可以帮助各利益相关者建立对土地覆盖或商品（在本例中为油棕）存在与否的共同理解。该模型的预测变量是由Sentinel-1、Sentinel-2和ALOS-2提供的公开卫星影像以及Jaxa（AW3D30）和Copernicus（GLO-30）提供的地形数据构建的年度合成。我们提供地图输出，表示给定像素中棕榈的概率，以反映底层状态的不确定性（棕榈或非棕榈）。这个模型的这个版本在独立测试集上的全球精度估计为92%（在0.5概率阈值下）。这个模型和由此产生的油棕概率地图产品对准确识别棕榈栽培的地理范围很有用。与及时的森林砍伐信息结合使用，这个棕榈模型对了解敏感森林地区持续的油棕种植园扩张风险很有用。

更新时间: 2024-11-20 00:10:28

领域: cs.CY,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.09530v2

Improving Low-Fidelity Models of Li-ion Batteries via Hybrid Sparse Identification of Nonlinear Dynamics

Accurate modeling of lithium ion (li-ion) batteries is essential for enhancing the safety, and efficiency of electric vehicles and renewable energy systems. This paper presents a data-inspired approach for improving the fidelity of reduced-order li-ion battery models. The proposed method combines a Genetic Algorithm with Sequentially Thresholded Ridge Regression (GA-STRidge) to identify and compensate for discrepancies between a low-fidelity model (LFM) and data generated either from testing or a high-fidelity model (HFM). The hybrid model, combining physics-based and data-driven methods, is tested across different driving cycles to demonstrate the ability to significantly reduce the voltage prediction error compared to the baseline LFM, while preserving computational efficiency. The model robustness is also evaluated under various operating conditions, showing low prediction errors and high Pearson correlation coefficients for terminal voltage in unseen environments.

Updated: 2024-11-20 00:00:11

标题: 通过混合稀疏非线性动力学识别改进锂离子电池的低保真度模型

摘要: 锂离子（Li-ion）电池的准确建模对于提高电动车辆和可再生能源系统的安全性和效率至关重要。本文提出了一种基于数据启发的方法，用于提高降阶Li-ion电池模型的准确性。所提出的方法将遗传算法与序贯阈值岭回归（GA-STRidge）相结合，以识别和补偿低准确性模型（LFM）与来自测试或高准确性模型（HFM）生成的数据之间的差异。这种结合了基于物理和数据驱动方法的混合模型在不同驾驶循环下进行测试，以展示相对于基准LFM能够显著减少电压预测误差的能力，同时保持计算效率。该模型的稳健性也在各种操作条件下进行评估，在未知环境中显示出低预测误差和高Pearson相关系数的终端电压。

更新时间: 2024-11-20 00:00:11

领域: eess.SY,cs.LG,cs.NE,cs.SY

下载: http://arxiv.org/abs/2411.12935v1