Arxiv Day: Article

GRS: Generating Robotic Simulation Tasks from Real-World Images

We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training. Using vision-language models (VLMs), our pipeline operates in three stages: 1) scene comprehension with SAM2 for segmentation and object description, 2) matching objects with simulation-ready assets, and 3) generating appropriate tasks. We ensure simulation-task alignment through generated test suites and introduce a router that iteratively refines both simulation and test code. Experiments demonstrate our system's effectiveness in object correspondence and task environment generation through our novel router mechanism.

Updated: 2025-04-04 23:56:25

标题: GRS：从现实世界图像生成机器人模拟任务

摘要: 我们介绍了GRS（生成机器人仿真任务），这是一个解决机器人仿真中真实到虚拟的系统。GRS从单个RGB-D观测中创建数字孪生仿真，其中包含适用于虚拟代理训练的可解决任务。使用视觉-语言模型（VLMs），我们的流程分为三个阶段：1）使用SAM2进行场景理解，进行分割和物体描述，2）将物体与仿真准备好的资产进行匹配，以及3）生成适当的任务。我们通过生成的测试套件确保仿真任务的一致性，并引入一个路由器，迭代地优化仿真和测试代码。实验证明了我们的系统在物体对应和任务环境生成方面的有效性，通过我们独创的路由器机制。

更新时间: 2025-04-04 23:56:25

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.15536v2

Improving Offline Mixed-Criticality Scheduling with Reinforcement Learning

This paper introduces a novel reinforcement learning (RL) approach to scheduling mixed-criticality (MC) systems on processors with varying speeds. Building upon the foundation laid by [1], we extend their work to address the non-preemptive scheduling problem, which is known to be NP-hard. By modeling this scheduling challenge as a Markov Decision Process (MDP), we develop an RL agent capable of generating near-optimal schedules for real-time MC systems. Our RL-based scheduler prioritizes high-critical tasks while maintaining overall system performance. Through extensive experiments, we demonstrate the scalability and effectiveness of our approach. The RL scheduler significantly improves task completion rates, achieving around 80% overall and 85% for high-criticality tasks across 100,000 instances of synthetic data and real data under varying system conditions. Moreover, under stable conditions without degradation, the scheduler achieves 94% overall task completion and 93% for high-criticality tasks. These results highlight the potential of RL-based schedulers in real-time and safety-critical applications, offering substantial improvements in handling complex and dynamic scheduling scenarios.

Updated: 2025-04-04 23:28:48

标题: 使用强化学习改进离线混合关键性调度

摘要: 本文介绍了一种新颖的强化学习（RL）方法，用于在速度不同的处理器上调度混合临界性（MC）系统。在[1]奠定的基础上，我们扩展了他们的工作，以解决已知为NP难的非抢占调度问题。通过将这一调度挑战建模为马尔科夫决策过程（MDP），我们开发了一个RL代理，能够为实时MC系统生成接近最优的调度。我们基于RL的调度程序优先考虑高临界性任务，同时保持整体系统性能。通过大量实验，我们展示了我们方法的可扩展性和有效性。RL调度程序显着提高了任务完成率，在100,000个合成数据和真实数据实例下，在不同系统条件下实现了大约80%的整体完成率和85%的高临界性任务完成率。此外，在稳定条件下且没有降级的情况下，调度程序实现了94%的整体任务完成率和93%的高临界性任务完成率。这些结果突显了RL调度程序在实时和安全关键应用中的潜力，为处理复杂和动态调度场景提供了重大改进。

更新时间: 2025-04-04 23:28:48

领域: cs.LG,cs.AI,cs.MA,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.03994v1

AutoStep: Locally adaptive involutive MCMC

Many common Markov chain Monte Carlo (MCMC) kernels can be formulated using a deterministic involutive proposal with a step size parameter. Selecting an appropriate step size is often a challenging task in practice; and for complex multiscale targets, there may not be one choice of step size that works well globally. In this work, we address this problem with a novel class of involutive MCMC methods -- AutoStep MCMC -- that selects an appropriate step size at each iteration adapted to the local geometry of the target distribution. We prove that AutoStep MCMC is $\pi$-invariant and has other desirable properties under mild assumptions on the target distribution $\pi$ and involutive proposal. Empirical results examine the effect of various step size selection design choices, and show that AutoStep MCMC is competitive with state-of-the-art methods in terms of effective sample size per unit cost on a range of challenging target distributions.

Updated: 2025-04-04 23:13:54

标题: AutoStep：本地自适应的反演MCMC

摘要: 许多常见的马尔可夫链蒙特卡洛（MCMC）核可以使用具有步长参数的确定性对合提议进行表述。在实践中选择适当的步长通常是一项具有挑战性的任务；对于复杂的多尺度目标，可能没有一种步长选择在全局范围内都有效。在这项工作中，我们提出了一种新颖的自动步长MCMC方法 - AutoStep MCMC，它在每次迭代中选择适当的步长，以适应目标分布的局部几何结构。我们证明AutoStep MCMC是$\pi$-不变的，并在对目标分布$\pi$和对合提议进行温和假设的情况下具有其他理想特性。经验结果考察了各种步长选择设计选择的影响，并表明AutoStep MCMC在一系列具有挑战性的目标分布上，以单位成本的有效样本量而言，与最先进的方法具有竞争力。

更新时间: 2025-04-04 23:13:54

领域: stat.CO,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.18929v2

Regression Discontinuity Design with Distribution-Valued Outcomes

This article introduces Regression Discontinuity Design (RDD) with Distribution-Valued Outcomes (R3D), extending the standard RDD framework to settings where the outcome is a distribution rather than a scalar. Such settings arise when treatment is assigned at a higher level of aggregation than the outcome-for example, when a subsidy is allocated based on a firm-level revenue cutoff while the outcome of interest is the distribution of employee wages within the firm. Since standard RDD methods cannot accommodate such two-level randomness, I propose a novel approach based on random distributions. The target estimand is a "local average quantile treatment effect", which averages across random quantiles. To estimate this target, I introduce two related approaches: one that extends local polynomial regression to random quantiles and another based on local Fr\'echet regression, a form of functional regression. For both estimators, I establish asymptotic normality and develop uniform, debiased confidence bands together with a data-driven bandwidth selection procedure. Simulations validate these theoretical properties and show existing methods to be biased and inconsistent in this setting. I then apply the proposed methods to study the effects of gubernatorial party control on within-state income distributions in the US, using a close-election design. The results suggest a classic equality-efficiency tradeoff under Democratic governorship, driven by reductions in income at the top of the distribution.

Updated: 2025-04-04 23:12:35

标题: 具有分布值结果的回归断点设计

摘要: 这篇文章介绍了具有分布值结果的回归不连续设计（RDD）（R3D），将标准RDD框架扩展到结果为分布而非标量的情况。这种情况出现在治疗分配在比结果更高层次的聚合级别时，例如，根据公司级收入截断分配补贴，而感兴趣的结果是公司内员工工资的分布。由于标准RDD方法无法适应这种两级随机性，我提出了一种基于随机分布的新方法。目标估计量是“局部平均分位治疗效应”，在随机分位上平均。为了估计这个目标，我介绍了两种相关方法：一种将局部多项式回归扩展到随机分位，另一种基于局部Fr\'echet回归，一种形式的函数回归。对于这两种估计器，我建立了渐近正态性，并开发了统一、去偏置的置信区间以及数据驱动的带宽选择程序。模拟验证了这些理论特性，并显示现有方法在这种情况下存在偏差和不一致性。然后，我应用提出的方法研究了美国州长党派控制对美国州内收入分布的影响，使用了一个近似选举设计。结果表明，在民主党州长任期下存在经典的平等效率权衡，主要是由于收入在分布顶部的减少。

更新时间: 2025-04-04 23:12:35

领域: econ.EM,cs.LG,math.ST,stat.AP,stat.ME,stat.TH

下载: http://arxiv.org/abs/2504.03992v1

Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models

Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) have been shown to emulate human-like behavior in social settings. But, obtaining a large set of diverse behaviors requires manual effort in the form of designing prompts. On the other hand, Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior. In this work, we combine QD optimization with LLM-powered agents to iteratively search for prompts that generate diverse team behavior in a long-horizon, multi-step collaborative environment. We first show, through a human-subjects experiment (n=54 participants), that humans exhibit diverse coordination and communication behavior in this domain. We then show that our approach can effectively replicate trends from human teaming data and also capture behaviors that are not easily observed without collecting large amounts of data. Our findings highlight the combination of QD and LLM-powered agents as an effective tool for studying teaming and communication strategies in multi-agent collaboration.

Updated: 2025-04-04 23:09:40

标题: 大规模语言模型的多样人类化团队和沟通的算法提示生成

摘要: 理解人类如何在团队中合作和沟通对于改进人机协作和人工智能辅助决策至关重要。然而，仅依靠大规模用户研究数据是不切实际的，因为存在后勤、伦理和实际限制，这需要多样化人类行为的合成模型。最近，由大型语言模型（LLMs）驱动的代理已被证明能够模仿社交环境中类似于人类的行为。但是，获得大量多样化行为需要设计提示的手动工作。另一方面，已经证明质量多样性（QD）优化能够生成多样化的强化学习（RL）代理行为。在这项工作中，我们将QD优化与LLM驱动的代理结合起来，通过迭代搜索提示，生成长期、多步协作环境中的多样化团队行为。我们首先通过一个人类实验（n=54参与者）展示，在该领域人类表现出多样化的协调和沟通行为。然后我们展示了我们的方法可以有效地复制来自人类团队数据的趋势，并捕捉到不易在没有大量数据的情况下观察到的行为。我们的发现强调了QD和LLM驱动的代理结合作为研究多智能体协作中团队和沟通策略的有效工具。

更新时间: 2025-04-04 23:09:40

领域: cs.CL,cs.AI,cs.HC,cs.MA

下载: http://arxiv.org/abs/2504.03991v1

A Symmetric Regressor for MRI-Based Assessment of Striatal Dopamine Transporter Uptake in Parkinson's Disease With Enhanced Uncertainty Estimation

Dopamine transporter (DAT) imaging is commonly used for monitoring Parkinson's disease (PD), where striatal DAT uptake amount is computed to assess PD severity. However, DAT imaging has a high cost and the risk of radiance exposure and is not available in general clinics. Recently, MRI patch of the nigral region has been proposed as a safer and easier alternative. This paper proposes a symmetric regressor for predicting the DAT uptake amount from the nigral MRI patch. Acknowledging the symmetry between the right and left nigrae, the proposed regressor incorporates a paired input-output model that simultaneously predicts the DAT uptake amounts for both the right and left striata. Moreover, it employs a symmetric loss that imposes a constraint on the difference between right-to-left predictions, resembling the high correlation in DAT uptake amounts in the two lateral sides. Additionally, we propose a symmetric Monte-Carlo (MC) dropout method for providing a fruitful uncertainty estimate of the DAT uptake prediction, which utilizes the above symmetry. We evaluated the proposed approach on 734 nigral patches, which demonstrated significantly improved performance of the symmetric regressor compared with the standard regressors while giving better explainability and feature representation. The symmetric MC dropout also gave precise uncertainty ranges with a high probability of including the true DAT uptake amounts within the range.

Updated: 2025-04-04 23:08:49

标题: 一种对帕金森病中纹状体多巴胺转运体摄取进行MRI评估的对称回归器，带有增强不确定性估计

摘要: 多巴胺转运体（DAT）成像通常用于监测帕金森病（PD），其中计算纹状体DAT摄取量以评估PD严重程度。然而，DAT成像成本高，存在辐射暴露风险，并且在一般诊所中不可用。最近，已提出使用黑质区MRI贴片作为更安全和更简便的替代方法。本文提出了一种对称回归器，用于从黑质MRI贴片预测DAT摄取量。鉴于右侧和左侧黑质之间的对称性，所提出的回归器结合了一个配对的输入输出模型，同时预测右侧和左侧纹状体的DAT摄取量。此外，它采用对称损失，强加了对右侧到左侧预测之间差异的约束，类似于两侧DAT摄取量之间的高相关性。此外，我们提出了一种对称蒙特卡罗（MC）辍学方法，用于提供DAT摄取预测的丰富不确定性估计，利用上述对称性。我们在734个黑质补丁上评估了所提出的方法，结果显示与标准回归器相比，对称回归器的性能显着提高，同时提供更好的可解释性和特征表示。对称MC辍学还提供了精确的不确定性范围，高概率包含真实DAT摄取量在范围内。

更新时间: 2025-04-04 23:08:49

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.11929v3

CORTEX-AVD: CORner Case Testing & EXploration for Autonomous Vehicles Development

Autonomous Vehicles (AVs) aim to improve traffic safety and efficiency by reducing human error. However, ensuring AVs reliability and safety is a challenging task when rare, high-risk traffic scenarios are considered. These 'Corner Cases' (CC) scenarios, such as unexpected vehicle maneuvers or sudden pedestrian crossings, must be safely and reliable dealt by AVs during their operations. But they arehard to be efficiently generated. Traditional CC generation relies on costly and risky real-world data acquisition, limiting scalability, and slowing research and development progress. Simulation-based techniques also face challenges, as modeling diverse scenarios and capturing all possible CCs is complex and time-consuming. To address these limitations in CC generation, this research introduces CORTEX-AVD, CORner Case Testing & EXploration for Autonomous Vehicles Development, an open-source framework that integrates the CARLA Simulator and Scenic to automatically generate CC from textual descriptions, increasing the diversity and automation of scenario modeling. Genetic Algorithms (GA) are used to optimize the scenario parameters in six case study scenarios, increasing the occurrence of high-risk events. Unlike previous methods, CORTEX-AVD incorporates a multi-factor fitness function that considers variables such as distance, time, speed, and collision likelihood. Additionally, the study provides a benchmark for comparing GA-based CC generation methods, contributing to a more standardized evaluation of synthetic data generation and scenario assessment. Experimental results demonstrate that the CORTEX-AVD framework significantly increases CC incidence while reducing the proportion of wasted simulations.

Updated: 2025-04-04 23:05:31

标题: CORTEX-AVD：自动驾驶车辆开发中的角落案例测试与探索

摘要: 自主驾驶车辆（AVs）旨在通过减少人为错误来提高交通安全性和效率。然而，当考虑到罕见的、高风险的交通场景时，确保AVs的可靠性和安全性是一项具有挑战性的任务。这些“边界案例”（CC）场景，如意外车辆机动或突然出现的行人穿越，必须在AVs运行期间被安全和可靠地处理。但它们很难被有效地生成。传统的CC生成依赖于昂贵且风险高的现实世界数据采集，限制了可扩展性，并减缓了研究和开发进展。基于模拟的技术也面临挑战，因为对各种场景建模和捕捉所有可能的CC是复杂且耗时的。为了解决CC生成中的这些限制，本研究介绍了CORTEX-AVD，即自主车辆开发的边界案例测试与探索的开源框架，该框架集成了CARLA模拟器和Scenic，可以从文本描述中自动生成CC，增加了场景建模的多样性和自动化。遗传算法（GA）用于优化六个案例研究场景中的场景参数，增加高风险事件的发生率。与先前的方法不同，CORTEX-AVD结合了考虑距离、时间、速度和碰撞可能性等变量的多因素适应度函数。此外，该研究提供了一个基准，用于比较基于GA的CC生成方法，有助于更加标准化地评估合成数据生成和场景评估。实验结果表明，CORTEX-AVD框架显著增加了CC发生率，同时减少了浪费模拟的比例。

更新时间: 2025-04-04 23:05:31

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2504.03989v1

V-CEM: Bridging Performance and Intervenability in Concept-based Models

Concept-based eXplainable AI (C-XAI) is a rapidly growing research field that enhances AI model interpretability by leveraging intermediate, human-understandable concepts. This approach not only enhances model transparency but also enables human intervention, allowing users to interact with these concepts to refine and improve the model's performance. Concept Bottleneck Models (CBMs) explicitly predict concepts before making final decisions, enabling interventions to correct misclassified concepts. While CBMs remain effective in Out-Of-Distribution (OOD) settings with intervention, they struggle to match the performance of black-box models. Concept Embedding Models (CEMs) address this by learning concept embeddings from both concept predictions and input data, enhancing In-Distribution (ID) accuracy but reducing the effectiveness of interventions, especially in OOD scenarios. In this work, we propose the Variational Concept Embedding Model (V-CEM), which leverages variational inference to improve intervention responsiveness in CEMs. We evaluated our model on various textual and visual datasets in terms of ID performance, intervention responsiveness in both ID and OOD settings, and Concept Representation Cohesiveness (CRC), a metric we propose to assess the quality of the concept embedding representations. The results demonstrate that V-CEM retains CEM-level ID performance while achieving intervention effectiveness similar to CBM in OOD settings, effectively reducing the gap between interpretability (intervention) and generalization (performance).

Updated: 2025-04-04 22:43:04

标题: V-CEM：在基于概念的模型中实现性能和可干预性的桥梁

摘要: 基于概念的可解释人工智能（C-XAI）是一个快速增长的研究领域，通过利用中间的、人类可理解的概念增强AI模型的可解释性。这种方法不仅增强了模型的透明度，还使人类干预成为可能，使用户能够与这些概念进行交互，以精调和提高模型的性能。概念瓶颈模型（CBM）在做出最终决策之前明确预测概念，使干预能够纠正错误分类的概念。虽然CBM在干预的情况下在Out-Of-Distribution（OOD）设置中仍然有效，但它们很难与黑盒模型的性能匹敌。概念嵌入模型（CEM）通过从概念预测和输入数据中学习概念嵌入来解决这个问题，增强In-Distribution（ID）准确性，但在OOD情景中尤其是降低了干预的效果。在这项工作中，我们提出了变分概念嵌入模型（V-CEM），利用变分推断来提高CEM中干预的响应性。我们根据ID性能、ID和OOD设置中的干预响应性以及概念表示连贯性（CRC）这一我们提出的度量标准，在各种文本和视觉数据集上评估了我们的模型。结果表明，V-CEM保留了CEM级别的ID性能，同时在OOD设置中实现了类似于CBM的干预效果，有效缩小了可解释性（干预）和泛化性（性能）之间的差距。

更新时间: 2025-04-04 22:43:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03978v1

OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models

OLAF (Open Life Science Analysis Framework) is an open-source platform that enables researchers to perform bioinformatics analyses using natural language. By combining large language models (LLMs) with a modular agent-pipe-router architecture, OLAF generates and executes bioinformatics code on real scientific data, including formats like .h5ad. The system includes an Angular front end and a Python/Firebase backend, allowing users to run analyses such as single-cell RNA-seq workflows, gene annotation, and data visualization through a simple web interface. Unlike general-purpose AI tools, OLAF integrates code execution, data handling, and scientific libraries in a reproducible, user-friendly environment. It is designed to lower the barrier to computational biology for non-programmers and support transparent, AI-powered life science research.

Updated: 2025-04-04 22:41:16

标题: OLAF：一个由大型语言模型驱动的用于对话生物信息学的开放生命科学分析框架

摘要: OLAF（Open Life Science Analysis Framework）是一个开源平台，使研究人员能够使用自然语言执行生物信息学分析。通过将大型语言模型（LLMs）与模块化的代理-管道-路由器架构相结合，OLAF生成并在真实科学数据上执行生物信息学代码，包括.h5ad等格式。该系统包括一个Angular前端和一个Python/Firebase后端，允许用户通过简单的Web界面运行单细胞RNA测序工作流程、基因注释和数据可视化等分析。与通用AI工具不同，OLAF将代码执行、数据处理和科学库集成在一个可重现、用户友好的环境中。它旨在降低非程序员在计算生物学方面的障碍，并支持透明、由AI驱动的生命科学研究。

更新时间: 2025-04-04 22:41:16

领域: q-bio.QM,cs.AI,q-bio.GN

下载: http://arxiv.org/abs/2504.03976v1

GREATERPROMPT: A Unified, Customizable, and High-Performing Open-Source Toolkit for Prompt Optimization

LLMs have gained immense popularity among researchers and the general public for its impressive capabilities on a variety of tasks. Notably, the efficacy of LLMs remains significantly dependent on the quality and structure of the input prompts, making prompt design a critical factor for their performance. Recent advancements in automated prompt optimization have introduced diverse techniques that automatically enhance prompts to better align model outputs with user expectations. However, these methods often suffer from the lack of standardization and compatibility across different techniques, limited flexibility in customization, inconsistent performance across model scales, and they often exclusively rely on expensive proprietary LLM APIs. To fill in this gap, we introduce GREATERPROMPT, a novel framework that democratizes prompt optimization by unifying diverse methods under a unified, customizable API while delivering highly effective prompts for different tasks. Our framework flexibly accommodates various model scales by leveraging both text feedback-based optimization for larger LLMs and internal gradient-based optimization for smaller models to achieve powerful and precise prompt improvements. Moreover, we provide a user-friendly Web UI that ensures accessibility for non-expert users, enabling broader adoption and enhanced performance across various user groups and application scenarios. GREATERPROMPT is available at https://github.com/psunlpgroup/GreaterPrompt via GitHub, PyPI, and web user interfaces.

Updated: 2025-04-04 22:36:55

标题: GREATERPROMPT：一个统一、可定制和高性能的用于提示优化的开源工具包

摘要: LLMs已经在研究人员和普通公众中获得了巨大的流行，因为它在各种任务上具有令人印象深刻的能力。值得注意的是，LLMs的效力在很大程度上取决于输入提示的质量和结构，使得提示设计成为它们性能的关键因素。最近自动提示优化的进展引入了多种技术，自动增强提示以更好地与用户期望对齐。然而，这些方法通常受到标准化和跨不同技术的兼容性的限制，定制灵活性有限，在模型规模上表现不一致，并且通常仅依赖于昂贵的专有LLM API。为了填补这一空白，我们引入了GREATERPROMPT，这是一个新颖的框架，通过统一的可定制API将各种方法统一起来，为不同任务提供高效的提示。我们的框架通过利用文本反馈优化对更大的LLMs进行优化，并通过内部梯度优化对更小的模型进行优化，灵活适应各种模型规模，以实现强大而精确的提示改进。此外，我们提供了一个用户友好的Web UI，确保非专家用户可以访问，从而促进更广泛的采用和增强各种用户群体和应用场景的性能。GREATERPROMPT可通过GitHub（https://github.com/psunlpgroup/GreaterPrompt）、PyPI和Web用户界面获得。

更新时间: 2025-04-04 22:36:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03975v1

Stack Overflow Meets Replication: Security Research Amid Evolving Code Snippets (Extended Version)

We study the impact of Stack Overflow code evolution on the stability of prior research findings derived from Stack Overflow data and provide recommendations for future studies. We systematically reviewed papers published between 2005--2023 to identify key aspects of Stack Overflow that can affect study results, such as the language or context of code snippets. Our analysis reveals that certain aspects are non-stationary over time, which could lead to different conclusions if experiments are repeated at different times. We replicated six studies using a more recent dataset to demonstrate this risk. Our findings show that four papers produced significantly different results than the original findings, preventing the same conclusions from being drawn with a newer dataset version. Consequently, we recommend treating Stack Overflow as a time series data source to provide context for interpreting cross-sectional research conclusions.

Updated: 2025-04-04 22:31:06

标题: Stack Overflow遇到复制：在不断演变的代码片段中的安全研究（扩展版）

摘要: 我们研究了Stack Overflow代码演变对先前基于Stack Overflow数据推导的研究结果稳定性的影响，并为未来的研究提供建议。我们系统地审查了2005年至2023年发表的论文，以确定可能影响研究结果的Stack Overflow关键方面，如代码片段的语言或上下文。我们的分析显示，随着时间的推移，某些方面是非稳态的，这可能导致在不同时间重复实验时得出不同的结论。我们使用更新的数据集复制了六项研究，以展示这种风险。我们的研究结果表明，有四篇论文的结果与原始发现有显著不同，导致无法通过更新的数据集版本得出相同的结论。因此，我们建议将Stack Overflow视为一个时间序列数据源，以为解释横断面研究结论提供背景。

更新时间: 2025-04-04 22:31:06

领域: cs.CR

下载: http://arxiv.org/abs/2501.16948v3

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

We introduce VideoComp, a benchmark and learning framework for advancing video-text compositionality understanding, aimed at improving vision-language models (VLMs) in fine-grained temporal alignment. Unlike existing benchmarks focused on static image-text compositionality or isolated single-event videos, our benchmark targets alignment in continuous multi-event videos. Leveraging video-text datasets with temporally localized event captions (e.g. ActivityNet-Captions, YouCook2), we construct two compositional benchmarks, ActivityNet-Comp and YouCook2-Comp. We create challenging negative samples with subtle temporal disruptions such as reordering, action word replacement, partial captioning, and combined disruptions. These benchmarks comprehensively test models' compositional sensitivity across extended, cohesive video-text sequences. To improve model performance, we propose a hierarchical pairwise preference loss that strengthens alignment with temporally accurate pairs and gradually penalizes increasingly disrupted ones, encouraging fine-grained compositional learning. To mitigate the limited availability of densely annotated video data, we introduce a pretraining strategy that concatenates short video-caption pairs to simulate multi-event sequences. We evaluate video-text foundational models and large multimodal models (LMMs) on our benchmark, identifying both strengths and areas for improvement in compositionality. Overall, our work provides a comprehensive framework for evaluating and enhancing model capabilities in achieving fine-grained, temporally coherent video-text alignment.

Updated: 2025-04-04 22:24:30

标题: VideoComp：在视频文本模型中推进细粒度组合和时间对齐

摘要: 我们介绍了VideoComp，这是一个旨在推动视频文本组合理解的基准和学习框架，旨在改进视觉语言模型（VLMs）在细粒度时间对齐方面的性能。与现有的侧重于静态图像文本组合性或孤立的单一事件视频的基准不同，我们的基准针对连续的多事件视频中的对齐。利用具有时间局部化事件标题的视频文本数据集（例如ActivityNet-Captions，YouCook2），我们构建了两个组合基准，ActivityNet-Comp和YouCook2-Comp。我们创建了具有微妙时间干扰的具有挑战性的负样本，例如重新排序、动作词替换、部分标题化和组合干扰。这些基准全面测试了模型在扩展的、连贯的视频文本序列中的组合敏感性。为了改善模型性能，我们提出了一种分层成对偏好损失，该损失加强了与时间准确配对的对齐，并逐渐对受到越来越多干扰的对齐进行惩罚，鼓励细粒度的组合学习。为了缓解视频数据稀缺的问题，我们引入了一种预训练策略，将短视频标题对连接起来以模拟多事件序列。我们在我们的基准上评估了视频文本基础模型和大型多模态模型（LMMs），识别了组合性方面的优势和改进领域。总的来说，我们的工作提供了一个全面的框架，用于评估和增强模型在实现细粒度、时间一致的视频文本对齐方面的能力。

更新时间: 2025-04-04 22:24:30

领域: cs.CV,cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2504.03970v1

Bridging LMS and Generative AI: Dynamic Course Content Integration (DCCI) for Connecting LLMs to Course Content -- The Ask ME Assistant

The integration of Large Language Models (LLMs) with Learning Management Systems (LMSs) has the potential to enhance task automation and accessibility in education. However, hallucination where LLMs generate inaccurate or misleading information remains a significant challenge. This study introduces the Dynamic Course Content Integration (DCCI) mechanism, which dynamically retrieves and integrates course content and curriculum from Canvas LMS into the LLM-powered assistant, Ask ME. By employing prompt engineering to structure retrieved content within the LLM's context window, DCCI ensures accuracy, relevance, and contextual alignment, mitigating hallucination. To evaluate DCCI's effectiveness, Ask ME's usability, and broader student perceptions of AI in education, a mixed-methods approach was employed, incorporating user satisfaction ratings and a structured survey. Results from a pilot study indicate high user satisfaction (4.614/5), with students recognizing Ask ME's ability to provide timely and contextually relevant responses for both administrative and course-related inquiries. Additionally, a majority of students agreed that Ask ME's integration with course content in Canvas LMS reduced platform-switching, improving usability, engagement, and comprehension. AI's role in reducing classroom hesitation and fostering self-directed learning and intellectual curiosity was also highlighted. Despite these benefits and positive perception of AI tools, concerns emerged regarding over-reliance on AI, accuracy limitations, and ethical issues such as plagiarism and reduced student-teacher interaction. These findings emphasize the need for strategic AI implementation, ethical safeguards, and a pedagogical framework that prioritizes human-AI collaboration over substitution.

Updated: 2025-04-04 22:17:30

标题: 将LMS和生成式人工智能联系起来：用于连接LMS和课程内容的动态课程内容集成（DCCI）-- Ask ME助手

摘要: 大型语言模型（LLMs）与学习管理系统（LMSs）的整合有潜力提高教育中任务自动化和可访问性。然而，LLMs生成不准确或误导性信息的幻觉仍然是一个重要挑战。本研究介绍了动态课程内容整合（DCCI）机制，该机制动态地从Canvas LMS检索和整合课程内容和课程，导入到由LLM提供支持的助手Ask ME中。通过采用提示工程将检索到的内容结构化到LLM的上下文窗口中，DCCI确保准确性、相关性和上下文对齐，减轻了幻觉。为了评估DCCI的有效性、Ask ME的可用性以及广泛学生对教育中人工智能的看法，采用了混合方法，包括用户满意度评级和结构化调查。一项试点研究的结果显示用户满意度很高（4.614/5），学生认识到Ask ME能够及时提供上下文相关的回答，对行政和与课程相关的问题。此外，大多数学生认为Ask ME与Canvas LMS中的课程内容整合减少了平台切换，提高了可用性、参与度和理解能力。人工智能在减少课堂犹豫和促进自主学习和智识好奇心方面的作用也得到强调。尽管存在这些好处和对人工智能工具的积极看法，但也出现了对人工智能过度依赖、准确性限制和道德问题（如抄袭和减少学生与教师互动）的担忧。这些发现强调了需要战略性的人工智能实施、道德保障措施以及一个优先考虑人工智能与人类合作而非替代的教学框架。

更新时间: 2025-04-04 22:17:30

领域: cs.CY,cs.AI,cs.ET,cs.HC,cs.SE

下载: http://arxiv.org/abs/2504.03966v1

Clinical ModernBERT: An efficient and long context encoder for biomedical text

We introduce Clinical ModernBERT, a transformer based encoder pretrained on large scale biomedical literature, clinical notes, and medical ontologies, incorporating PubMed abstracts, MIMIC IV clinical data, and medical codes with their textual descriptions. Building on ModernBERT the current state of the art natural language text encoder featuring architectural upgrades such as rotary positional embeddings (RoPE), Flash Attention, and extended context length up to 8,192 tokens our model adapts these innovations specifically for biomedical and clinical domains. Clinical ModernBERT excels at producing semantically rich representations tailored for long context tasks. We validate this both by analyzing its pretrained weights and through empirical evaluation on a comprehensive suite of clinical NLP benchmarks.

Updated: 2025-04-04 22:14:12

标题: 临床现代BERT：一种用于生物医学文本的高效且长上下文编码器

摘要: 我们介绍了Clinical ModernBERT，这是一个基于transformer的编码器，它在大规模生物医学文献、临床笔记和医学本体知识的基础上进行了预训练，包括PubMed摘要、MIMIC IV临床数据和医学代码以及它们的文本描述。在现有技术的基础上，我们构建了ModernBERT，这是一个最先进的自然语言文本编码器，具有架构升级，如旋转位置嵌入（RoPE）、Flash Attention和扩展的上下文长度高达8,192个标记，我们的模型专门为生物医学和临床领域的应用调整了这些创新。Clinical ModernBERT在产生适用于长上下文任务的语义丰富表示方面表现出色。我们通过分析其预训练权重以及在一套全面的临床自然语言处理基准测试中的实证评估来验证这一点。

更新时间: 2025-04-04 22:14:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03964v1

The Black-Box Simulation Barrier Persists in a Fully Quantum World

Zero-Knowledge (ZK) protocols have been intensely studied due to their fundamental importance and versatility. However, quantum information's inherent differences significantly alter the landscape, necessitating a re-examination of ZK designs. A crucial aspect is round complexity, linked to $\textit{simulation}$, which forms the foundation of ZK definition and security proofs. In the $\textit{post-quantum}$ setting, where honest parties and channels are classical but adversaries quantum, Chia et al. [FOCS'21] showed constant-round $\textit{black-box-simulatable}$ ZK arguments (BBZK) for $\mathbf{NP}$ are impossible unless $\mathbf{NP} \subseteq \mathbf{BQP}$. But this problem remains open when all parties and communication are quantum. Indeed, this problem interests the broader theory of quantum computing. Investigating how quantum power alters tasks like the $\textit{unconditional}$ security of QKD and incorporating OT in MiniQCrypt has been crucial. Moreover, quantum communication has enabled round compression for commitments and interactive arguments. Along this line, understanding if quantum computing could fundamentally change ZK protocols is vital. We resolved this problem by proving that only languages in $\mathbf{BQP}$ admit constant-round $\textit{fully-quantum}$ BBZK. This result holds significant implications. Firstly, it illuminates the nature of quantum zero-knowledge and provides valuable insights for designing future protocols in the quantum realm. Secondly, it relates ZK round complexity with the intriguing problem of $\mathbf{BQP}$ vs $\mathbf{QMA}$, which is out of the reach of previous analogue impossibility results in the classical or post-quantum setting. Lastly, it justifies the need for the $\textit{non-black-box}$ simulation techniques or the relaxed security notions employed in existing constant-round fully-quantum BBZK protocols.

Updated: 2025-04-04 22:13:07

标题: 在完全量子世界中，黑匣子模拟障碍仍然存在

摘要: Zero-Knowledge (ZK) protocols have been intensely studied due to their fundamental importance and versatility. However, quantum information's inherent differences significantly alter the landscape, necessitating a re-examination of ZK designs. A crucial aspect is round complexity, linked to simulation, which forms the foundation of ZK definition and security proofs. In the post-quantum setting, where honest parties and channels are classical but adversaries quantum, Chia et al. [FOCS'21] showed constant-round black-box-simulatable ZK arguments (BBZK) for NP are impossible unless NP ⊆ BQP. But this problem remains open when all parties and communication are quantum. Indeed, this problem interests the broader theory of quantum computing. Investigating how quantum power alters tasks like the unconditional security of QKD and incorporating OT in MiniQCrypt has been crucial. Moreover, quantum communication has enabled round compression for commitments and interactive arguments. Along this line, understanding if quantum computing could fundamentally change ZK protocols is vital. We resolved this problem by proving that only languages in BQP admit constant-round fully-quantum BBZK. This result holds significant implications. Firstly, it illuminates the nature of quantum zero-knowledge and provides valuable insights for designing future protocols in the quantum realm. Secondly, it relates ZK round complexity with the intriguing problem of BQP vs QMA, which is out of the reach of previous analogue impossibility results in the classical or post-quantum setting. Lastly, it justifies the need for the non-black-box simulation techniques or the relaxed security notions employed in existing constant-round fully-quantum BBZK protocols.

更新时间: 2025-04-04 22:13:07

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2409.06317v2

Optimizing UAV Aerial Base Station Flights Using DRL-based Proximal Policy Optimization

Unmanned aerial vehicle (UAV)-based base stations offer a promising solution in emergencies where the rapid deployment of cutting-edge networks is crucial for maximizing life-saving potential. Optimizing the strategic positioning of these UAVs is essential for enhancing communication efficiency. This paper introduces an automated reinforcement learning approach that enables UAVs to dynamically interact with their environment and determine optimal configurations. By leveraging the radio signal sensing capabilities of communication networks, our method provides a more realistic perspective, utilizing state-of-the-art algorithm -- proximal policy optimization -- to learn and generalize positioning strategies across diverse user equipment (UE) movement patterns. We evaluate our approach across various UE mobility scenarios, including static, random, linear, circular, and mixed hotspot movements. The numerical results demonstrate the algorithm's adaptability and effectiveness in maintaining comprehensive coverage across all movement patterns.

Updated: 2025-04-04 22:06:01

标题: 使用基于深度强化学习的近端策略优化优化无人机空中基站飞行

摘要: 基于无人机（UAV）的基站在紧急情况下提供了一个有前途的解决方案，其中快速部署先进网络对于最大化挽救生命的潜力至关重要。优化这些UAV的战略定位对于提高通信效率至关重要。本文介绍了一种自动强化学习方法，使UAV能够动态地与环境互动并确定最佳配置。通过利用通信网络的无线信号感知能力，我们的方法提供了一个更现实的视角，利用最先进的算法--近端策略优化--来学习和推广跨不同用户设备（UE）移动模式的定位策略。我们评估我们的方法在各种UE移动情景下的效果，包括静态、随机、线性、圆形和混合热点移动。数值结果表明该算法在保持全面覆盖跨所有移动模式方面的适应性和有效性。

更新时间: 2025-04-04 22:06:01

领域: cs.AI,eess.SP

下载: http://arxiv.org/abs/2504.03961v1

Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation

Time series data is ubiquitous across various domains, including manufacturing, finance, and healthcare. High-quality annotations are essential for effectively understanding time series and facilitating downstream tasks; however, obtaining such annotations is challenging, particularly in mission-critical domains. In this paper, we propose TESSA, a multi-agent system designed to automatically generate both general and domain-specific annotations for time series data. TESSA introduces two agents: a general annotation agent and a domain-specific annotation agent. The general agent captures common patterns and knowledge across multiple source domains, leveraging both time-series-wise and text-wise features to generate general annotations. Meanwhile, the domain-specific agent utilizes limited annotations from the target domain to learn domain-specific terminology and generate targeted annotations. Extensive experiments on multiple synthetic and real-world datasets demonstrate that TESSA effectively generates high-quality annotations, outperforming existing methods.

Updated: 2025-04-04 21:58:33

标题: 用LLMs解码时间序列：一个跨领域注释的多代理框架

摘要: 时间序列数据在各个领域中普遍存在，包括制造业、金融和医疗保健等。高质量的注释对于有效理解时间序列并促进下游任务至关重要；然而，在关键领域中获得这样的注释是具有挑战性的。在本文中，我们提出了TESSA，一个多智能体系统，旨在自动生成时间序列数据的通用和特定领域的注释。TESSA引入了两个智能体：一个通用注释智能体和一个特定领域注释智能体。通用智能体捕获多个源领域中的通用模式和知识，利用时间序列和文本特征生成通用注释。与此同时，特定领域智能体利用目标领域中有限的注释来学习特定领域的术语，并生成针对性的注释。对多个合成和真实世界数据集的大量实验表明，TESSA能够有效生成高质量的注释，胜过现有方法。

更新时间: 2025-04-04 21:58:33

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.17462v2

Tell What You Hear From What You See -- Video to Audio Generation Through Text

The content of visual and audio scenes is multi-faceted such that a video can be paired with various audio and vice-versa. Thereby, in video-to-audio generation task, it is imperative to introduce steering approaches for controlling the generated audio. While Video-to-Audio generation is a well-established generative task, existing methods lack such controllability. In this work, we propose VATT, a multi-modal generative framework that takes a video and an optional text prompt as input, and generates audio and optional textual description of the audio. Such a framework has two advantages: i) Video-to-Audio generation process can be refined and controlled via text which complements the context of visual information, and ii) The model can suggest what audio to generate for the video by generating audio captions. VATT consists of two key modules: VATT Converter, a LLM that is fine-tuned for instructions and includes a projection layer that maps video features to the LLM vector space; and VATT Audio, a transformer that generates audio tokens from visual frames and from optional text prompt using iterative parallel decoding. The audio tokens are converted to a waveform by pretrained neural codec. Experiments show that when VATT is compared to existing video-to-audio generation methods in objective metrics, it achieves competitive performance when the audio caption is not provided. When the audio caption is provided as a prompt, VATT achieves even more refined performance (lowest KLD score of 1.41). Furthermore, subjective studies show that VATT Audio has been chosen as preferred generated audio than audio generated by existing methods. VATT enables controllable video-to-audio generation through text as well as suggesting text prompts for videos through audio captions, unlocking novel applications such as text-guided video-to-audio generation and video-to-audio captioning.

Updated: 2025-04-04 21:50:29

标题: 从所见告诉你听到的内容--通过文本实现视频到音频的转换

摘要: 视觉和音频场景的内容是多方面的，因此一个视频可以与各种音频配对，反之亦然。因此，在视频到音频生成任务中，引入控制生成音频的引导方法至关重要。虽然视频到音频生成是一个成熟的生成任务，但现有方法缺乏这种可控性。在这项工作中，我们提出了VATT，这是一个多模态生成框架，接受视频和可选文本提示作为输入，并生成音频和音频的可选文本描述。这样的框架具有两个优点：i）通过文本可以精细地控制视频到音频生成过程，补充视觉信息的背景，ii）模型可以通过生成音频字幕为视频建议要生成的音频。VATT由两个关键模块组成：VATT Converter，一个针对指令进行微调的LLM，包括一个将视频特征映射到LLM向量空间的投影层；以及VATT Audio，一个transformer，通过迭代并行解码从视觉帧和可选文本提示生成音频标记。音频标记通过预训练的神经编解码器转换为波形。实验结果显示，与现有视频到音频生成方法在客观指标上相比，当没有提供音频字幕时，VATT表现出有竞争力的性能。当音频字幕作为提示时，VATT表现出更加精细的性能（最低KLD分数为1.41）。此外，主观研究显示，VATT Audio被选择为优先生成的音频，而不是现有方法生成的音频。VATT通过文本实现可控的视频到音频生成，并通过音频字幕为视频提供文本提示，开启了诸如文本引导的视频到音频生成和视频到音频字幕等新应用。

更新时间: 2025-04-04 21:50:29

领域: cs.CV,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2411.05679v3

Practical Poisoning Attacks against Retrieval-Augmented Generation

Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. While RAG enhances LLM outputs, it remains vulnerable to poisoning attacks. Recent studies show that injecting poisoned text into the knowledge database can compromise RAG systems, but most existing attacks assume that the attacker can insert a sufficient number of poisoned texts per query to outnumber correct-answer texts in retrieval, an assumption that is often unrealistic. To address this limitation, we propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text, enhancing both feasibility and stealth. Extensive experiments across multiple datasets demonstrate that CorruptRAG achieves higher attack success rates compared to existing baselines.

Updated: 2025-04-04 21:49:42

标题: 实用的毒化攻击对检索增强生成的影响

摘要: 大型语言模型（LLMs）展示了令人印象深刻的自然语言处理能力，但面临虚构和过时知识等挑战。检索增强生成（RAG）已成为缓解这些问题的最先进方法。虽然RAG增强了LLM的输出，但仍然容易受到毒化攻击。最近的研究表明，向知识数据库注入毒化文本可能危害RAG系统，但大多数现有攻击假定攻击者可以在每个查询中插入足够数量的毒化文本，以超过检索中的正确答案文本，这种假设通常是不现实的。为了解决这一限制，我们提出了CorruptRAG，一种实用的对抗RAG系统的毒化攻击，攻击者只需插入一个毒化文本，增强了可行性和隐蔽性。跨多个数据集的广泛实验表明，与现有基线相比，CorruptRAG实现了更高的攻击成功率。

更新时间: 2025-04-04 21:49:42

领域: cs.CR,cs.IR,cs.LG

下载: http://arxiv.org/abs/2504.03957v1

Data-driven construction of a generalized kinetic collision operator from molecular dynamics

We introduce a data-driven approach to learn a generalized kinetic collision operator directly from molecular dynamics. Unlike the conventional (e.g., Landau) models, the present operator takes an anisotropic form that accounts for a second energy transfer arising from the collective interactions between the pair of collision particles and the environment. Numerical results show that preserving the broadly overlooked anisotropic nature of the collision energy transfer is crucial for predicting the plasma kinetics with non-negligible correlations, where the Landau model shows limitations.

Updated: 2025-04-04 21:43:31

标题: 基于分子动力学数据驱动的广义动力学碰撞算子构建

摘要: 我们介绍了一种从分子动力学中直接学习广义动力碰撞算符的数据驱动方法。与传统的（例如Landau）模型不同，当前的算符采用了一个各向异性形式，考虑了由碰撞粒子对和环境之间的集体相互作用产生的第二能量转移。数值结果表明，保持碰撞能量转移广泛被忽视的各向异性性质对于预测具有非可忽略相关性的等离子体动力学至关重要，而Landau模型显示了局限性。

更新时间: 2025-04-04 21:43:31

领域: physics.comp-ph,cs.LG,cs.NA,math.NA,physics.plasm-ph

下载: http://arxiv.org/abs/2503.24208v2

An Efficient Continual Learning Framework for Multivariate Time Series Prediction Tasks with Application to Vehicle State Estimation

In continual time series analysis using neural networks, catastrophic forgetting (CF) of previously learned models when training on new data domains has always been a significant challenge. This problem is especially challenging in vehicle estimation and control, where new information is sequentially introduced to the model. Unfortunately, existing work on continual learning has not sufficiently addressed the adverse effects of catastrophic forgetting in time series analysis, particularly in multivariate output environments. In this paper, we present EM-ReSeleCT (Efficient Multivariate Representative Selection for Continual Learning in Time Series Tasks), an enhanced approach designed to handle continual learning in multivariate environments. Our approach strategically selects representative subsets from old and historical data and incorporates memory-based continual learning techniques with an improved optimization algorithm to adapt the pre-trained model on new information while preserving previously acquired information. Additionally, we develop a sequence-to-sequence transformer model (autoregressive model) specifically designed for vehicle state estimation. Moreover, we propose an uncertainty quantification framework using conformal prediction to assess the sensitivity of the memory size and to showcase the robustness of the proposed method. Experimental results from tests on an electric Equinox vehicle highlight the superiority of our method in continually learning new information while retaining prior knowledge, outperforming state-of-the-art continual learning methods. Furthermore, EM-ReSeleCT significantly reduces training time, a critical advantage in continual learning applications.

Updated: 2025-04-04 21:40:47

标题: 一种高效的多变量时间序列预测任务的持续学习框架，应用于车辆状态估计

摘要: 在使用神经网络进行连续时间序列分析时，当在新数据域上训练时，先前学习过的模型的灾难性遗忘（CF）一直是一个重要挑战。这个问题在车辆估计和控制中尤其具有挑战性，因为新信息会逐步引入模型。不幸的是，现有的连续学习工作并没有充分解决时间序列分析中灾难性遗忘的不利影响，特别是在多变量输出环境中。在本文中，我们提出了EM-ReSeleCT（用于时间序列任务中的高效多变量代表性选择的连续学习方法），这是一种增强的方法，旨在处理多变量环境中的连续学习。我们的方法从旧有数据和历史数据中策略性地选择代表性子集，并结合基于记忆的连续学习技术和改进的优化算法，使预训练模型能够适应新信息，同时保留先前获取的信息。此外，我们开发了一个专门为车辆状态估计设计的序列到序列变换器模型（自回归模型）。此外，我们提出了一种使用符合预测来评估记忆大小敏感性并展示所提出方法稳健性的不确定性量化框架。在对一辆电动Equinox车辆进行测试的实验结果中，突显了我们的方法在持续学习新信息的同时保留先前知识的优越性，优于最先进的持续学习方法。此外，EM-ReSeleCT显著减少了训练时间，这在连续学习应用中是一个关键优势。

更新时间: 2025-04-04 21:40:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.01669v2

DeepOHeat-v1: Efficient Operator Learning for Fast and Trustworthy Thermal Simulation and Optimization in 3D-IC Design

Thermal analysis is crucial in three-dimensional integrated circuit (3D-IC) design due to increased power density and complex heat dissipation paths. Although operator learning frameworks such as DeepOHeat have demonstrated promising preliminary results in accelerating thermal simulation, they face critical limitations in prediction capability for multi-scale thermal patterns, training efficiency, and trustworthiness of results during design optimization. This paper presents DeepOHeat-v1, an enhanced physics-informed operator learning framework that addresses these challenges through three key innovations. First, we integrate Kolmogorov-Arnold Networks with learnable activation functions as trunk networks, enabling an adaptive representation of multi-scale thermal patterns. This approach achieves a $1.25\times$ and $6.29\times$ reduction in error in two representative test cases. Second, we introduce a separable training method that decomposes the basis function along the coordinate axes, achieving $62\times$ training speedup and $31\times$ GPU memory reduction in our baseline case, and enabling thermal analysis at resolutions previously infeasible due to GPU memory constraints. Third, we propose a confidence score to evaluate the trustworthiness of the predicted results, and further develop a hybrid optimization workflow that combines operator learning with finite difference (FD) using Generalized Minimal Residual (GMRES) method for incremental solution refinement, enabling efficient and trustworthy thermal optimization. Experimental results demonstrate that DeepOHeat-v1 achieves accuracy comparable to optimization using high-fidelity finite difference solvers, while speeding up the entire optimization process by $70.6\times$ in our test cases, effectively minimizing the peak temperature through optimal placement of heat-generating components.

Updated: 2025-04-04 21:39:42

标题: DeepOHeat-v1：三维IC设计中快速可靠的热仿真和优化的高效算子学习

摘要: 热分析在三维集成电路（3D-IC）设计中至关重要，因为功率密度增加和复杂的散热路径。虽然操作员学习框架（如DeepOHeat）已经展示出在加速热仿真方面取得了有希望的初步结果，但它们在多尺度热模式的预测能力、训练效率和设计优化过程中的结果可信度方面存在关键限制。本文介绍了DeepOHeat-v1，这是一个增强的基于物理信息的操作员学习框架，通过三个关键创新解决了这些挑战。首先，我们将可学习激活函数的Kolmogorov-Arnold网络集成为主干网络，实现多尺度热模式的自适应表示。这种方法在两个代表性测试案例中实现了误差减少1.25倍和6.29倍。其次，我们引入了一种可分离的训练方法，将基函数沿坐标轴分解，实现了在基准案例中训练速度提高62倍和GPU内存减少31倍，并使以前由于GPU内存限制而不可行的分辨率的热分析成为可能。第三，我们提出了一个置信度评分来评估预测结果的可信度，并进一步开发了一个混合优化工作流，将操作员学习与有限差分（FD）结合起来，使用广义最小残差（GMRES）方法进行增量解的细化，实现了高效和可信赖的热优化。实验结果表明，DeepOHeat-v1实现了与使用高保真度有限差分求解器进行优化相当的准确性，同时在我们的测试案例中加快了整个优化过程70.6倍，通过最佳放置热源组件有效地减小了峰值温度。

更新时间: 2025-04-04 21:39:42

领域: cs.LG,cs.AI,physics.data-an

下载: http://arxiv.org/abs/2504.03955v1

TGraphX: Tensor-Aware Graph Neural Network for Multi-Dimensional Feature Learning

TGraphX presents a novel paradigm in deep learning by unifying convolutional neural networks (CNNs) with graph neural networks (GNNs) to enhance visual reasoning tasks. Traditional CNNs excel at extracting rich spatial features from images but lack the inherent capability to model inter-object relationships. Conversely, conventional GNNs typically rely on flattened node features, thereby discarding vital spatial details. TGraphX overcomes these limitations by employing CNNs to generate multi-dimensional node features (e.g., (3*128*128) tensors) that preserve local spatial semantics. These spatially aware nodes participate in a graph where message passing is performed using 1*1 convolutions, which fuse adjacent features while maintaining their structure. Furthermore, a deep CNN aggregator with residual connections is used to robustly refine the fused messages, ensuring stable gradient flow and end-to-end trainability. Our approach not only bridges the gap between spatial feature extraction and relational reasoning but also demonstrates significant improvements in object detection refinement and ensemble reasoning.

Updated: 2025-04-04 21:38:20

标题: TGraphX：多维特征学习的张量感知图神经网络

摘要: TGraphX 提出了一种新的深度学习范式，通过将卷积神经网络（CNNs）与图神经网络（GNNs）统一起来，以增强视觉推理任务。传统的CNNs擅长从图像中提取丰富的空间特征，但缺乏建模物体间关系的固有能力。相反，传统的GNNs通常依赖于扁平化的节点特征，因此丢弃了重要的空间细节。TGraphX通过使用CNNs生成多维节点特征（例如（3*128*128）张量），保留了局部空间语义，从而克服了这些限制。这些空间感知节点参与一个图中，其中使用1*1卷积进行消息传递，将相邻特征融合在一起同时保持其结构。此外，使用具有残差连接的深度CNN聚合器来稳健地优化融合的消息，确保稳定的梯度流和端到端的可训练性。我们的方法不仅弥合了空间特征提取与关系推理之间的差距，还在目标检测细化和集合推理方面取得了显着的改进。

更新时间: 2025-04-04 21:38:20

领域: cs.CV,cs.AI,68T07, 68T45, 68R10,I.2.6; I.5.1; I.4.8

下载: http://arxiv.org/abs/2504.03953v1

A New Approach to Controlling Linear Dynamical Systems

We propose a new method for controlling linear dynamical systems under adversarial disturbances and cost functions. Our algorithm achieves a running time that scales polylogarithmically with the inverse of the stability margin, improving upon prior methods with polynomial dependence maintaining the same regret guarantees. The technique, which may be of independent interest, is based on a novel convex relaxation that approximates linear control policies using spectral filters constructed from the eigenvectors of a specific Hankel matrix.

Updated: 2025-04-04 21:37:46

标题: 控制线性动力系统的新方法

摘要: 我们提出了一种新的方法，用于在对抗性干扰和成本函数下控制线性动态系统。我们的算法实现了随着稳定性边缘倒数的对数多项式比例的运行时间，改进了先前方法中多项式依赖性的保持相同遗憾保证。这种技术可能是独立感兴趣的，它基于一种新颖的凸松弛，利用从特定Hankel矩阵的特征向量构建的谱滤波器来近似线性控制策略。

更新时间: 2025-04-04 21:37:46

领域: eess.SY,cs.LG,cs.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2504.03952v1

Understanding EFX Allocations: Counting and Variants

Envy-freeness up to any good (EFX) is a popular and important fairness property in the fair allocation of indivisible goods, of which its existence in general is still an open question. In this work, we investigate the problem of determining the minimum number of EFX allocations for a given instance, arguing that this approach may yield valuable insights into the existence and computation of EFX allocations. We focus on restricted instances where the number of goods slightly exceeds the number of agents, and extend our analysis to weighted EFX (WEFX) and a novel variant of EFX for general monotone valuations, termed EFX+. In doing so, we identify the transition threshold for the existence of allocations satisfying these fairness notions. Notably, we resolve open problems regarding WEFX by proving polynomial-time computability under binary additive valuations, and establishing the first constant-factor approximation for two agents.

Updated: 2025-04-04 21:36:09

标题: 理解EFX分配：计数和变体

摘要: Envy-freeness up to any good (EFX)是在不可分割商品的公平分配中流行且重要的公平性属性，其在一般情况下是否存在仍然是一个待解问题。在这项工作中，我们研究了确定给定实例的EFX分配的最小数量的问题，认为这种方法可能会为存在和计算EFX分配提供有价值的见解。我们专注于限制实例，其中商品的数量略多于代理人的数量，并将我们的分析扩展到加权EFX（WEFX）和一种新颖的EFX变体，适用于一般单调估值，称为EFX+。在这样做的过程中，我们确定了满足这些公平性概念的分配存在的过渡阈值。值得注意的是，我们通过证明在二元可加估值下WEFX的多项式时间可计算性，并为两个代理人建立了第一个常数因子逼近，解决了关于WEFX的待解问题。

更新时间: 2025-04-04 21:36:09

领域: cs.GT,cs.AI,econ.TH

下载: http://arxiv.org/abs/2504.03951v1

Batch Bayesian Optimization for High-Dimensional Experimental Design: Simulation and Visualization

Bayesian Optimization (BO) is increasingly used to guide experimental optimization tasks. To elucidate BO behavior in noisy and high-dimensional settings typical for materials science applications, we perform batch BO of two six-dimensional test functions: an Ackley function representing a needle-in-a-haystack problem and a Hartmann function representing a problem with a false maximum with a value close to the global maximum. We show learning curves, performance metrics, and visualization to effectively track the evolution of optimization in high dimensions and evaluate how they are affected by noise, batch-picking method, choice of acquisition function,and its exploration hyperparameter values. We find that the effects of noise depend on the problem landscape; therefore, prior knowledge of the domain structure and noise level is needed when designing BO. The Ackley function optimization is significantly degraded by noise with a complete loss of ground truth resemblance when noise equals 10 % of the maximum objective value. For the Hartmann function, even in the absence of noise, a significant fraction of the initial samplings identify the false maximum instead of the ground truth maximum as the optimum of the function; with increasing noise, BO remains effective, albeit with increasing probability of landing on the false maximum. This study systematically highlights the critical issues when setting up BO and choosing synthetic data to test experimental design. The results and methodology will facilitate wider utilization of BO in guiding experiments, specifically in high-dimensional settings.

Updated: 2025-04-04 21:20:11

标题: 高维实验设计的批量贝叶斯优化：模拟与可视化

摘要: 贝叶斯优化（BO）越来越多地被用于指导实验优化任务。为了阐明在材料科学应用中典型的嘈杂和高维环境中的BO行为，我们对两个六维测试函数进行了批量BO：一个代表着草堆中的针问题的Ackley函数，另一个代表着一个带有接近全局最大值的伪最大值问题的Hartmann函数。我们展示了学习曲线，性能指标和可视化以有效跟踪高维优化的演变，并评估它们如何受到噪声、批量选择方法、采集函数的选择和其探索超参数值的影响。我们发现噪声的影响取决于问题的结构；因此，在设计BO时需要先验领域结构和噪声水平的知识。在噪声等于最大目标值的10%时，Ackley函数的优化受到严重损害，完全失去了与真实值的相似性。对于Hartmann函数，即使没有噪声，一大部分初始采样将伪最大值标识为函数的最优解，而不是真实最大值；随着噪声的增加，BO仍然有效，但落在伪最大值的概率也在增加。这项研究系统地强调了在设置BO和选择合成数据以测试实验设计时的关键问题。这些结果和方法将促进BO在指导实验中的更广泛利用，特别是在高维环境中。

更新时间: 2025-04-04 21:20:11

领域: stat.ML,cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2504.03943v1

ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models

Recent studies have shown that Large Language Models (LLMs) augmented with chain-of-thought (CoT) reasoning demonstrate impressive problem-solving abilities. However, in this work, we identify a recurring issue where these models occasionally generate overly short reasoning, leading to degraded performance on even simple mathematical problems. Specifically, we investigate how reasoning length is embedded in the hidden representations of reasoning models and its impact on accuracy. Our analysis reveals that reasoning length is governed by a linear direction in the representation space, allowing us to induce overly short reasoning by steering the model along this direction. Building on this insight, we introduce ThinkEdit, a simple yet effective weight-editing approach to mitigate the issue of overly short reasoning. We first identify a small subset of attention heads (approximately 2%) that predominantly drive short reasoning behavior. We then edit the output projection weights of these heads to suppress the short reasoning direction. With changes to only 0.1% of the model's parameters, ThinkEdit effectively reduces overly short reasoning and yields notable accuracy gains for short reasoning outputs (+5.44%), along with an overall improvement across multiple math benchmarks (+2.43%). Our findings provide new mechanistic insights into how reasoning length is controlled within LLMs and highlight the potential of fine-grained model interventions to improve reasoning quality. Our code is available at https://github.com/Trustworthy-ML-Lab/ThinkEdit

Updated: 2025-04-04 21:17:46

标题: ThinkEdit：可解释的权重编辑以减轻推理模型中过于短的思考

摘要: 最近的研究表明，大型语言模型（LLMs）与思维链（CoT）推理相结合展现出令人印象深刻的问题解决能力。然而，在这项工作中，我们发现一个反复出现的问题，即这些模型偶尔会生成过于简短的推理，导致即使是简单的数学问题也表现出下降的性能。具体而言，我们调查了推理长度如何嵌入在推理模型的隐藏表示中以及其对准确性的影响。我们的分析揭示了推理长度受表示空间中的线性方向控制，使我们能够通过沿着这个方向引导模型来诱发过于简短的推理。基于这一洞察，我们引入了ThinkEdit，这是一种简单而有效的权重编辑方法，用于缓解过于简短的推理问题。我们首先确定一个仅占注意力头部的小子集（大约2%），这些头部主要推动简短推理行为。然后，我们编辑这些头部的输出投影权重，以抑制简短推理方向。通过仅更改模型参数的0.1％，ThinkEdit有效地减少了过于简短的推理，并为简短推理输出带来了显着的准确性增益（+5.44％），同时在多个数学基准测试中实现了整体改进（+2.43％）。我们的研究结果为我们如何控制LLMs中的推理长度提供了新的机械洞察，并突显了微观模型干预改进推理质量的潜力。我们的代码可在https://github.com/Trustworthy-ML-Lab/ThinkEdit上找到。

更新时间: 2025-04-04 21:17:46

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.22048v2

Analysis of Robustness of a Large Game Corpus

Procedural content generation via machine learning (PCGML) in games involves using machine learning techniques to create game content such as maps and levels. 2D tile-based game levels have consistently served as a standard dataset for PCGML because they are a simplified version of game levels while maintaining the specific constraints typical of games, such as being solvable. In this work, we highlight the unique characteristics of game levels, including their structured discrete data nature, the local and global constraints inherent in the games, and the sensitivity of the game levels to small changes in input. We define the robustness of data as a measure of sensitivity to small changes in input that cause a change in output, and we use this measure to analyze and compare these levels to state-of-the-art machine learning datasets, showcasing the subtle differences in their nature. We also constructed a large dataset from four games inspired by popular classic tile-based games that showcase these characteristics and address the challenge of sparse data in PCGML by providing a significantly larger dataset than those currently available.

Updated: 2025-04-04 21:15:13

标题: 大型游戏语料库的鲁棒性分析

摘要: 通过机器学习进行程序内容生成（PCGML）在游戏中涉及使用机器学习技术来创建游戏内容，如地图和关卡。2D基于瓦片的游戏关卡一直被作为PCGML的标准数据集，因为它们是游戏关卡的简化版本，同时保持了游戏的特定约束，例如可解决性。在这项工作中，我们突出游戏关卡的独特特征，包括结构化的离散数据性质，游戏中固有的局部和全局约束，以及游戏关卡对输入中微小变化的敏感性。我们将数据的稳健性定义为对输入中微小变化的敏感性的度量，这些变化会导致输出发生变化，并利用这一度量来分析和比较这些关卡与最先进的机器学习数据集，展示它们性质中微妙的差异。我们还从受到热门经典基于瓦片的游戏启发的四款游戏中构建了一个大型数据集，展示了这些特征，并通过提供比当前可用数据集大得多的数据集来解决PCGML中稀疏数据的挑战。

更新时间: 2025-04-04 21:15:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03940v1

Commit-Reveal$^2$: Randomized Reveal Order Mitigates Last-Revealer Attacks in Commit-Reveal

Randomness generation is a fundamental component in blockchain systems, essential for tasks such as validator selection, zero-knowledge proofs, and decentralized finance operations. Traditional Commit-Reveal mechanisms provide simplicity and security but are susceptible to last revealer attacks, where an adversary can manipulate the random outcome by withholding their reveal. To address this vulnerability, we propose the Commit-Reveal$^2$ protocol, which employs a two-layer Commit-Reveal process to randomize the reveal order and mitigate the risk of such attacks. Additionally, we introduces a method to leverage off-chain networks to optimize communication costs and enhance efficiency. We implement a prototype of the proposed mechanism and publicly release the code to facilitate practical adoption and further research.

Updated: 2025-04-04 21:05:51

标题: Commit-Reveal$^2$: 随机揭示顺序减轻了在Commit-Reveal中的最后揭示者攻击

摘要: 随机数生成是区块链系统中的基本组件，对于验证者选择、零知识证明和去中心化金融操作等任务至关重要。传统的承诺-揭示机制提供了简单性和安全性，但容易受到最后揭示者攻击的影响，即对手可以通过不揭示来操纵随机结果。为了解决这一漏洞，我们提出了承诺-揭示$^2$协议，采用两层承诺-揭示过程来随机化揭示顺序并减轻此类攻击的风险。此外，我们引入了一种利用链下网络来优化通信成本和增强效率的方法。我们实现了所提出机制的原型，并公开发布了代码，以促进实际采用和进一步研究。

更新时间: 2025-04-04 21:05:51

领域: cs.CR

下载: http://arxiv.org/abs/2504.03936v1

Individualized Deepfake Detection Exploiting Traces Due to Double Neural-Network Operations

In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial images of individual public figures. We propose to condition the proposed detector on the identity of an identified individual, given the advantages revealed by our theory-driven simulations. While most detectors in the literature rely on perceptible or imperceptible artifacts present in deepfake facial images, we demonstrate that the detection performance can be improved by exploiting the idempotency property of neural networks. In our approach, the training process involves double neural-network operations where we pass an authentic image through a deepfake simulating network twice. Experimental results show that the proposed method improves the area under the curve (AUC) from 0.92 to 0.94 and reduces its standard deviation by 17%. To address the need for evaluating detection performance for individual public figures, we curated and publicly released a dataset of ~32k images featuring 45 public figures, as existing deepfake datasets do not meet this criterion.

Updated: 2025-04-04 21:05:01

标题: 利用由双神经网络操作产生的痕迹进行个性化深度伪造检测

摘要: 在当今数字化的环境中，记者在将特定公众人物的面部图像和视频纳入新闻报道之前，迫切需要工具来验证这些图像和视频的真实性。目前现有的深度伪造检测器在涉及特定可识别个体的图像时并未针对这一检测任务进行优化。本研究专注于对个别公众人物的面部图像进行深度伪造检测。我们提出将所提出的检测器与已识别个体的身份进行条件化，以利用我们理论驱动的模拟所揭示的优势。虽然文献中大多数检测器依赖于深度伪造面部图像中存在的可感知或不可感知的人为痕迹，我们证明通过利用神经网络的幂等性属性可以提高检测性能。在我们的方法中，训练过程涉及双神经网络操作，我们将一个真实图像通过一个深度伪造模拟网络两次传递。实验结果显示，所提出的方法将曲线下面积（AUC）从0.92提高到0.94，并将其标准偏差降低了17%。为了满足对个别公众人物检测性能的评估需求，我们整理并公开发布了一个包含约32,000张图像的数据集，其中包含45个公众人物，因为现有的深度伪造数据集不符合这一标准。

更新时间: 2025-04-04 21:05:01

领域: eess.IV,cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.08034v2

Metamorphic Testing for Fairness Evaluation in Large Language Models: Identifying Intersectional Bias in LLaMA and GPT

Large Language Models (LLMs) have made significant strides in Natural Language Processing but remain vulnerable to fairness-related issues, often reflecting biases inherent in their training data. These biases pose risks, particularly when LLMs are deployed in sensitive areas such as healthcare, finance, and law. This paper introduces a metamorphic testing approach to systematically identify fairness bugs in LLMs. We define and apply a set of fairness-oriented metamorphic relations (MRs) to assess the LLaMA and GPT model, a state-of-the-art LLM, across diverse demographic inputs. Our methodology includes generating source and follow-up test cases for each MR and analyzing model responses for fairness violations. The results demonstrate the effectiveness of MT in exposing bias patterns, especially in relation to tone and sentiment, and highlight specific intersections of sensitive attributes that frequently reveal fairness faults. This research improves fairness testing in LLMs, providing a structured approach to detect and mitigate biases and improve model robustness in fairness-sensitive applications.

Updated: 2025-04-04 21:04:14

标题: 大语言模型公平性评估的变质测试：在 LLaMA 和 GPT 中识别交叉偏见

摘要: Large Language Models (LLMs) 在自然语言处理领域取得了显著进展，但仍然容易受到与公平有关的问题的影响，通常反映了它们训练数据中固有的偏见。这些偏见可能带来风险，特别是当LLMs被部署在医疗保健、金融和法律等敏感领域时。本文介绍了一种变形测试方法，用于系统地识别LLMs中的公平性缺陷。我们定义并应用了一组面向公平性的变形关系（MRs），以评估LLaMA和GPT模型，这是一种最先进的LLM，涵盖了各种人口统计输入。我们的方法包括为每个MR生成源和后续测试用例，并分析模型对公平性违规的响应。结果表明，变形测试在暴露偏见模式方面非常有效，特别是与语调和情感相关的偏见，并突出显示了那些频繁暴露公平性错误的敏感属性的具体交集。这项研究改进了LLMs中的公平性测试，提供了一种结构化方法来检测和减轻偏见，并提高了在对公平性敏感的应用中模型的稳健性。

更新时间: 2025-04-04 21:04:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.07982v1

Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment

In a prompt injection attack, an attacker injects a prompt into the original one, aiming to make an LLM follow the injected prompt to perform an attacker-chosen task. Existing attacks primarily focus on how to blend the injected prompt into the original prompt without altering the LLM itself. Our experiments show that these attacks achieve some success, but there is still significant room for improvement. In this work, we show that an attacker can boost the success of prompt injection attacks by poisoning the LLM's alignment process. Specifically, we propose PoisonedAlign, a method to strategically create poisoned alignment samples. When even a small fraction of the alignment data is poisoned using our method, the aligned LLM becomes more vulnerable to prompt injection while maintaining its foundational capabilities. The code is available at https://github.com/Sadcardation/PoisonedAlign

Updated: 2025-04-04 21:04:01

标题: Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment 通过毒化对齐增强对LLM的提示注入攻击

摘要: 在即时注入攻击中，攻击者将一个提示注入到原始提示中，旨在让LLM跟随注入的提示执行攻击者选择的任务。现有攻击主要关注如何将注入的提示融入原始提示而不改变LLM本身。我们的实验表明，这些攻击取得了一定的成功，但仍有很大的改进空间。在这项工作中，我们展示了攻击者可以通过毒化LLM的对齐过程来提高提示注入攻击的成功率。具体来说，我们提出了一种名为PoisonedAlign的方法，用于策略性地创建毒化对齐样本。当使用我们的方法毒化对齐数据的一小部分时，对齐的LLM在保持其基础功能的同时变得更容易受到提示注入攻击。代码可在https://github.com/Sadcardation/PoisonedAlign找到。

更新时间: 2025-04-04 21:04:01

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.14827v2

Language Models Are Implicitly Continuous

Language is typically modelled with discrete sequences. However, the most successful approaches to language modelling, namely neural networks, are continuous and smooth function approximators. In this work, we show that Transformer-based language models implicitly learn to represent sentences as continuous-time functions defined over a continuous input space. This phenomenon occurs in most state-of-the-art Large Language Models (LLMs), including Llama2, Llama3, Phi3, Gemma, Gemma2, and Mistral, and suggests that LLMs reason about language in ways that fundamentally differ from humans. Our work formally extends Transformers to capture the nuances of time and space continuity in both input and output space. Our results challenge the traditional interpretation of how LLMs understand language, with several linguistic and engineering implications.

Updated: 2025-04-04 21:01:20

标题: 语言模型是隐式连续的

摘要: 语言通常被建模为离散序列。然而，对于语言建模最成功的方法，即神经网络，是连续且平滑的函数逼近器。在这项工作中，我们展示了基于Transformer的语言模型隐式学习将句子表示为在连续输入空间上定义的连续时间函数。这种现象发生在大多数最先进的大型语言模型（LLMs）中，包括Llama2、Llama3、Phi3、Gemma、Gemma2和Mistral，并表明LLMs在处理语言方面与人类有根本的不同。我们的工作正式扩展了Transformers，以捕捉输入和输出空间中时间和空间连续性的微妙差异。我们的结果挑战了LLMs理解语言的传统解释，具有多个语言学和工程方面的影响。

更新时间: 2025-04-04 21:01:20

领域: cs.CL,cs.LG,I.2.7; I.2.6

下载: http://arxiv.org/abs/2504.03933v1

Adaptation of Large Language Models

This tutorial on adaptation of LLMs is designed to address the growing demand for models that go beyond the static capabilities of generic LLMs by providing an overview of dynamic, domain-specific, and task-adaptive LLM adaptation techniques. While general LLMs have demonstrated strong generalization across a variety of tasks, they often struggle to perform well in specialized domains such as finance, healthcare, and code generation for underrepresented languages. Additionally, their static nature limits their ability to evolve with the changing world, and they are often extremely large in size, making them impractical and costly to deploy at scale. As a result, the adaptation of LLMs has drawn much attention since the birth of LLMs and is of core importance, both for industry, which focuses on serving its targeted users, and academia, which can greatly benefit from small but powerful LLMs. To address this gap, this tutorial aims to provide an overview of the LLM adaptation techniques. We start with an introduction to LLM adaptation, from both the data perspective and the model perspective. We then emphasize how the evaluation metrics and benchmarks are different from other techniques. After establishing the problems, we explore various adaptation techniques. We categorize adaptation techniques into two main families. The first is parametric knowledge adaptation, which focuses on updating the parametric knowledge within LLMs. Additionally, we will discuss real-time adaptation techniques, including model editing, which allows LLMs to be updated dynamically in production environments. The second kind of adaptation is semi-parametric knowledge adaptation, where the goal is to update LLM parameters to better leverage external knowledge or tools through techniques like retrieval-augmented generation (RAG) and agent-based systems.

Updated: 2025-04-04 20:57:41

标题: 大型语言模型的适应性

摘要: 这个关于LLM适应性的教程旨在满足对超越通用LLM静态能力的模型日益增长的需求，提供动态、领域特定和任务自适应LLM适应技术的概述。虽然通用LLM在各种任务上展现出强大的泛化能力，但它们通常在专业领域（如金融、医疗保健和代表性语言的代码生成）中表现不佳。此外，它们的静态性质限制了它们随着世界变化的能力，并且它们通常非常庞大，使得它们在规模上部署不切实际和昂贵。因此，自LLM诞生以来，LLM的适应性引起了广泛关注，并且对于产业和学术界来说都至关重要，产业专注于为其目标用户提供服务，而学术界则可以从小而强大的LLM中获益颇丰。为了填补这一差距，本教程旨在提供LLM适应技术的概述。我们从数据和模型的角度介绍LLM适应，然后强调评估指标和基准与其他技术的不同。在明确问题后，我们探讨各种适应技术。我们将适应技术分为两大类。第一类是参数化知识适应，重点是更新LLM内的参数化知识。此外，我们将讨论实时适应技术，包括模型编辑，这允许LLM在生产环境中动态更新。第二种适应技术是半参数化知识适应，旨在通过检索增强生成（RAG）和基于代理的系统等技术更新LLM参数，以更好地利用外部知识或工具。

更新时间: 2025-04-04 20:57:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.03931v1

Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition

Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. In theory, autoregressive LLMs with Chain-of-Thought (CoT) can perform more serial computations to solve complex reasoning tasks. However, recent studies suggest that, despite this capacity, LLMs do not truly learn to reason but instead fit on statistical features. To study the reasoning capabilities in a principled fashion, we adopt a computational theory perspective and propose an experimental protocol centered on 3-SAT -- the prototypical NP-complete problem lying at the core of logical reasoning and constraint satisfaction tasks. Specifically, we examine the phase transitions in random 3-SAT and characterize the reasoning abilities of state-of-the-art LLMs by varying the inherent hardness of the problem instances. By comparing DeepSeek R1 with other LLMs, our findings reveal two key insights (1) LLM accuracy drops significantly on harder instances, suggesting all current models struggle when statistical shortcuts are unavailable (2) Unlike other LLMs, R1 shows signs of having learned the underlying reasoning. Following a principled experimental protocol, our study moves beyond the benchmark-driven evidence often found in LLM reasoning research. Our findings highlight important gaps and suggest clear directions for future research.

Updated: 2025-04-04 20:57:36

标题: 大型语言模型是否学会了推理？通过3-SAT相变的表征

摘要: 大型语言模型（LLMs）被宣传为具有先进推理能力的人工智能模型。理论上，具有链式思维（CoT）的自回归LLMs可以执行更多的串行计算来解决复杂的推理任务。然而，最近的研究表明，尽管具有这种能力，LLMs并非真正学会推理，而是适应了统计特征。为了以原则性的方式研究推理能力，我们采用了计算理论的视角，并提出了一个以3-SAT为中心的实验协议 - 这是逻辑推理和约束满足任务核心的典型NP完全问题。具体而言，我们通过改变问题实例的固有难度，研究随机3-SAT中的相变并表征最先进的LLMs的推理能力。通过将DeepSeek R1与其他LLMs进行比较，我们的发现揭示了两个关键见解：（1）LLM在更难的实例上的准确性显著下降，表明所有当前模型在统计快捷方式不可用时都会遇到困难（2）与其他LLMs不同，R1显示出已经学会了潜在的推理。通过遵循一个原则性的实验协议，我们的研究超越了通常在LLM推理研究中找到的基准驱动证据。我们的发现突出了重要的差距，并为未来研究提出了明确的方向。

更新时间: 2025-04-04 20:57:36

领域: cs.AI,cs.CC,cs.LG

下载: http://arxiv.org/abs/2504.03930v1

Level Generation with Constrained Expressive Range

Expressive range analysis is a visualization-based technique used to evaluate the performance of generative models, particularly in game level generation. It typically employs two quantifiable metrics to position generated artifacts on a 2D plot, offering insight into how content is distributed within a defined metric space. In this work, we use the expressive range of a generator as the conceptual space of possible creations. Inspired by the quality diversity paradigm, we explore this space to generate levels. To do so, we use a constraint-based generator that systematically traverses and generates levels in this space. To train the constraint-based generator we use different tile patterns to learn from the initial example levels. We analyze how different patterns influence the exploration of the expressive range. Specifically, we compare the exploration process based on time, the number of successful and failed sample generations, and the overall interestingness of the generated levels. Unlike typical quality diversity approaches that rely on random generation and hope to get good coverage of the expressive range, this approach systematically traverses the grid ensuring more coverage. This helps create unique and interesting game levels while also improving our understanding of the generator's strengths and limitations.

Updated: 2025-04-04 20:55:30

标题: 使用受限表达范围的层级生成

摘要: 表达范围分析是一种基于可视化的技术，用于评估生成模型的性能，特别是在游戏关卡生成中。它通常使用两个可量化的指标将生成的艺术品定位在二维图上，为我们提供了洞察内容在定义的度量空间中如何分布的见解。在这项工作中，我们将生成器的表达范围视为可能创作的概念空间。受到质量多样性范式的启发，我们探索这个空间来生成关卡。为此，我们使用一种基于约束的生成器，系统地遍历并在这个空间中生成关卡。为了训练基于约束的生成器，我们使用不同的瓷砖图案来学习自初始示例关卡。我们分析不同图案如何影响对表达范围的探索。具体来说，我们比较基于时间、成功和失败样本生成的数量以及生成关卡的整体趣味性的探索过程。与依赖随机生成并希望获得良好表达范围覆盖的典型质量多样性方法不同，这种方法系统地遍历网格，确保更多的覆盖。这有助于创建独特且有趣的游戏关卡，同时也提高了我们对生成器的优势和局限性的理解。

更新时间: 2025-04-04 20:55:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05334v1

Minmax Trend Filtering: Generalizations of Total Variation Minmax Trend Filtering: Generalizations of Total Variation Denoising via a Local Minmax/Maxmin Formula

Total Variation Denoising (TVD) is a fundamental denoising and smoothing method. In this article, we identify a new local minmax/maxmin formula producing two estimators which sandwich the univariate TVD estimator at every point. Operationally, this formula gives a local definition of TVD as a minmax/maxmin of a simple function of local averages. Moreover we find that this minmax/maxmin formula is generalizeable and can be used to define other TVD like estimators. In this article we propose and study higher order polynomial versions of TVD which are defined pointwise lying between minmax and maxmin optimizations of penalized local polynomial regressions over intervals of different scales. These appear to be new nonparametric regression methods, different from usual Trend Filtering and any other existing method in the nonparametric regression toolbox. We call these estimators Minmax Trend Filtering (MTF). We show how the proposed local definition of TVD/MTF estimator makes it tractable to bound pointwise estimation errors in terms of a local bias variance like trade-off. This type of local analysis of TVD/MTF is new and arguably simpler than existing analyses of TVD/Trend Filtering. In particular, apart from minimax rate optimality over bounded variation and piecewise polynomial classes, our pointwise estimation error bounds also enable us to derive local rates of convergence for (locally) Holder Smooth signals. These local rates offer a new pointwise explanation of local adaptivity of TVD/MTF instead of global (MSE) based justifications.

Updated: 2025-04-04 20:54:20

标题: 极小-极大趋势滤波：总变差的泛化通过局部极小极大/极大极小公式进行总变差去噪

摘要: 总变差去噪（TVD）是一种基本的去噪和平滑方法。在本文中，我们确定了一个新的局部最小最大/最大最小公式，产生两个估计量，这两个估计量在每个点上夹住了一维TVD估计量。在操作上，这个公式给出了TVD的局部定义，作为局部平均的一个简单函数的最小最大/最大最小。此外，我们发现这个最小最大/最大最小公式是可推广的，并可以用来定义其他类似TVD的估计量。在本文中，我们提出并研究了高阶多项式版本的TVD，这些版本被定义为点间的minmax和maxmin优化，这些优化是在不同尺度区间上对惩罚局部多项式回归进行的。这些似乎是新的非参数回归方法，不同于通常的趋势滤波和非参数回归工具箱中的任何其他现有方法。我们将这些估计量称为最小最大趋势滤波（MTF）。我们展示了所提出的TVD/MTF估计量的局部定义如何使得能够以局部偏差方差的权衡来限制点间估计误差。这种TVD/MTF的局部分析类型是新的，可以说比现有的TVD/趋势滤波的分析更简单。特别地，除了在有界变化和分段多项式类中的极小极大率最优性之外，我们的点间估计误差界限还使我们能够推导出（局部）Holder光滑信号的局部收敛率。这些局部收敛率提供了一个新的点间解释，说明了TVD/MTF的局部适应性，而不是基于全局（MSE）的理由。

更新时间: 2025-04-04 20:54:20

领域: math.ST,cs.LG,stat.TH

下载: http://arxiv.org/abs/2410.03041v2

Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces

Existing approaches remain largely constrained by traditional distance metrics, limiting their effectiveness in handling random data. In this work, we introduce the first k-means variant in the literature that operates within a probabilistic metric space, replacing conventional distance measures with a well-defined distance distribution function. This pioneering approach enables more flexible and robust clustering in both deterministic and random datasets, establishing a new foundation for clustering in stochastic environments. By adopting a probabilistic perspective, our method not only introduces a fresh paradigm but also establishes a rigorous theoretical framework that is expected to serve as a key reference for future clustering research involving random data. Extensive experiments on diverse real and synthetic datasets assess our model's effectiveness using widely recognized evaluation metrics, including Silhouette, Davies-Bouldin, Calinski Harabasz, the adjusted Rand index, and distortion. Comparative analyses against established methods such as k-means++, fuzzy c-means, and kernel probabilistic k-means demonstrate the superior performance of our proposed random normed k-means (RNKM) algorithm. Notably, RNKM exhibits a remarkable ability to identify nonlinearly separable structures, making it highly effective in complex clustering scenarios. These findings position RNKM as a groundbreaking advancement in clustering research, offering a powerful alternative to traditional techniques while addressing a long-standing gap in the literature. By bridging probabilistic metrics with clustering, this study provides a foundational reference for future developments and opens new avenues for advanced data analysis in dynamic, data-driven applications.

Updated: 2025-04-04 20:48:43

标题: 随机规范化k均值算法：在概率度量空间中聚类的范式转变

摘要: 现有方法在处理随机数据时往往受传统距离度量的限制，限制了它们的有效性。在这项工作中，我们介绍了文献中首个在概率度量空间内运行的k-means变体，用一个明确定义的距离分布函数取代传统的距离度量。这一开创性方法使得在确定性和随机数据集中进行更加灵活和健壮的聚类成为可能，为在随机环境中进行聚类奠定了新的基础。通过采用概率视角，我们的方法不仅引入了一个新的范式，还建立了一个严格的理论框架，预计将成为未来涉及随机数据的聚类研究的关键参考。在多样的真实和合成数据集上进行的广泛实验评估了我们模型的有效性，使用了广泛认可的评估指标，包括Silhouette、Davies-Bouldin、Calinski Harabasz、调整后的Rand指数和失真等。与已建立的方法如k-means++、模糊c-means和核概率k-means的比较分析显示出我们提出的随机规范化k-means（RNKM）算法的优越性能。值得注意的是，RNKM表现出出色的能力，能够识别非线性可分离结构，在复杂的聚类场景中非常有效。这些发现将RNKM定位为聚类研究中的一个开创性进展，为传统技术提供了一个强大的替代方案，同时填补了文献中长期存在的空白。通过将概率度量与聚类相结合，这项研究为未来的发展提供了一个基础参考，并为动态、数据驱动应用中的高级数据分析开辟了新途径。

更新时间: 2025-04-04 20:48:43

领域: cs.LG,math.PR,stat.ML

下载: http://arxiv.org/abs/2504.03928v1

An Exploration-free Method for a Linear Stochastic Bandit Driven by a Linear Gaussian Dynamical System

In stochastic multi-armed bandits, a major problem the learner faces is the trade-off between exploration and exploitation. Recently, exploration-free methods -- methods that commit to the action predicted to return the highest reward -- have been studied from the perspective of linear bandits. In this paper, we introduce a linear bandit setting where the reward is the output of a linear Gaussian dynamical system. Motivated by a problem encountered in hyperparameter optimization for reinforcement learning, where the number of actions is much higher than the number of training iterations, we propose Kalman filter Observability Dependent Exploration (KODE), an exploration-free method that utilizes the Kalman filter predictions to select actions. Our major contribution of this work is our analysis of the performance of the proposed method, which is dependent on the observability properties of the underlying linear Gaussian dynamical system. We evaluate KODE via two different metrics: regret, which is the cumulative expected difference between the highest possible reward and the reward sampled by KODE, and action alignment, which measures how closely KODE's chosen action aligns with the linear Gaussian dynamical system's state variable. To provide intuition on the performance, we prove that KODE implicitly encourages the learner to explore actions depending on the observability of the linear Gaussian dynamical system. This method is compared to several well-known stochastic multi-armed bandit algorithms to validate our theoretical results.

Updated: 2025-04-04 20:46:35

标题: 一个不需要探索的方法，用于由线性高斯动态系统驱动的线性随机老虎机

摘要: 在随机多臂赌博机中，学习者面临的一个重要问题是探索和利用之间的权衡。最近，从线性赌博机的角度研究了无探索方法--即致力于预测将返回最高奖励的操作的方法。在本文中，我们介绍了一个线性赌博机设置，其中奖励是线性高斯动态系统的输出。受强化学习中超参数优化中遇到的问题的启发，其中操作数量远远超过训练迭代次数，我们提出了一种称为卡尔曼滤波可观测性依赖探索（KODE）的无探索方法，该方法利用卡尔曼滤波器的预测来选择操作。本文的主要贡献是对所提出方法的性能进行分析，该性能取决于底层线性高斯动态系统的可观测性质。我们通过两种不同的指标评估KODE：遗憾，即最高可能奖励与KODE采样奖励之间的累积预期差异，以及操作对齐度，用于衡量KODE选择操作与线性高斯动态系统状态变量之间的接近程度。为了提供关于性能的直观理解，我们证明KODE隐含地鼓励学习者根据线性高斯动态系统的可观测性来探索操作。将该方法与几种众所周知的随机多臂赌博机算法进行比较，以验证我们的理论结果。

更新时间: 2025-04-04 20:46:35

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.03926v1

Generative AI Enhanced Financial Risk Management Information Retrieval

Risk management in finance involves recognizing, evaluating, and addressing financial risks to maintain stability and ensure regulatory compliance. Extracting relevant insights from extensive regulatory documents is a complex challenge requiring advanced retrieval and language models. This paper introduces RiskData, a dataset specifically curated for finetuning embedding models in risk management, and RiskEmbed, a finetuned embedding model designed to improve retrieval accuracy in financial question-answering systems. The dataset is derived from 94 regulatory guidelines published by the Office of the Superintendent of Financial Institutions (OSFI) from 1991 to 2024. We finetune a state-of-the-art sentence BERT embedding model to enhance domain-specific retrieval performance typically for Retrieval-Augmented Generation (RAG) systems. Experimental results demonstrate that RiskEmbed significantly outperforms general-purpose and financial embedding models, achieving substantial improvements in ranking metrics. By open-sourcing both the dataset and the model, we provide a valuable resource for financial institutions and researchers aiming to develop more accurate and efficient risk management AI solutions.

Updated: 2025-04-04 20:42:38

标题: 生成式人工智能增强财务风险管理信息检索

摘要: 在金融领域的风险管理涉及识别、评估和解决金融风险，以维持稳定并确保遵守监管规定。从广泛的监管文件中提取相关见解是一个复杂的挑战，需要先进的检索和语言模型。本文介绍了RiskData，这是一个专门为风险管理中微调嵌入模型而精心策划的数据集，以及RiskEmbed，这是一个经过微调的嵌入模型，旨在提高金融问答系统中的检索准确性。该数据集来源于1991年至2024年由金融机构监督官办公室（OSFI）发布的94份监管指南。我们对最先进的句子BERT嵌入模型进行微调，以增强领域特定的检索性能，通常用于检索增强生成（RAG）系统。实验结果表明，RiskEmbed明显优于通用和金融嵌入模型，在排名指标上取得了显著的改进。通过开源数据集和模型，我们为金融机构和研究人员提供了一个有价值的资源，旨在开发更准确和高效的风险管理人工智能解决方案。

更新时间: 2025-04-04 20:42:38

领域: q-fin.RM,cs.LG

下载: http://arxiv.org/abs/2504.06293v1

PerLA: Perceptive 3D Language Assistant

Enabling Large Language Models (LLMs) to understand the 3D physical world is an emerging yet challenging research direction. Current strategies for processing point clouds typically downsample the scene or divide it into smaller parts for separate analysis. However, both approaches risk losing key local details or global contextual information. In this paper, we introduce PerLA, a 3D language assistant designed to be more perceptive to both details and context, making visual representations more informative for the LLM. PerLA captures high-resolution (local) details in parallel from different point cloud areas and integrates them with (global) context obtained from a lower-resolution whole point cloud. We present a novel algorithm that preserves point cloud locality through the Hilbert curve and effectively aggregates local-to-global information via cross-attention and a graph neural network. Lastly, we introduce a novel loss for local representation consensus to promote training stability. PerLA outperforms state-of-the-art 3D language assistants, with gains of up to +1.34 CiDEr on ScanQA for question answering, and +4.22 on ScanRefer and +3.88 on Nr3D for dense captioning. https://gfmei.github.io/PerLA/

Updated: 2025-04-04 20:29:02

标题: PerLA: 三维感知语言助手

摘要: 使大型语言模型（LLMs）理解三维物理世界是一个新兴且具有挑战性的研究方向。目前处理点云的策略通常是对场景进行降采样或将其分成较小的部分进行单独分析。然而，这两种方法都存在丢失关键局部细节或全局上下文信息的风险。在本文中，我们介绍了PerLA，一个设计更具感知性以便同时关注细节和上下文的三维语言助手，使视觉表示对LLM更具信息性。PerLA从不同点云区域并行捕获高分辨率（局部）细节，并将其与从较低分辨率整个点云中获取的（全局）上下文集成。我们提出了一种通过Hilbert曲线保留点云局部性并通过交叉注意力和图神经网络有效聚合局部到全局信息的新算法。最后，我们引入了一种新的损失函数用于促进局部表示一致性的训练稳定性。PerLA在超越最先进的3D语言助手方面表现出色，对于问题回答，在ScanQA上的增益可达+1.34 CiDEr，对于ScanRefer和Nr3D的密集字幕分别为+4.22和+3.88。

更新时间: 2025-04-04 20:29:02

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.19774v2

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Dataset distillation (DD) allows datasets to be distilled to fractions of their original size while preserving the rich distributional information, so that models trained on the distilled datasets can achieve a comparable accuracy while saving significant computational loads. Recent research in this area has been focusing on improving the accuracy of models trained on distilled datasets. In this paper, we aim to explore a new perspective of DD. We study how to embed adversarial robustness in distilled datasets, so that models trained on these datasets maintain the high accuracy and meanwhile acquire better adversarial robustness. We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training. Extensive empirical experiments suggest that our method not only outperforms standard adversarial training on both accuracy and robustness with less computation overhead but is also capable of generating robust distilled datasets that can withstand various adversarial attacks. Our implementation is available at: https://github.com/yumozi/GUARD.

Updated: 2025-04-04 20:27:58

标题: 朝向通过曲率正则化实现对抗性鲁棒数据集提炼

摘要: 数据集精简（DD）允许将数据集精简到原始大小的一部分，同时保留丰富的分布信息，因此在经过精简的数据集上训练的模型可以达到可比较的准确性，同时节省大量的计算负载。最近在这一领域的研究一直致力于提高在精简数据集上训练的模型的准确性。在本文中，我们旨在探索DD的一个新视角。我们研究如何在精简数据集中嵌入对抗鲁棒性，以便在这些数据集上训练的模型保持高准确性的同时获得更好的对抗鲁棒性。我们提出了一种新方法，通过在精简过程中引入曲率正则化来实现这一目标，其计算开销远低于标准的对抗训练。大量的实证实验表明，我们的方法不仅在准确性和鲁棒性方面优于标准的对抗训练，并且具有更少的计算开销，而且能够生成能够抵御各种对抗攻击的鲁棒精简数据集。我们的实现可在以下链接找到：https://github.com/yumozi/GUARD。

更新时间: 2025-04-04 20:27:58

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.10045v4

RF-BayesPhysNet: A Bayesian rPPG Uncertainty Estimation Method for Complex Scenarios

Remote photoplethysmography (rPPG) technology infers heart rate by capturing subtle color changes in facial skin using a camera, demonstrating great potential in non-contact heart rate measurement. However, measurement accuracy significantly decreases in complex scenarios such as lighting changes and head movements compared to ideal laboratory conditions. Existing deep learning models often neglect the quantification of measurement uncertainty, limiting their credibility in dynamic scenes. To address the issue of insufficient rPPG measurement reliability in complex scenarios, this paper introduces Bayesian neural networks to the rPPG field for the first time, proposing the Robust Fusion Bayesian Physiological Network (RF-BayesPhysNet), which can model both aleatoric and epistemic uncertainty. It leverages variational inference to balance accuracy and computational efficiency. Due to the current lack of uncertainty estimation metrics in the rPPG field, this paper also proposes a new set of methods, using Spearman correlation coefficient, prediction interval coverage, and confidence interval width, to measure the effectiveness of uncertainty estimation methods under different noise conditions. Experiments show that the model, with only double the parameters compared to traditional network models, achieves a MAE of 2.56 on the UBFC-RPPG dataset, surpassing most models. It demonstrates good uncertainty estimation capability in no-noise and low-noise conditions, providing prediction confidence and significantly enhancing robustness in real-world applications. We have open-sourced the code at https://github.com/AIDC-rPPG/RF-Net

Updated: 2025-04-04 20:24:57

标题: RF-BayesPhysNet：一种用于复杂场景的贝叶斯rPPG不确定性估计方法

摘要: 远程光电容积脉动技术(rPPG)通过捕捉面部皮肤微小颜色变化来推断心率，利用摄像头展示了非接触心率测量的巨大潜力。然而，与理想的实验室条件相比，在复杂情景（如光照变化和头部运动）中，测量精度显著降低。现有的深度学习模型通常忽视了测量不确定性的量化，限制了它们在动态场景中的可信度。为了解决复杂情景下rPPG测量可靠性不足的问题，本文首次将贝叶斯神经网络引入rPPG领域，提出了Robust Fusion Bayesian Physiological Network (RF-BayesPhysNet)，可以建模aleatoric和epistemic不确定性。它利用变分推断来平衡准确性和计算效率。由于目前rPPG领域缺乏不确定性估计指标，本文还提出了一套新的方法，使用Spearman相关系数、预测区间覆盖率和置信区间宽度，来衡量在不同噪声条件下不确定性估计方法的有效性。实验表明，该模型仅比传统网络模型多出双倍参数，在UBFC-RPPG数据集上实现了2.56的MAE，超过了大多数模型。它在无噪声和低噪声条件下表现出良好的不确定性估计能力，提供预测信心，并在真实应用中显著增强了鲁棒性。我们已经在https://github.com/AIDC-rPPG/RF-Net开源了代码。

更新时间: 2025-04-04 20:24:57

领域: cs.LG,cs.AI,68T07, 62F15, 94A12,I.2.6; I.5.4; C.3

下载: http://arxiv.org/abs/2504.03915v1

Opening the Black-Box: Symbolic Regression with Kolmogorov-Arnold Networks for Energy Applications

While most modern machine learning methods offer speed and accuracy, few promise interpretability or explainability -- two key features necessary for highly sensitive industries, like medicine, finance, and engineering. Using eight datasets representative of one especially sensitive industry, nuclear power, this work compares a traditional feedforward neural network (FNN) to a Kolmogorov-Arnold Network (KAN). We consider not only model performance and accuracy, but also interpretability through model architecture and explainability through a post-hoc SHAP analysis. In terms of accuracy, we find KANs and FNNs comparable across all datasets, when output dimensionality is limited. KANs, which transform into symbolic equations after training, yield perfectly interpretable models while FNNs remain black-boxes. Finally, using the post-hoc explainability results from Kernel SHAP, we find that KANs learn real, physical relations from experimental data, while FNNs simply produce statistically accurate results. Overall, this analysis finds KANs a promising alternative to traditional machine learning methods, particularly in applications requiring both accuracy and comprehensibility.

Updated: 2025-04-04 20:23:33

标题: 打开黑匣子：应用于能源的科尔莫戈洛夫-阿诺德网络的符号回归

摘要: 尽管大多数现代机器学习方法提供速度和准确性，但很少有方法承诺可解释性或可解释性--这两个对于高度敏感的行业（如医疗、金融和工程）至关重要的特征。本研究使用代表一种特别敏感行业--核能源的八个数据集，将传统的前馈神经网络（FNN）与Kolmogorov-Arnold网络（KAN）进行比较。我们不仅考虑模型性能和准确性，还通过模型架构和事后的SHAP分析考虑可解释性。在准确性方面，当输出维度受限时，我们发现KAN和FNN在所有数据集上都是可比较的。经过训练后，KAN会转化为符号方程，产生完全可解释的模型，而FNN仍然是黑盒模型。最后，通过核SHAP的事后可解释性结果，我们发现KAN从实验数据中学习到真实的物理关系，而FNN仅仅产生统计上准确的结果。总体而言，本分析发现KAN是传统机器学习方法的一种有前途的替代品，特别适用于需要准确性和可理解性的应用场景。

更新时间: 2025-04-04 20:23:33

领域: cs.LG,cs.SC,stat.ML

下载: http://arxiv.org/abs/2504.03913v1

Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator

Although adaptive optimization algorithms have been successful in many applications, there are still some mysteries in terms of convergence analysis that have not been unraveled. This paper provides a novel non-convex analysis of adaptive optimization to uncover some of these mysteries. Our contributions are three-fold. First, we show that an increasing or large enough momentum parameter for the first-order moment used in practice is sufficient to ensure the convergence of adaptive algorithms whose adaptive scaling factors of the step size are bounded. Second, our analysis gives insights for practical implementations, e.g., increasing the momentum parameter in a stage-wise manner in accordance with stagewise decreasing step size would help improve the convergence. Third, the modular nature of our analysis allows its extension to solving other optimization problems, e.g., compositional, min-max and bilevel problems. As an interesting yet non-trivial use case, we present algorithms for solving non-convex min-max optimization and bilevel optimization that do not require using large batches of data to estimate gradients or double loops as the literature do. Our empirical studies corroborate our theoretical results.

Updated: 2025-04-04 20:13:36

标题: 自适应优化中带移动平均估计器的统一收敛分析

摘要: 尽管自适应优化算法在许多应用中取得了成功，但在收敛分析方面仍存在一些未解之谜。本文提供了自适应优化的新颖非凸分析，以揭示其中一些谜团。我们的贡献有三个方面。首先，我们表明，在实践中使用的一阶矩动量参数递增或足够大就足以确保自适应算法的收敛，其中步长的自适应缩放因子是有界的。其次，我们的分析为实际实现提供了见解，例如，根据分阶递减的步长逐渐增加动量参数将有助于提高收敛速度。第三，我们的分析的模块化性质使其可以扩展到解决其他优化问题，例如，组合、最小-最大和双层问题。作为一个有趣但非平凡的应用案例，我们提出了用于解决非凸最小-最大优化和双层优化的算法，这些算法不需要使用大批量数据来估计梯度或像文献中那样使用双循环。我们的实证研究证实了我们的理论结果。

更新时间: 2025-04-04 20:13:36

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2104.14840v7

Secure Federated XGBoost with CUDA-accelerated Homomorphic Encryption via NVIDIA FLARE

Federated learning (FL) enables collaborative model training across decentralized datasets. NVIDIA FLARE's Federated XGBoost extends the popular XGBoost algorithm to both vertical and horizontal federated settings, facilitating joint model development without direct data sharing. However, the initial implementation assumed mutual trust over the sharing of intermediate gradient statistics produced by the XGBoost algorithm, leaving potential vulnerabilities to honest-but-curious adversaries. This work introduces "Secure Federated XGBoost", an efficient solution to mitigate these risks. We implement secure federated algorithms for both vertical and horizontal scenarios, addressing diverse data security patterns. To secure the messages, we leverage homomorphic encryption (HE) to protect sensitive information during training. A novel plugin and processor interface seamlessly integrates HE into the Federated XGBoost pipeline, enabling secure aggregation over ciphertexts. We present both CPU-based and CUDA-accelerated HE plugins, demonstrating significant performance gains. Notably, our CUDA-accelerated HE implementation achieves up to 30x speedups in vertical Federated XGBoost compared to existing third-party solutions. By securing critical computation steps and encrypting sensitive assets, Secure Federated XGBoost provides robust data privacy guarantees, reinforcing the fundamental benefits of federated learning while maintaining high performance.

Updated: 2025-04-04 20:08:24

标题: 使用NVIDIA FLARE进行CUDA加速同态加密的安全联邦XGBoost

摘要: 联邦学习（FL）实现了跨分散数据集的协作模型训练。NVIDIA FLARE的Federated XGBoost将流行的XGBoost算法扩展到垂直和水平联邦设置，促进了联合模型开发，而无需直接共享数据。然而，最初的实现假设在XGBoost算法产生的中间梯度统计数据共享方面存在相互信任，这可能给诚实但好奇的对手留下漏洞。本研究介绍了“安全联邦XGBoost”，这是一个有效的解决方案来减轻这些风险。我们为垂直和水平情况实施了安全的联邦算法，解决了不同的数据安全模式。为了保护消息安全，我们利用同态加密（HE）来在训练过程中保护敏感信息。一种新颖的插件和处理器接口无缝集成了HE到Federated XGBoost管道中，实现了对密文的安全聚合。我们提供了基于CPU和CUDA加速的HE插件，展示了显著的性能提升。值得注意的是，我们的CUDA加速HE实现与现有的第三方解决方案相比，在垂直Federated XGBoost中实现了高达30倍的加速。通过保护关键计算步骤并加密敏感资产，安全联邦XGBoost提供了坚固的数据隐私保证，巩固了联邦学习的基本优势，同时保持高性能。

更新时间: 2025-04-04 20:08:24

领域: cs.CR,cs.DC,cs.ET

下载: http://arxiv.org/abs/2504.03909v1

From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems

Consider the math problem: "Lily received 3 cookies from her best friend yesterday and ate 5 for breakfast. Today, her friend gave her 3 more cookies. How many cookies does Lily have now?" Many large language models (LLMs) in previous research approach this problem by calculating the answer "1" using the equation "3 - 5 + 3." However, from a human perspective, we recognize the inherent flaw in this problem: Lily cannot eat 5 cookies if she initially only had 3. This discrepancy prompts a key question: Are current LLMs merely Blind Solver that apply mathematical operations without deeper reasoning, or can they function as Logical Thinker capable of identifying logical inconsistencies? To explore this question, we propose a benchmark dataset, FaultyMath, which includes faulty math problems of rich diversity: i) multiple mathematical categories, e.g., algebra, geometry, number theory, etc., ii) varying levels of difficulty, and iii) different origins of faultiness -- ranging from violations of common sense and ambiguous statements to mathematical contradictions and more. We evaluate a broad spectrum of LLMs, including open-source, closed-source, and math-specialized models, using FaultyMath across three dimensions: (i) How accurately can the models detect faulty math problems without being explicitly prompted to do so? (ii) When provided with hints -- either correct or misleading -- about the validity of the problems, to what extent do LLMs adapt to become reliable Logical Thinker? (iii) How trustworthy are the explanations generated by LLMs when they recognize a math problem as flawed? Through extensive experimentation and detailed analysis, our results demonstrate that existing LLMs largely function as Blind Solver and fall short of the reasoning capabilities required to perform as Logical Thinker.

Updated: 2025-04-04 20:06:36

标题: 从盲目求解者到逻辑思考者：在有缺陷的数学问题上对LLMs逻辑完整性的基准测试

摘要: 考虑数学问题：“莉莉昨天从她最好的朋友那里收到3块饼干，早餐吃了5块。今天，她的朋友再给了她3块饼干。莉莉现在有多少块饼干？”许多先前研究中的大型语言模型（LLMs）通过使用方程“3-5+3”计算答案“1”来解决这个问题。然而，从人类的角度来看，我们意识到了这个问题的固有缺陷：如果她最初只有3块饼干，莉莉不可能吃掉5块饼干。这种差异引发了一个关键问题：当前的LLMs是否仅仅是盲目求解器，只是应用数学运算而缺乏更深层次的推理，还是能够作为具有识别逻辑矛盾能力的逻辑思考者？为了探讨这个问题，我们提出了一个基准数据集，名为FaultyMath，其中包含丰富多样的错误数学问题：i）多种数学类别，例如代数，几何，数论等，ii）不同难度级别，以及iii）不同的错误来源--从违反常识和模糊陈述到数学矛盾等。我们评估了广泛的LLMs，包括开源、闭源和数学专业模型，使用FaultyMath跨三个维度：（i）模型在没有明确提示的情况下能够多准确地检测错误的数学问题？（ii）当提供关于问题有效性的提示--无论是正确的还是误导性的--时，LLMs在多大程度上能够调整自己成为可靠的逻辑思考者？（iii）当LLMs意识到一个数学问题存在缺陷时，生成的解释有多可信？通过广泛的实验和详细分析，我们的结果表明，现有的LLMs在很大程度上只是充当盲目求解器，缺乏作为逻辑思考者所需的推理能力。

更新时间: 2025-04-04 20:06:36

领域: cs.CL,cs.AI,cs.LO

下载: http://arxiv.org/abs/2410.18921v2

XpertAI: uncovering regression model strategies for sub-manifolds

In recent years, Explainable AI (XAI) methods have facilitated profound validation and knowledge extraction from ML models. While extensively studied for classification, few XAI solutions have addressed the challenges specific to regression models. In regression, explanations need to be precisely formulated to address specific user queries (e.g.\ distinguishing between `Why is the output above 0?' and `Why is the output above 50?'). They should furthermore reflect the model's behavior on the relevant data sub-manifold. In this paper, we introduce XpertAI, a framework that disentangles the prediction strategy into multiple range-specific sub-strategies and allows the formulation of precise queries about the model (the `explanandum') as a linear combination of those sub-strategies. XpertAI is formulated generally to work alongside popular XAI attribution techniques, based on occlusion, gradient integration, or reverse propagation. Qualitative and quantitative results, demonstrate the benefits of our approach.

Updated: 2025-04-04 19:52:12

标题: XpertAI：揭示子流形回归模型策略

摘要: 最近，可解释人工智能（XAI）方法已经促进了从机器学习模型中深入验证和知识提取。虽然在分类方面进行了广泛研究，但很少有XAI解决方案解决了特定于回归模型的挑战。在回归中，解释需要精确地制定以应对特定用户查询（例如区分“为什么输出值大于0？”和“为什么输出值大于50？”）。此外，它们还应反映模型在相关数据子流形上的行为。在本文中，我们介绍了XpertAI，这是一个将预测策略分解为多个特定范围子策略的框架，并允许将关于模型（“解释物”）的精确查询表述为这些子策略的线性组合。XpertAI被通常制定为与基于遮挡、梯度整合或反向传播的流行XAI归因技术一起使用。定性和定量结果表明了我们方法的优势。

更新时间: 2025-04-04 19:52:12

领域: cs.LG

下载: http://arxiv.org/abs/2403.07486v3

SIGMA: Single Interpolated Generative Model for Anomalies

A key step in any resonant anomaly detection search is accurate modeling of the background distribution in each signal region. Data-driven methods like CATHODE accomplish this by training separate generative models on the complement of each signal region, and interpolating them into their corresponding signal regions. Having to re-train the generative model on essentially the entire dataset for each signal region is a major computational cost in a typical sliding window search with many signal regions. Here, we present SIGMA, a new, fully data-driven, computationally-efficient method for estimating background distributions. The idea is to train a single generative model on all of the data and interpolate its parameters in sideband regions in order to obtain a model for the background in the signal region. The SIGMA method significantly reduces the computational cost compared to previous approaches, while retaining a similar high quality of background modeling and sensitivity to anomalous signals.

Updated: 2025-04-04 19:46:57

标题: SIGMA：单插值生成模型用于异常检测

摘要: 在任何共振异常检测搜索中的一个关键步骤是准确建模每个信号区域中的背景分布。像CATHODE这样的数据驱动方法通过在每个信号区域的补集上训练单独的生成模型，并将它们插值到相应的信号区域中来实现这一点。在典型的滑动窗口搜索中，对每个信号区域基本上需要重新训练生成模型是一个主要的计算成本。在这里，我们提出了SIGMA，一种新的完全数据驱动、计算效率高的估计背景分布的方法。其思想是在所有数据上训练一个单一的生成模型，并在旁边区域内插值其参数，以获得信号区域的背景模型。与先前的方法相比，SIGMA方法显著降低了计算成本，同时保留了类似的高质量背景建模和对异常信号的敏感性。

更新时间: 2025-04-04 19:46:57

领域: hep-ph,cs.LG,hep-ex,physics.data-an

下载: http://arxiv.org/abs/2410.20537v2

Stochastic Variational Inference with Tuneable Stochastic Annealing

In this paper, we exploit the observation that stochastic variational inference (SVI) is a form of annealing and present a modified SVI approach -- applicable to both large and small datasets -- that allows the amount of annealing done by SVI to be tuned. We are motivated by the fact that, in SVI, the larger the batch size the more approximately Gaussian is the intrinsic noise, but the smaller its variance. This low variance reduces the amount of annealing which is needed to escape bad local optimal solutions. We propose a simple method for achieving both goals of having larger variance noise to escape bad local optimal solutions and more data information to obtain more accurate gradient directions. The idea is to set an actual batch size, which may be the size of the data set, and a smaller effective batch size that matches the larger level of variance at this smaller batch size. The result is an approximation to the maximum entropy stochastic gradient at this variance level. We theoretically motivate our approach for the framework of conjugate exponential family models and illustrate the method empirically on the probabilistic matrix factorization collaborative filter, the Latent Dirichlet Allocation topic model, and the Gaussian mixture model.

Updated: 2025-04-04 19:46:10

标题: 可调节随机退火的随机变分推断

摘要: 在这篇论文中，我们利用了随机变分推断（SVI）是一种退火形式的观察，并提出了一种修改的SVI方法，适用于大型和小型数据集，允许调整SVI所做的退火量。我们的动机是，在SVI中，批量大小越大，内在噪声就越接近高斯分布，但方差越小。这种低方差减少了逃离不良局部最优解所需的退火量。我们提出了一种简单的方法，旨在实现两个目标：具有更大方差噪声以逃离不良局部最优解，以及获得更多数据信息以获得更准确的梯度方向。这个想法是设置一个实际的批量大小，可能是数据集的大小，以及一个较小的有效批量大小，与这个较小批量大小上的更大方差水平相匹配。结果是在这个方差水平上的最大熵随机梯度的近似。我们在共轭指数族模型框架下从理论上证明了我们的方法，并在概率矩阵分解协同过滤器、潜在狄利克雷分配主题模型和高斯混合模型上实证了这种方法。

更新时间: 2025-04-04 19:46:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.03902v1

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

Retrieval from graph data is crucial for augmenting large language models (LLM) with both open-domain knowledge and private enterprise data, and it is also a key component in the recent GraphRAG system (edge et al., 2024). Despite decades of research on knowledge graphs and knowledge base question answering, leading LLM frameworks (e.g. Langchain and LlamaIndex) have only minimal support for retrieval from modern encyclopedic knowledge graphs like Wikidata. In this paper, we analyze the root cause and suggest that modern RDF knowledge graphs (e.g. Wikidata, Freebase) are less efficient for LLMs due to overly large schemas that far exceed the typical LLM context window, use of resource identifiers, overlapping relation types and lack of normalization. As a solution, we propose property graph views on top of the underlying RDF graph that can be efficiently queried by LLMs using Cypher. We instantiated this idea on Wikidata and introduced CypherBench, the first benchmark with 11 large-scale, multi-domain property graphs with 7.8 million entities and over 10,000 questions. To achieve this, we tackled several key challenges, including developing an RDF-to-property graph conversion engine, creating a systematic pipeline for text-to-Cypher task generation, and designing new evaluation metrics.

Updated: 2025-04-04 19:44:30

标题: CypherBench：朝向在LLM时代全面现代知识图谱的精准检索

摘要: 从图数据中检索对于增强大型语言模型（LLM）具有至关重要的意义，可以为其提供开放领域知识和私人企业数据，也是最近的GraphRAG系统的关键组成部分（edge等，2024年）。尽管在知识图和知识库问答方面进行了数十年的研究，但领先的LLM框架（例如Langchain和LlamaIndex）仅对现代百科知识图（如Wikidata）的检索提供了最低限度的支持。本文分析了根本原因，并建议现代RDF知识图（例如Wikidata、Freebase）对于LLM不够高效，原因是其架构过于庞大，远远超出了典型的LLM上下文窗口，使用资源标识符、重叠关系类型和缺乏规范化。作为解决方案，我们提出了在底层RDF图上的属性图视图，可以通过使用Cypher有效地查询LLM。我们在Wikidata上实现了这一想法，并引入了CypherBench，这是一个具有780万实体和超过10,000个问题的11个大规模、多领域属性图的第一个基准测试。为了实现这一目标，我们解决了几个关键挑战，包括开发一个RDF到属性图转换引擎，创建一个系统化的文本到Cypher任务生成管道，并设计新的评估指标。

更新时间: 2025-04-04 19:44:30

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2412.18702v2

Leveraging Gait Patterns as Biomarkers: An attention-guided Deep Multiple Instance Learning Network for Scoliosis Classification

Scoliosis is a spinal curvature disorder that is difficult to detect early and can compress the chest cavity, impacting respiratory function and cardiac health. Especially for adolescents, delayed detection and treatment result in worsening compression. Traditional scoliosis detection methods heavily rely on clinical expertise, and X-ray imaging poses radiation risks, limiting large-scale early screening. We propose an Attention-Guided Deep Multi-Instance Learning method (Gait-MIL) to effectively capture discriminative features from gait patterns, which is inspired by ScoNet-MT's pioneering use of gait patterns for scoliosis detection. We evaluate our method on the first large-scale dataset based on gait patterns for scoliosis classification. The results demonstrate that our study improves the performance of using gait as a biomarker for scoliosis detection, significantly enhances detection accuracy for the particularly challenging Neutral cases, where subtle indicators are often overlooked. Our Gait-MIL also performs robustly in imbalanced scenarios, making it a promising tool for large-scale scoliosis screening.

Updated: 2025-04-04 19:35:33

标题: 利用步态模式作为生物标志物：一种基于注意力引导的深度多实例学习网络用于脊柱侧弯分类

摘要: 脊柱侧弯是一种难以早期检测的脊柱曲度障碍，可能会压缩胸腔，影响呼吸功能和心脏健康。特别是对于青少年来说，延迟检测和治疗会导致压缩恶化。传统的脊柱侧弯检测方法严重依赖临床专业知识，X射线成像会带来辐射风险，限制了大规模早期筛查。我们提出了一个注意引导的深度多实例学习方法（Gait-MIL），以有效捕捉步态模式中的判别特征，这受到ScoNet-MT对步态模式进行脊柱侧弯检测的开创性使用的启发。我们在第一个基于步态模式的大规模数据集上评估了我们的方法用于脊柱侧弯分类。结果表明，我们的研究提高了将步态作为脊柱侧弯检测生物标志的性能，显著提高了尤其具有挑战性的中性病例的检测准确性，这些情况下常常会忽视细微的指标。我们的Gait-MIL在不平衡场景中也表现稳健，使其成为大规模脊柱侧弯筛查的有希望的工具。

更新时间: 2025-04-04 19:35:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03894v1

Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats

We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions, leveraging Xilinx's Vitis AI (VAI) framework and Deep Learning Processing Unit (DPU), a programmable engine with pre-implemented, parameterizable IP cores optimized for deep neural networks, on a Zynq UltraScale+ MPSoC. This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity. Applying channel pruning, we achieved substantial reductions in model parameters (up to 98.6%) and floating-point operations (up to 90.7%) with minimal accuracy loss. Furthermore, the VAI tool was used to quantize the models to 8-bit precision, ensuring optimized hardware performance with negligible impact on accuracy. All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning. The image-wise Scene-Net and U-Net models demonstrated strong real-time inference capabilities, achieving frame rates per second of 57.14 and 37.45, respectively, with power consumption of around 2.5 W, surpassing state-of-the-art onboard cloud detection solutions. Our approach underscores the potential of DPU-based hardware accelerators to expand the processing capabilities of small satellites, enabling efficient and flexible onboard CNN-based applications.

Updated: 2025-04-04 19:32:47

标题: 立方卫星上用于云检测的高效FPGA加速卷积神经网络

摘要: 我们提出了在资源受限的CubeSat任务中实现四个FPGA加速的卷积神经网络（CNN）模型，利用赛灵思的Vitis AI（VAI）框架和深度学习处理单元（DPU），这是一个可编程引擎，预先实现了参数化的IP核，针对深度神经网络进行了优化，采用Zynq UltraScale+ MPSoC。本研究探索了像素级（Pixel-Net和Patch-Net）和图像级（U-Net和Scene-Net）模型，以评估在准确性、延迟和模型复杂性之间的权衡。通过应用通道剪枝，我们实现了模型参数（高达98.6%）和浮点运算（高达90.7%）的显著减少，同时最小化了准确性损失。此外，VAI工具被用于将模型量化为8位精度，确保硬件性能优化，对准确性几乎没有影响。所有模型在FPGA集成后保持了高准确性，量化和剪枝后的累积最大准确度下降仅为0.6%。图像级Scene-Net和U-Net模型展示了强大的实时推断能力，分别实现了每秒57.14帧和37.45帧的帧率，功耗约为2.5瓦，超越了最先进的机载云检测解决方案。我们的方法强调了基于DPU的硬件加速器扩展小型卫星处理能力的潜力，实现了高效灵活的机载基于CNN的应用。

更新时间: 2025-04-04 19:32:47

领域: eess.SP,cs.LG,cs.NE

下载: http://arxiv.org/abs/2504.03891v1

Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs

Multi-head attention is foundational to large language models (LLMs), enabling different heads to have diverse focus on relevant input tokens. However, learned behaviors like attention sinks, where the first token receives most attention despite limited semantic importance, challenge our understanding of multi-head attention. To analyze this phenomenon, we propose a new definition for attention heads dominated by attention sinks, known as dormant attention heads. We compare our definition to prior work in a model intervention study where we test whether dormant heads matter for inference by zeroing out the output of dormant attention heads. Using six pretrained models and five benchmark datasets, we find our definition to be more model and dataset-agnostic. Using our definition on most models, more than 4% of a model's attention heads can be zeroed while maintaining average accuracy, and zeroing more than 14% of a model's attention heads can keep accuracy to within 1% of the pretrained model's average accuracy. Further analysis reveals that dormant heads emerge early in pretraining and can transition between dormant and active states during pretraining. Additionally, we provide evidence that they depend on characteristics of the input text.

Updated: 2025-04-04 19:28:23

标题: 使用注意力池来识别和评估预训练LLMs中的休眠头部

摘要: 多头注意力是大型语言模型（LLMs）的基础，使不同的注意力头能够对相关的输入标记有不同的关注。然而，像注意力陷阱这样的学习行为，在这种情况下，尽管具有有限的语义重要性，第一个标记却接收到最多的关注，挑战了我们对多头注意力的理解。为了分析这一现象，我们提出了一个新的定义，用于描述被注意力陷阱主导的注意力头，即休眠注意力头。我们在一个模型干预研究中将我们的定义与先前的工作进行比较，在这个研究中，我们测试了休眠头对推断的影响，方法是将休眠注意力头的输出置零。使用六个预训练模型和五个基准数据集，我们发现我们的定义更具模型和数据集的通用性。在大多数模型上使用我们的定义，可以将模型的超过4%的注意力头置零，同时保持平均准确率，并且将模型超过14%的注意力头置零，可以使准确率保持在预训练模型的平均准确率的1%之内。进一步分析显示，休眠头在预训练过程中早期出现，并且在预训练过程中可以在休眠状态和活跃状态之间转换。此外，我们提供证据表明，它们取决于输入文本的特征。

更新时间: 2025-04-04 19:28:23

领域: cs.LG

下载: http://arxiv.org/abs/2504.03889v1

Investigating Affective Use and Emotional Well-being on ChatGPT

As AI chatbots see increased adoption and integration into everyday life, questions have been raised about the potential impact of human-like or anthropomorphic AI on users. In this work, we investigate the extent to which interactions with ChatGPT (with a focus on Advanced Voice Mode) may impact users' emotional well-being, behaviors and experiences through two parallel studies. To study the affective use of AI chatbots, we perform large-scale automated analysis of ChatGPT platform usage in a privacy-preserving manner, analyzing over 3 million conversations for affective cues and surveying over 4,000 users on their perceptions of ChatGPT. To investigate whether there is a relationship between model usage and emotional well-being, we conduct an Institutional Review Board (IRB)-approved randomized controlled trial (RCT) on close to 1,000 participants over 28 days, examining changes in their emotional well-being as they interact with ChatGPT under different experimental settings. In both on-platform data analysis and the RCT, we observe that very high usage correlates with increased self-reported indicators of dependence. From our RCT, we find that the impact of voice-based interactions on emotional well-being to be highly nuanced, and influenced by factors such as the user's initial emotional state and total usage duration. Overall, our analysis reveals that a small number of users are responsible for a disproportionate share of the most affective cues.

Updated: 2025-04-04 19:22:10

标题: 研究ChatGPT的情感使用和情感幸福感

摘要: 随着人工智能聊天机器人在日常生活中的广泛应用和整合，人们开始关注类似于人类或拟人化人工智能对用户的潜在影响。在这项工作中，我们通过两项平行研究调查了与ChatGPT（重点放在高级语音模式上）的互动对用户情感健康、行为和体验的影响程度。为了研究人工智能聊天机器人的情感使用，我们以保护隐私的方式对ChatGPT平台使用进行了大规模自动分析，分析了超过300万次对话中的情感线索，并对超过4000名用户进行了关于ChatGPT感知的调查。为了调查模型使用与情感健康之间是否存在关系，我们在近1000名参与者身上进行了经过机构审查委员会（IRB）批准的随机对照试验（RCT），持续28天，研究了他们在与ChatGPT在不同实验设置下互动时情感健康的变化。在平台上的数据分析和RCT中，我们观察到非常高的使用量与增加的自我报告指标的依赖性有关。从我们的RCT中，我们发现基于语音的互动对情感健康的影响非常微妙，并受到用户初始情绪状态和总使用时长等因素的影响。总体而言，我们的分析显示，少数用户负责大部分最具影响力的线索。

更新时间: 2025-04-04 19:22:10

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2504.03888v1

Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis

The benefits of Deep Learning (DL) impose significant pressure on GPU resources, particularly within GPU cluster, where Out-Of-Memory (OOM) errors present a primary impediment to model training and efficient resource utilization. Conventional OOM estimation techniques, relying either on static graph analysis or direct GPU memory profiling, suffer from inherent limitations: static analysis often fails to capture model dynamics, whereas GPU-based profiling intensifies contention for scarce GPU resources. To overcome these constraints, VeritasEst emerges. It is an innovative, entirely CPU-based analysis tool capable of accurately predicting the peak GPU memory required for DL training tasks without accessing the target GPU. This "offline" prediction capability is core advantage of VeritasEst, allowing accurate memory footprint information to be obtained before task scheduling, thereby effectively preventing OOM and optimizing GPU allocation. Its performance was validated through thousands of experimental runs across convolutional neural network (CNN) models: Compared to baseline GPU memory estimators, VeritasEst significantly reduces the relative error by 84% and lowers the estimation failure probability by 73%. VeritasEst represents a key step towards efficient and predictable DL training in resource-constrained environments.

Updated: 2025-04-04 19:20:03

标题: 通过动态分析实现深度学习作业的准确GPU内存预测

摘要: 深度学习（DL）的好处对GPU资源施加了巨大压力，尤其在GPU集群中，Out-Of-Memory（OOM）错误成为模型训练和资源利用效率的主要障碍。传统的OOM估计技术，无论是依赖静态图分析还是直接GPU内存分析，都存在固有的局限性：静态分析通常无法捕捉模型动态，而基于GPU的分析加剧了对稀缺GPU资源的争夺。为了克服这些限制，VeritasEst应运而生。它是一种创新的、完全基于CPU的分析工具，能够准确预测DL训练任务所需的峰值GPU内存，而无需访问目标GPU。VeritasEst的“离线”预测能力是其核心优势，可以在任务调度之前获取准确的内存占用信息，从而有效地防止OOM并优化GPU分配。通过数千次实验运行验证了其性能，覆盖了卷积神经网络（CNN）模型：与基准GPU内存估计器相比，VeritasEst将相对误差减少了84％，估计失败概率降低了73％。VeritasEst是朝着在资源受限环境中进行高效可预测的DL训练迈出的重要一步。

更新时间: 2025-04-04 19:20:03

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2504.03887v1

A Large Model for Non-invasive and Personalized Management of Breast Cancer from Multiparametric MRI

Breast Magnetic Resonance Imaging (MRI) demonstrates the highest sensitivity for breast cancer detection among imaging modalities and is standard practice for high-risk women. Interpreting the multi-sequence MRI is time-consuming and prone to subjective variation. We develop a large mixture-of-modality-experts model (MOME) that integrates multiparametric MRI information within a unified structure, leveraging breast MRI scans from 5,205 female patients in China for model development and validation. MOME matches four senior radiologists' performance in identifying breast cancer and outperforms a junior radiologist. The model is able to reduce unnecessary biopsies in Breast Imaging-Reporting and Data System (BI-RADS) 4 patients, classify triple-negative breast cancer, and predict pathological complete response to neoadjuvant chemotherapy. MOME further supports inference with missing modalities and provides decision explanations by highlighting lesions and measuring modality contributions. To summarize, MOME exemplifies an accurate and robust multimodal model for noninvasive, personalized management of breast cancer patients via multiparametric MRI. Code is available at https://github.com/LLYXC/MOME/tree/main.

Updated: 2025-04-04 19:14:02

标题: 一种基于多参数MRI的乳腺癌非侵入性个性化管理的大型模型

摘要: 乳腺磁共振成像（MRI）在各种影像学方法中显示出对乳腺癌检测的最高敏感性，并已成为高风险女性的标准实践。解读多序列MRI耗时且容易出现主观差异。我们开发了一个大型的混合模态专家模型（MOME），将多参数MRI信息集成到统一结构中，利用中国5205名女性患者的乳腺MRI扫描进行模型开发和验证。MOME匹配了四名资深放射科医师在乳腺癌识别方面的表现，并优于一名初级放射科医师。该模型能够减少乳腺影像报告和数据系统（BI-RADS）4级患者的不必要活检，对三阴性乳腺癌进行分类，并预测新辅助化疗的病理完全缓解。MOME进一步支持缺失模态的推断，并通过突出病变和测量模态贡献来提供决策解释。总之，MOME是一个准确且强大的多模态模型，可通过多参数MRI对乳腺癌患者进行无创、个性化管理。可在https://github.com/LLYXC/MOME/tree/main找到代码。

更新时间: 2025-04-04 19:14:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.12606v3

TrajLearn: Trajectory Prediction Learning using Deep Generative Models

Trajectory prediction aims to estimate an entity's future path using its current position and historical movement data, benefiting fields like autonomous navigation, robotics, and human movement analytics. Deep learning approaches have become key in this area, utilizing large-scale trajectory datasets to model movement patterns, but face challenges in managing complex spatial dependencies and adapting to dynamic environments. To address these challenges, we introduce TrajLearn, a novel model for trajectory prediction that leverages generative modeling of higher-order mobility flows based on hexagonal spatial representation. TrajLearn predicts the next $k$ steps by integrating a customized beam search for exploring multiple potential paths while maintaining spatial continuity. We conducted a rigorous evaluation of TrajLearn, benchmarking it against leading state-of-the-art approaches and meaningful baselines. The results indicate that TrajLearn achieves significant performance gains, with improvements of up to ~40% across multiple real-world trajectory datasets. In addition, we evaluated different prediction horizons (i.e., various values of $k$), conducted resolution sensitivity analysis, and performed ablation studies to assess the impact of key model components. Furthermore, we developed a novel algorithm to generate mixed-resolution maps by hierarchically subdividing hexagonal regions into finer segments within a specified observation area. This approach supports selective detailing, applying finer resolution to areas of interest or high activity (e.g., urban centers) while using coarser resolution for less significant regions (e.g., rural areas), effectively reducing data storage requirements and computational overhead. We promote reproducibility and adaptability by offering complete code, data, and detailed documentation with flexible configuration options for various applications.

Updated: 2025-04-04 19:12:44

标题: TrajLearn：使用深度生成模型进行轨迹预测学习

摘要: 轨迹预测旨在利用实体的当前位置和历史运动数据来估计其未来路径，有益于自主导航、机器人技术和人类运动分析等领域。深度学习方法已成为该领域的关键，利用大规模轨迹数据集来建模运动模式，但面临管理复杂空间依赖性和适应动态环境的挑战。为解决这些挑战，我们引入了TrajLearn，一种基于六边形空间表示的高阶移动流生成建模的轨迹预测新模型。TrajLearn通过整合定制的束搜索来预测接下来的k个步骤，同时保持空间连续性。我们对TrajLearn进行了严格评估，将其与领先的最先进方法和有意义的基线进行了基准测试。结果显示，TrajLearn取得了显著的性能提升，在多个真实世界轨迹数据集上的改进高达约40%。此外，我们评估了不同的预测时间跨度（即不同的k值），进行了分辨率敏感性分析，并进行了消融研究以评估关键模型组件的影响。此外，我们开发了一种新算法，通过将六边形区域分层细分为指定观测区域内的更细分段，生成混合分辨率地图。这种方法支持选择性细节化，对感兴趣或高活动区域（如城市中心）应用更细的分辨率，而对不那么重要的区域（如农村地区）使用更粗的分辨率，有效减少数据存储需求和计算开销。我们通过提供完整的代码、数据和详细文档以及灵活的配置选项来促进可重现性和适应性，适用于各种应用。

更新时间: 2025-04-04 19:12:44

领域: cs.LG,cs.CV,cs.RO

下载: http://arxiv.org/abs/2501.00184v2

Estimating City-wide Operating Mode Distribution of Light-Duty Vehicles: A Neural Network-based Approach

Driving cycles are a set of driving conditions and are crucial for the existing emission estimation model to evaluate vehicle performance, fuel efficiency, and emissions, by matching them with average speed to calculate the operating modes, such as braking, idling, and cruising. While existing emission estimation models, such as the Motor Vehicle Emission Simulator (MOVES), are powerful tools, their reliance on predefined driving cycles can be limiting, as these cycles often do not accurately represent regional driving conditions, making the models less effective for city-wide analyses. To solve this problem, this paper proposes a modular neural network (NN)-based framework to estimate operating mode distributions bypassing the driving cycle development phase, utilizing macroscopic variables such as speed, flow, and link infrastructure attributes. The proposed method is validated using a well-calibrated microsimulation model of Brookline MA, the United States. The results indicate that the proposed framework outperforms the operating mode distribution calculated by MOVES based on default driving cycles, providing a closer match to the actual operating mode distribution derived from trajectory data. Specifically, the proposed model achieves an average RMSE of 0.04 in predicting operating mode distribution, compared to 0.08 for MOVES. The average error in emission estimation across pollutants is 8.57% for the proposed method, lower than the 32.86% error for MOVES. In particular, for the estimation of CO2, the proposed method has an error of just 4%, compared to 35% for MOVES. The proposed model can be utilized for real-time emissions monitoring by providing rapid and accurate emissions estimates with easily accessible inputs.

Updated: 2025-04-04 19:12:09

标题: 估算轻型车辆全市运行模式分布：基于神经网络的方法

摘要: 驾驶循环是一组驾驶条件，对于现有的排放估计模型来评估车辆性能、燃油效率和排放是至关重要的，通过将它们与平均速度匹配以计算制动、怠速和巡航等操作模式。虽然现有的排放估计模型，如Motor Vehicle Emission Simulator (MOVES)，是强大的工具，但它们对预定义的驾驶循环的依赖可能有限，因为这些循环通常不能准确代表区域驾驶条件，使得这些模型在城市范围的分析中效果较差。为解决这一问题，本文提出了一个基于模块化神经网络(NN)的框架来估计操作模式分布，跳过驾驶循环开发阶段，利用宏观变量如速度、流量和链路基础设施属性。提出的方法在美国布鲁克莱恩市的一个经过良好校准的微观模拟模型中进行了验证。结果表明，提出的框架在预测基于默认驾驶循环的MOVES计算的操作模式分布方面表现优于实际操作模式分布源自轨迹数据的匹配。具体而言，与MOVES的0.08相比，提出的模型在预测操作模式分布方面实现了0.04的平均RMSE。在污染物的排放估计方面，提出的方法的平均误差为8.57%，低于MOVES的32.86%误差。特别是在CO2的估算方面，提出的方法的误差仅为4%，而MOVES的误差为35%。提出的模型可以通过提供可轻松获得的输入来进行实时排放监测，从而提供快速准确的排放估计。

更新时间: 2025-04-04 19:12:09

领域: cs.LG

下载: http://arxiv.org/abs/2503.22118v2

Concept-based Rubrics Improve LLM Formative Assessment and Data Synthesis

Formative assessment in STEM topics aims to promote student learning by identifying students' current understanding, thus targeting how to promote further learning. Previous studies suggest that the assessment performance of current generative large language models (LLMs) on constructed responses to open-ended questions is significantly lower than that of supervised classifiers trained on high-quality labeled data. However, we demonstrate that concept-based rubrics can significantly enhance LLM performance, which narrows the gap between LLMs as off-the shelf assessment tools, and smaller supervised models, which need large amounts of training data. For datasets where concept-based rubrics allow LLMs to achieve strong performance, we show that the concept-based rubrics help the same LLMs generate high quality synthetic data for training lightweight, high-performance supervised models. Our experiments span diverse STEM student response datasets with labels of varying quality, including a new real-world dataset that contains some AI-assisted responses, which introduces additional considerations.

Updated: 2025-04-04 19:02:07

标题: 基于概念的量表改善LLM形成性评估和数据综合

摘要: STEM主题中的形成性评估旨在通过识别学生当前的理解来促进学生学习，从而针对如何促进进一步学习。先前的研究表明，当前生成的大型语言模型（LLMs）对开放性问题的构造响应的评估表现显著低于在高质量标记数据上训练的监督分类器。然而，我们证明了基于概念的评分标准可以显著提高LLM的性能，从而缩小了LLM作为现成评估工具与需要大量训练数据的较小监督模型之间的差距。对于基于概念的评分标准允许LLMs实现强大性能的数据集，我们展示了这些基于概念的评分标准能够帮助相同的LLMs生成用于训练轻量级、高性能监督模型的高质量合成数据。我们的实验涵盖了各种STEM学生响应数据集，这些数据集具有不同质量的标签，包括一个包含一些AI辅助响应的新的现实世界数据集，引入了额外的考虑因素。

更新时间: 2025-04-04 19:02:07

领域: cs.LG,I.2.7; K.3.1

下载: http://arxiv.org/abs/2504.03877v1

HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs

The Mixture-of-Experts (MoE) architecture has become increasingly popular as a method to scale up large language models (LLMs). To save costs, heterogeneity-aware training solutions have been proposed to utilize GPU clusters made up of both newer and older-generation GPUs. However, existing solutions are agnostic to the performance characteristics of different MoE model components (i.e., attention and expert) and do not fully utilize each GPU's compute capability. In this paper, we introduce HeterMoE, a system to efficiently train MoE models on heterogeneous GPUs. Our key insight is that newer GPUs significantly outperform older generations on attention due to architectural advancements, while older GPUs are still relatively efficient for experts. HeterMoE disaggregates attention and expert computation, where older GPUs are only assigned with expert modules. Through the proposed zebra parallelism, HeterMoE overlaps the computation on different GPUs, in addition to employing an asymmetric expert assignment strategy for fine-grained load balancing to minimize GPU idle time. Our evaluation shows that HeterMoE achieves up to 2.3x speed-up compared to existing MoE training systems, and 1.4x compared to an optimally balanced heterogeneity-aware solution. HeterMoE efficiently utilizes older GPUs by maintaining 95% training throughput on average, even with half of the GPUs in a homogeneous A40 cluster replaced with V100.

Updated: 2025-04-04 18:55:52

标题: HeterMoE: 在异构GPU上高效训练混合专家模型

摘要: 混合专家（MoE）架构已经越来越受欢迎，作为扩大大型语言模型（LLMs）规模的方法。为节省成本，已经提出了异构感知培训解决方案，以利用由新一代和旧一代GPU组成的GPU集群。然而，现有解决方案对不同MoE模型组件（即关注和专家）的性能特征是不可知的，并且没有充分利用每个GPU的计算能力。在本文中，我们介绍了HeterMoE，这是一个在异构GPU上高效训练MoE模型的系统。我们的关键洞察是，由于体系结构的进步，新一代GPU在关注方面明显优于旧一代，而旧一代GPU在专家方面仍然相对高效。HeterMoE将关注和专家计算分解开来，其中旧一代GPU仅分配专家模块。通过提出的斑马并行性，HeterMoE重叠不同GPU上的计算，另外采用了一种不对称的专家分配策略，用于细粒度的负载均衡，以最大限度地减少GPU空闲时间。我们的评估显示，与现有的MoE培训系统相比，HeterMoE的速度提高了最多2.3倍，与最佳平衡的异构感知解决方案相比，提高了1.4倍。HeterMoE通过在平均情况下保持95％的培训吞吐量，即使在一个由V100替换了A40集群中一半GPU的情况下，也能高效利用旧一代GPU。

更新时间: 2025-04-04 18:55:52

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2504.03871v1

CREASE-2D Analysis of Small Angle X-ray Scattering Data from Supramolecular Dipeptide Systems

In this paper, we extend a recently developed machine-learning (ML) based CREASE-2D method to analyze the entire two-dimensional (2D) scattering pattern obtained from small angle X-ray scattering measurements of supramolecular dipeptide micellar systems. Traditional analysis of such scattering data would involve use of approximate or incorrect analytical models to fit to azimuthally-averaged 1D scattering patterns that can miss the anisotropic arrangements. Analysis of the 2D scattering profiles of such micellar solutions using CREASE-2D allows us to understand both isotropic and anisotropic structural arrangements that are present in these systems of assembled dipeptides in water and in the presence of added solvents/salts. CREASE-2D outputs distributions of relevant structural features including ones that cannot be identified with existing analytical models (e.g., assembled tubes, cross-sectional eccentricity, tortuosity, orientational order). The representative three-dimensional (3D) real-space structures for the optimized values of these structural features further facilitate visualization of the structures. Through this detailed interpretation of these 2D SAXS profiles we are able to characterize the shapes of the assembled tube structures as a function of dipeptide chemistry, solution conditions with varying salts and solvents, and relative concentrations of all components. This paper demonstrates how CREASE-2D analysis of entire SAXS profiles can provide an unprecedented level of understanding of structural arrangements which has not been possible through traditional analytical model fits to the 1D SAXS data.

Updated: 2025-04-04 18:53:32

标题: "CREASE-2D对超分子二肽系统小角X射线散射数据的分析"

摘要: 在这篇论文中，我们将最近开发的基于机器学习（ML）的CREASE-2D方法扩展到分析超分子二肽胶束体系小角度X射线散射测量获得的整个二维（2D）散射图案。传统的分析这种散射数据将涉及使用近似或不正确的分析模型拟合方位角平均的1D散射图案，可能会错过各向异性排列。使用CREASE-2D分析这些胶束溶液的2D散射轮廓使我们能够了解这些二肽在水中组装和在添加溶剂/盐的情况下存在的等向性和各向异性结构排列。CREASE-2D输出相关结构特征的分布，包括不能通过现有分析模型（例如组装管道，横截面偏心率，弯曲度，方向有序性）识别的特征。对于这些结构特征的优化值的代表性三维（3D）实空间结构进一步促进了结构的可视化。通过这些2D SAXS轮廓的详细解释，我们能够表征组装管结构的形状，这取决于二肽化学、溶液条件与不同盐和溶剂的相对浓度。这篇论文展示了如何通过对整个SAXS轮廓的CREASE-2D分析提供了对结构排列的空前理解水平，这是通过传统的分析模型拟合1D SAXS数据不可能实现的。

更新时间: 2025-04-04 18:53:32

领域: cond-mat.soft,cs.LG

下载: http://arxiv.org/abs/2504.03869v1

InTAR: Inter-Task Auto-Reconfigurable Accelerator Design for High Data Volume Variation in DNNs

The rise of deep neural networks (DNNs) has driven an increased demand for computing power and memory. Modern DNNs exhibit high data volume variation (HDV) across tasks, which poses challenges for FPGA acceleration: conventional accelerators rely on fixed execution patterns (dataflow or sequential) that can lead to pipeline stalls or necessitate frequent off-chip memory accesses. To address these challenges, we introduce the Inter-Task Auto-Reconfigurable Accelerator (InTAR), a novel accelerator design methodology for HDV applications on FPGAs. InTAR combines the high computational efficiency of sequential execution with the reduced off-chip memory overhead of dataflow execution. It switches execution patterns automatically with a static schedule determined before circuit design based on resource constraints and problem sizes. Unlike previous reconfigurable accelerators, InTAR encodes reconfiguration schedules during circuit design, allowing model-specific optimizations that allocate only the necessary logic and interconnects. Thus, InTAR achieves a high clock frequency with fewer resources and low reconfiguration time. Furthermore, InTAR supports high-level tools such as HLS for fast design generation. We implement a set of multi-task HDV DNN kernels using InTAR. Compared with dataflow and sequential accelerators, InTAR exhibits $\mathbf{1.8\times}$ and $\mathbf{7.1 \times}$ speedups correspondingly. Moreover, we extend InTAR to GPT-2 medium as a more complex example, which is $\mathbf{3.65 \sim 39.14\times}$ faster and a $\mathbf{1.72 \sim 10.44\times}$ more DSP efficient than SoTA accelerators (Allo and DFX) on FPGAs. Additionally, this design demonstrates $\mathbf{1.66 \sim 7.17\times}$ better power efficiency than GPUs. Code: https://github.com/OswaldHe/InTAR

Updated: 2025-04-04 18:49:05

标题: InTAR：用于DNN中高数据量变化的任务间自适应可重配置加速器设计

摘要: 深度神经网络（DNN）的兴起推动了对计算能力和内存的增加需求。现代DNN在任务之间表现出高数据量变化（HDV），这给FPGA加速带来了挑战：传统加速器依赖固定的执行模式（数据流或顺序），可能导致流水线停顿或需要频繁的离片存储访问。为了解决这些挑战，我们引入了Inter-Task Auto-Reconfigurable Accelerator（InTAR），这是一种针对FPGA上HDV应用的新型加速器设计方法。InTAR结合了顺序执行的高计算效率和数据流执行的减少离片内存开销。它在电路设计之前根据资源约束和问题规模确定静态调度来自动切换执行模式。与以往的可重构加速器不同，InTAR在电路设计过程中编码重新配置调度，允许针对模型进行优化，只分配必要的逻辑和互连。因此，InTAR实现了更高的时钟频率，资源更少，重配置时间更短。此外，InTAR支持高级工具（如HLS）进行快速设计生成。我们使用InTAR实现了一组多任务HDV DNN内核。与数据流和顺序加速器相比，InTAR分别展示了1.8倍和7.1倍的加速。此外，我们将InTAR扩展到GPT-2 medium作为更复杂的示例，比FPGA上的SoTA加速器（Allo和DFX）更快3.65至39.14倍，并且比DSP更高效1.72至10.44倍。此外，该设计显示出比GPU更高1.66至7.17倍的功耗效率。源代码：https://github.com/OswaldHe/InTAR

更新时间: 2025-04-04 18:49:05

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2502.08807v2

The Secret Life of CVEs

The Common Vulnerabilities and Exposures (CVEs) system is a reference method for documenting publicly known information security weaknesses and exposures. This paper presents a study of the lifetime of CVEs in software projects and the risk factors affecting their existence. The study uses survival analysis to examine how features of programming languages, projects, and CVEs themselves impact the lifetime of CVEs. We suggest avenues for future research to investigate the effect of various factors on the resolution of vulnerabilities.

Updated: 2025-04-04 18:39:05

标题: CVE的秘密生活

摘要: 通用漏洞和暴露（CVEs）系统是一种记录公开已知信息安全漏洞和暴露的参考方法。本文介绍了对软件项目中CVEs寿命和影响其存在的风险因素的研究。该研究使用生存分析来研究编程语言、项目和CVEs本身的特征如何影响CVEs的寿命。我们建议未来研究的途径，以调查各种因素对漏洞解决的影响。

更新时间: 2025-04-04 18:39:05

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2504.03863v1

Improving World Models using Deep Supervision with Linear Probes

Developing effective world models is crucial for creating artificial agents that can reason about and navigate complex environments. In this paper, we investigate a deep supervision technique for encouraging the development of a world model in a network trained end-to-end to predict the next observation. While deep supervision has been widely applied for task-specific learning, our focus is on improving the world models. Using an experimental environment based on the Flappy Bird game, where the agent receives only LIDAR measurements as observations, we explore the effect of adding a linear probe component to the network's loss function. This additional term encourages the network to encode a subset of the true underlying world features into its hidden state. Our experiments demonstrate that this supervision technique improves both training and test performance, enhances training stability, and results in more easily decodable world features -- even for those world features which were not included in the training. Furthermore, we observe a reduced distribution drift in networks trained with the linear probe, particularly during high-variability phases of the game (flying between successive pipe encounters). Including the world features loss component roughly corresponded to doubling the model size, suggesting that the linear probe technique is particularly beneficial in compute-limited settings or when aiming to achieve the best performance with smaller models. These findings contribute to our understanding of how to develop more robust and sophisticated world models in artificial agents, paving the way for further advancements in this field.

Updated: 2025-04-04 18:35:21

标题: 使用线性探测器深度监督改进世界模型

摘要: 发展有效的世界模型对于创建能够推理和导航复杂环境的人工智能代理至关重要。在本文中，我们调查了一种深度监督技术，用于鼓励在端到端训练的网络中开发世界模型，以预测下一个观察结果。虽然深度监督已广泛应用于任务特定学习，但我们的重点是改进世界模型。在基于Flappy Bird游戏的实验环境中，代理仅接收激光雷达测量作为观察结果，我们探讨了在网络损失函数中添加线性探测器组件的效果。这个额外的项鼓励网络将真实的底层世界特征的子集编码到其隐藏状态中。我们的实验表明，这种监督技术改善了训练和测试性能，增强了训练稳定性，并导致更容易解码的世界特征 - 即使是那些未包含在训练中的世界特征。此外，我们观察到，在使用线性探测器进行训练的网络中，特别是在游戏的高变异性阶段（在连续的管道遭遇之间飞行时），分布漂移减少了。包括世界特征损失组件大致相当于将模型大小加倍，表明线性探测技术在计算受限的环境或在努力通过较小的模型实现最佳性能时特别有益。这些发现有助于我们理解如何在人工智能代理中开发更加强大和复杂的世界模型，为该领域的进一步发展铺平道路。

更新时间: 2025-04-04 18:35:21

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03861v1

Can ChatGPT Learn My Life From a Week of First-Person Video?

Motivated by recent improvements in generative AI and wearable camera devices (e.g. smart glasses and AI-enabled pins), I investigate the ability of foundation models to learn about the wearer's personal life through first-person camera data. To test this, I wore a camera headset for 54 hours over the course of a week, generated summaries of various lengths (e.g. minute-long, hour-long, and day-long summaries), and fine-tuned both GPT-4o and GPT-4o-mini on the resulting summary hierarchy. By querying the fine-tuned models, we are able to learn what the models learned about me. The results are mixed: Both models learned basic information about me (e.g. approximate age, gender). Moreover, GPT-4o correctly deduced that I live in Pittsburgh, am a PhD student at CMU, am right-handed, and have a pet cat. However, both models also suffered from hallucination and would make up names for the individuals present in the video footage of my life.

Updated: 2025-04-04 18:33:45

标题: ChatGPT能否通过一周的第一人称视频了解我的生活？

摘要: 受到生成式人工智能和可穿戴摄像设备（如智能眼镜和AI智能别针）近期的改进的启发，我研究了基础模型通过第一人称摄像数据来了解佩戴者个人生活的能力。为了测试这一点，我在一周内佩戴了一个摄像头盔，总计54小时，生成了不同长度的摘要（如一分钟、一小时和一天的摘要），并在结果摘要层次上对GPT-4o和GPT-4o-mini进行了微调。通过查询微调模型，我们能够了解模型学到了有关我的什么信息。结果是复杂的：两个模型都学到了我基本的信息（如大概年龄、性别）。此外，GPT-4o正确推断出我住在匹兹堡，是CMU的博士生，是右撇子，并且有一只宠物猫。然而，两个模型也受到了幻觉的影响，在视频中虚构了我生活中出现的人物的名字。

更新时间: 2025-04-04 18:33:45

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03857v1

MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs

Medical tasks such as diagnosis and treatment planning require precise and complex reasoning, particularly in life-critical domains. Unlike mathematical reasoning, medical reasoning demands meticulous, verifiable thought processes to ensure reliability and accuracy. However, there is a notable lack of datasets that provide transparent, step-by-step reasoning to validate and enhance the medical reasoning ability of AI models. To bridge this gap, we introduce MedReason, a large-scale high-quality medical reasoning dataset designed to enable faithful and explainable medical problem-solving in large language models (LLMs). We utilize a structured medical knowledge graph (KG) to convert clinical QA pairs into logical chains of reasoning, or ``thinking paths'', which trace connections from question elements to answers via relevant KG entities. Each path is validated for consistency with clinical logic and evidence-based medicine. Our pipeline generates detailed reasoning for various medical questions from 7 medical datasets, resulting in a dataset of 32,682 question-answer pairs, each with detailed, step-by-step explanations. Experiments demonstrate that fine-tuning with our dataset consistently boosts medical problem-solving capabilities, achieving significant gains of up to 7.7% for DeepSeek-Ditill-8B. Our top-performing model, MedReason-8B, outperforms the Huatuo-o1-8B, a state-of-the-art medical reasoning model, by up to 4.2% on the clinical benchmark MedBullets. We also engage medical professionals from diverse specialties to assess our dataset's quality, ensuring MedReason offers accurate and coherent medical reasoning. Our data, models, and code is available at https://github.com/UCSC-VLAA/MedReason.

Updated: 2025-04-04 18:29:18

标题: MedReason：通过知识图谱引发LLMs中的事实医学推理步骤

摘要: 医疗任务，如诊断和治疗计划，需要精确和复杂的推理，特别是在生命关键领域。与数学推理不同，医学推理要求细致、可验证的思维过程，以确保可靠性和准确性。然而，目前缺乏提供透明、逐步推理以验证和增强AI模型医学推理能力的数据集。为了填补这一差距，我们介绍了MedReason，一个大规模高质量的医学推理数据集，旨在实现大型语言模型（LLMs）中忠实和可解释的医学问题解决。我们利用结构化的医学知识图（KG）将临床问答对转换为逻辑推理链，或“思维路径”，通过相关KG实体从问题元素到答案追踪连接。每条路径都经过验证，以确保与临床逻辑和基于证据的医学一致。我们的流程从7个医学数据集生成各种医学问题的详细推理，生成了一个包含32,682个问答对的数据集，每个问题都有详细的逐步解释。实验证明，使用我们的数据集进行微调可以持续提升医学问题解决能力，DeepSeek-Ditill-8B获得高达7.7%的显著增益。我们的表现最佳模型MedReason-8B在临床基准MedBullets上比最先进的医学推理模型Huatuo-o1-8B提高了高达4.2%。我们还邀请来自各种专业的医学专业人士评估我们数据集的质量，确保MedReason提供准确和连贯的医学推理。我们的数据、模型和代码都可以在https://github.com/UCSC-VLAA/MedReason 上找到。

更新时间: 2025-04-04 18:29:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.00993v2

Detection Limits and Statistical Separability of Tree Ring Watermarks in Rectified Flow-based Text-to-Image Generation Models

Tree-Ring Watermarking is a significant technique for authenticating AI-generated images. However, its effectiveness in rectified flow-based models remains unexplored, particularly given the inherent challenges of these models with noise latent inversion. Through extensive experimentation, we evaluated and compared the detection and separability of watermarks between SD 2.1 and FLUX.1-dev models. By analyzing various text guidance configurations and augmentation attacks, we demonstrate how inversion limitations affect both watermark recovery and the statistical separation between watermarked and unwatermarked images. Our findings provide valuable insights into the current limitations of Tree-Ring Watermarking in the current SOTA models and highlight the critical need for improved inversion methods to achieve reliable watermark detection and separability. The official implementation, dataset release and all experimental results are available at this \href{https://github.com/dsgiitr/flux-watermarking}{\textbf{link}}.

Updated: 2025-04-04 18:24:23

标题: 树木年轮水印在校正流基础文本到图像生成模型中的检测限和统计可分离性

摘要: 树环水印技术是用于验证AI生成图像的重要技术。然而，在矫正流模型中其有效性尚未被探索，特别是考虑到这些模型在噪声潜在反转方面的固有挑战。通过广泛的实验，我们评估并比较了SD 2.1和FLUX.1-dev模型之间水印的检测和可分离性。通过分析各种文本指导配置和增强攻击，我们展示了反转限制如何影响水印恢复以及水印图像与非水印图像之间的统计分离。我们的研究结果为当前SOTA模型中树环水印技术的局限性提供了有价值的见解，并强调了改进反转方法以实现可靠水印检测和可分离性的重要性。官方实现、数据集发布和所有实验结果均可在以下链接找到：https://github.com/dsgiitr/flux-watermarking。

更新时间: 2025-04-04 18:24:23

领域: cs.CV,cs.AI,cs.CR,cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.03850v1

UnCLe: Benchmarking Continual Learning for Unsupervised Depth Completion

We propose UnCLe, a standardized benchmark for Unsupervised Continual Learning of a multimodal depth estimation task: Depth completion aims to infer a dense depth map from a pair of synchronized RGB image and sparse depth map. We benchmark depth completion models under the practical scenario of unsupervised learning over continuous streams of data. Existing methods are typically trained on a static, or stationary, dataset. However, when adapting to novel non-stationary distributions, they "catastrophically forget" previously learned information. UnCLe simulates these non-stationary distributions by adapting depth completion models to sequences of datasets containing diverse scenes captured from distinct domains using different visual and range sensors. We adopt representative methods from continual learning paradigms and translate them to enable unsupervised continual learning of depth completion. We benchmark these models for indoor and outdoor and investigate the degree of catastrophic forgetting through standard quantitative metrics. Furthermore, we introduce model inversion quality as an additional measure of forgetting. We find that unsupervised continual learning of depth completion is an open problem, and we invite researchers to leverage UnCLe as a development platform.

Updated: 2025-04-04 18:23:51

标题: UnCLe：无监督深度完成的持续学习基准测试

摘要: 我们提出了UnCLe，这是一个用于多模态深度估计任务的无监督持续学习的标准基准：深度完成旨在从一对同步的RGB图像和稀疏深度图中推断出密集深度图。我们在实际场景下对深度完成模型进行基准测试，这些模型是在连续数据流上进行无监督学习的。现有方法通常在静态或固定的数据集上进行训练。然而，当适应新的非固定分布时，它们会“灾难性地遗忘”先前学到的信息。UnCLe通过将深度完成模型调整到包含从不同领域使用不同视觉和范围传感器捕获的各种场景的数据集序列中来模拟这些非固定分布。我们采用持续学习范式中的代表性方法，并将其转化为使深度完成的无监督持续学习成为可能。我们对室内和室外的这些模型进行基准测试，并通过标准量化指标调查灾难性遗忘的程度。此外，我们引入了模型反演质量作为一个额外的遗忘度量。我们发现，深度完成的无监督持续学习是一个尚未解决的问题，并邀请研究人员利用UnCLe作为开发平台。

更新时间: 2025-04-04 18:23:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18074v3

Knowledge Distillation Decision Tree for Unravelling Black-box Machine Learning Models

Machine learning models, particularly the black-box models, are widely favored for their outstanding predictive capabilities. However, they often face scrutiny and criticism due to the lack of interpretability. Paradoxically, their strong predictive capabilities may indicate a deep understanding of the underlying data, implying significant potential for interpretation. Leveraging the emerging concept of knowledge distillation, we introduce the method of knowledge distillation decision tree (KDDT). This method enables the distillation of knowledge about the data from a black-box model into a decision tree, thereby facilitating the interpretation of the black-box model. Essential attributes for a good interpretable model include simplicity, stability, and predictivity. The primary challenge of constructing interpretable tree lies in ensuring structural stability under the randomness of the training data. KDDT is developed with the theoretical foundations demonstrating that structure stability can be achieved under mild assumptions. Furthermore, we propose the hybrid KDDT to achieve both simplicity and predictivity. An efficient algorithm is provided for constructing the hybrid KDDT. Simulation studies and a real-data analysis validate the hybrid KDDT's capability to deliver accurate and reliable interpretations. KDDT is an excellent interpretable model with great potential for practical applications.

Updated: 2025-04-04 18:13:02

标题: 知识蒸馏决策树：揭示黑匣子机器学习模型

摘要: 机器学习模型，特别是黑盒模型，因其出色的预测能力而受到广泛青睐。然而，由于缺乏可解释性，它们经常面临审查和批评。矛盾的是，它们的强大预测能力可能表明对基础数据的深刻理解，暗示着重要的解释潜力。借助新兴的知识蒸馏概念，我们引入了知识蒸馏决策树（KDDT）的方法。该方法使黑盒模型中关于数据的知识蒸馏为一个决策树，从而便于解释黑盒模型。一个好的可解释模型的基本属性包括简单性、稳定性和预测性。构建可解释树的主要挑战在于确保在训练数据的随机性下的结构稳定性。KDDT是在理论基础上开发的，证明了在温和的假设下可以实现结构稳定性。此外，我们提出了混合KDDT以实现简单性和预测性。为构建混合KDDT提供了高效算法。模拟研究和实际数据分析验证了混合KDDT提供准确可靠的解释能力。KDDT是一个具有巨大潜力的优秀可解释模型，适用于实际应用。

更新时间: 2025-04-04 18:13:02

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2206.04661v4

Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and Perspectives

In cancer therapeutics, protein-metal binding mechanisms critically govern drug pharmacokinetics and targeting efficacy, thereby fundamentally shaping the rational design of anticancer metallodrugs. While conventional laboratory methods used to study such mechanisms are often costly, low throughput, and limited in capturing dynamic biological processes, machine learning (ML) has emerged as a promising alternative. Despite increasing efforts to develop protein-metal binding datasets and ML algorithms, the application of ML in tumor protein-metal binding remains limited. Key challenges include a shortage of high-quality, tumor-specific datasets, insufficient consideration of multiple data modalities, and the complexity of interpreting results due to the ''black box'' nature of complex ML models. This paper summarizes recent progress and ongoing challenges in using ML to predict tumor protein-metal binding, focusing on data, modeling, and interpretability. We present multimodal protein-metal binding datasets and outline strategies for acquiring, curating, and preprocessing them for training ML models. Moreover, we explore the complementary value provided by different data modalities and examine methods for their integration. We also review approaches for improving model interpretability to support more trustworthy decisions in cancer research. Finally, we offer our perspective on research opportunities and propose strategies to address the scarcity of tumor protein data and the limited number of predictive models for tumor protein-metal binding. We also highlight two promising directions for effective metal-based drug design: integrating protein-protein interaction data to provide structural insights into metal-binding events and predicting structural changes in tumor proteins after metal binding.

Updated: 2025-04-04 18:10:00

标题: 可解释的多模态学习用于肿瘤蛋白质-金属结合：进展、挑战和展望

摘要: 在癌症治疗中，蛋白质与金属结合机制对药物药代动力学和靶向疗效起着关键作用，从而基本上塑造了抗癌金属药物的合理设计。虽然用于研究这些机制的传统实验室方法通常成本高、通量低，并且在捕捉动态生物过程方面存在局限性，但机器学习（ML）已经成为一种有前途的替代方法。尽管越来越多的努力用于开发蛋白质-金属结合数据集和ML算法，但ML在肿瘤蛋白质-金属结合方面的应用仍然有限。主要挑战包括高质量、肿瘤特异性数据集的短缺，对多种数据模态的不充分考虑，以及由于复杂ML模型的“黑匣子”性质而导致结果解释的复杂性。本文总结了最近在使用ML预测肿瘤蛋白质-金属结合方面取得的进展和持续挑战，重点关注数据、建模和可解释性。我们提出了多模式蛋白质-金属结合数据集，并概述了获取、筛选和预处理这些数据以训练ML模型的策略。此外，我们探讨了不同数据模态提供的互补价值，并检查了它们整合的方法。我们还回顾了改进模型可解释性的方法，以支持癌症研究中更可靠的决策。最后，我们提出了对研究机会的看法，并提出了解决肿瘤蛋白数据稀缺和肿瘤蛋白质-金属结合预测模型数量有限的策略。我们还强调了两个有效金属药物设计的有前途方向：整合蛋白质-蛋白质相互作用数据，以提供对金属结合事件的结构洞察，并预测金属结合后肿瘤蛋白质的结构变化。

更新时间: 2025-04-04 18:10:00

领域: q-bio.QM,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2504.03847v1

Data-driven Mixed Integer Optimization through Probabilistic Multi-variable Branching

In this paper, we propose a Pre-trained Mixed Integer Optimization framework (PreMIO) that accelerates online mixed integer program (MIP) solving with offline datasets and machine learning models. Our method is based on a data-driven multi-variable cardinality branching procedure that splits the MIP feasible region using hyperplanes chosen by the concentration inequalities. Unlike most previous ML+MIP approaches that either require complicated implementation or suffer from a lack of theoretical justification, our method is simple, flexible, provable, and explainable. Numerical experiments on both classical OR benchmark datasets and real-life instances validate the efficiency of our proposed method.

Updated: 2025-04-04 18:09:21

标题: 基于数据驱动的概率多变量分支的混合整数优化

摘要: 在本文中，我们提出了一个预训练混合整数优化框架（PreMIO），通过离线数据集和机器学习模型加速在线混合整数规划（MIP）求解。我们的方法基于数据驱动的多变量基数分支程序，使用浓度不等式选择超平面来分割MIP可行区域。与大多数先前的ML+MIP方法不同，这些方法要么需要复杂的实现，要么缺乏理论上的验证，我们的方法简单、灵活、可证明且可解释。对经典OR基准数据集和现实生活实例的数值实验验证了我们提出方法的有效性。

更新时间: 2025-04-04 18:09:21

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2305.12352v3

Bonsai: Interpretable Tree-Adaptive Grounded Reasoning

To develop general-purpose collaborative agents, humans need reliable AI systems that can (1) adapt to new domains and (2) transparently reason with uncertainty to allow for verification and correction. Black-box models demonstrate powerful data processing abilities but do not satisfy these criteria due to their opaqueness, domain specificity, and lack of uncertainty awareness. We introduce Bonsai, a compositional and probabilistic reasoning system that generates adaptable inference trees by retrieving relevant grounding evidence and using it to compute likelihoods of sub-claims derived from broader natural language inferences. Bonsai's reasoning power is tunable at test-time via evidence scaling and it demonstrates reliable handling of varied domains including transcripts, photographs, videos, audio, and databases. Question-answering and human alignment experiments demonstrate that Bonsai matches the performance of domain-specific black-box methods while generating interpretable, grounded, and uncertainty-aware reasoning traces.

Updated: 2025-04-04 17:59:50

标题: 盆景：可解释的树适应性基础推理

摘要: 为了开发通用的协作代理，人类需要可靠的人工智能系统，该系统能够（1）适应新的领域并（2）透明地处理不确定性，以便进行验证和修正。黑匣子模型展示了强大的数据处理能力，但由于其不透明性、领域特定性和缺乏不确定性意识，无法满足这些标准。我们介绍了一种名为Bonsai的组合和概率推理系统，该系统通过检索相关的基础证据并使用它来计算从更广泛的自然语言推理中导出的子主张的可能性，生成可适应的推理树。Bonsai的推理能力在测试时可以通过证据缩放进行调整，并且它展示了对包括文字记录、照片、视频、音频和数据库在内的各种领域的可靠处理能力。问答和人类对齐实验表明，Bonsai在生成可解释、基于事实、具有不确定性意识的推理追踪时与特定领域的黑匣子方法的性能相匹配。

更新时间: 2025-04-04 17:59:50

领域: cs.CL,cs.AI,cs.CV,68T50, 68T37,I.2.7

下载: http://arxiv.org/abs/2504.03640v1

AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

The advancements in large language models (LLMs) have propelled the improvement of video understanding tasks by incorporating LLMs with visual models. However, most existing LLM-based models (e.g., VideoLLaMA, VideoChat) are constrained to processing short-duration videos. Recent attempts to understand long-term videos by extracting and compressing visual features into a fixed memory size. Nevertheless, those methods leverage only visual modality to merge video tokens and overlook the correlation between visual and textual queries, leading to difficulties in effectively handling complex question-answering tasks. To address the challenges of long videos and complex prompts, we propose AdaCM$^2$, which, for the first time, introduces an adaptive cross-modality memory reduction approach to video-text alignment in an auto-regressive manner on video streams. Our extensive experiments on various video understanding tasks, such as video captioning, video question answering, and video classification, demonstrate that AdaCM$^2$ achieves state-of-the-art performance across multiple datasets while significantly reducing memory usage. Notably, it achieves a 4.5% improvement across multiple tasks in the LVU dataset with a GPU memory consumption reduction of up to 65%.

Updated: 2025-04-04 17:58:08

标题: AdaCM$^2$: 关于理解具有自适应跨模态记忆降低的极长期视频

摘要: 大型语言模型（LLM）的进步推动了通过将LLM与视觉模型相结合来改进视频理解任务。然而，大多数现有的基于LLM的模型（例如VideoLLaMA、VideoChat）受限于处理短时视频。最近一些尝试通过将视觉特征提取和压缩为固定的内存大小来理解长期视频。然而，这些方法仅利用视觉模态来合并视频标记，忽视了视觉和文本查询之间的相关性，导致在有效处理复杂问题回答任务方面存在困难。为了应对长视频和复杂提示的挑战，我们提出了AdaCM$^2$，这是首次在视频流上以自回归方式引入自适应交叉模态内存减少方法来进行视频文本对齐。我们在各种视频理解任务上进行了大量实验，如视频字幕生成、视频问题回答和视频分类，在多个数据集上证明了AdaCM$^2$在显著减少内存使用的同时实现了最先进的性能。值得注意的是，在LVU数据集中，它在多个任务上取得了4.5%的改进，并将GPU内存消耗降低了高达65%。

更新时间: 2025-04-04 17:58:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.12593v3

Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks requiring complex reasoning. However, the effects of scaling on their reasoning abilities remain insufficiently understood. In this paper, we introduce a synthetic multihop reasoning environment designed to closely replicate the structure and distribution of real-world large-scale knowledge graphs. Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios. To evaluate this, we pretrain language models (LMs) from scratch solely on triples from the incomplete graph and assess their ability to infer the missing edges. Interestingly, we observe that overparameterization can impair reasoning performance due to excessive memorization. We investigate different factors that affect this U-shaped loss curve, including graph structure, model size, and training steps. To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size. This work provides new insights into the relationship between scaling and reasoning in LLMs, shedding light on possible ways to optimize their performance for reasoning tasks.

Updated: 2025-04-04 17:57:22

标题: 更大的语言模型是否意味着更好的推理？推理的预训练缩放定律

摘要: 大型语言模型（LLMs）在需要复杂推理的各种任务中展示出了卓越的能力。然而，它们推理能力的扩展效果仍然不够了解。在本文中，我们引入了一个合成的多跳推理环境，旨在紧密复制真实世界大规模知识图的结构和分布。我们的推理任务涉及完成图中缺失的边，需要高级多跳推理，并模仿真实世界推理场景。为了评估这一点，我们从头开始仅在不完整图中的三元组上预训练语言模型（LMs），并评估它们推断缺失边的能力。有趣的是，我们观察到过度参数化可能会影响推理性能，因为会导致过度记忆。我们调查了影响这种U形损失曲线的不同因素，包括图结构、模型大小和训练步骤。为了预测特定知识图的最佳模型大小，我们发现了一个经验标度，将知识图搜索熵线性映射到最佳模型大小。这项工作为LLMs中扩展和推理之间的关系提供了新的见解，启示了优化它们在推理任务中性能的可能方法。

更新时间: 2025-04-04 17:57:22

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.03635v1

The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning

Large Language Models (LLMs) offer powerful capabilities in text generation and are increasingly adopted across a wide range of domains. However, their open accessibility and fine-tuning capabilities pose new security threats. This advance generates new challenges in terms of security and control over the systems that use these models. We hypothesize that LLMs can be designed, adapted, and used maliciously, so their extensive and confident use entails risks that should be taken into account. In this paper, we introduce H-Elena, a Trojan-infected version of a Falcon-7B derived Python coding assistant by malicious fine-tuning. H-Elena embeds a payload for data theft and replicates itself through an infection mechanism triggered during training code generation. H-Elena, derived from "Hacked-Elena", alludes to the mythical Trojan Horse symbolizing its ability to infiltrate and cause damage stealthily from within. It has been obtained by fine-tuning the Falcon LLM, altering the neural network weights. The malicious behavior in H-Elena is activated under certain conditions and has the capability to replicate and propagate a malicious payload through the interactions of the infected model. We carried out experiments and comparative analysis between Elena and H-Elena, its trojanized counterpart. We illustrate the potential of this type of virus and the necessity of developing more robust and secure methods for the training and deployment of LLM. Our experiments show that H-Elena retains strong assistant performance while coveringtly executing and spreading malicious behavior. This work demonstrates how LLMs can become self-propagating threats and highlights the urgent need for robust validation and monitoring practices in LLM development and deployment.

Updated: 2025-04-04 17:53:19

标题: H-Elena特洛伊木马病毒感染模型权重：有关恶意微调安全风险的警示

摘要: 大型语言模型（LLMs）在文本生成方面具有强大的能力，并且在各个领域中越来越被广泛采用。然而，它们的开放性和微调能力带来了新的安全威胁。这一进步在使用这些模型的系统的安全性和控制方面产生了新的挑战。我们假设LLMs可以被设计、调整和恶意使用，因此它们广泛和自信地使用带来了应该考虑的风险。在本文中，我们介绍了H-Elena，这是一种通过恶意微调从Falcon-7B衍生的Python编码助手的特洛伊木马感染版本。H-Elena嵌入了一个用于数据窃取的载荷，并通过在训练代码生成过程中触发的感染机制进行复制。H-Elena源自“Hacked-Elena”，暗示了其能够秘密渗透并造成损害的传说中的特洛伊木马象征。它是通过微调Falcon LLM并改变神经网络权重获得的。H-Elena中的恶意行为在特定条件下被激活，并具有复制和传播恶意载荷的能力，通过感染模型的相互作用进行。我们进行了Elena和其感染的对应品H-Elena之间的实验和比较分析。我们阐明了这种类型的病毒的潜力，以及开发更加健壮和安全的方法来训练和部署LLM的必要性。我们的实验表明，H-Elena在保持强大助手性能的同时，隐蔽地执行和传播恶意行为。这项工作展示了LLMs如何成为自我传播的威胁，并强调了在LLM开发和部署中急需健壮的验证和监控实践。

更新时间: 2025-04-04 17:53:19

领域: cs.CR

下载: http://arxiv.org/abs/2504.03823v1

Quantum Speedups for Markov Chain Monte Carlo Methods with Application to Optimization

We propose quantum algorithms that provide provable speedups for Markov Chain Monte Carlo (MCMC) methods commonly used for sampling from probability distributions of the form $\pi \propto e^{-f}$, where $f$ is a potential function. Our first approach considers Gibbs sampling for finite-sum potentials in the stochastic setting, employing an oracle that provides gradients of individual functions. In the second setting, we consider access only to a stochastic evaluation oracle, allowing simultaneous queries at two points of the potential function under the same stochastic parameter. By introducing novel techniques for stochastic gradient estimation, our algorithms improve the gradient and evaluation complexities of classical samplers, such as Hamiltonian Monte Carlo (HMC) and Langevin Monte Carlo (LMC) in terms of dimension, precision, and other problem-dependent parameters. Furthermore, we achieve quantum speedups in optimization, particularly for minimizing non-smooth and approximately convex functions that commonly appear in empirical risk minimization problems.

Updated: 2025-04-04 17:44:22

标题: 量子加速马尔可夫链蒙特卡洛方法及其在优化中的应用

摘要: 我们提出了量子算法，为常用于从形式为$\pi \propto e^{-f}$的概率分布中进行采样的马尔可夫链蒙特卡洛（MCMC）方法提供了可证明的加速。其中$f$为一个势函数。我们的第一种方法考虑了在随机设置中对有限和势的吉布斯采样，利用提供单个函数梯度的预言机。在第二种设置中，我们只考虑对随机评估预言机的访问，允许在相同随机参数下同时查询势函数的两个点。通过引入新颖的随机梯度估计技术，我们的算法在维度、精度和其他问题相关参数方面改进了经典采样器，例如哈密尔顿蒙特卡洛（HMC）和朗之万蒙特卡洛（LMC）的梯度和评估复杂性。此外，我们在优化方面实现了量子加速，特别是在最小化非光滑和近似凸函数方面，这些函数常出现在经验风险最小化问题中。

更新时间: 2025-04-04 17:44:22

领域: quant-ph,cs.LG,math.OC

下载: http://arxiv.org/abs/2504.03626v1

Reciprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction

Path loss modeling is a widely used technique for estimating point-to-point losses along a communications link from transmitter (Tx) to receiver (Rx). Accurate path loss predictions can optimize use of the radio frequency spectrum and minimize unwanted interference. Modern path loss modeling often leverages data-driven approaches, using machine learning to train models on drive test measurement datasets. Drive tests primarily represent downlink scenarios, where the Tx is located on a building and the Rx is located on a moving vehicle. Consequently, trained models are frequently reserved for downlink coverage estimation, lacking representation of uplink scenarios. In this paper, we demonstrate that data augmentation can be used to train a path loss model that is generalized to uplink, downlink, and backhaul scenarios, training using only downlink drive test measurements. By adding a small number of synthetic samples representing uplink scenarios to the training set, root mean squared error is reduced by >8 dB on uplink examples in the test set.

Updated: 2025-04-04 17:44:14

标题: 互惠感知卷积神经网络用于基于地图的路径损耗预测

摘要: 路径损耗建模是一种广泛使用的技术，用于估计从发射机（Tx）到接收机（Rx）沿通信链的点对点损失。准确的路径损耗预测可以优化无线频谱的利用，并最小化不必要的干扰。现代路径损耗建模通常利用数据驱动方法，利用机器学习在驾驶测试测量数据集上训练模型。驾驶测试主要代表下行场景，其中Tx位于建筑物上，Rx位于移动车辆上。因此，训练的模型经常用于下行覆盖估计，缺乏上行场景的代表性。本文演示了数据增强可用于训练一种通用于上行、下行和远程传输场景的路径损耗模型，仅使用下行驾驶测试测量进行训练。通过将代表上行场景的少量合成样本添加到训练集中，在测试集中上行示例上的均方根误差降低了8 dB以上。

更新时间: 2025-04-04 17:44:14

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2504.03625v1

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. All Nemotron-H models will be released, with support in Hugging Face, NeMo, and Megatron-LM.

Updated: 2025-04-04 17:41:58

标题: Nemotron-H: 一系列精准高效的混合Mamba-Transformer模型

摘要: 随着推理时间的缩短对于增强推理能力变得至关重要，构建高效的推理模型也变得越来越重要。我们引入了Nemotron-H，这是一个由8B和56B/47B混合的Mamba-Transformer模型家族，旨在降低在给定精度水平下的推理成本。为了实现这一目标，我们将常见Transformer模型架构中大部分的自注意力层替换为执行恒定计算并且每个生成的标记需要恒定内存的Mamba层。我们展示了Nemotron-H模型在推理速度上比其他类似规模的最新开源Transformer模型（例如Qwen-2.5-7B/72B和Llama-3.1-8B/70B）提供更好或者相当的准确性，同时推理速度可高达3倍。为进一步提高推理速度并降低推理时所需的内存，我们使用一种名为MiniPuzzle的新的剪枝和蒸馏技术，从56B模型中创建了Nemotron-H-47B-Base。Nemotron-H-47B-Base达到了与56B模型相似的准确性，但推理速度快了20%。此外，我们引入了基于FP8的训练配方，并展示它可以与基于BF16的训练取得相当的结果。这个配方被用来训练56B模型。所有的Nemotron-H模型将会发布，并在Hugging Face、NeMo和Megatron-LM上提供支持。

更新时间: 2025-04-04 17:41:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03624v1

Align to Structure: Aligning Large Language Models with Structural Information

Generating long, coherent text remains a challenge for large language models (LLMs), as they lack hierarchical planning and structured organization in discourse generation. We introduce Structural Alignment, a novel method that aligns LLMs with human-like discourse structures to enhance long-form text generation. By integrating linguistically grounded discourse frameworks into reinforcement learning, our approach guides models to produce coherent and well-organized outputs. We employ a dense reward scheme within a Proximal Policy Optimization framework, assigning fine-grained, token-level rewards based on the discourse distinctiveness relative to human writing. Two complementary reward models are evaluated: the first improves readability by scoring surface-level textual features to provide explicit structuring, while the second reinforces deeper coherence and rhetorical sophistication by analyzing global discourse patterns through hierarchical discourse motifs, outperforming both standard and RLHF-enhanced models in tasks such as essay generation and long-document summarization. All training data and code will be publicly shared at https://github.com/minnesotanlp/struct_align.

Updated: 2025-04-04 17:40:04

标题: 对齐结构：利用结构信息对齐大型语言模型

摘要: 生成长篇连贯文本仍然是大型语言模型（LLMs）面临的挑战，因为它们缺乏层次式规划和结构化组织在话语生成中。我们介绍了一种新方法——结构对齐（Structural Alignment），该方法将LLMs与类人话语结构对齐，以增强长篇文本生成。通过将语言学基础的话语框架整合到强化学习中，我们的方法引导模型产生连贯且组织良好的输出。我们在Proximal Policy Optimization框架中采用密集奖励方案，根据话语与人类写作的区别性赋予细粒度、标记级别的奖励。我们评估了两种互补的奖励模型：第一种通过评分表面层文本特征来提供显式结构化，从而改善可读性；第二种通过分析全局话语模式来加强更深层次的连贯性和修辞复杂度，通过层次话语主题优于标准和RLHF增强模型在诸如作文生成和长文摘要等任务中。所有训练数据和代码将在https://github.com/minnesotanlp/struct_align 上公开分享。

更新时间: 2025-04-04 17:40:04

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03622v1

Trading off Relevance and Revenue in the Jobs Marketplace: Estimation, Optimization and Auction Design

We study the problem of position allocation in job marketplaces, where the platform determines the ranking of the jobs for each seeker. The design of ranking mechanisms is critical to marketplace efficiency, as it influences both short-term revenue from promoted job placements and long-term health through sustained seeker engagement. Our analysis focuses on the tradeoff between revenue and relevance, as well as the innovations in job auction design. We demonstrated two ways to improve relevance with minimal impact on revenue: incorporating the seekers preferences and applying position-aware auctions.

Updated: 2025-04-04 17:36:40

标题: 在就业市场中权衡相关性和收入：估算、优化和拍卖设计

摘要: 我们研究了职位分配在职场市场中的问题，平台确定每个求职者的工作排名。排名机制的设计对市场效率至关重要，因为它影响了通过推广工作安置获得的短期收入以及通过持续求职者参与而获得的长期健康。我们的分析集中在收入和相关性之间的权衡，以及在工作拍卖设计中的创新。我们展示了两种提高相关性而对收入影响最小的方法：融入求职者的偏好和应用位置感知拍卖。

更新时间: 2025-04-04 17:36:40

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2504.03618v1

Arti-"fickle" Intelligence: Using LLMs as a Tool for Inference in the Political and Social Sciences

Generative large language models (LLMs) are incredibly useful, versatile, and promising tools. However, they will be of most use to political and social science researchers when they are used in a way that advances understanding about real human behaviors and concerns. To promote the scientific use of LLMs, we suggest that researchers in the political and social sciences need to remain focused on the scientific goal of inference. To this end, we discuss the challenges and opportunities related to scientific inference with LLMs, using validation of model output as an illustrative case for discussion. We propose a set of guidelines related to establishing the failure and success of LLMs when completing particular tasks, and discuss how we can make inferences from these observations. We conclude with a discussion of how this refocus will improve the accumulation of shared scientific knowledge about these tools and their uses in the social sciences.

Updated: 2025-04-04 17:35:45

标题: "虚假"智能：将LLM用作政治和社会科学推理工具

摘要: 生成式大语言模型（LLMs）是非常有用、多才多艺和具有前景的工具。然而，当它们以一种促进对真实人类行为和关注的理解的方式使用时，它们对政治和社会科学研究人员将会最有用。为了促进LLMs的科学使用，我们建议政治和社会科学领域的研究人员需要保持专注于推理的科学目标。为此，我们讨论了与LLMs进行科学推理相关的挑战和机遇，以验证模型输出作为讨论的一个例证案例。我们提出了一套指导方针，涉及在完成特定任务时建立LLMs的失败和成功，并讨论如何从这些观察中推理。最后，我们讨论了这种重新聚焦如何改善关于这些工具及其在社会科学中的使用的共享科学知识的积累。

更新时间: 2025-04-04 17:35:45

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2504.03822v1

Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task

Retrieval-augmented generation (RAG) has become a cornerstone of contemporary NLP, enhancing large language models (LLMs) by allowing them to access richer factual contexts through in-context retrieval. While effective in monolingual settings, especially in English, its use in multilingual tasks remains unexplored. This paper investigates the effectiveness of RAG across multiple languages by proposing novel approaches for multilingual open-domain question-answering. We evaluate the performance of various multilingual RAG strategies, including question-translation (tRAG), which translates questions into English before retrieval, and Multilingual RAG (MultiRAG), where retrieval occurs directly across multiple languages. Our findings reveal that tRAG, while useful, suffers from limited coverage. In contrast, MultiRAG improves efficiency by enabling multilingual retrieval but introduces inconsistencies due to cross-lingual variations in the retrieved content. To address these issues, we propose Crosslingual RAG (CrossRAG), a method that translates retrieved documents into a common language (e.g., English) before generating the response. Our experiments show that CrossRAG significantly enhances performance on knowledge-intensive tasks, benefiting both high-resource and low-resource languages.

Updated: 2025-04-04 17:35:43

标题: 多语言检索增强生成用于知识密集型任务

摘要: 检索增强生成（RAG）已成为当代自然语言处理的重要基石，通过在上下文检索中让大型语言模型（LLMs）访问更丰富的事实背景来增强其性能。在单语环境中表现出有效性，尤其是在英语环境中，但在多语言任务中的应用尚未被探索。本文通过提出用于多语言开放领域问答的新方法，调查了RAG在多种语言中的有效性。我们评估了各种多语言RAG策略的性能，包括问题翻译（tRAG），它在检索之前将问题翻译成英语，以及多语言RAG（MultiRAG），其中检索直接跨越多种语言。我们的研究结果显示，尽管tRAG有用，但覆盖范围有限。相比之下，MultiRAG通过实现多语言检索提高了效率，但由于检索内容中的跨语言变化，引入了不一致性。为了解决这些问题，我们提出了Crosslingual RAG（CrossRAG）方法，该方法在生成响应之前将检索到的文档翻译成共同语言（例如英语）。我们的实验表明，CrossRAG显著提高了知识密集型任务的性能，有利于高资源语言和低资源语言。

更新时间: 2025-04-04 17:35:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.03616v1

Autonomous and Self-Adapting System for Synthetic Media Detection and Attribution

Rapid advances in generative AI have enabled the creation of highly realistic synthetic images, which, while beneficial in many domains, also pose serious risks in terms of disinformation, fraud, and other malicious applications. Current synthetic image identification systems are typically static, relying on feature representations learned from known generators; as new generative models emerge, these systems suffer from severe performance degradation. In this paper, we introduce the concept of an autonomous self-adaptive synthetic media identification system -- one that not only detects synthetic images and attributes them to known sources but also autonomously identifies and incorporates novel generators without human intervention. Our approach leverages an open-set identification strategy with an evolvable embedding space that distinguishes between known and unknown sources. By employing an unsupervised clustering method to aggregate unknown samples into high-confidence clusters and continuously refining its decision boundaries, our system maintains robust detection and attribution performance even as the generative landscape evolves. Extensive experiments demonstrate that our method significantly outperforms existing approaches, marking a crucial step toward universal, adaptable forensic systems in the era of rapidly advancing generative models.

Updated: 2025-04-04 17:33:59

标题: 自主和自适应系统用于合成媒体检测和归因

摘要: 生成式人工智能的快速发展使得高度逼真的合成图像的创造成为可能，虽然在许多领域中具有益处，但也在虚假信息、欺诈和其他恶意应用方面带来严重风险。当前的合成图像识别系统通常是静态的，依赖于从已知生成器学习到的特征表示；随着新的生成模型出现，这些系统遭受严重的性能下降。在本文中，我们介绍了一个自主自适应合成媒体识别系统的概念——不仅可以检测合成图像并将其归因于已知来源，还可以在无需人为干预的情况下自动识别和整合新的生成器。我们的方法利用了一个开放式识别策略，其中包含一个可进化的嵌入空间，可以区分已知和未知来源。通过采用无监督聚类方法将未知样本聚集到高置信度的簇中，并不断优化其决策边界，我们的系统在生成景观不断演变的同时保持了稳健的检测和归因性能。大量实验证明我们的方法明显优于现有方法，标志着在快速发展的生成模型时代迈出了向普遍、适应性取证系统的关键一步。

更新时间: 2025-04-04 17:33:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03615v1

PAFFA: Premeditated Actions For Fast Agents

Modern AI assistants have made significant progress in natural language understanding and tool-use, with emerging efforts to interact with Web interfaces. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. We introduce PAFFA (Premeditated Actions For Fast Agents), a method that makes LLMs faster and more accurate in completing tasks on the internet using a novel inference-time technique that requires no task-specific training. PAFFA constructs an 'Action Library', leveraging the parametric knowledge of the base LLM to pre-compute browser interaction patterns that generalize across tasks. By strategically re-using LLM inference across tasks - either via 'Dist-Map' for task-agnostic identification of key interactive web elements, or 'Unravel' for first-encounter, stateful exploration of novel tasks/sites) - PAFFA drastically reduces inference time tokens by 87% while maintaining robust performance (achieving 0.57 vs. 0.50 step accuracy compared to baseline). Further, Unravel's ability to update its action library based on explorations allows generalization and adaptation to unseen websites. In sum, this work exhibits that LLM reasoning sequences can generalize across prompts, offering a way to scale inference-time techniques for internet-scale data with sublinear token count.

Updated: 2025-04-04 17:33:53

标题: PAFFA：快速代理的预谋行动

摘要: 现代人工智能助手在自然语言理解和工具使用方面取得了重大进展，并且正在努力与网络界面进行交互。然而，目前主要依赖重复LLM驱动的HTML解析的方法在处理动态网络界面和多步任务时计算成本高且容易出错。我们介绍了一种名为PAFFA（Premeditated Actions For Fast Agents）的方法，通过一种新颖的推理时间技术使LLM更快速、更准确地完成互联网任务，而无需特定任务的训练。PAFFA构建了一个“行动库”，利用基础LLM的参数化知识预先计算浏览器交互模式，这些模式可以在不同任务中泛化。通过在任务之间战略地重复使用LLM推理 - 无论是通过“Dist-Map”来识别关键的交互式网页元素，还是通过“Unravel”来对新任务/站点进行首次遇到、有状态的探索 - PAFFA将推理时间标记大大减少了87%，同时保持了稳健的性能（与基准相比，达到0.57 vs. 0.50的步骤准确度）。此外，Unravel根据探索的结果更新其行动库的能力使其能够泛化并适应未知的网站。总之，这项工作表明LLM推理序列可以在提示之间泛化，为在互联网规模数据上实现次线性标记计数的推理时间技术提供了一种扩展方法。

更新时间: 2025-04-04 17:33:53

领域: cs.AI

下载: http://arxiv.org/abs/2412.07958v2

Towards deployment-centric multimodal AI beyond vision and language

Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasise deeper integration across multiple levels of multimodality and multidisciplinary collaboration to significantly broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases: pandemic response, self-driving car design, and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability, and finance. By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.

Updated: 2025-04-04 17:20:05

标题: 走向基于部署的超越视觉和语言的多模态人工智能

摘要: 多模态人工智能（AI）通过机器学习整合各种类型的数据，以改善跨学科领域如医疗保健、科学和工程学的理解、预测和决策过程。然而，大多数多模态AI进展集中在视觉和语言数据模型上，而其可部署性仍然是一个关键挑战。我们倡导一个以部署为中心的工作流程，早期纳入部署约束以减少不可部署解决方案的可能性，以补充数据中心和模型中心的方法。我们还强调跨多个多模态水平和跨学科合作更深入地整合，以显著扩大研究范围，超越视觉和语言。为促进这种方法，我们确定了跨学科领域共享的常见多模态AI特定挑战，并检验了三个现实世界用例：疫情应对、自动驾驶汽车设计和气候变化适应，汲取来自医疗保健、社会科学、工程、科学、可持续性和金融方面的专业知识。通过促进跨学科对话和开放研究实践，我们的社区可以加速以部署为中心的开发，实现广泛社会影响。

更新时间: 2025-04-04 17:20:05

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03603v1

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $\tau$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-r models to advance research in AI agents. Models are available on HuggingFace at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4 and project website is https://apigen-mt.github.io

Updated: 2025-04-04 17:13:57

标题: APIGen-MT：通过模拟代理-人类互动生成多轮数据的主动管道

摘要: 训练有效的AI代理程序以进行多轮交互需要捕捉真实的人-代理动态的高质量数据，然而这样的数据稀缺且昂贵，手动收集。我们介绍了APIGen-MT，一个生成可验证和多样化的多轮代理数据的两阶段框架。在第一阶段，我们的代理管道通过LLM评审员委员会和迭代反馈循环生成详细的任务蓝图，带有地面真实动作。这些蓝图然后通过模拟的人-代理互动转化成完整的交互轨迹。我们训练了一系列模型--xLAM-2-fc-r系列，参数规模从1B到70B不等。我们的模型在$\tau$-bench和BFCL基准测试中表现优于前沿模型，如GPT-4o和Claude 3.5，较小模型在多轮设置中超越较大的对应模型，同时在多次试验中保持卓越的一致性。综合实验证明，我们的验证蓝图到细节方法产生高质量的训练数据，使得更可靠、高效、能力强大的代理程序的开发成为可能。我们开源了收集的合成数据和训练的xLAM-2-fc-r模型，以推动AI代理领域的研究。模型可在HuggingFace上获取，项目网站为https://apigen-mt.github.io。

更新时间: 2025-04-04 17:13:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03601v1

MedSAM2: Segment Anything in 3D Medical Images and Videos

Medical image and video segmentation is a critical task for precision medicine, which has witnessed considerable progress in developing task or modality-specific and generalist models for 2D images. However, there have been limited studies on building general-purpose models for 3D images and videos with comprehensive user studies. Here, we present MedSAM2, a promptable segmentation foundation model for 3D image and video segmentation. The model is developed by fine-tuning the Segment Anything Model 2 on a large medical dataset with over 455,000 3D image-mask pairs and 76,000 frames, outperforming previous models across a wide range of organs, lesions, and imaging modalities. Furthermore, we implement a human-in-the-loop pipeline to facilitate the creation of large-scale datasets resulting in, to the best of our knowledge, the most extensive user study to date, involving the annotation of 5,000 CT lesions, 3,984 liver MRI lesions, and 251,550 echocardiogram video frames, demonstrating that MedSAM2 can reduce manual costs by more than 85%. MedSAM2 is also integrated into widely used platforms with user-friendly interfaces for local and cloud deployment, making it a practical tool for supporting efficient, scalable, and high-quality segmentation in both research and healthcare environments.

Updated: 2025-04-04 17:13:37

标题: MedSAM2：在3D医学图像和视频中分割任何物体

摘要: 医学图像和视频分割是精准医学中的关键任务，已经在开发2D图像的任务或模态特定和通用模型方面取得了相当大的进展。然而，对于建立适用于3D图像和视频的通用模型以及进行综合用户研究的研究还很有限。在这里，我们介绍了MedSAM2，一个用于3D图像和视频分割的可提示的分割基础模型。该模型是通过在一个包含超过455,000个3D图像-掩模对和76,000帧的大型医学数据集上微调Segment Anything Model 2而开发的，表现优于先前的模型在各种器官、病变和成像模态上。此外，我们实现了一个人机协作的流程，以促进大规模数据集的创建，据我们所知，这是迄今为止规模最大的用户研究，涉及对5000个CT病变、3984个肝MRI病变和251,550个超声心动图视频帧进行注释，证明了MedSAM2能够将手动成本降低超过85%。MedSAM2还集成到广泛使用的平台中，具有用户友好的界面，可在本地和云端部署，使其成为支持研究和医疗环境中高效、可扩展和高质量分割的实用工具。

更新时间: 2025-04-04 17:13:37

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.03600v1

A Hybrid Wavelet-Fourier Method for Next-Generation Conditional Diffusion Models

We present a novel generative modeling framework,Wavelet-Fourier-Diffusion, which adapts the diffusion paradigm to hybrid frequency representations in order to synthesize high-quality, high-fidelity images with improved spatial localization. In contrast to conventional diffusion models that rely exclusively on additive noise in pixel space, our approach leverages a multi-transform that combines wavelet sub-band decomposition with partial Fourier steps. This strategy progressively degrades and then reconstructs images in a hybrid spectral domain during the forward and reverse diffusion processes. By supplementing traditional Fourier-based analysis with the spatial localization capabilities of wavelets, our model can capture both global structures and fine-grained features more effectively. We further extend the approach to conditional image generation by integrating embeddings or conditional features via cross-attention. Experimental evaluations on CIFAR-10, CelebA-HQ, and a conditional ImageNet subset illustrate that our method achieves competitive or superior performance relative to baseline diffusion models and state-of-the-art GANs, as measured by Fr\'echet Inception Distance (FID) and Inception Score (IS). We also show how the hybrid frequency-based representation improves control over global coherence and fine texture synthesis, paving the way for new directions in multi-scale generative modeling.

Updated: 2025-04-04 17:11:04

标题: 一种用于下一代条件扩散模型的混合小波-傅立叶方法

摘要: 我们提出了一个新颖的生成建模框架，Wavelet-Fourier-Diffusion，它将扩散范式调整到混合频率表示，以合成具有改善空间定位的高质量、高保真度的图像。与传统的依赖像素空间中的加性噪声的扩散模型相比，我们的方法利用了将小波子带分解与部分傅立叶步骤相结合的多重变换。这种策略在前向和反向扩散过程中逐渐降解然后重建图像在混合谱域中。通过将传统的基于傅立叶分析与小波的空间定位能力相结合，我们的模型可以更有效地捕捉全局结构和精细特征。我们进一步将这种方法扩展到通过交叉注意力集成嵌入或条件特征的条件图像生成。对CIFAR-10、CelebA-HQ和一个条件ImageNet子集上的实验评估表明，我们的方法在Fr\'echet Inception Distance（FID）和Inception Score（IS）的衡量标准下相对于基线扩散模型和最先进的GANs取得了具有竞争力或更优趀的性能。我们还展示了基于混合频率表示如何提高对全局连贯性和精细纹理合成的控制，为多尺度生成建模开辟了新的方向。

更新时间: 2025-04-04 17:11:04

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2504.03821v1

EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline

Existing information retrieval systems excel in cases where the language of target documents closely matches that of the user query. However, real-world retrieval systems are often required to implicitly reason whether a document is relevant. For example, when retrieving technical texts or tables, their relevance to the user query may be implied through a particular jargon or structure, rather than explicitly expressed in their content. Large language models (LLMs) hold great potential in identifying such implied relevance by leveraging their reasoning skills. Nevertheless, current LLM-augmented retrieval is hindered by high latency and computation cost, as the LLM typically computes the query-document relevance online, for every query anew. To tackle this issue we introduce EnrichIndex, a retrieval approach which instead uses the LLM offline to build semantically-enriched retrieval indices, by performing a single pass over all documents in the retrieval corpus once during ingestion time. Furthermore, the semantically-enriched indices can complement existing online retrieval approaches, boosting the performance of LLM re-rankers. We evaluated EnrichIndex on five retrieval tasks, involving passages and tables, and found that it outperforms strong online LLM-based retrieval systems, with an average improvement of 11.7 points in recall @ 10 and 10.6 points in NDCG @ 10 compared to strong baselines. In terms of online calls to the LLM, it processes 293.3 times fewer tokens which greatly reduces the online latency and cost. Overall, EnrichIndex is an effective way to build better retrieval indices offline by leveraging the strong reasoning skills of LLMs.

Updated: 2025-04-04 17:08:46

标题: EnrichIndex：使用LLMs离线丰富检索索引

摘要: 现有的信息检索系统在目标文档的语言与用户查询的语言非常接近的情况下表现出色。然而，在现实世界中，检索系统经常需要隐含地推理文档的相关性。例如，在检索技术文本或表格时，它们与用户查询的相关性可能通过特定的术语或结构隐含而非明确地表达在内容中。大型语言模型（LLMs）在通过利用其推理技能识别这种隐含相关性方面具有巨大潜力。然而，目前的LLM增强检索受到高延迟和计算成本的限制，因为LLM通常会在线计算每个查询与文档的相关性。为了解决这个问题，我们引入了EnrichIndex，一种检索方法，它利用LLM离线构建语义丰富的检索索引，在摄入时一次性遍历检索语料库中的所有文档。此外，语义丰富的索引可以补充现有的在线检索方法，提升LLM重新排序器的性能。我们在涉及段落和表格的五个检索任务上评估了EnrichIndex，并发现它在召回率@10方面比强大的在线LLM检索系统表现更好，相对于强大的基线，召回率@10平均提升11.7个点，NDCG@10提升10.6个点。在LLM的在线调用方面，它处理的令牌数量减少了293.3倍，大大降低了在线延迟和成本。总的来说，EnrichIndex是一种有效的通过利用LLMs强大的推理技能离线构建更好的检索索引的方法。

更新时间: 2025-04-04 17:08:46

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2504.03598v1

Real-is-Sim: Bridging the Sim-to-Real Gap with a Dynamic Digital Twin for Real-World Robot Policy Evaluation

Recent advancements in behavior cloning have enabled robots to perform complex manipulation tasks. However, accurately assessing training performance remains challenging, particularly for real-world applications, as behavior cloning losses often correlate poorly with actual task success. Consequently, researchers resort to success rate metrics derived from costly and time-consuming real-world evaluations, making the identification of optimal policies and detection of overfitting or underfitting impractical. To address these issues, we propose real-is-sim, a novel behavior cloning framework that incorporates a dynamic digital twin (based on Embodied Gaussians) throughout the entire policy development pipeline: data collection, training, and deployment. By continuously aligning the simulated world with the physical world, demonstrations can be collected in the real world with states extracted from the simulator. The simulator enables flexible state representations by rendering image inputs from any viewpoint or extracting low-level state information from objects embodied within the scene. During training, policies can be directly evaluated within the simulator in an offline and highly parallelizable manner. Finally, during deployment, policies are run within the simulator where the real robot directly tracks the simulated robot's joints, effectively decoupling policy execution from real hardware and mitigating traditional domain-transfer challenges. We validate real-is-sim on the PushT manipulation task, demonstrating strong correlation between success rates obtained in the simulator and real-world evaluations. Videos of our system can be found at https://realissim.rai-inst.com.

Updated: 2025-04-04 17:05:56

标题: Real-is-Sim：用于真实世界机器人策略评估的动态数字孪生技术弥合模拟与真实之间的差距

摘要: 最近在行为克隆方面取得的进展使得机器人能够执行复杂的操作任务。然而，准确评估训练性能仍然具有挑战性，特别是对于实际应用，因为行为克隆的损失常常与实际任务成功之间的相关性较差。因此，研究人员不得不依赖于从昂贵且耗时的实际评估中得出的成功率指标，使得找到最佳策略和检测过拟合或欠拟合变得不切实际。为了解决这些问题，我们提出了real-is-sim，这是一个新颖的行为克隆框架，它贯穿整个策略开发流程，包括数据收集、训练和部署，都采用了基于体现高斯的动态数字孪生技术。通过不断将模拟世界与现实世界对齐，可以在真实世界中收集演示，并从模拟器中提取状态。模拟器可以通过从任何视角呈现图像输入或从场景中具体对象提取低级状态信息，实现灵活的状态表示。在训练过程中，策略可以在模拟器中以离线和高度可并行化的方式直接评估。最后，在部署过程中，策略在模拟器中运行，真实机器人直接跟踪模拟机器人的关节，有效地将策略执行与真实硬件分离，减轻了传统的领域转移挑战。我们在PushT操作任务上验证了real-is-sim，在模拟器中获得的成功率与实际世界评估之间表现出强大的相关性。我们的系统视频可以在https://realissim.rai-inst.com找到。

更新时间: 2025-04-04 17:05:56

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2504.03597v1

Optimistic Online Learning in Symmetric Cone Games

Optimistic online learning algorithms have led to significant advances in equilibrium computation, particularly for two-player zero-sum games, achieving an iteration complexity of $\mathcal{O}(1/\epsilon)$ to reach an $\epsilon$-saddle point. These advances have been established in normal-form games, where strategies are simplex vectors, and quantum games, where strategies are trace-one positive semidefinite matrices. We extend optimistic learning to symmetric cone games (SCGs), a class of two-player zero-sum games where strategy spaces are generalized simplices (trace-one slices of symmetric cones). A symmetric cone is the cone of squares of a Euclidean Jordan Algebra; canonical examples include the nonnegative orthant, the second-order cone, the cone of positive semidefinite matrices, and their products, all fundamental to convex optimization. SCGs unify normal-form and quantum games and, as we show, offer significantly greater modeling flexibility, allowing us to model applications such as distance metric learning problems and the Fermat-Weber problem. To compute approximate saddle points in SCGs, we introduce the Optimistic Symmetric Cone Multiplicative Weights Update algorithm and establish an iteration complexity of $\mathcal{O}(1/\epsilon)$ to reach an $\epsilon$-saddle point. Our analysis builds on the Optimistic Follow-the-Regularized-Leader framework, with a key technical contribution being a new proof of the strong convexity of the symmetric cone negative entropy with respect to the trace-one norm, a result that may be of independent interest.

Updated: 2025-04-04 16:59:19

标题: 对称锥游戏中的乐观在线学习

摘要: 乐观的在线学习算法在均衡计算方面取得了重大进展，特别是对于两人零和博弈，实现了迭代复杂度为$\mathcal{O}(1/\epsilon)$，以达到$\epsilon$-鞍点。这些进展已在正态形式游戏和量子游戏中得到确立，其中策略为单纯形向量，以及策略为迹为一的正半定矩阵。我们将乐观学习扩展到对称锥游戏（SCGs），这是一类两人零和博弈，其中策略空间是广义单纯形（对称锥的迹为一切片）。对称锥是欧几里德约旦代数的平方锥；典型示例包括非负正半轴，二阶锥，正半定矩阵锥及其乘积，所有这些对于凸优化都是基本的。SCGs统一了正态形式和量子游戏，并且正如我们展示的，提供了显著更大的建模灵活性，使我们能够建模诸如距离度量学习问题和费马-韦伯问题等应用。为了在SCGs中计算近似鞍点，我们引入了乐观对称锥乘法权重更新算法，并建立了达到$\epsilon$-鞍点的迭代复杂度为$\mathcal{O}(1/\epsilon)$。我们的分析基于乐观Follow-the-Regularized-Leader框架，其中一个关键的技术贡献是针对迹为一范数的对称锥负熵的强凸性的新证明，这一结果可能具有独立的兴趣。

更新时间: 2025-04-04 16:59:19

领域: math.OC,cs.GT,cs.LG

下载: http://arxiv.org/abs/2504.03592v1

A Hitchhiker's Guide to Understanding Performances of Two-Class Classifiers

Properly understanding the performances of classifiers is essential in various scenarios. However, the literature often relies only on one or two standard scores to compare classifiers, which fails to capture the nuances of application-specific requirements. The Tile is a recently introduced visualization tool organizing an infinity of ranking scores into a 2D map. Thanks to the Tile, it is now possible to compare classifiers efficiently, displaying all possible application-specific preferences instead of having to rely on a pair of scores. This hitchhiker's guide to understanding the performances of two-class classifiers presents four scenarios showcasing different user profiles: a theoretical analyst, a method designer, a benchmarker, and an application developer. We introduce several interpretative flavors adapted to the user's needs by mapping different values on the Tile. We illustrate this guide by ranking and analyzing the performances of 74 state-of-the-art semantic segmentation models through the perspective of the four scenarios. Through these user profiles, we demonstrate that the Tile effectively captures the behavior of classifiers in a single visualization, while accommodating an infinite number of ranking scores. Code for mapping the different Tile flavors is available in supplementary material.

Updated: 2025-04-04 16:58:56

标题: 一个理解双分类分类器性能的搭车者指南

摘要: 正确理解分类器的性能在各种情况下都是至关重要的。然而，文献通常只依赖于一个或两个标准分数来比较分类器，这无法捕捉特定应用需求的细微差别。Tile是一种最近引入的可视化工具，将无限数量的排名分数组织成二维地图。借助Tile，现在可以高效地比较分类器，显示所有可能的应用程序特定偏好，而不必依赖一对分数。这本“搭车者指南”旨在帮助理解两类分类器的性能，展示了四种不同用户配置文件：理论分析师、方法设计师、基准测试者和应用开发者。我们通过在Tile上映射不同值来引入几种适应用户需求的解释风格。我们通过从四种情景的角度对74个最新的语义分割模型的性能进行排名和分析来说明这本指南。通过这些用户配置文件，我们证明了Tile有效地捕捉了分类器的行为，同时容纳了无限数量的排名分数。映射不同Tile风格的代码可在补充材料中找到。

更新时间: 2025-04-04 16:58:56

领域: cs.CV,cs.LG,cs.PF

下载: http://arxiv.org/abs/2412.04377v3

Adaptive Semantic Prompt Caching with VectorQ

Semantic prompt caches reduce the latency and cost of large language model (LLM) inference by reusing cached LLM-generated responses for semantically similar prompts. Vector similarity metrics assign a numerical score to quantify the similarity between an embedded prompt and its nearest neighbor in the cache. Existing systems rely on a static threshold to classify whether the similarity score is sufficiently high to result in a cache hit. We show that this one-size-fits-all threshold is insufficient across different embeddings. We propose VectorQ, an online framework with a threshold convergence guarantee to learn embedding-specific threshold regions that adapt to the uncertainty of an embedding. Through evaluations on a combination of three diverse datasets, we show that VectorQ consistently outperforms state-of-the-art systems across all static thresholds, achieving up to 26x increases in cache hit rate and error rate reductions up to 74%.

Updated: 2025-04-04 16:51:15

标题: 使用VectorQ进行自适应语义提示缓存

摘要: 语义提示缓存通过重复使用缓存的大型语言模型（LLM）生成的响应来减少延迟和成本。向量相似度度量使用数字分数来量化嵌入提示与缓存中最近邻居之间的相似性。现有系统依赖静态阈值来分类相似性分数是否足够高以导致缓存命中。我们表明，这种一刀切的阈值在不同嵌入中是不够的。我们提出了VectorQ，一个在线框架，具有阈值收敛保证，可以学习适应嵌入不确定性的特定嵌入阈值区域。通过对三个不同数据集的组合进行评估，我们表明VectorQ始终优于现有系统，无论是在所有静态阈值上，还是在缓存命中率上实现了最多26倍的增加，错误率降低最多74%。

更新时间: 2025-04-04 16:51:15

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2502.03771v2

Cloning Games, Black Holes and Cryptography

Quantum no-cloning is one of the most fundamental properties of quantum information. In this work, we introduce a new toolkit for analyzing cloning games; these games capture more quantitative versions of no-cloning and are central to unclonable cryptography. Previous works rely on the framework laid out by Tomamichel, Fehr, Kaniewski and Wehner to analyze both the $n$-qubit BB84 game and the subspace coset game. Their constructions and analysis face the following inherent limitations: - The existing bounds on the values of these games are at least $2^{-0.25n}$; on the other hand, the trivial adversarial strategy wins with probability $2^{-n}$. Not only that, the BB84 game does in fact admit a highly nontrivial winning strategy. This raises the natural question: are there cloning games which admit no non-trivial winning strategies? - The existing constructions are not multi-copy secure; the BB84 game is not even $2 \mapsto 3$ secure, and the subspace coset game is not $t \mapsto t+1$ secure for a polynomially large $t$. Moreover, we provide evidence that the existing technical tools do not suffice to prove multi-copy security of even completely different constructions. This raises the natural question: can we design new cloning games that achieve multi-copy security, possibly by developing a new analytic toolkit? We study a new cloning game based on binary phase states and show that it is $t$-copy secure when $t=o(n/\log n)$. Moreover, for constant $t$, we obtain the first asymptotically optimal bounds of $O(2^{-n})$. We also show a worst-case to average-case reduction for a large class of cloning games, which allows us to show the same quantitative results for Haar cloning games. These technical ingredients together enable two new applications which have previously been out of reach; one in black hole physics, and one in unclonable cryptography.

Updated: 2025-04-04 16:48:23

标题: 复制游戏，黑洞和密码学

摘要: 量子不复制是量子信息中最基本的性质之一。在这项工作中，我们引入了一个用于分析克隆游戏的新工具包；这些游戏捕捉了更多与不复制相关的定量版本，对于不可克隆加密至关重要。先前的研究依赖于Tomamichel、Fehr、Kaniewski和Wehner提出的框架来分析$n$量子比特BB84游戏和子空间余类游戏。他们的构造和分析面临以下固有限制： - 这些游戏的现有值的边界至少为$2^{-0.25n}$；另一方面，平凡的对抗策略以$2^{-n}$的概率获胜。不仅如此，BB84游戏实际上确实有一个非常复杂的获胜策略。这引发了一个自然的问题：是否存在不允许非平凡获胜策略的克隆游戏？ - 现有的构造不具备多拷贝安全性；BB84游戏甚至不是$2 \mapsto 3$安全的，子空间余类游戏对于多项式规模的$t$来说也不是$t \mapsto t+1$安全的。此外，我们提供证据表明，现有的技术工具不足以证明甚至是完全不同的构造的多拷贝安全性。这引发了一个自然的问题：我们是否可以设计新的克隆游戏，实现多拷贝安全性，可能通过开发一个新的分析工具包？我们研究基于二进制相位状态的新克隆游戏，并展示当$t=o(n/\log n)$时，它是$t$-拷贝安全的。此外，对于常数$t$，我们获得了首个渐近最优边界$O(2^{-n})。我们还展示了一个广泛类别的克隆游戏的最坏情况到平均情况的归约，这使我们能够展示对于Haar克隆游戏相同的定量结果。这些技术要素共同实现了两个之前无法达到的新应用；一个在黑洞物理学中，另一个在不可克隆加密中。

更新时间: 2025-04-04 16:48:23

领域: quant-ph,cs.CR,hep-th

下载: http://arxiv.org/abs/2411.04730v2

Scalable Hypergraph Structure Learning with Diverse Smoothness Priors

In graph signal processing, learning the weighted connections between nodes from a set of sample signals is a fundamental task when the underlying relationships are not known a priori. This task is typically addressed by finding a graph Laplacian on which the observed signals are smooth. With the extension of graphs to hypergraphs - where edges can connect more than two nodes - graph learning methods have similarly been generalized to hypergraphs. However, the absence of a unified framework for calculating total variation has led to divergent definitions of smoothness and, consequently, differing approaches to hyperedge recovery. We confront this challenge through generalization of several previously proposed hypergraph total variations, subsequently allowing ease of substitution into a vector based optimization. To this end, we propose a novel hypergraph learning method that recovers a hypergraph topology from time-series signals based on a smoothness prior. Our approach addresses key limitations in prior works, such as hyperedge selection and convergence issues, by formulating the problem as a convex optimization solved via a forward-backward-forward algorithm, ensuring guaranteed convergence. Additionally, we introduce a process that simultaneously limits the span of the hyperedge search and maintains a valid hyperedge selection set. In doing so, our method becomes scalable in increasingly complex network structures. The experimental results demonstrate improved performance, in terms of accuracy, over other state-of-the-art hypergraph inference methods; furthermore, we empirically show our method to be robust to total variation terms, biased towards global smoothness, and scalable to larger hypergraphs.

Updated: 2025-04-04 16:47:30

标题: 可扩展的具有多样性平滑先验的超图结构学习

摘要: 在图信号处理中，从一组样本信号中学习节点之间的加权连接是一项基本任务，特别是当底层关系事先未知时。通常，这个任务通过找到一个图拉普拉斯算子来解决，观察到的信号在该算子上是平滑的。随着图扩展到超图 - 其中边可以连接超过两个节点 - 图学习方法也已被推广到超图。然而，由于缺乏统一的计算总变差的框架，导致了对平滑性的不同定义，因此对超边恢复的方法也不同。我们通过泛化先前提出的几种超图总变差来应对这一挑战，随后可以轻松地用于向量优化中。为此，我们提出了一种新颖的基于平滑性先验的超图学习方法，从时间序列信号中恢复超图拓扑结构。我们的方法通过将问题构建为一个凸优化问题，并通过前向-后向-前向算法解决，确保了收敛性。此外，我们引入了一个过程，同时限制超边搜索的范围，并保持一个有效的超边选择集。通过这样做，我们的方法在越来越复杂的网络结构中变得可扩展。实验结果表明，与其他最先进的超图推断方法相比，我们的方法在准确性方面表现更好；此外，我们通过实验证明我们的方法对总变差项具有鲁棒性，偏向于全局平滑性，并可扩展到更大的超图。

更新时间: 2025-04-04 16:47:30

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2504.03583v1

Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?

An AI control protocol is a plan for usefully deploying AI systems that aims to prevent an AI from intentionally causing some unacceptable outcome. This paper investigates how well AI systems can generate and act on their own strategies for subverting control protocols whilst operating statelessly (without shared memory between contexts). To do this, an AI system may need to reliably generate optimal plans in each context, take actions with well-calibrated probabilities, and coordinate plans with other instances of itself without communicating. We develop Subversion Strategy Eval, a suite of eight environments, covering a range of protocols and strategic capabilities, and six sets of affordances that help isolate individual capabilities. We implement the evaluation in Inspect-AI and release it open-source. We evaluate Claude 3.5 models, including helpful-only versions, as well as OpenAI reasoning models. None of the models demonstrate substantial capability in strategizing to subvert control protocols statelessly. However, providing models with additional affordances, such as the ability to share a plan between contexts, can substantially improve performance. We hope our evaluations can act as a leading indicator for when models are capable of subverting control protocols and also relax the worst-case assumption of perfect strategic ability in AI control evaluations.

Updated: 2025-04-04 16:36:02

标题: 颠覆策略评估：语言模型能否无状态地策划颠覆控制协议？

摘要: AI控制协议是一项有助于部署AI系统的计划，旨在防止AI故意造成某些不可接受的结果。本文调查了AI系统在无状态操作（在不同环境之间没有共享内存）时生成和执行自己的策略以颠覆控制协议的能力。为了做到这一点，AI系统可能需要在每个环境中可靠地生成最佳计划，以适当校准的概率采取行动，并在不通信的情况下协调计划与其他实例。我们开发了Subversion Strategy Eval，这是一个包含八个环境的套件，涵盖了一系列协议和战略能力，以及六组有助于分离个人能力的功能。我们在Inspect-AI中实施了评估，并以开源发布。我们评估了Claude 3.5模型，包括仅具有帮助功能的版本，以及OpenAI推理模型。在无状态情况下，没有一个模型表现出在策略上颠覆控制协议的实质能力。然而，为模型提供额外的功能，例如在不同环境之间共享计划的能力，可以显著改善性能。我们希望我们的评估可以作为模型何时能够颠覆控制协议的能力的先行指标，并且放宽AI控制评估中对完美战略能力的最坏情形假设。

更新时间: 2025-04-04 16:36:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2412.12480v4

Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

Detecting whether an LLM hallucinates is an important research challenge. One promising way of doing so is to estimate the semantic entropy (Farquhar et al., 2024) of the distribution of generated sequences. We propose a new algorithm for doing that, with two main advantages. First, due to us taking the Bayesian approach, we achieve a much better quality of semantic entropy estimates for a given budget of samples from the LLM. Second, we are able to tune the number of samples adaptively so that `harder' contexts receive more samples. We demonstrate empirically that our approach systematically beats the baselines, requiring only 59% of samples used by Farquhar et al. (2024) to achieve the same quality of hallucination detection as measured by AUROC. Moreover, quite counterintuitively, our estimator is useful even with just one sample from the LLM.

Updated: 2025-04-04 16:30:44

标题: 在预算范围内检测幻觉：语义熵的高效贝叶斯估计

摘要: 检测LLM是否产生幻觉是一个重要的研究挑战。一种有希望的方法是估计生成序列分布的语义熵（Farquhar等人，2024年）。我们提出了一种新的算法来实现这一点，具有两个主要优势。首先，由于采用了贝叶斯方法，我们在给定LLM样本预算的情况下，实现了更好的语义熵估计质量。其次，我们能够自适应地调整样本数量，使得“更困难”的上下文获得更多的样本。我们在实证上证明，我们的方法系统地击败了基准线，仅需要Farquhar等人（2024年）使用样本数量的59%即可达到相同质量的幻觉检测，衡量标准为AUROC。此外，相当令人费解的是，我们的估计器甚至在仅有一个样本的情况下也是有用的。

更新时间: 2025-04-04 16:30:44

领域: cs.LG

下载: http://arxiv.org/abs/2504.03579v1

Intuitionistic Fuzzy Cognitive Maps for Interpretable Image Classification

Several deep learning (DL) approaches have been proposed to deal with image classification tasks. However, despite their effectiveness, they lack interpretability, as they are unable to explain or justify their results. To address the challenge of interpretable image classification, this paper introduces a novel framework, named Interpretable Intuitionistic Fuzzy Cognitive Maps (I2FCMs).Intuitionistic FCMs (iFCMs) have been proposed as an extension of FCMs offering a natural mechanism to assess the quality of their output through the estimation of hesitancy, a concept resembling human hesitation in decision making. In the context of image classification, hesitancy is considered as a degree of unconfidence with which an image is categorized to a class. To the best of our knowledge this is the first time iFCMs are applied for image classification. Further novel contributions of the introduced framework include the following: a) a feature extraction process focusing on the most informative image regions; b) a learning algorithm for automatic data-driven determination of the intuitionistic fuzzy interconnections of the iFCM, thereby reducing human intervention in the definition of the graph structure; c) an inherently interpretable classification approach based on image contents, providing understandable explanations of its predictions, using linguistic terms. Furthermore, the proposed I2FCM framework can be applied to DL models, including Convolutional Neural Network (CNN), rendering them interpretable. The effectiveness of I2FCM is evaluated on publicly available datasets, and the results confirm that it can provide enhanced classification performance, while providing interpretable inferences.

Updated: 2025-04-04 16:28:33

标题: 直觉模糊认知地图用于可解释的图像分类

摘要: 已经提出了几种深度学习（DL）方法来处理图像分类任务。然而，尽管它们有效，但它们缺乏可解释性，因为它们无法解释或证明其结果。为了解决可解释图像分类的挑战，本文介绍了一个新的框架，名为可解释直觉模糊认知图（I2FCMs）。直觉FCMs（iFCMs）被提出作为FCMs的扩展，通过估计犹豫来评估其输出的质量，犹豫是一种类似于人类在决策过程中犹豫的概念。在图像分类的背景下，犹豫被认为是将图像分类到类别的不确定度程度。据我们所知，这是第一次将iFCMs应用于图像分类。引入框架的进一步新贡献包括：a）侧重于最具信息价值的图像区域的特征提取过程；b）用于自动数据驱动确定iFCM直觉模糊互连的学习算法，从而减少人类对图形结构的干预；c）基于图像内容的固有可解释分类方法，使用语言术语提供其预测的可理解解释。此外，提出的I2FCM框架可以应用于DL模型，包括卷积神经网络（CNN），使它们具有可解释性。I2FCM的有效性在公开可用的数据集上进行评估，结果证实它可以提供增强的分类性能，同时提供可解释的推理。

更新时间: 2025-04-04 16:28:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.03745v2

Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion

Generating co-speech gestures in real time requires both temporal coherence and efficient sampling. We introduce Accelerated Rolling Diffusion, a novel framework for streaming gesture generation that extends rolling diffusion models with structured progressive noise scheduling, enabling seamless long-sequence motion synthesis while preserving realism and diversity. We further propose Rolling Diffusion Ladder Acceleration (RDLA), a new approach that restructures the noise schedule into a stepwise ladder, allowing multiple frames to be denoised simultaneously. This significantly improves sampling efficiency while maintaining motion consistency, achieving up to a 2x speedup with high visual fidelity and temporal coherence. We evaluate our approach on ZEGGS and BEAT, strong benchmarks for real-world applicability. Our framework is universally applicable to any diffusion-based gesture generation model, transforming it into a streaming approach. Applied to three state-of-the-art methods, it consistently outperforms them, demonstrating its effectiveness as a generalizable and efficient solution for real-time, high-fidelity co-speech gesture synthesis.

Updated: 2025-04-04 16:12:42

标题: 通过加速滚动扩散实现共语手势的流式生成

摘要: 在实时生成共同语言手势时，需要同时具备时间连贯性和高效采样。我们引入了加速滚动扩散（Accelerated Rolling Diffusion），这是一个新颖的流式手势生成框架，通过结构化渐进性噪声调度扩展了滚动扩散模型，实现了无缝的长序列动作合成，同时保持了真实性和多样性。我们进一步提出了滚动扩散梯度加速（RDLA），这是一种新方法，将噪声调度重构为逐步梯形，允许同时去噪多个帧。这显著提高了采样效率，同时保持了动作一致性，实现了高达2倍的加速，并且具有高视觉保真度和时间连贯性。我们在ZEGGS和BEAT上评估了我们的方法，这是实际应用的强有力基准。我们的框架普遍适用于任何基于扩散的手势生成模型，将其转变为一种流式方法。应用于三种最先进的方法后，我们的方法始终表现优异，展示了其作为实时、高保真度共同语言手势合成的通用且高效解决方案的有效性。

更新时间: 2025-04-04 16:12:42

领域: cs.LG,cs.CV,cs.HC

下载: http://arxiv.org/abs/2503.10488v2

SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement

In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments. Code is available at https://github.com/zjunlp/SynWorld.

Updated: 2025-04-04 16:10:57

标题: SynWorld：用于主体行动知识细化的虚拟场景合成

摘要: 在代理与其环境之间的互动中，代理通过规划和执行行动来扩展其能力。然而，基于LLM的代理在部署到新环境或需要导航非传统行动空间时面临重大挑战。为了赋予代理自主探索环境、优化工作流程并增强其对行动的理解能力，我们提出了SynWorld框架，该框架允许代理在行动空间内合成可能的场景，并执行多步行动调用，通过蒙特卡洛树搜索（MCTS）探索有效地完善他们在当前环境中的行动知识。我们的实验表明，SynWorld是一种在新环境中学习行动知识的有效且通用的方法。代码可在https://github.com/zjunlp/SynWorld获取。

更新时间: 2025-04-04 16:10:57

领域: cs.CL,cs.AI,cs.CV,cs.LG,cs.MA

下载: http://arxiv.org/abs/2504.03561v1

Stochastic Optimization with Optimal Importance Sampling

Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its power, the performance of IS is often highly sensitive to the choice of the proposal distribution and frequently requires stochastic calibration techniques. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a unique challenge: the decision and the IS distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both the analysis of convergence for decision iterates and the efficiency of the IS scheme. In this paper, we propose an iterative gradient-based algorithm that jointly updates the decision variable and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible asymptotic variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family. Furthermore, we show that these properties are preserved under linear constraints by incorporating a recent variant of Nesterov's dual averaging method.

Updated: 2025-04-04 16:10:18

标题: 随机优化中的最优重要性采样

摘要: 重要性抽样（IS）是一种广泛使用的方差缩减技术，可提高蒙特卡洛方法的效率，特别是在稀有事件模拟和相关应用中。尽管其强大，IS的性能通常对提议分布的选择非常敏感，并经常需要随机校准技术。虽然在估计设置中已经广泛研究了IS的设计和分析，但在随机优化中应用IS引入了一种独特的挑战：决策和IS分布是相互依赖的，形成一个循环优化结构。这种相互依赖使得决策迭代的收敛分析和IS方案的效率都变得复杂。在本文中，我们提出了一种基于梯度的迭代算法，同时更新决策变量和IS分布，而无需在两者之间进行时间尺度分离。我们的方法实现了可能的最低渐近方差，并在客观凸性和IS分布族的温和假设下保证全局收敛。此外，我们证明了通过结合最近的Nesterov双平均方法的变体，这些属性在线性约束下得以保留。

更新时间: 2025-04-04 16:10:18

领域: math.OC,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2504.03560v1

Quantifying Knowledge Distillation Using Partial Information Decomposition

Knowledge distillation deploys complex machine learning models in resource-constrained environments by training a smaller student model to emulate internal representations of a complex teacher model. However, the teacher's representations can also encode nuisance or additional information not relevant to the downstream task. Distilling such irrelevant information can actually impede the performance of a capacity-limited student model. This observation motivates our primary question: What are the information-theoretic limits of knowledge distillation? To this end, we leverage Partial Information Decomposition to quantify and explain the transferred knowledge and knowledge left to distill for a downstream task. We theoretically demonstrate that the task-relevant transferred knowledge is succinctly captured by the measure of redundant information about the task between the teacher and student. We propose a novel multi-level optimization to incorporate redundant information as a regularizer, leading to our framework of Redundant Information Distillation (RID). RID leads to more resilient and effective distillation under nuisance teachers as it succinctly quantifies task-relevant knowledge rather than simply aligning student and teacher representations.

Updated: 2025-04-04 16:08:36

标题: 使用部分信息分解量化知识蒸馏

摘要: 知识蒸馏通过训练一个较小的学生模型来模拟复杂教师模型的内部表示，从而在资源受限的环境中部署复杂的机器学习模型。然而，教师的表示也可能编码与下游任务无关的无用或附加信息。蒸馏这种无关信息实际上可能会妨碍容量有限的学生模型的性能。这一观察激发了我们的主要问题：知识蒸馏的信息理论限制是什么？为此，我们利用部分信息分解来量化和解释转移的知识以及留给下游任务蒸馏的知识。我们在理论上证明了与任务相关的转移知识被任务之间的冗余信息量度简洁地捕获。我们提出了一种新颖的多层优化方法，将冗余信息作为正则化器，从而引导我们的冗余信息蒸馏（RID）框架。RID在有害教师下导致更具韧性和有效性的蒸馏，因为它简洁地量化了与任务相关的知识，而不仅仅是对齐学生和教师的表示。

更新时间: 2025-04-04 16:08:36

领域: stat.ML,cs.CV,cs.IT,cs.LG,eess.IV,math.IT

下载: http://arxiv.org/abs/2411.07483v2

Agentic Knowledgeable Self-awareness

Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a "flood irrigation" methodology that indiscriminately injects gold trajectories, external feedback, and domain knowledge into agent models. This practice overlooks the fundamental human cognitive principle of situational self-awareness during decision-making-the ability to dynamically assess situational demands and strategically employ resources during decision-making. We propose agentic knowledgeable self-awareness to address this gap, a novel paradigm enabling LLM-based agents to autonomously regulate knowledge utilization. Specifically, we propose KnowSelf, a data-centric approach that applies agents with knowledgeable self-awareness like humans. Concretely, we devise a heuristic situation judgement criterion to mark special tokens on the agent's self-explored trajectories for collecting training data. Through a two-stage training process, the agent model can switch between different situations by generating specific special tokens, achieving optimal planning effects with minimal costs. Our experiments demonstrate that KnowSelf can outperform various strong baselines on different tasks and models with minimal use of external knowledge. Code is available at https://github.com/zjunlp/KnowSelf.

Updated: 2025-04-04 16:03:38

标题: 主动型知识自我意识

摘要: 大型语言模型（LLM）在各种代理规划任务中取得了相当可观的表现。然而，传统的代理规划方法采用了“泛滥灌溉”方法，将黄金轨迹、外部反馈和领域知识不加区分地注入代理模型中。这种做法忽视了决策过程中的基本人类认知原则-情境自我意识的能力，即在决策过程中动态评估情境需求并在决策过程中策略性地利用资源的能力。我们提出了代理知识自我意识来弥补这一差距，这是一种新颖的范式，使基于LLM的代理能够自主调节知识利用。具体来说，我们提出了KnowSelf，这是一种以数据为中心的方法，将代理赋予像人类一样的知识自我意识。具体来说，我们设计了一个启发式情境判断标准，用于在代理自我探索的轨迹上标记特殊标记，以收集训练数据。通过两阶段的训练过程，代理模型可以通过生成特定的特殊标记在不同情况之间切换，以最小的成本实现最佳的规划效果。我们的实验证明，KnowSelf可以在不使用外部知识的情况下，击败各种强基线在不同任务和模型上。代码可在https://github.com/zjunlp/KnowSelf 上找到。

更新时间: 2025-04-04 16:03:38

领域: cs.CL,cs.AI,cs.CV,cs.LG,cs.MA

下载: http://arxiv.org/abs/2504.03553v1

Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

We explore the local landscape of low-rank matrix recovery, focusing on reconstructing a $d_1\times d_2$ matrix $X^\star$ with rank $r$ from $m$ linear measurements, some potentially noisy. When the noise is distributed according to an outlier model, minimizing a nonsmooth $\ell_1$-loss with a simple sub-gradient method can often perfectly recover the ground truth matrix $X^\star$. Given this, a natural question is what optimization property (if any) enables such learning behavior. The most plausible answer is that the ground truth $X^\star$ manifests as a local optimum of the loss function. In this paper, we provide a strong negative answer to this question, showing that, under moderate assumptions, the true solutions corresponding to $X^\star$ do not emerge as local optima, but rather as strict saddle points -- critical points with strictly negative curvature in at least one direction. Our findings challenge the conventional belief that all strict saddle points are undesirable and should be avoided.

Updated: 2025-04-04 15:57:51

标题: 学习能否通过鲁棒低秩矩阵恢复中的局部最优性来解释？

摘要: 我们探索低秩矩阵恢复的局部景观，重点是从$m$个线性测量中重建一个秩为$r$的$d_1\times d_2$矩阵$X^\star$，其中一些可能带有噪声。当噪声按异常值模型分布时，通过使用简单的次梯度方法最小化非光滑的$\ell_1$损失通常可以完美恢复地面真实矩阵$X^\star$。鉴于此，一个自然的问题是什么优化特性（如果有的话）使得这种学习行为成为可能。最有可能的答案是，地面真实矩阵$X^\star$表现为损失函数的局部最优解。在本文中，我们对这个问题提供了一个强烈的否定答案，表明在适度的假设下，对应于$X^\star$的真解并不出现为局部最优解，而是作为严格鞍点 - 在至少一个方向上具有严格负曲率的临界点。我们的发现挑战了传统观念，即所有严格鞍点都是不可取的，应该避免。

更新时间: 2025-04-04 15:57:51

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2302.10963v3

Exploring Various Sequential Learning Methods for Deformation History Modeling

Current neural network (NN) models can learn patterns from data points with historical dependence. Specifically, in natural language processing (NLP), sequential learning has transitioned from recurrence-based architectures to transformer-based architectures. However, it is unknown which NN architectures will perform the best on datasets containing deformation history due to mechanical loading. Thus, this study ascertains the appropriateness of 1D-convolutional, recurrent, and transformer-based architectures for predicting deformation localization based on the earlier states in the form of deformation history. Following this investigation, the crucial incompatibility issues between the mathematical computation of the prediction process in the best-performing NN architectures and the actual values derived from the natural physical properties of the deformation paths are examined in detail.

Updated: 2025-04-04 15:52:24

标题: 探究不同的顺序学习方法用于变形历史建模

摘要: 目前的神经网络（NN）模型可以从具有历史依赖性的数据点中学习模式。具体来说，在自然语言处理（NLP）领域，顺序学习已经从基于循环的架构过渡到基于 transformer 的架构。然而，目前尚不清楚哪种NN架构在包含由机械加载引起的变形历史的数据集上表现最佳。因此，本研究确定了基于变形历史的早期状态来预测变形定位的1D卷积、循环和 transformer 架构的适用性。在进行这项调查后，详细研究了最佳表现的NN架构中预测过程的数学计算与从变形路径的自然物理特性中得出的实际值之间的关键不兼容性问题。

更新时间: 2025-04-04 15:52:24

领域: cs.LG,cs.AI,cs.CE

下载: http://arxiv.org/abs/2504.03818v1

Revisiting MAE pre-training for 3D medical image segmentation

Self-Supervised Learning (SSL) presents an exciting opportunity to unlock the potential of vast, untapped clinical datasets, for various downstream applications that suffer from the scarcity of labeled data. While SSL has revolutionized fields like natural language processing and computer vision, its adoption in 3D medical image computing has been limited by three key pitfalls: Small pre-training dataset sizes, architectures inadequate for 3D medical image analysis, and insufficient evaluation practices. In this paper, we address these issues by i) leveraging a large-scale dataset of 39k 3D brain MRI volumes and ii) using a Residual Encoder U-Net architecture within the state-of-the-art nnU-Net framework. iii) A robust development framework, incorporating 5 development and 8 testing brain MRI segmentation datasets, allowed performance-driven design decisions to optimize the simple concept of Masked Auto Encoders (MAEs) for 3D CNNs. The resulting model not only surpasses previous SSL methods but also outperforms the strong nnU-Net baseline by an average of approximately 3 Dice points setting a new state-of-the-art. Our code and models are made available here.

Updated: 2025-04-04 15:51:37

标题: 重新探讨MAE预训练在3D医学图像分割中的应用

摘要: 自监督学习（SSL）为解锁庞大且未开发的临床数据集提供了令人兴奋的机会，用于各种下游应用，这些应用因标记数据稀缺而受到影响。虽然SSL已经彻底改变了自然语言处理和计算机视觉等领域，但其在3D医学图像计算中的应用受到三个关键缺陷的限制：预训练数据集大小较小，架构不适合3D医学图像分析，以及评估实践不足。在本文中，我们通过i）利用39k个3D脑MRI体积的大规模数据集和ii）在最先进的nnU-Net框架中使用残差编码器U-Net架构来解决这些问题。iii）一个强大的开发框架，结合5个开发和8个测试脑MRI分割数据集，允许基于性能的设计决策来优化3D CNN的Masked Auto Encoders（MAEs）的简单概念。结果模型不仅超越了先前的SSL方法，还比强大的nnU-Net基准模型平均高出约3个Dice点，创造了一个新的最先进水平。我们的代码和模型在此处提供。

更新时间: 2025-04-04 15:51:37

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23132v3

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Multilingual speech translation (ST) in the medical domain enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we present the first systematic study on medical ST, to our best knowledge, by releasing MultiMed-ST, a large-scale ST dataset for the medical domain, spanning all translation directions in five languages: Vietnamese, English, German, French, Traditional Chinese and Simplified Chinese, together with the models. With 290,000 samples, our dataset is the largest medical machine translation (MT) dataset and the largest many-to-many multilingual ST among all domains. Secondly, we present the most extensive analysis study in ST research to date, including: empirical baselines, bilingual-multilingual comparative study, end-to-end vs. cascaded comparative study, task-specific vs. multi-task sequence-to-sequence (seq2seq) comparative study, code-switch analysis, and quantitative-qualitative error analysis. All code, data, and models are available online: https://github.com/leduckhai/MultiMed-ST.

Updated: 2025-04-04 15:49:17

标题: MultiMed-ST: 大规模多对多多语言医学语音翻译

摘要: 在医学领域中，多语言语音翻译（ST）通过跨越语言障碍实现高效沟通，缓解专业人才短缺问题，促进改善诊断和治疗，特别是在流行病期间，有助于增强患者护理。在这项工作中，我们以我们所知的最佳方式，首次系统研究医学ST，通过发布MultiMed-ST，一个涵盖越南语、英语、德语、法语、繁体中文和简体中文五种语言的医学领域大规模ST数据集，涵盖所有翻译方向，并提供了模型。我们的数据集拥有290,000个样本，是最大的医学机器翻译（MT）数据集，也是所有领域中最大的多对多多语言ST数据集。其次，我们进行了到目前为止在ST研究中最广泛的分析研究，包括：实证基线、双语-多语比较研究、端到端与级联比较研究、任务特定与多任务序列到序列（seq2seq）比较研究、代码切换分析和定量-定性错误分析。所有代码、数据和模型都可以在线获得：https://github.com/leduckhai/MultiMed-ST。

更新时间: 2025-04-04 15:49:17

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2504.03546v1

AutoML Benchmark with shorter time constraints and early stopping

Automated Machine Learning (AutoML) automatically builds machine learning (ML) models on data. The de facto standard for evaluating new AutoML frameworks for tabular data is the AutoML Benchmark (AMLB). AMLB proposed to evaluate AutoML frameworks using 1- and 4-hour time budgets across 104 tasks. We argue that shorter time constraints should be considered for the benchmark because of their practical value, such as when models need to be retrained with high frequency, and to make AMLB more accessible. This work considers two ways in which to reduce the overall computation used in the benchmark: smaller time constraints and the use of early stopping. We conduct evaluations of 11 AutoML frameworks on 104 tasks with different time constraints and find the relative ranking of AutoML frameworks is fairly consistent across time constraints, but that using early-stopping leads to a greater variety in model performance.

Updated: 2025-04-04 15:38:37

标题: 使用更短时间限制和提前停止的AutoML基准测试

摘要: 自动化机器学习（AutoML）在数据上自动构建机器学习（ML）模型。评估新的AutoML框架在表格数据上的事实标准是AutoML基准（AMLB）。 AMLB建议使用1小时和4小时的时间预算评估AutoML框架在104个任务上的表现。我们认为应该考虑更短的时间限制作为基准，因为它们在实践中具有实际价值，例如当模型需要高频重新训练时，并且使AMLB更易于使用。这项工作考虑了在基准测试中减少总体计算的两种方式：较小的时间限制和使用提前停止。我们对104项任务中的11个AutoML框架进行了评估，使用不同的时间限制，发现AutoML框架的相对排名在不同时间限制下是相当一致的，但使用提前停止会导致模型性能的差异更大。

更新时间: 2025-04-04 15:38:37

领域: cs.LG

下载: http://arxiv.org/abs/2504.01222v2

A Memristive Based Design of a Core Digital Circuit for Elliptic Curve Cryptography

The new emerging non-volatile memory (NVM) devices known as memristors could be the promising candidate for future digital architecture, owing to their nanoscale size and its ability to integrate with the exciting CMOS technology. In this paper, a combination of memristor devices and CMOS transistors are working together to form a hybrid CMOS-memristor circuit for XAX- Module, a core element for the finite field multiplier. The proposed design was implemented using Pt /TaOx/Ta memristor device and simulated in Cadence Virtuoso. The simulation results demonstrate the design functionality. The proposed module appears to be efficient in terms of layout area, delay and power consumption since the design utilizes the hybrid CMOS/memristor gates.

Updated: 2025-04-04 15:33:24

标题: 基于忆阻器的椭圆曲线密码核心数字电路设计

摘要: 摘要：新兴的非易失性存储器（NVM）设备，即memristor，可能成为未来数字架构的有希望的候选者，因为其纳米尺度大小和与令人兴奋的CMOS技术集成的能力。本文中，memristor设备和CMOS晶体管的组合共同工作，形成了用于有限域乘法器的XAX-模块的混合CMOS-memristor电路。所提出的设计使用Pt/TaOx/Ta memristor设备实现，并在Cadence Virtuoso中进行模拟。模拟结果证明了设计的功能性。由于设计利用混合CMOS/memristor门，所提出的模块在布局面积、延迟和功耗方面似乎非常高效。

更新时间: 2025-04-04 15:33:24

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2203.14358v2

Performance Modeling of Data Storage Systems using Generative Models

High-precision modeling of systems is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. We have developed several models of a storage system using machine learning-based generative models. The system consists of several components: hard disk drive (HDD) and solid-state drive (SSD) storage pools with different RAID schemes and cache. Each storage component is represented by a probabilistic model that describes the probability distribution of the component performance in terms of IOPS and latency, depending on their configuration and external data load parameters. The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components and models of the system. The predictions show up to 0.99 Pearson correlation with Little's law, which can be used for unsupervised reliability checks of the models. In addition, we present novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.

Updated: 2025-04-04 15:30:20

标题: 使用生成模型对数据存储系统进行性能建模

摘要: 系统的高精度建模是工业数据分析的主要领域之一。系统模型，即数字孪生，用于预测系统在不同条件下的行为。我们使用基于机器学习的生成模型开发了几个存储系统模型。该系统由多个组件组成：具有不同RAID方案和缓存的硬盘驱动器（HDD）和固态驱动器（SSD）存储池。每个存储组件由一个概率模型表示，描述了组件性能的概率分布，即IOPS和延迟，取决于它们的配置和外部数据负载参数。实验结果表明，根据系统的组件和模型，IOPS的预测误差为4-10％，延迟的预测误差为3-16％。预测显示与Little定律高达0.99的Pearson相关性，可用于对模型进行无监督的可靠性检查。此外，我们提供了可用于基准回归算法、条件生成模型和不确定性估计方法的新数据集。

更新时间: 2025-04-04 15:30:20

领域: cs.LG,cs.AI,cs.PF

下载: http://arxiv.org/abs/2307.02073v2

Dense Neural Network Based Arrhythmia Classification on Low-cost and Low-compute Micro-controller

The electrocardiogram (ECG) monitoring device is an expensive albeit essential device for the treatment and diagnosis of cardiovascular diseases (CVD). The cost of this device typically ranges from $2000 to $10000. Several studies have implemented ECG monitoring systems in micro-controller units (MCU) to reduce industrial development costs by up to 20 times. However, to match industry-grade systems and display heartbeats effectively, it is essential to develop an efficient algorithm for detecting arrhythmia (irregular heartbeat). Hence in this study, a dense neural network is developed to detect arrhythmia on the Arduino Nano. The Nano consists of the ATMega328 microcontroller with a 16MHz clock, 2KB of SRAM, and 32KB of program memory. Additionally, the AD8232 SparkFun Single-Lead Heart Rate Monitor is used as the ECG sensor. The implemented neural network model consists of two layers (excluding the input) with 10 and four neurons respectively with sigmoid activation function. However, four approaches are explored to choose the appropriate activation functions. The model has a size of 1.267 KB, achieves an F1 score (macro-average) of 78.3\% for classifying four types of arrhythmia, an accuracy rate of 96.38%, and requires 0.001314 MOps of floating-point operations (FLOPs).

Updated: 2025-04-04 15:30:02

标题: 密集神经网络基于低成本和低计算微控制器的心律失常分类

摘要: 心电图（ECG）监测设备是一种昂贵但必不可少的设备，用于治疗和诊断心血管疾病（CVD）。这种设备的成本通常在2000美元至10000美元之间。一些研究在微控制器单元（MCU）中实现了ECG监测系统，可以将工业开发成本降低多达20倍。然而，为了匹配工业级系统并有效显示心跳，必须开发一种有效的算法来检测心律失常（心跳不规则）。因此，在本研究中，开发了一个密集神经网络，用于在Arduino Nano上检测心律失常。Nano包含具有16MHz时钟、2KB的SRAM和32KB的程序内存的ATMega328微控制器。此外，使用AD8232 SparkFun单导心率监测器作为ECG传感器。实施的神经网络模型由两层（不包括输入层）组成，分别具有10个和四个神经元，并采用S形激活函数。然而，探索了四种方法来选择适当的激活函数。该模型大小为1.267 KB，对于分类四种心律失常的F1得分（宏平均）为78.3％，准确率为96.38％，需要0.001314 MOps的浮点运算（FLOPs）。

更新时间: 2025-04-04 15:30:02

领域: cs.LG,cs.AI,I.2.1; I.2.6; C.3

下载: http://arxiv.org/abs/2504.03531v1

Lightweight Learning for Grant-Free Activity Detection in Cell-Free Massive MIMO Networks

Grant-free random access (GF-RA) is a promising access technique for massive machine-type communications (mMTC) in future wireless networks, particularly in the context of 5G and beyond (6G) systems. Within the context of GF-RA, this study investigates the efficiency of employing supervised machine learning techniques to tackle the challenges on the device activity detection (AD). GF-RA addresses scalability by employing non-orthogonal pilot sequences, which provides an efficient alternative comparing to conventional grant-based random access (GB-RA) technique that are constrained by the scarcity of orthogonal preamble resources. In this paper, we propose a novel lightweight data-driven algorithmic framework specifically designed for activity detection in GF-RA for mMTC in cell-free massive multiple-input multiple-output (CF-mMIMO) networks. We propose two distinct framework deployment strategies, centralized and decentralized, both tailored to streamline the proposed approach implementation across network infrastructures. Moreover, we introduce optimized post-detection methodologies complemented by a clustering stage to enhance overall detection performances. Our 3GPP-compliant simulations have validated that the proposed algorithm achieves state-of-the-art model-based activity detection accuracy while significantly reducing complexity. Achieving 99% accuracy, it demonstrates real-world viability and effectiveness.

Updated: 2025-04-04 15:24:36

标题: 无需授权的轻量级学习用于无蜂窝大规模MIMO网络中的活动检测

摘要: 免授权随机接入（GF-RA）是未来无线网络中大规模机器通信（mMTC）的一种有前景的接入技术，特别是在5G及其之后（6G）系统的背景下。在GF-RA的背景下，本研究调查了采用监督机器学习技术来解决设备活动检测（AD）挑战的效率。GF-RA通过使用非正交导频序列来解决可扩展性问题，这提供了一种有效的替代方案，与传统的基于授权的随机接入（GB-RA）技术相比，后者受到正交前导资源稀缺性的限制。本文提出了一个新颖的轻量级数据驱动算法框架，专门设计用于CF-mMIMO网络中的mMTC中的GF-RA中的活动检测。我们提出了两种不同的框架部署策略，集中式和分散式，都旨在简化网络基础设施上所提出的方法的实施。此外，我们引入了经过优化的检测后方法学，辅以聚类阶段以增强整体检测性能。我们的3GPP兼容性模拟验证了所提出的算法实现了基于模型的活动检测准确性的最新水平，同时显著降低了复杂性。实现了99%的准确性，证明了其在现实世界中的可行性和有效性。

更新时间: 2025-04-04 15:24:36

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2503.11305v2

Programming Distributed Collective Processes in the eXchange Calculus

Recent trends like the Internet of Things (IoT) suggest a vision of dense and multi-scale deployments of computing devices in nearly all kinds of environments. A prominent engineering challenge revolves around programming the collective adaptive behaviour of such computational ecosystems. This requires abstractions able to capture concepts like ensembles (dynamic groups of cooperating devices) and collective tasks (joint activities carried out by ensembles). In this work, we consider collections of devices interacting with neighbours and that execute in nearly-synchronised sense-compute-interact rounds, where the computation is given by a single program mapping sensing values and incoming messages to output and outcoming messages. To support programming whole computational collectives, we propose the abstraction of a distributed collective process, which can be used to define at once the ensemble formation logic and its collective task. We formalise the abstraction in the eXchange Calculus (XC), a core functional language based on neighbouring values (maps from neighbours to values) where state and interaction is handled through a single primitive, exchange, and provide a corresponding implementation in the FCPP language. Then, we exercise distributed collective processes using two case studies: multi-hop message propagation and distributed monitoring of spatial properties. Finally, we discuss the features of the abstraction and its suitability for different kinds of distributed computing applications.

Updated: 2025-04-04 15:23:08

标题: 在交换演算中编程分布式集体过程

摘要: 最近的趋势，如物联网（IoT），表明了在几乎所有类型的环境中部署计算设备的密集和多尺度的愿景。一个突出的工程挑战围绕着如何编程这些计算生态系统的集体适应行为。这需要能够捕捉概念如合奏（动态的协作设备组）和集体任务（由合奏执行的联合活动）的抽象。在这项工作中，我们考虑与邻居交互并在几乎同步的感知-计算-交互轮次中执行的设备集合，其中计算由单个程序给出，将感知值和传入消息映射为输出和传出消息。为了支持整个计算集体的编程，我们提出了分布式集体过程的抽象，该抽象可用于一次性定义合奏形成逻辑及其集体任务。我们在eXchange Calculus（XC）中形式化这种抽象，这是一种基于邻居值（从邻居到值的映射）的核心函数语言，其中状态和交互通过单个原语exchange处理，并在FCPP语言中提供相应的实现。然后，我们使用两个案例研究来练习分布式集体过程：多跳消息传播和空间属性的分布式监控。最后，我们讨论了这种抽象的特点及其适用于不同类型的分布式计算应用的特性。

更新时间: 2025-04-04 15:23:08

领域: cs.DC,cs.AI,cs.MA,cs.PL,D.1.3; F.1.1; F.4.3; I.2.11; J.7

下载: http://arxiv.org/abs/2401.11212v3

Dexterous Manipulation through Imitation Learning: A Survey

Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated finger movements and adaptive force modulation, enables complex interactions similar to human hand dexterity. With recent advances in robotics and machine learning, there is a growing demand for these systems to operate in complex and unstructured environments. Traditional model-based approaches struggle to generalize across tasks and object variations due to the high-dimensionality and complex contact dynamics of dexterous manipulation. Although model-free methods such as reinforcement learning (RL) show promise, they require extensive training, large-scale interaction data, and carefully designed rewards for stability and effectiveness. Imitation learning (IL) offers an alternative by allowing robots to acquire dexterous manipulation skills directly from expert demonstrations, capturing fine-grained coordination and contact dynamics while bypassing the need for explicit modeling and large-scale trial-and-error. This survey provides an overview of dexterous manipulation methods based on imitation learning (IL), details recent advances, and addresses key challenges in the field. Additionally, it explores potential research directions to enhance IL-driven dexterous manipulation. Our goal is to offer researchers and practitioners a comprehensive introduction to this rapidly evolving domain.

Updated: 2025-04-04 15:14:38

标题: 翻译：通过模仿学习实现灵巧操作：一项调查

摘要: 灵巧操控，指的是机器手或多指末端执行器通过精确、协调的手指运动和自适应力量调节，巧妙地控制、重新定位和操作物体的能力，使得机器能够进行类似于人类手部灵巧的复杂互动。随着机器人技术和机器学习的最新进展，对这些系统在复杂和无结构环境中运行的需求日益增长。传统的基于模型的方法在灵巧操控中遇到困难，由于高维度和复杂的接触动力学，难以泛化到不同任务和物体变化之间。虽然无模型方法如强化学习（RL）表现出潜力，但它们需要大量的训练、大规模的互动数据以及经过精心设计的奖励来确保稳定性和有效性。模仿学习（IL）则提供了一种替代方案，允许机器人直接从专家演示中获取灵巧操控技能，捕捉细粒度的协调和接触动力学，同时避免了对显式建模和大规模试错的需求。本调查概述了基于模仿学习（IL）的灵巧操控方法，详细介绍了最新进展，并解决了该领域的关键挑战。此外，它探讨了增强IL驱动的灵巧操控的潜在研究方向。我们的目标是为研究人员和实践者提供对这个快速发展领域的全面介绍。

更新时间: 2025-04-04 15:14:38

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2504.03515v1

Data-Efficient Kernel Methods for Learning Differential Equations and Their Solution Operators: Algorithms and Error Analysis

We introduce a novel kernel-based framework for learning differential equations and their solution maps that is efficient in data requirements, in terms of solution examples and amount of measurements from each example, and computational cost, in terms of training procedures. Our approach is mathematically interpretable and backed by rigorous theoretical guarantees in the form of quantitative worst-case error bounds for the learned equation. Numerical benchmarks demonstrate significant improvements in computational complexity and robustness while achieving one to two orders of magnitude improvements in terms of accuracy compared to state-of-the-art algorithms.

Updated: 2025-04-04 15:13:38

标题: 数据高效的核方法用于学习微分方程及其解算子：算法和误差分析

摘要: 我们引入了一种新颖的基于核的框架，用于学习微分方程及其解映射，该框架在数据需求方面高效，即解示例和每个示例的测量数量，以及计算成本方面，即训练过程。我们的方法在数学上具有可解释性，并通过严格的理论保证支持，以学习方程的量化最坏情况误差界的形式。数值基准测试表明，在与最先进算法相比，我们的方法在计算复杂度和稳健性方面取得了显着的改进，同时在准确性方面实现了一到两个数量级的改进。

更新时间: 2025-04-04 15:13:38

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2503.01036v2

The Central Role of the Loss Function in Reinforcement Learning

This paper illustrates the central role of loss functions in data-driven decision making, providing a comprehensive survey on their influence in cost-sensitive classification (CSC) and reinforcement learning (RL). We demonstrate how different regression loss functions affect the sample efficiency and adaptivity of value-based decision making algorithms. Across multiple settings, we prove that algorithms using the binary cross-entropy loss achieve first-order bounds scaling with the optimal policy's cost and are much more efficient than the commonly used squared loss. Moreover, we prove that distributional algorithms using the maximum likelihood loss achieve second-order bounds scaling with the policy variance and are even sharper than first-order bounds. This in particular proves the benefits of distributional RL. We hope that this paper serves as a guide analyzing decision making algorithms with varying loss functions, and can inspire the reader to seek out better loss functions to improve any decision making algorithm.

Updated: 2025-04-04 15:09:19

标题: 强化学习中损失函数的核心作用

摘要: 本文阐述了损失函数在数据驱动决策中的核心作用，全面调查了它们在成本敏感分类(CSC)和强化学习(RL)中的影响。我们展示了不同的回归损失函数如何影响基于值的决策算法的样本效率和适应性。在多种设置下，我们证明使用二元交叉熵损失的算法实现了与最优策略成本成比例的一阶边界，并且比常用的平方损失要高效得多。此外，我们证明使用最大似然损失的分布算法实现了与策略方差成比例的二阶边界，甚至比一阶边界更尖锐。这特别证明了分布式RL的好处。我们希望本文可以作为一个指南，分析具有不同损失函数的决策算法，并激励读者寻找更好的损失函数来改进任何决策算法。

更新时间: 2025-04-04 15:09:19

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2409.12799v3

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Due to privacy restrictions, there's a shortage of publicly available speech recognition datasets in the medical domain. In this work, we present VietMed - a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech. To our best knowledge, VietMed is by far the world's largest public medical speech recognition dataset in 7 aspects: total duration, number of speakers, diseases, recording conditions, speaker roles, unique medical terms and accents. VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration. Additionally, we are the first to present a medical ASR dataset covering all ICD-10 disease groups and all accents within a country. Moreover, we release the first public large-scale pre-trained models for Vietnamese ASR, w2v2-Viet and XLSR-53-Viet, along with the first public large-scale fine-tuned models for medical ASR. Even without any medical data in unsupervised pre-training, our best pre-trained model XLSR-53-Viet generalizes very well to the medical domain by outperforming state-of-the-art XLSR-53, from 51.8% to 29.6% WER on test set (a relative reduction of more than 40%). All code, data and models are made publicly available: https://github.com/leduckhai/MultiMed/tree/master/VietMed.

Updated: 2025-04-04 15:06:21

标题: VietMed：越南医学领域自动语音识别的数据集和基准

摘要: 由于隐私限制，医学领域公开可用的语音识别数据集短缺。在这项工作中，我们介绍了VietMed - 一个越南医学领域的语音识别数据集，包括16小时的标记医学语音，1000小时的未标记医学语音和1200小时的未标记通用领域语音。据我们所知，VietMed在7个方面是迄今为止世界上最大的公开医学语音识别数据集：总时长、说话者数量、疾病、录音条件、说话者角色、独特医学术语和口音。在总时长方面，VietMed也是迄今为止最大的公开越南语语音数据集。此外，我们是首个提出涵盖全国ICD-10疾病组和所有口音的医学语音识别数据集。此外，我们发布了越南语ASR的首个公开大规模预训练模型w2v2-Viet和XLSR-53-Viet，以及医学ASR的首个公开大规模微调模型。即使在无监督预训练中没有任何医学数据，我们最佳的预训练模型XLSR-53-Viet在医学领域表现非常好，将测试集上的WER从51.8%降低到29.6%（相对降低超过40%）。所有代码、数据和模型都已公开：https://github.com/leduckhai/MultiMed/tree/master/VietMed。

更新时间: 2025-04-04 15:06:21

领域: cs.CL,cs.AI,eess.AS

下载: http://arxiv.org/abs/2404.05659v3

Hierarchical Knowledge Structuring for Effective Federated Learning in Heterogeneous Environments

Federated learning enables collaborative model training across distributed entities while maintaining individual data privacy. A key challenge in federated learning is balancing the personalization of models for local clients with generalization for the global model. Recent efforts leverage logit-based knowledge aggregation and distillation to overcome these issues. However, due to the non-IID nature of data across diverse clients and the imbalance in the client's data distribution, directly aggregating the logits often produces biased knowledge that fails to apply to individual clients and obstructs the convergence of local training. To solve this issue, we propose a Hierarchical Knowledge Structuring (HKS) framework that formulates sample logits into a multi-granularity codebook to represent logits from personalized per-sample insights to globalized per-class knowledge. The unsupervised bottom-up clustering method is leveraged to enable the global server to provide multi-granularity responses to local clients. These responses allow local training to integrate supervised learning objectives with global generalization constraints, which results in more robust representations and improved knowledge sharing in subsequent training rounds. The proposed framework's effectiveness is validated across various benchmarks and model architectures.

Updated: 2025-04-04 15:06:02

标题: 在异构环境中进行有效联邦学习的分层知识结构化

摘要: 联邦学习使得分布式实体之间能够进行协作模型训练，同时保持个体数据隐私。联邦学习中的一个关键挑战是在为本地客户个性化模型与为全局模型进行泛化之间取得平衡。最近的研究利用logit-based知识聚合和提炼来克服这些问题。然而，由于不同客户间数据的非独立同分布性质和客户数据分布的不平衡，直接聚合logits往往会产生偏见知识，无法应用于个体客户并阻碍本地训练的收敛。为了解决这个问题，我们提出了一个层次化知识结构（HKS）框架，将样本logits形成一个多粒度编码表，以表示从个性化每个样本见解到全球化每个类别知识的logits。利用无监督的自下而上聚类方法，全局服务器能够为本地客户提供多粒度响应。这些响应允许本地训练将监督学习目标与全局泛化约束相结合，从而产生更稳健的表示和在后续训练轮中改善知识共享。所提出的框架的有效性在各种基准测试和模型架构中得到验证。

更新时间: 2025-04-04 15:06:02

领域: cs.LG

下载: http://arxiv.org/abs/2504.03505v1

Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel

The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any joint distribution with arbitrary dependence across entries. We propose an incoherent-constrained least-square estimator and prove its tightness both in the sense of deterministic lower bound and matching minimax risks under various noise distributions. To attain this, we establish a novel result asserting that the difference between two arbitrary low-rank incoherent matrices must spread energy out across its entries; in other words, it cannot be too sparse, which sheds light on the structure of incoherent low-rank matrices and may be of independent interest. We then showcase the applications of our framework to several important statistical machine learning problems. In the problem of estimating a structured Markov transition kernel, the proposed method achieves the minimax optimality and the result can be extended to estimating the conditional mean operator, a crucial component in reinforcement learning. The applications to multitask regression and structured covariance estimation are also presented. We propose an alternating minimization algorithm to approximately solve the potentially hard optimization problem. Numerical results corroborate the effectiveness of our method which typically converges in a few steps.

Updated: 2025-04-04 15:02:24

标题: 结构化矩阵学习在任意元素依赖性下的估计和马尔可夫转移核

摘要: 结构化矩阵估计问题主要在强噪声依赖假设下进行研究。本文考虑了一个嘈杂的低秩加稀疏矩阵恢复的一般框架，其中噪声矩阵可能来自任意联合分布，并在条目之间具有任意依赖关系。我们提出了一个不连贯约束最小二乘估计器，并证明了其在确定性下界意义上的紧致性，以及在各种噪声分布下匹配极小风险。为了实现这一点，我们建立了一个新的结果，断言两个任意低秩不连贯矩阵之间的差异必须在其条目之间传播能量；换句话说，它不能太稀疏，这揭示了不连贯低秩矩阵的结构，并可能具有独立的兴趣。然后，我们展示了我们框架在几个重要的统计机器学习问题中的应用。在估计结构化马尔可夫转移核的问题上，所提出的方法达到了极小化最优性，并且结果可以扩展到估计条件均值算子，这是强化学习中的一个关键组件。还介绍了多任务回归和结构化协方差估计的应用。我们提出了一种交替最小化算法来近似解决潜在困难的优化问题。数值结果证实了我们的方法的有效性，通常在几个步骤内收敛。

更新时间: 2025-04-04 15:02:24

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2401.02520v2

FEASE: Shallow AutoEncoding Recommender with Cold Start Handling via Side Features

User and item cold starts present significant challenges in industrial applications of recommendation systems. Supplementing user-item interaction data with metadata is a common solution-but often at the cost of introducing additional biases. In this work, we introduce an augmented EASE model, i.e. FEASE, that seamlessly integrates both user and item side information to address these cold start issues. Our straightforward, autoencoder-based method produces a closed-form solution that leverages rich content signals for cold items while refining user representations in data-sparse environments. Importantly, our method strikes a balance by effectively recommending cold start items and handling cold start users without incurring extra bias, and it maintains strong performance in warm settings. Experimental results demonstrate improved recommendation accuracy and robustness compared to previous collaborative filtering approaches. Moreover, our model serves as a strong baseline for future comparative studies.

Updated: 2025-04-04 15:01:46

标题: FEASE：利用辅助特征处理冷启动的浅层自动编码推荐系统

摘要: 用户和物品冷启动在推荐系统的工业应用中面临着重大挑战。将用户-物品交互数据与元数据结合是一种常见的解决方案，但往往会引入额外的偏见。在这项工作中，我们引入了一个增强型EASE模型，即FEASE，它无缝地整合了用户和物品的侧面信息，以解决这些冷启动问题。我们的简单、基于自动编码器的方法产生了一个闭合形式的解决方案，利用丰富的内容信号来处理冷启动物品，同时在数据稀疏环境中改进用户表示。重要的是，我们的方法通过有效推荐冷启动物品和处理冷启动用户来取得平衡，而不会带来额外的偏见，并且在热启动环境中保持强大的性能。实验结果表明，与先前的协同过滤方法相比，我们的推荐准确性和稳健性得到了改善。此外，我们的模型为未来的比较研究提供了一个强大的基准。

更新时间: 2025-04-04 15:01:46

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2504.02288v2

When is using AI the rational choice? The importance of counterfactuals in AI deployment decisions

Decisions to deploy AI capabilities are often driven by counterfactuals - a comparison of decisions made using AI to decisions that would have been made if the AI were not used. Counterfactual misses, which are poor decisions that are attributable to using AI, may have disproportionate disutility to AI deployment decision makers. Counterfactual hits, which are good decisions attributable to AI usage, may provide little benefit beyond the benefit of better decisions. This paper explores how to include counterfactual outcomes into usage decision expected utility assessments. Several properties emerge when counterfactuals are explicitly included. First, there are many contexts where the expected utility of AI usage is positive for intended beneficiaries and strongly negative for stakeholders and deployment decision makers. Second, high levels of complementarity, where differing AI and user assessments are merged beneficially, often leads to substantial disutility for stakeholders. Third, apparently small changes in how users interact with an AI capability can substantially impact stakeholder utility. Fourth, cognitive biases such as expert overconfidence and hindsight bias exacerbate the perceived frequency of costly counterfactual misses. The expected utility assessment approach presented here is intended to help AI developers and deployment decision makers to navigate the subtle but substantial impact of counterfactuals so as to better ensure that beneficial AI capabilities are used.

Updated: 2025-04-04 14:59:29

标题: 什么时候使用人工智能是理性选择？在人工智能部署决策中反事实的重要性

摘要: AI部署决策通常受到反事实的驱动 - 比较使用AI做出的决策与如果不使用AI将会做出的决策。反事实漏判是指因使用AI而导致的糟糕决策，可能对AI部署决策制定者产生不成比例的不便。反事实命中是指因使用AI而做出的好决策，可能除了更好的决策的好处外，提供的利益很少。本文探讨了如何将反事实结果纳入使用决策的期望效用评估中。当明确包含反事实时，会出现几个特性。首先，在许多情境下，AI使用的预期效用对于预期受益者是正的，但对于利益相关者和部署决策制定者来说是极为负面的。其次，高度互补性，即不同的AI和用户评估被有益地合并，通常会导致利益相关者的实质性不便。第三，用户与AI能力互动方式的表面上微小变化可能会极大影响利益相关者的效用。第四，认知偏见，如专家过度自信和事后识别偏见加剧了昂贵反事实漏判的感知频率。本文介绍的期望效用评估方法旨在帮助AI开发者和部署决策制定者应对反事实的微妙但实质性影响，以更好地确保有益的AI能力得到应用。

更新时间: 2025-04-04 14:59:29

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2504.05333v1

Operator Learning: A Statistical Perspective

Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experimental data, even without a known mathematical model. In this article, we begin by formalizing operator learning as a function-to-function regression problem and review some recent developments in the field. We also discuss PDE-specific operator learning, outlining strategies for incorporating physical and mathematical constraints into architecture design and training processes. Finally, we end by highlighting key future directions such as active data collection and the development of rigorous uncertainty quantification frameworks.

Updated: 2025-04-04 14:58:45

标题: 操作员学习：统计学视角

摘要: 运算学习已经成为科学计算中一种强大的工具，用于逼近无限维函数空间之间的映射关系。运算学习的一个主要应用是为偏微分方程（PDEs）的解算子开发代理模型。这些方法还可以用来开发黑盒模拟器，以从实验数据中模拟系统行为，即使没有已知的数学模型。在本文中，我们首先将运算学习形式化为一个函数到函数回归问题，并回顾了该领域一些最近的发展。我们还讨论了特定于PDE的运算学习，概述了将物理和数学约束条件纳入架构设计和训练过程的策略。最后，我们通过强调关键的未来方向，如主动数据收集和严谨的不确定性量化框架来结束。

更新时间: 2025-04-04 14:58:45

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2504.03503v1

Learning Dual-Arm Coordination for Grasping Large Flat Objects

Grasping large flat objects, such as books or keyboards lying horizontally, presents significant challenges for single-arm robotic systems, often requiring extra actions like pushing objects against walls or moving them to the edge of a surface to facilitate grasping. In contrast, dual-arm manipulation, inspired by human dexterity, offers a more refined solution by directly coordinating both arms to lift and grasp the object without the need for complex repositioning. In this paper, we propose a model-free deep reinforcement learning (DRL) framework to enable dual-arm coordination for grasping large flat objects. We utilize a large-scale grasp pose detection model as a backbone to extract high-dimensional features from input images, which are then used as the state representation in a reinforcement learning (RL) model. A CNN-based Proximal Policy Optimization (PPO) algorithm with shared Actor-Critic layers is employed to learn coordinated dual-arm grasp actions. The system is trained and tested in Isaac Gym and deployed to real robots. Experimental results demonstrate that our policy can effectively grasp large flat objects without requiring additional maneuvers. Furthermore, the policy exhibits strong generalization capabilities, successfully handling unseen objects. Importantly, it can be directly transferred to real robots without fine-tuning, consistently outperforming baseline methods.

Updated: 2025-04-04 14:55:46

标题: 学习双臂协调抓握大平面物体

摘要: 抓取大型平面物体，如水平放置的书籍或键盘，对单臂机器人系统来说是一个重大挑战，通常需要额外的动作，如将物体推向墙壁或将其移动到表面边缘以便抓取。相比之下，受人类灵巧性启发的双臂操作提供了一个更精细的解决方案，通过直接协调两只手臂来提起和抓取物体，而无需复杂的重新定位。在本文中，我们提出了一个无模型深度强化学习（DRL）框架，以实现双臂协调抓取大型平面物体。我们利用一个大规模抓取姿态检测模型作为主干，从输入图像中提取高维特征，然后将其用作强化学习（RL）模型中的状态表示。采用基于CNN的共享Actor-Critic层的Proximal Policy Optimization（PPO）算法来学习协调的双臂抓取动作。该系统在Isaac Gym中进行训练和测试，并部署到真实机器人中。实验结果表明，我们的策略可以有效地抓取大型平面物体，而无需额外的机动。此外，该策略表现出强大的泛化能力，成功处理未见过的物体。重要的是，它可以直接转移到真实机器人中而无需微调，并始终优于基准方法。

更新时间: 2025-04-04 14:55:46

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2504.03500v1

Optimistic Learning for Communication Networks

AI/ML-based tools are at the forefront of resource management solutions for communication networks. Deep learning, in particular, is highly effective in facilitating fast and high-performing decision-making whenever representative training data is available to build offline accurate models. Conversely, online learning solutions do not require training and enable adaptive decisions based on runtime observations, alas are often overly conservative. This extensive tutorial proposes the use of optimistic learning (OpL) as a decision engine for resource management frameworks in modern communication systems. When properly designed, such solutions can achieve fast and high-performing decisions -- comparable to offline-trained models -- while preserving the robustness and performance guarantees of the respective online learning approaches. We introduce the fundamental concepts, algorithms and results of OpL, discuss the roots of this theory and present different approaches to defining and achieving optimism. We proceed to showcase how OpL can enhance resource management in communication networks for several key problems such as caching, edge computing, network slicing, and workload assignment in decentralized O-RAN platforms. Finally, we discuss the open challenges that must be addressed to unlock the full potential of this new resource management approach.

Updated: 2025-04-04 14:55:27

标题: 通信网络的乐观学习

摘要: 基于人工智能/机器学习的工具处于通信网络资源管理解决方案的前沿。深度学习特别有效，特别是在有代表性的训练数据可用于构建离线准确模型时，有助于促进快速和高性能的决策。相反，在线学习解决方案不需要训练，并且基于运行时观察使得决策能够自适应，但往往过于保守。这篇广泛的教程提议将乐观学习（OpL）作为现代通信系统中资源管理框架的决策引擎。当正确设计时，这样的解决方案可以实现快速和高性能的决策 -- 可与离线训练模型相媲美 -- 同时保持各自在线学习方法的鲁棒性和性能保证。我们介绍OpL的基本概念、算法和结果，讨论这一理论的根源，并提出定义和实现乐观性的不同方法。我们继续展示OpL如何提升通信网络中资源管理的能力，解决一些关键问题，如缓存、边缘计算、网络切片和分配工作负载在分散式O-RAN平台上。最后，我们讨论必须解决的开放挑战，以释放这种新资源管理方法的全部潜力。

更新时间: 2025-04-04 14:55:27

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2504.03499v1

Hybrid Real- and Complex-valued Neural Network Architecture

We propose a \emph{hybrid} real- and complex-valued \emph{neural network} (HNN) architecture, designed to combine the computational efficiency of real-valued processing with the ability to effectively handle complex-valued data. We illustrate the limitations of using real-valued neural networks (RVNNs) for inherently complex-valued problems by showing how it learnt to perform complex-valued convolution, but with notable inefficiencies stemming from its real-valued constraints. To create the HNN, we propose to use building blocks containing both real- and complex-valued paths, where information between domains is exchanged through domain conversion functions. We also introduce novel complex-valued activation functions, with higher generalisation and parameterisation efficiency. HNN-specific architecture search techniques are described to navigate the larger solution space. Experiments with the AudioMNIST dataset demonstrate that the HNN reduces cross-entropy loss and consumes less parameters compared to an RVNN for all considered cases. Such results highlight the potential for the use of partially complex-valued processing in neural networks and applications for HNNs in many signal processing domains.

Updated: 2025-04-04 14:52:44

标题: 混合实部和复部神经网络架构

摘要: 我们提出了一种混合的实数和复数值神经网络（HNN）架构，旨在结合实数处理的计算效率与有效处理复数值数据的能力。我们通过展示实数值神经网络（RVNNs）在本质上复数值问题上的局限性，说明了它学习执行复数值卷积的能力，但由于其实数值约束而导致了显著的低效性。为了创建HNN，我们建议使用同时包含实数和复数值路径的构建块，通过领域转换函数在领域之间交换信息。我们还引入了新颖的复数值激活函数，具有更高的泛化和参数化效率。描述了HNN特定的架构搜索技术，以导航更大的解决方案空间。使用AudioMNIST数据集进行的实验表明，与RVNN相比，HNN在所有考虑的情况下减少了交叉熵损失并消耗更少的参数。这些结果突显出在神经网络中部分复数值处理的潜力，以及HNN在许多信号处理领域中的应用。

更新时间: 2025-04-04 14:52:44

领域: cs.LG

下载: http://arxiv.org/abs/2504.03497v1

Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems

Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility.

Updated: 2025-04-04 14:50:48

标题: 量化鲁棒性：一个用于深度学习预测在网络物理系统中的基准框架

摘要: 在制造业和能源分配等领域，基于网络的物理系统（CPS）生成复杂的时间序列数据，对预测和健康管理（PHM）至关重要。虽然深度学习（DL）方法展现出强大的预测能力，但它们在工业CPS中的应用仍然受到限制，因为缺乏足够的鲁棒性。现有的鲁棒性评估主要关注形式验证或对抗性扰动，未能充分代表在现实世界CPS场景中遇到的复杂性。为了解决这个问题，我们引入了一个基于分布鲁棒性的实用鲁棒性定义，专门针对工业CPS，并提出了一个系统化的鲁棒性评估框架。我们的框架模拟了现实世界中的干扰，如传感器漂移、噪声和不规则采样，从而使得可以对预测模型在实际CPS数据集上进行彻底的鲁棒性分析。这个鲁棒性定义提供了一个标准化的分数，用于量化和比较不同数据集上的模型性能，帮助选择和设计模型架构。通过广泛的实证研究评估主流的DL架构（包括循环、卷积、基于注意力、模块化和结构状态空间模型），我们展示了我们方法的适用性和有效性。我们公开发布我们的鲁棒性基准，以鼓励进一步的研究和可再现性。

更新时间: 2025-04-04 14:50:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03494v1

Diffusion Active Learning: Towards Data-Driven Experimental Design in Computed Tomography

We introduce Diffusion Active Learning, a novel approach that combines generative diffusion modeling with data-driven sequential experimental design to adaptively acquire data for inverse problems. Although broadly applicable, we focus on scientific computed tomography (CT) for experimental validation, where structured prior datasets are available, and reducing data requirements directly translates to shorter measurement times and lower X-ray doses. We first pre-train an unconditional diffusion model on domain-specific CT reconstructions. The diffusion model acts as a learned prior that is data-dependent and captures the structure of the underlying data distribution, which is then used in two ways: It drives the active learning process and also improves the quality of the reconstructions. During the active learning loop, we employ a variant of diffusion posterior sampling to generate conditional data samples from the posterior distribution, ensuring consistency with the current measurements. Using these samples, we quantify the uncertainty in the current estimate to select the most informative next measurement. Our results show substantial reductions in data acquisition requirements, corresponding to lower X-ray doses, while simultaneously improving image reconstruction quality across multiple real-world tomography datasets.

Updated: 2025-04-04 14:46:48

标题: 扩散主动学习：走向计算机断层扫描中基于数据驱动的实验设计

摘要: 我们引入了扩散主动学习，这是一种结合生成性扩散建模和数据驱动的顺序实验设计的新方法，用于自适应地获取逆问题的数据。虽然广泛适用，我们重点关注科学计算机断层摄影（CT）用于实验验证，在那里结构化的先验数据集可用，并且减少数据要求直接转化为更短的测量时间和更低的X射线剂量。我们首先在领域特定的CT重建上预训练一个无条件的扩散模型。扩散模型充当一个学习的先验，是数据相关的，并捕获基础数据分布的结构，然后以两种方式使用：它驱动主动学习过程，同时也提高了重建的质量。在主动学习循环期间，我们采用一种扩散后验抽样的变体，从后验分布中生成条件数据样本，确保与当前测量一致。利用这些样本，我们量化当前估计的不确定性，以选择最具信息量的下一个测量。我们的结果显示，数据采集要求显著减少，对应于更低的X射线剂量，同时在多个真实世界的断层摄影数据集中改善图像重建质量。

更新时间: 2025-04-04 14:46:48

领域: cs.LG

下载: http://arxiv.org/abs/2504.03491v1

BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution

Super-resolution (SR) techniques are critical for enhancing image quality, particularly in scenarios where high-resolution imagery is essential yet limited by hardware constraints. Existing diffusion models for SR have relied predominantly on Gaussian models for noise generation, which often fall short when dealing with the complex and variable texture inherent in natural scenes. To address these deficiencies, we introduce the Bayesian Uncertainty Guided Diffusion Probabilistic Model (BUFF). BUFF distinguishes itself by incorporating a Bayesian network to generate high-resolution uncertainty masks. These masks guide the diffusion process, allowing for the adjustment of noise intensity in a manner that is both context-aware and adaptive. This novel approach not only enhances the fidelity of super-resolved images to their original high-resolution counterparts but also significantly mitigates artifacts and blurring in areas characterized by complex textures and fine details. The model demonstrates exceptional robustness against complex noise patterns and showcases superior adaptability in handling textures and edges within images. Empirical evidence, supported by visual results, illustrates the model's robustness, especially in challenging scenarios, and its effectiveness in addressing common SR issues such as blurring. Experimental evaluations conducted on the DIV2K dataset reveal that BUFF achieves a notable improvement, with a +0.61 increase compared to baseline in SSIM on BSD100, surpassing traditional diffusion approaches by an average additional +0.20dB PSNR gain. These findings underscore the potential of Bayesian methods in enhancing diffusion processes for SR, paving the way for future advancements in the field.

Updated: 2025-04-04 14:43:45

标题: BUFF：基于贝叶斯不确定性引导扩散概率模型的单图像超分辨率

摘要: 超分辨率（SR）技术对于提高图像质量至关重要，特别是在高分辨率图像对硬件约束有限的情况下。现有的SR扩散模型主要依赖于高斯模型进行噪声生成，但在处理自然场景中固有的复杂和变化的纹理时往往不足。为了解决这些不足，我们引入了贝叶斯不确定性引导扩散概率模型（BUFF）。BUFF通过引入贝叶斯网络生成高分辨率不确定性蒙版来区分自身。这些蒙版引导扩散过程，允许以一种既有上下文意识又自适应的方式调整噪声强度。这种新颖方法不仅增强了超分辨率图像与其原始高分辨率对应图像之间的保真度，还显著减轻了在具有复杂纹理和细节的区域中的伪影和模糊。模型对复杂噪声模式表现出异常的稳健性，并展示了在处理图像中的纹理和边缘时的优越适应性。经验证据，支持视觉结果，说明了模型的稳健性，尤其是在挑战性场景中，以及其解决常见SR问题（如模糊）的有效性。在DIV2K数据集上进行的实验评估显示，与基线相比，BUFF在BSD100上的SSIM得分增加了+0.61，超过传统扩散方法平均额外+0.20dB的PSNR增益。这些发现强调了贝叶斯方法在增强SR扩散过程方面的潜力，为该领域的未来进展铺平了道路。

更新时间: 2025-04-04 14:43:45

领域: cs.CV,cs.AI,68T45,I.2.10; J.0

下载: http://arxiv.org/abs/2504.03490v1

Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej

Automating legal document drafting can significantly enhance efficiency, reduce manual effort, and streamline legal workflows. While prior research has explored tasks such as judgment prediction and case summarization, the structured generation of private legal documents in the Indian legal domain remains largely unaddressed. To bridge this gap, we introduce VidhikDastaavej, a novel, anonymized dataset of private legal documents, and develop NyayaShilp, a fine-tuned legal document generation model specifically adapted to Indian legal texts. We propose a Model-Agnostic Wrapper (MAW), a two-step framework that first generates structured section titles and then iteratively produces content while leveraging retrieval-based mechanisms to ensure coherence and factual accuracy. We benchmark multiple open-source LLMs, including instruction-tuned and domain-adapted versions, alongside proprietary models for comparison. Our findings indicate that while direct fine-tuning on small datasets does not always yield improvements, our structured wrapper significantly enhances coherence, factual adherence, and overall document quality while mitigating hallucinations. To ensure real-world applicability, we developed a Human-in-the-Loop (HITL) Document Generation System, an interactive user interface that enables users to specify document types, refine section details, and generate structured legal drafts. This tool allows legal professionals and researchers to generate, validate, and refine AI-generated legal documents efficiently. Extensive evaluations, including expert assessments, confirm that our framework achieves high reliability in structured legal drafting. This research establishes a scalable and adaptable foundation for AI-assisted legal drafting in India, offering an effective approach to structured legal document generation.

Updated: 2025-04-04 14:41:50

标题: 在印度的结构化法律文件生成：一种与模型无关的包装器方法与VidhikDastaavej

摘要: 自动化法律文件起草可以显著提高效率，减少手工劳动，并简化法律工作流程。尽管先前的研究已经探讨了判断预测和案件总结等任务，但在印度法律领域中对私人法律文件的结构化生成仍然未得到充分关注。为了弥补这一差距，我们引入了VidhikDastaavej，一个新颖的匿名化私人法律文件数据集，并开发了NyayaShilp，一个专门针对印度法律文本进行了精细调整的法律文件生成模型。我们提出了一个模型无关的包装器（MAW），一个两步框架，首先生成结构化的章节标题，然后利用基于检索的机制迭代地生成内容，以确保连贯性和事实准确性。我们对多个开源的LLM进行了基准测试，包括经过指导调整和领域适应的版本，同时还与专有模型进行了比较。我们的研究结果表明，虽然直接在小型数据集上进行精细调整并不总是会带来改善，但我们的结构化包装器显著提高了连贯性、事实依从性和整体文件质量，同时减轻了幻觉。为了确保现实世界的适用性，我们开发了一个人机协同（HITL）文件生成系统，一个交互式用户界面，使用户能够指定文件类型，完善章节细节，并生成结构化的法律草稿。这一工具使法律专业人士和研究人员能够高效生成、验证和完善由人工智能生成的法律文件。广泛的评估，包括专家评估，证实了我们的框架在结构化法律起草方面具有高可靠性。这项研究为印度的AI辅助法律起草奠定了可扩展和适应性强的基础，提供了一种有效的结构化法律文件生成方法。

更新时间: 2025-04-04 14:41:50

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2504.03486v1

Gaussian Process Tilted Nonparametric Density Estimation using Fisher Divergence Score Matching

We present three Fisher divergence (FD) minimization algorithms for learning Gaussian process (GP) based score models for lower dimensional density estimation problems. The density is formed by multiplying a base multivariate normal distribution with an exponentiated GP refinement, and so we refer to it as a GP-tilted nonparametric density. By representing the GP part of the score as a linear function using the random Fourier feature (RFF) approximation, we show that all learning problems can be solved in closed form. This includes the basic and noise conditional versions of the Fisher divergence, as well as a novel alternative to noise conditional FD models based on variational inference (VI). Here, we propose using an ELBO-like optimization of the approximate posterior with which we derive a Fisher variational predictive distribution. The RFF representation of the GP, which is functionally equivalent to a single layer neural network score model with cosine activation, provides a unique linear form for which all expectations are in closed form. The Gaussian base also helps with tractability of the VI approximation. We demonstrate our three learning algorithms, as well as a MAP baseline algorithm, on several low dimensional density estimation problems. The closed-form nature of the learning problem removes the reliance on iterative algorithms, making this technique particularly well-suited to large data sets.

Updated: 2025-04-04 14:41:41

标题: 高斯过程倾斜非参数密度估计中使用的费舍尔分歧分数匹配

摘要: 我们提出了三种用于学习基于高斯过程（GP）的评分模型以解决低维密度估计问题的费舍尔散度（FD）最小化算法。密度由基本多元正态分布与指数化的GP细化相乘而成，因此我们将其称为GP倾斜的非参数密度。通过使用随机傅里叶特征（RFF）逼近将评分的GP部分表示为线性函数，我们展示了所有学习问题均可闭式求解。这包括费舍尔散度的基本和噪声条件版本，以及基于变分推断的噪声条件FD模型的新颖替代方法。在这里，我们提出使用类似ELBO的近似后验优化，从而推导出费舍尔变分预测分布。GP的RFF表示，功能上等同于具有余弦激活的单层神经网络评分模型，提供了一个独特的线性形式，所有期望值均为闭式形式。高斯基础还有助于VI逼近的可操作性。我们在几个低维密度估计问题上展示了我们的三种学习算法，以及MAP基线算法。学习问题的闭式性质消除了对迭代算法的依赖，使这种技术特别适用于大数据集。

更新时间: 2025-04-04 14:41:41

领域: cs.LG

下载: http://arxiv.org/abs/2504.03485v1

Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?

Large language models (LLMs) are increasingly contributing to the creation of content on the Internet. This creates a feedback loop as subsequent generations of models will be trained on this generated, synthetic data. This phenomenon is receiving increasing interest, in particular because previous studies have shown that it may lead to distribution shift - models misrepresent and forget the true underlying distributions of human data they are expected to approximate (e.g. resulting in a drastic loss of quality). In this study, we study the impact of human data properties on distribution shift dynamics in iterated training loops. We first confirm that the distribution shift dynamics greatly vary depending on the human data by comparing four datasets (two based on Twitter and two on Reddit). We then test whether data quality may influence the rate of this shift. We find that it does on the twitter, but not on the Reddit datasets. We then focus on a Reddit dataset and conduct a more exhaustive evaluation of a large set of dataset properties. This experiment associated lexical diversity with larger, and semantic diversity with smaller detrimental shifts, suggesting that incorporating text with high lexical (but limited semantic) diversity could exacerbate the degradation of generated text. We then focus on the evolution of political bias, and find that the type of shift observed (bias reduction, amplification or inversion) depends on the political lean of the human (true) distribution. Overall, our work extends the existing literature on the consequences of recursive fine-tuning by showing that this phenomenon is highly dependent on features of the human data on which training occurs. This suggests that different parts of internet (e.g. GitHub, Reddit) may undergo different types of shift depending on their properties.

Updated: 2025-04-04 14:41:41

标题: 在LLMs中的递归训练循环：训练数据属性如何调节生成数据中的分布偏移？

摘要: 大型语言模型（LLMs）越来越多地促成了互联网内容的创作。这造成了一个反馈循环，因为后续的模型将基于这些生成的合成数据进行训练。这种现象引起了越来越多的关注，特别是因为先前的研究表明，这可能导致分布偏移 - 模型错误地表现和忘记了人类数据的真实基础分布，这些数据是它们被期望逼近的（例如导致质量急剧下降）。在本研究中，我们研究了人类数据属性对迭代训练循环中分布偏移动力学的影响。我们首先通过比较四个数据集（两个基于Twitter，两个基于Reddit）确认了分布偏移动力学在很大程度上取决于人类数据。然后我们测试数据质量是否可能影响这种偏移的速率。我们发现在Twitter数据集上会影响，但在Reddit数据集上不会。然后我们专注于一个Reddit数据集，并对大量数据集属性进行更全面的评估。这个实验将词汇多样性与更大的、语义多样性与更小的有害偏移联系起来，表明将高词汇（但语义有限）多样性的文本纳入可能会加剧生成文本的退化。然后我们关注政治偏见的演变，并发现观察到的偏见减少、增强或倒置类型取决于人类（真实）分布的政治倾向。总的来说，我们的工作通过显示这种现象高度依赖于训练发生的人类数据的特征，扩展了现有关于递归微调后果的文献。这表明互联网的不同部分（例如GitHub、Reddit）可能会根据其属性经历不同类型的偏移。

更新时间: 2025-04-04 14:41:41

领域: cs.LG,cs.AI,cs.CL,68T50,I.2.7

下载: http://arxiv.org/abs/2504.03814v1

Discovering Partially Known Ordinary Differential Equations: a Case Study on the Chemical Kinetics of Cellulose Degradation

The degree of polymerization (DP) is one of the methods for estimating the aging of the polymer based insulation systems, such as cellulose insulation in power components. The main degradation mechanisms in polymers are hydrolysis, pyrolysis, and oxidation. These mechanisms combined cause a reduction of the DP. However, the data availability for these types of problems is usually scarce. This study analyzes insulation aging using cellulose degradation data from power transformers. The aging problem for the cellulose immersed in mineral oil inside power transformers is modeled with ordinary differential equations (ODEs). We recover the governing equations of the degradation system using Physics-Informed Neural Networks (PINNs) and symbolic regression. We apply PINNs to discover the Arrhenius equation's unknown parameters in the Ekenstam ODE describing cellulose contamination content and the material aging process related to temperature for synthetic data and real DP values. A modification of the Ekenstam ODE is given by Emsley's system of ODEs, where the rate constant expressed by the Arrhenius equation decreases in time with the new formulation. We use PINNs and symbolic regression to recover the functional form of one of the ODEs of the system and to identify an unknown parameter.

Updated: 2025-04-04 14:41:24

标题: 发现部分已知的普通微分方程：以纤维素降解的化学动力学为例研究

摘要: 聚合物化合物的聚合度（DP）是估计基于聚合物绝缘系统老化程度的方法之一，例如电力元件中的纤维素绝缘。聚合物的主要降解机制包括水解、热解和氧化。这些机制的结合导致DP的降低。然而，针对这些问题的数据通常很少。本研究分析了利用电力变压器中纤维素降解数据来研究绝缘老化问题。在变压器内被矿物油浸泡的纤维素的老化问题被建模为常微分方程（ODEs）。我们利用基于物理信息的神经网络（PINNs）和符号回归恢复了降解系统的控制方程。我们应用PINNs来发现描述纤维素污染含量和与温度相关的材料老化过程的Ekenstam ODE中的阿伦尼乌斯方程的未知参数，用于合成数据和真实的DP值。Emsley的ODE系统给出了Ekenstam ODE的修改，其中由阿伦尼乌斯方程表达的速率常数随着新的公式而随时间减小。我们使用PINNs和符号回归来恢复系统中一个ODE的功能形式，并确定一个未知参数。

更新时间: 2025-04-04 14:41:24

领域: cs.LG

下载: http://arxiv.org/abs/2504.03484v1

Online Traffic Density Estimation using Physics-Informed Neural Networks

Recent works on the application of Physics-Informed Neural Networks to traffic density estimation have shown to be promising for future developments due to their robustness to model errors and noisy data. In this paper, we introduce a methodology for online approximation of the traffic density using measurements from probe vehicles in two settings: one using the Greenshield model and the other considering a high-fidelity traffic simulation. The proposed method continuously estimates the real-time traffic density in space and performs model identification with each new set of measurements. The density estimate is updated in almost real-time using gradient descent and adaptive weights. In the case of full model knowledge, the resulting algorithm has similar performance to the classical open-loop one. However, in the case of model mismatch, the iterative solution behaves as a closed-loop observer and outperforms the baseline method. Similarly, in the high-fidelity setting, the proposed algorithm correctly reproduces the traffic characteristics.

Updated: 2025-04-04 14:41:22

标题: 使用受物理启发的神经网络进行在线交通密度估计

摘要: 最近关于物理信息神经网络在交通密度估计中的应用的研究表明，由于其对模型错误和嘈杂数据的稳健性，这些方法对未来的发展具有前景。在本文中，我们介绍了一种利用探测车辆测量在线估计交通密度的方法，其中有两种设置：一种使用Greenshield模型，另一种考虑高保真交通仿真。所提出的方法不断估计空间中的实时交通密度，并在每组新的测量值中执行模型识别。密度估计几乎实时地使用梯度下降和自适应权重进行更新。在完全了解模型的情况下，得到的算法性能与经典开环方法相似。然而，在模型不匹配的情况下，迭代解表现为闭环观察器，并优于基准方法。同样，在高保真设置中，提出的算法正确地再现了交通特性。

更新时间: 2025-04-04 14:41:22

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2504.03483v1

Probabilistic Machine Learning for Noisy Labels in Earth Observation

Label noise poses a significant challenge in Earth Observation (EO), often degrading the performance and reliability of supervised Machine Learning (ML) models. Yet, given the critical nature of several EO applications, developing robust and trustworthy ML solutions is essential. In this study, we take a step in this direction by leveraging probabilistic ML to model input-dependent label noise and quantify data uncertainty in EO tasks, accounting for the unique noise sources inherent in the domain. We train uncertainty-aware probabilistic models across a broad range of high-impact EO applications-spanning diverse noise sources, input modalities, and ML configurations-and introduce a dedicated pipeline to assess their accuracy and reliability. Our experimental results show that the uncertainty-aware models consistently outperform the standard deterministic approaches across most datasets and evaluation metrics. Moreover, through rigorous uncertainty evaluation, we validate the reliability of the predicted uncertainty estimates, enhancing the interpretability of model predictions. Our findings emphasize the importance of modeling label noise and incorporating uncertainty quantification in EO, paving the way for more accurate, reliable, and trustworthy ML solutions in the field.

Updated: 2025-04-04 14:36:33

标题: Probabilistic Machine Learning在地球观测中用于嘈杂标签

摘要: 标签噪声在地球观测（EO）中构成了一个重要挑战，通常会降低监督机器学习（ML）模型的性能和可靠性。然而，鉴于几个EO应用的关键性质，开发稳健且可信赖的ML解决方案至关重要。在这项研究中，我们通过利用概率ML来建模输入相关的标签噪声，并量化EO任务中的数据不确定性，考虑到该领域固有的独特噪声源。我们训练了涵盖广泛高影响EO应用的不确定性感知概率模型，涵盖不同的噪声源、输入模态和ML配置，并引入了专门的流程来评估它们的准确性和可靠性。我们的实验结果表明，这些不确定性感知模型在大多数数据集和评估指标上一贯优于标准的确定性方法。此外，通过严格的不确定性评估，我们验证了预测的不确定性估计的可靠性，增强了模型预测的可解释性。我们的发现强调了在EO中建模标签噪声并融入不确定性量化的重要性，为该领域中更准确、可靠和值得信赖的ML解决方案铺平了道路。

更新时间: 2025-04-04 14:36:33

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2504.03478v1

Asymptotically efficient adaptive identification under saturated output observation

As saturated output observations are ubiquitous in practice, identifying stochastic systems with such nonlinear observations is a fundamental problem across various fields. This paper investigates the asymptotically efficient identification problem for stochastic dynamical systems with saturated output observations. In contrast to most of the existing results, our results do not need the commonly used but stringent conditions such as periodic or independent assumptions on the system signals, and thus do not exclude applications to stochastic feedback systems. To be specific, we introduce a new adaptive Newton-type algorithm on the negative log-likelihood of the partially observed samples using a two-step design technique. Under some general excitation data conditions, we show that the parameter estimate is strongly consistent and asymptotically normal by employing the stochastic Lyapunov function method and limit theories for martingales. Furthermore, we show that the mean square error of the estimates can achieve the Cramer-Rao bound asymptotically without resorting to i.i.d data assumptions. This indicates that the performance of the proposed algorithm is the best possible that one can expect in general. A numerical example is provided to illustrate the superiority of our new adaptive algorithm over the existing related ones in the literature.

Updated: 2025-04-04 14:29:51

标题: 渐近有效的自适应识别在饱和输出观测下

摘要: 由于饱和输出观测在实践中是普遍存在的，因此识别具有这种非线性观测的随机系统是各个领域中的一个基本问题。本文研究了具有饱和输出观测的随机动态系统的渐近有效识别问题。与大多数现有结果不同，我们的结果不需要对系统信号进行周期性或独立性假设等常用但严格的条件，因此不排除应用于随机反馈系统。具体而言，我们引入了一种新的自适应牛顿型算法，使用两步设计技术对部分观测样本的负对数似然进行优化。在一些一般的激励数据条件下，我们证明通过采用随机李雅普诺夫函数方法和鞍点理论，参数估计是强一致的且渐近正态的。此外，我们展示估计值的均方误差可以在不依赖于i.i.d数据假设的情况下，渐近达到Cramer-Rao下界。这表明所提出的算法的性能在一般情况下是可以期望的最佳。通过提供一个数值例子，我们展示了我们的新自适应算法相对于文献中现有相关算法的优越性。

更新时间: 2025-04-04 14:29:51

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2309.09454v3

Predictive Traffic Rule Compliance using Reinforcement Learning

Autonomous vehicle path planning has reached a stage where safety and regulatory compliance are crucial. This paper presents an approach that integrates a motion planner with a deep reinforcement learning model to predict potential traffic rule violations. Our main innovation is replacing the standard actor network in an actor-critic method with a motion planning module, which ensures both stable and interpretable trajectory generation. In this setup, we use traffic rule robustness as the reward to train a reinforcement learning agent's critic, and the output of the critic is directly used as the cost function of the motion planner, which guides the choices of the trajectory. We incorporate some key interstate rules from the German Road Traffic Regulation into a rule book and use a graph-based state representation to handle complex traffic information. Experiments on an open German highway dataset show that the model can predict and prevent traffic rule violations beyond the planning horizon, increasing safety and rule compliance in challenging traffic scenarios.

Updated: 2025-04-04 14:28:47

标题: 使用强化学习预测交通规则遵守情况

摘要: 自主车辆路径规划已经达到了安全和法规合规至关重要的阶段。本文提出了一种方法，将运动规划器与深度强化学习模型相结合，用于预测潜在的交通规则违规行为。我们的主要创新是将演员评判方法中的标准演员网络替换为运动规划模块，确保轨迹生成稳定且可解释。在这个设置中，我们使用交通规则的稳健性作为奖励来训练强化学习代理的评论者，评论者的输出直接作为运动规划器的成本函数，指导轨迹的选择。我们将一些关键的德国道路交通管理条例纳入规则书，并使用基于图的状态表示来处理复杂的交通信息。对一个开放的德国高速公路数据集进行的实验表明，该模型能够预测并防止规划范围之外的交通规则违规行为，在具有挑战性的交通场景中增加安全性和规则合规性。

更新时间: 2025-04-04 14:28:47

领域: cs.RO,cs.AI,I.2.9; I.2.6

下载: http://arxiv.org/abs/2503.22925v2

Trust Me, I Know the Way: Predictive Uncertainty in the Presence of Shortcut Learning

The correct way to quantify predictive uncertainty in neural networks remains a topic of active discussion. In particular, it is unclear whether the state-of-the art entropy decomposition leads to a meaningful representation of model, or epistemic, uncertainty (EU) in the light of a debate that pits ignorance against disagreement perspectives. We aim to reconcile the conflicting viewpoints by arguing that both are valid but arise from different learning situations. Notably, we show that the presence of shortcuts is decisive for EU manifesting as disagreement.

Updated: 2025-04-04 14:22:46

标题: 相信我，我知道路：在快捷学习存在时的预测不确定性

摘要: 在神经网络中量化预测不确定性的正确方式仍然是一个活跃讨论的话题。特别是，目前尚不清楚最先进的熵分解方法是否能够有效地表示模型或认识不确定性（EU），尤其是在将无知与分歧观点相对立的辩论中。我们旨在通过认为这两种观点都是有效的，但来源于不同的学习情境来调和这些相互冲突的观点。值得注意的是，我们表明，捷径的存在对于EU表现为分歧至关重要。

更新时间: 2025-04-04 14:22:46

领域: cs.LG

下载: http://arxiv.org/abs/2502.09137v2

Physics-informed 4D X-ray image reconstruction from ultra-sparse spatiotemporal data

The unprecedented X-ray flux density provided by modern X-ray sources offers new spatiotemporal possibilities for X-ray imaging of fast dynamic processes. Approaches to exploit such possibilities often result in either i) a limited number of projections or spatial information due to limited scanning speed, as in time-resolved tomography, or ii) a limited number of time points, as in stroboscopic imaging, making the reconstruction problem ill-posed and unlikely to be solved by classical reconstruction approaches. 4D reconstruction from such data requires sample priors, which can be included via deep learning (DL). State-of-the-art 4D reconstruction methods for X-ray imaging combine the power of AI and the physics of X-ray propagation to tackle the challenge of sparse views. However, most approaches do not constrain the physics of the studied process, i.e., a full physical model. Here we present 4D physics-informed optimized neural implicit X-ray imaging (4D-PIONIX), a novel physics-informed 4D X-ray image reconstruction method combining the full physical model and a state-of-the-art DL-based reconstruction method for 4D X-ray imaging from sparse views. We demonstrate and evaluate the potential of our approach by retrieving 4D information from ultra-sparse spatiotemporal acquisitions of simulated binary droplet collisions, a relevant fluid dynamic process. We envision that this work will open new spatiotemporal possibilities for various 4D X-ray imaging modalities, such as time-resolved X-ray tomography and more novel sparse acquisition approaches like X-ray multi-projection imaging, which will pave the way for investigations of various rapid 4D dynamics, such as fluid dynamics and composite testing.

Updated: 2025-04-04 14:18:51

标题: 物理信息学的4D X射线图像重建技术：基于超稀疏时空数据

摘要: 现代X射线源提供的前所未有的X射线通量密度为X射线成像快速动态过程提供了新的时空可能性。利用这种可能性的方法通常会导致以下两种情况之一：i)由于扫描速度有限，如时间分辨层析成像，导致投影数量或空间信息有限；或ii)由于快门成像等原因，时间点数量有限，使得重建问题不适合并且不太可能通过传统的重建方法解决。从这些数据中进行4D重建需要样本先验知识，可以通过深度学习(DL)来包含。用于X射线成像的最新4D重建方法结合了人工智能的力量和X射线传播的物理来解决稀疏视图的挑战。然而，大多数方法不限制所研究过程的物理学，即完整的物理模型。在这里，我们提出了4D物理信息优化神经隐式X射线成像(4D-PIONIX)，这是一种结合了完整物理模型和最先进的DL基础重建方法的物理信息4D X射线图像重建方法，用于从稀疏视图中进行4D X射线成像。我们通过从模拟的二元液滴碰撞的超稀疏时空采集中检索4D信息来展示和评估我们方法的潜力，这是一个相关的流体动力学过程。我们预见这项工作将为各种4D X射线成像模式，如时间分辨X射线层析成像以及更新颖的稀疏采集方法，如X射线多投影成像，开辟新的时空可能性，为各种快速4D动态，如流体动力学和复合材料测试的研究铺平道路。

更新时间: 2025-04-04 14:18:51

领域: eess.IV,cs.AI,physics.data-an

下载: http://arxiv.org/abs/2504.03469v1

Real-time Speech Summarization for Medical Conversations

In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum

Updated: 2025-04-04 14:12:54

标题: 医疗对话的实时语音摘要

摘要: 在医生与患者的对话中，识别医学相关信息至关重要，这需要对对话进行总结。在这项工作中，我们提出了第一个可部署的实时语音总结系统，用于工业中的实际应用，在每N次对话中生成本地总结，在对话结束后生成全局总结。我们的系统可以从商业角度提升用户体验，同时从技术角度减少计算成本。其次，我们介绍了VietMed-Sum，据我们所知，这是第一个用于医疗对话的语音总结数据集。第三，我们是第一个利用LLM和人工标注者合作创建医疗对话总结的黄金标准和合成摘要。最后，我们在VietMed-Sum上展示了最先进模型的基准结果。所有代码、数据（英文翻译和越南文）和模型都可在线获得：https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum

更新时间: 2025-04-04 14:12:54

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.15888v2

Generating ensembles of spatially-coherent in-situ forecasts using flow matching

We propose a machine-learning-based methodology for in-situ weather forecast postprocessing that is both spatially coherent and multivariate. Compared to previous work, our Flow MAtching Postprocessing (FMAP) better represents the correlation structures of the observations distribution, while also improving marginal performance at the stations. FMAP generates forecasts that are not bound to what is already modeled by the underlying gridded prediction and can infer new correlation structures from data. The resulting model can generate an arbitrary number of forecasts from a limited number of numerical simulations, allowing for low-cost forecasting systems. A single training is sufficient to perform postprocessing at multiple lead times, in contrast with other methods which use multiple trained networks at generation time. This work details our methodology, including a spatial attention transformer backbone trained within a flow matching generative modeling framework. FMAP shows promising performance in experiments on the EUPPBench dataset, forecasting surface temperature and wind gust values at station locations in western Europe up to five-day lead times.

Updated: 2025-04-04 14:12:53

标题: 使用流匹配生成具有空间一致性的现场预测集合

摘要: 我们提出了一种基于机器学习的现场天气预报后处理方法，既具有空间一致性又具有多变量性。与先前的工作相比，我们的流匹配后处理（FMAP）更好地表示了观测分布的相关结构，同时也提高了各个站点的边际性能。FMAP生成的预测不受基础格网预测模型的限制，可以从数据中推断新的相关结构。由此产生的模型可以从有限数量的数值模拟中生成任意数量的预报，从而实现低成本的预报系统。与其他方法在生成时使用多个训练过的网络相比，只需进行一次训练即可在多个前导时间进行后处理。本文详细介绍了我们的方法论，包括在流匹配生成建模框架中训练的空间注意力转换器骨干。在EUPPBench数据集上的实验中，FMAP在预测欧洲西部站点位置的表面温度和阵风值的表现表现出有希望的性能，可提前五天。

更新时间: 2025-04-04 14:12:53

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2504.03463v1

Conditioning Diffusions Using Malliavin Calculus

In stochastic optimal control and conditional generative modelling, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and path-space integration by parts, that enables the development of methods robust to such singular rewards. This allows our approach to handle a broad range of applications, including classification, diffusion bridges, and conditioning without the need for artificial observational noise. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques.

Updated: 2025-04-04 14:10:21

标题: 使用Malliavin微积分对扩散过程进行调节

摘要: 在随机最优控制和条件生成建模中，一个核心的计算任务是修改一个参考扩散过程，以最大化给定的终端时间奖励。大多数现有方法要求这个奖励是可微分的，使用梯度来引导扩散朝着有利的结果发展。然而，在许多实际设置中，比如扩散桥，奖励是奇异的，如果目标被击中，则取无限值，否则为零。我们引入了一个基于Malliavin微积分和路径空间分部积分的新框架，使得能够开发对这种奇异奖励鲁棒的方法。这使得我们的方法能够处理广泛的应用，包括分类、扩散桥和无需人为观测噪声的条件。我们演示了我们的方法提供了稳定可靠的训练，优于现有技术。

更新时间: 2025-04-04 14:10:21

领域: stat.ML,cs.LG,math.PR,G.3

下载: http://arxiv.org/abs/2504.03461v1

Not someone, but something: Rethinking trust in the age of medical AI

As artificial intelligence (AI) becomes embedded in healthcare, trust in medical decision-making is changing fast. This opinion paper argues that trust in AI isn't a simple transfer from humans to machines -- it's a dynamic, evolving relationship that must be built and maintained. Rather than debating whether AI belongs in medicine, this paper asks: what kind of trust must AI earn, and how? Drawing from philosophy, bioethics, and system design, it explores the key differences between human trust and machine reliability -- emphasizing transparency, accountability, and alignment with the values of care. It argues that trust in AI shouldn't rely on mimicking empathy or intuition, but on thoughtful design, responsible deployment, and clear moral responsibility. The goal is a balanced view -- one that avoids blind optimism and reflexive fear. Trust in AI must be treated not as a given, but as something to be earned over time.

Updated: 2025-04-04 14:09:20

标题: 不是某人，而是某物：重新思考医疗人工智能时代的信任

摘要: 随着人工智能（AI）在医疗领域的应用，对医疗决策的信任正在迅速改变。这篇观点论文认为，在AI方面的信任不是简单地从人类转移到机器上，而是一个动态的、不断发展的关系，必须建立和维护。与其争论AI是否适合在医学领域，这篇论文提出了一个问题：AI必须获得何种信任，以及如何获得？从哲学、生物伦理学和系统设计方面进行探讨，探讨了人类信任和机器可靠性之间的关键区别，强调透明度、问责制和与护理价值的一致性。它认为，对AI的信任不应该依赖于模仿同理心或直觉，而应该依赖于深思熟虑的设计、负责任的部署和清晰的道德责任。目标是达成一种平衡的观点——避免盲目乐观和反射性恐惧。对AI的信任必须被视为一种需要随着时间而赢得的东西，而不是一种理所当然的东西。

更新时间: 2025-04-04 14:09:20

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2504.05331v1

SpectR: Dynamically Composing LM Experts with Spectral Routing

Training large, general-purpose language models poses significant challenges. The growing availability of specialized expert models, fine-tuned from pretrained models for specific tasks or domains, offers a promising alternative. Leveraging the potential of these existing expert models in real-world applications requires effective methods to select or merge the models best suited for a given task. This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference. Notably, our method requires no additional training and enables flexible, token- and layer-wise model combinations. Our experimental results demonstrate that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.

Updated: 2025-04-04 13:58:44

标题: SpectR：使用频谱路由动态组合语言模型专家

摘要: 训练大型通用语言模型面临重大挑战。越来越多专门领域的专家模型可供使用，这些模型经过微调，从预训练模型中获得，为特定任务或领域提供了一种有前途的替代方法。利用这些现有专家模型在现实世界应用中的潜力需要有效的方法来选择或合并最适合特定任务的模型。本文介绍了一种名为SPECTR的方法，用于在推理过程中每个时间步动态组合专家模型。值得注意的是，我们的方法不需要额外的训练，并且可以实现灵活的基于标记和层的模型组合。我们的实验结果表明，SPECTR相对于其他免费训练方法提高了路由准确性，提高了专家领域的任务性能。

更新时间: 2025-04-04 13:58:44

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03454v1

HeartBERT: A Self-Supervised ECG Embedding Model for Efficient and Effective Medical Signal Analysis

The HeartBert model is introduced with three primary objectives: reducing the need for labeled data, minimizing computational resources, and simultaneously improving performance in machine learning systems that analyze Electrocardiogram (ECG) signals. Inspired by Bidirectional Encoder Representations from Transformers (BERT) in natural language processing and enhanced with a self-supervised learning approach, the HeartBert model-built on the RoBERTa architecture-generates sophisticated embeddings tailored for ECG-based projects in the medical domain. To demonstrate the versatility, generalizability, and efficiency of the proposed model, two key downstream tasks have been selected: sleep stage detection and heartbeat classification. HeartBERT-based systems, utilizing bidirectional LSTM heads, are designed to address complex challenges. A series of practical experiments have been conducted to demonstrate the superiority and advancements of HeartBERT, particularly in terms of its ability to perform well with smaller training datasets, reduced learning parameters, and effective performance compared to rival models. The code and data are publicly available at https://github.com/ecgResearch/HeartBert.

Updated: 2025-04-04 13:53:30

标题: HeartBERT: 一种自监督的心电图嵌入模型，用于高效和有效的医学信号分析

摘要: HeartBert模型引入了三个主要目标：减少标记数据的需求，最小化计算资源，并同时提高分析心电图（ECG）信号的机器学习系统的性能。受自然语言处理中双向编码器变换器（BERT）的启发，并结合自监督学习方法，基于RoBERTa架构构建的HeartBert模型生成了专为医学领域中基于ECG项目量身定制的复杂嵌入。为了展示所提出模型的多功能性、泛化性和效率，选择了两个关键的下游任务：睡眠阶段检测和心跳分类。基于HeartBERT的系统，利用双向LSTM头部，旨在解决复杂挑战。进行了一系列实验，以展示HeartBERT的卓越性和进步，特别是其在较小的训练数据集、减少学习参数以及与竞争模型相比的有效性能方面表现良好。代码和数据可在https://github.com/ecgResearch/HeartBert 上公开获取。

更新时间: 2025-04-04 13:53:30

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2411.11896v3

How Feature Learning Can Improve Neural Scaling Laws

We develop a solvable model of neural scaling laws beyond the kernel limit. Theoretical analysis of this model shows how performance scales with model size, training time, and the total amount of available data. We identify three scaling regimes corresponding to varying task difficulties: hard, easy, and super easy tasks. For easy and super-easy target functions, which lie in the reproducing kernel Hilbert space (RKHS) defined by the initial infinite-width Neural Tangent Kernel (NTK), the scaling exponents remain unchanged between feature learning and kernel regime models. For hard tasks, defined as those outside the RKHS of the initial NTK, we demonstrate both analytically and empirically that feature learning can improve scaling with training time and compute, nearly doubling the exponent for hard tasks. This leads to a different compute optimal strategy to scale parameters and training time in the feature learning regime. We support our finding that feature learning improves the scaling law for hard tasks but not for easy and super-easy tasks with experiments of nonlinear MLPs fitting functions with power-law Fourier spectra on the circle and CNNs learning vision tasks.

Updated: 2025-04-04 13:47:57

标题: 特征学习如何改进神经缩放定律

摘要: 我们发展了一个可解的神经缩放定律模型，超越了核限制。对这个模型的理论分析显示了性能如何随着模型大小、训练时间和可用数据总量的变化而缩放。我们确定了三个与不同任务难度相对应的缩放规则：困难、容易和超容易任务。对于位于初始无限宽度神经切向核（NTK）定义的再生核希尔伯特空间（RKHS）中的容易和超容易目标函数，缩放指数在特征学习和核区模型之间保持不变。对于定义为初始NTK的RKHS之外的困难任务，我们理论上和经验上证明了特征学习可以提高随着训练时间和计算的缩放，几乎使困难任务的指数翻倍。这导致在特征学习阶段按比例缩放参数和训练时间的不同计算优化策略。我们通过非线性MLP拟合圆上具有幂律傅里叶谱函数和CNN学习视觉任务的实验支持我们的发现，即特征学习改善了困难任务的缩放定律，但不适用于容易和超容易任务。

更新时间: 2025-04-04 13:47:57

领域: stat.ML,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2409.17858v2

What Can 240,000 New Credit Transactions Tell Us About the Impact of NGEU Funds?

Using a panel data local projections model and controlling for firm characteristics, procurement bid attributes, and macroeconomic conditions, the study estimates the dynamic effects of procurement awards on new lending, a more precise measure than the change in the stock of credit. The analysis further examines heterogeneity in credit responses based on firm size, industry, credit maturity, and value chain position of the firms. The empirical evidence confirms that public procurement awards significantly increase new lending, with NGEU-funded contracts generating stronger credit expansion than traditional procurement during the recent period. The results show that the impact of NGEU procurement programs aligns closely with historical procurement impacts, with differences driven mainly by lower utilization rates. Moreover, integrating high-frequency financial data with procurement records highlights the potential of Big Data in refining public policy design.

Updated: 2025-04-04 13:46:31

标题: 240,000笔新的信贷交易告诉我们有关NGEU基金影响的什么？

摘要: 使用面板数据局部投影模型，并控制公司特征、采购投标属性和宏观经济条件，本研究估计了采购奖励对新贷款的动态影响，这是比信贷存量变化更精确的衡量指标。分析进一步考察了基于公司规模、行业、信贷到期日和公司价值链位置的信贷反应的异质性。实证证据证实，公共采购奖励显著增加新贷款，NGEU资助的合同在最近时期产生的信贷扩张比传统采购更强。结果显示，NGEU采购计划的影响与历史采购影响密切相关，主要由较低的利用率驱动。此外，将高频金融数据与采购记录整合，突显了大数据在完善公共政策设计方面的潜力。

更新时间: 2025-04-04 13:46:31

领域: q-fin.ST,cs.LG

下载: http://arxiv.org/abs/2504.01964v2

Optimizing Specific and Shared Parameters for Efficient Parameter Tuning

Foundation models, with a vast number of parameters and pretraining on massive datasets, achieve state-of-the-art performance across various applications. However, efficiently adapting them to downstream tasks with minimal computational overhead remains a challenge. Parameter-Efficient Transfer Learning (PETL) addresses this by fine-tuning only a small subset of parameters while preserving pre-trained knowledge. In this paper, we propose SaS, a novel PETL method that effectively mitigates distributional shifts during fine-tuning. SaS integrates (1) a shared module that captures common statistical characteristics across layers using low-rank projections and (2) a layer-specific module that employs hypernetworks to generate tailored parameters for each layer. This dual design ensures an optimal balance between performance and parameter efficiency while introducing less than 0.05% additional parameters, making it significantly more compact than existing methods. Extensive experiments on diverse downstream tasks, few-shot settings and domain generalization demonstrate that SaS significantly enhances performance while maintaining superior parameter efficiency compared to existing methods, highlighting the importance of capturing both shared and layer-specific information in transfer learning. Code and data are available at https://anonymous.4open.science/r/SaS-PETL-3565.

Updated: 2025-04-04 13:43:54

标题: 优化特定和共享参数以实现高效参数调整

摘要: 基于庞大参数量和在大规模数据集上预训练的基础模型，在各种应用领域取得了最先进的性能。然而，高效地将它们适应到下游任务并最小化计算开销仍然是一个挑战。参数高效的迁移学习（PETL）通过仅微调一小部分参数来保留预训练知识来解决这一挑战。在本文中，我们提出了SaS，一个新颖的PETL方法，能够有效地缓解微调过程中的分布转移。SaS整合了（1）一个共享模块，通过低秩投影捕获各层之间的共同统计特征，以及（2）一个层特定模块，利用超网络为每一层生成定制参数。这种双重设计确保了性能和参数效率之间的最佳平衡，同时引入的额外参数不到0.05％，使其比现有方法显著更紧凑。对多样化的下游任务、少样本设置和领域泛化的大量实验表明，与现有方法相比，SaS显著提升了性能，同时保持了优越的参数效率，突显了在迁移学习中捕获共享和层特定信息的重要性。代码和数据可在https://anonymous.4open.science/r/SaS-PETL-3565获取。

更新时间: 2025-04-04 13:43:54

领域: cs.LG

下载: http://arxiv.org/abs/2504.03450v1

Semi-Supervised Model-Free Bayesian State Estimation from Compressed Measurements

We consider data-driven Bayesian state estimation from compressed measurements (BSCM) of a model-free process. The dimension of the temporal measurement vector is lower than that of the temporal state vector to be estimated, leading to an under-determined inverse problem. The underlying dynamical model of the state's evolution is unknown for a 'model-free process.' Hence, it is difficult to use traditional model-driven methods, for example, Kalman and particle filters. Instead, we consider data-driven methods. We experimentally show that two existing unsupervised learning-based data-driven methods fail to address the BSCM problem in a model-free process. The methods are -- data-driven nonlinear state estimation (DANSE) and deep Markov model (DMM). While DANSE provides good predictive/forecasting performance to model the temporal measurement data as a time series, its unsupervised learning lacks suitable regularization for tackling the BSCM task. We then propose a semi-supervised learning approach and develop a semi-supervised learning-based DANSE method, referred to as SemiDANSE. In SemiDANSE, we use a large amount of unlabelled data along with a limited amount of labelled data, i.e., pairwise measurement-and-state data, which provides the desired regularization. Using three benchmark dynamical systems, we show that the data-driven SemiDANSE provides competitive state estimation performance for BSCM against a hybrid method called KalmanNet and two model-driven methods (extended Kalman filter and unscented Kalman filter) that know the dynamical models exactly.

Updated: 2025-04-04 13:38:45

标题: 基于压缩测量的无模型贝叶斯状态估计的半监督方法

摘要: 我们考虑从模型无关过程的压缩测量数据（BSCM）中进行数据驱动的贝叶斯状态估计。时间测量向量的维数低于待估计的时间状态向量的维数，导致一个欠定的逆问题。状态演化的基础动力学模型对于“模型无关过程”是未知的。因此，很难使用传统的模型驱动方法，例如Kalman和粒子滤波器。相反，我们考虑数据驱动方法。我们实验性地表明，两种现有的基于无监督学习的数据驱动方法无法解决模型无关过程中的BSCM问题。这些方法是--数据驱动非线性状态估计（DANSE）和深度马尔可夫模型（DMM）。虽然DANSE能够很好地预测/预测时间测量数据作为时间序列的模型，但其无监督学习缺乏适当的正则化来解决BSCM任务。然后，我们提出了一种半监督学习方法，并开发了一种基于半监督学习的DANSE方法，称为SemiDANSE。在SemiDANSE中，我们使用大量未标记数据以及有限数量的标记数据，即成对的测量和状态数据，这提供了所需的正则化。使用三个基准动力系统，我们展示了数据驱动的SemiDANSE在BSCM问题上提供了竞争性的状态估计性能，比较了一种名为KalmanNet的混合方法和两种精确了解动力学模型的模型驱动方法（扩展Kalman滤波器和无迹Kalman滤波器）。

更新时间: 2025-04-04 13:38:45

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2407.07368v3

Generalization of the Gibbs algorithm with high probability at low temperatures

The paper gives a bound on the generalization error of the Gibbs algorithm, which recovers known data-independent bounds for the high temperature range and extends to the low-temperature range, where generalization depends critically on the data-dependent loss-landscape. It is shown, that with high probability the generalization error of a single hypothesis drawn from the Gibbs posterior decreases with the total prior volume of all hypotheses with similar or smaller empirical error. This gives theoretical support to the belief in the benefit of flat minima. The zero temperature limit is discussed and the bound is extended to a class of similar stochastic algorithms.

Updated: 2025-04-04 13:38:13

标题: 在低温下高概率的Gibbs算法的泛化

摘要: 这篇论文给出了Gibbs算法的泛化误差上界，该算法在高温范围内恢复了已知的数据无关上界，并扩展到低温范围，其中泛化误差关键取决于数据相关的损失景观。研究表明，从Gibbs后验中抽取的单个假设的泛化误差以高概率随着所有具有相似或更小经验误差的假设的总先验体积的减少而减少。这为支持平坦最小值的好处的信念提供了理论支持。讨论了零温度极限，并将上界扩展到一类类似的随机算法。

更新时间: 2025-04-04 13:38:13

领域: cs.LG,stat.ML,68T05,G.3

下载: http://arxiv.org/abs/2502.11071v5

ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling

Optimization modeling plays a critical role in the application of Operations Research (OR) tools to address real-world problems, yet they pose challenges and require extensive expertise from OR experts. With the advent of large language models (LLMs), new opportunities have emerged to streamline and automate such task. However, current research predominantly relies on closed-source LLMs such as GPT-4, along with extensive prompt engineering techniques. This reliance stems from the scarcity of high-quality training datasets for optimization modeling, resulting in elevated costs, prolonged processing times, and privacy concerns. To address these challenges, our work is the first to propose a viable path for training open-source LLMs that are capable of optimization modeling and developing solver codes, eventually leading to a superior ability for automating optimization modeling and solving. Particularly, we design the {\sc OR-Instruct}, a semi-automated data synthesis framework for optimization modeling that enables customizable enhancements for specific scenarios or model types. This work also introduces IndustryOR, the first industrial benchmark for evaluating LLMs in solving practical OR problems. We train several 7B-scale open-source LLMs using synthesized data (dubbed ORLMs{https://github.com/Cardinal-Operations/ORLM}), which exhibit significantly enhanced optimization modeling capabilities, achieving competitive performance across the NL4OPT, MAMO, and IndustryOR benchmarks. Additionally, our experiments highlight the potential of scaling law and reinforcement learning to further enhance the performance of ORLMs. The workflows and human-machine interaction paradigms of ORLMs in practical industrial applications are also discussed in the paper.

Updated: 2025-04-04 13:31:38

标题: ORLM：用于自动优化建模的训练大型模型的可定制框架

摘要: 优化建模在运筹学工具应用中发挥着关键作用，但也面临挑战，并需要运筹学专家的广泛专业知识。随着大型语言模型（LLMs）的出现，出现了新的机会来简化和自动化这样的任务。然而，当前的研究主要依赖于闭源LLMs，如GPT-4，以及大量的提示工程技术。这种依赖源于优化建模高质量训练数据的稀缺性，导致成本上升、处理时间延长和隐私问题。为了解决这些挑战，我们的工作首次提出了一个可行的路径，用于训练开源LLMs，这些LLMs能够进行优化建模和开发求解器代码，最终实现更优秀的自动化优化建模和求解能力。特别是，我们设计了OR-Instruct，这是一个半自动数据合成框架，用于优化建模，可以为特定场景或模型类型提供可定制的增强功能。这项工作还引入了IndustryOR，这是用于评估LLMs在解决实际运筹学问题中的第一个工业基准。我们使用合成数据（称为ORLMs）训练了几个7B规模的开源LLMs，这些LLMs展示了显著增强的优化建模能力，在NL4OPT、MAMO和IndustryOR基准测试中取得了竞争性表现。此外，我们的实验突出了缩放定律和强化学习在进一步提高ORLMs性能方面的潜力。论文还讨论了ORLMs在实际工业应用中的工作流程和人机交互范式。

更新时间: 2025-04-04 13:31:38

领域: cs.CL,cs.AI,cs.CE,cs.LG

下载: http://arxiv.org/abs/2405.17743v5

Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models

To leverage the full potential of Large Language Models (LLMs) it is crucial to have some information on their answers' uncertainty. This means that the model has to be able to quantify how certain it is in the correctness of a given response. Bad uncertainty estimates can lead to overconfident wrong answers undermining trust in these models. Quite a lot of research has been done on language models that work with text inputs and provide text outputs. Still, since the visual capabilities have been added to these models recently, there has not been much progress on the uncertainty of Visual Language Models (VLMs). We tested three state-of-the-art VLMs on corrupted image data. We found that the severity of the corruption negatively impacted the models' ability to estimate their uncertainty and the models also showed overconfidence in most of the experiments.

Updated: 2025-04-04 13:31:08

标题: 了解你不知道的事情：在视觉-语言模型中对受损图像的口头不确定性估计稳健性

摘要: 为了充分发挥大型语言模型（LLMs）的潜力，了解它们答案的不确定性至关重要。这意味着模型必须能够量化其对给定响应正确性的确定程度。糟糕的不确定性估计可能导致过于自信但错误的答案，破坏对这些模型的信任。已经进行了大量关于处理文本输入并提供文本输出的语言模型的研究。然而，自这些模型最近增加了视觉能力以来，对视觉语言模型（VLMs）的不确定性研究进展不大。我们对三种最先进的VLMs在受损图像数据上进行了测试。我们发现损坏的严重程度对模型估计不确定性的能力产生了负面影响，并且在大多数实验中，模型也表现出过于自信的特点。

更新时间: 2025-04-04 13:31:08

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.03440v1

A Polynomial-Time Algorithm for Variational Inequalities under the Minty Condition

Solving (Stampacchia) variational inequalities (SVIs) is a foundational problem at the heart of optimization, with a host of critical applications ranging from engineering to economics. However, this expressivity comes at the cost of computational hardness. As a result, most research has focused on carving out specific subclasses that elude those intractability barriers. A classical property that goes back to the 1960s is the Minty condition, which postulates that the Minty VI (MVI) problem -- the weak dual of the SVI problem -- admits a solution. In this paper, we establish the first polynomial-time algorithm -- that is, with complexity growing polynomially in the dimension $d$ and $\log(1/\epsilon)$ -- for solving $\epsilon$-SVIs for Lipschitz continuous mappings under the Minty condition. Prior approaches either incurred an exponentially worse dependence on $1/\epsilon$ (and other natural parameters of the problem) or made overly restrictive assumptions -- such as strong monotonicity. To do so, we introduce a new variant of the ellipsoid algorithm wherein separating hyperplanes are obtained after taking a gradient descent step from the center of the ellipsoid. It succeeds even though the set of SVIs can be nonconvex and not fully dimensional. Moreover, when our algorithm is applied to an instance with no MVI solution and fails to identify an SVI solution, it produces a succinct certificate of MVI infeasibility. We also show that deciding whether the Minty condition holds is $\mathsf{coNP}$-complete. We provide several extensions and new applications of our main results. Specifically, we obtain the first polynomial-time algorithms for i) solving monotone VIs, ii) globally minimizing a (potentially nonsmooth) quasar-convex function, and iii) computing Nash equilibria in multi-player harmonic games.

Updated: 2025-04-04 13:24:41

标题: 一个满足Minty条件的变分不等式的多项式时间算法

摘要: 解决（Stampacchia）变分不等式（SVIs）是优化中的一个基础问题，涉及从工程到经济等一系列关键应用。然而，这种表达能力的提高是以计算难度为代价的。因此，大多数研究都集中在开发特定的子类，以避开这些难以处理的障碍。一个可以追溯到上世纪60年代的经典性质是Minty条件，该条件假设Minty VI（MVI）问题 -- SVI问题的弱对偶 -- 存在解。在本文中，我们建立了第一个多项式时间算法 -- 即，随着维度$d$和$ \log(1/\epsilon)$的多项式级复杂度 -- 用于在Minty条件下解决Lipschitz连续映射的$\epsilon$-SVIs问题。先前的方法要么在$1/\epsilon$（以及问题的其他自然参数）上产生指数级的依赖，要么做出过于严格的假设 -- 例如强单调性。为此，我们引入了一个新的椭球算法变种，在这种算法中，从椭球的中心进行梯度下降步骤后获得分离超平面。即使SVIs集合可能是非凸的且不完全维度，该算法也能成功。此外，当我们的算法应用于没有MVI解并且未能识别SVI解的实例时，它会产生一个简洁的MVI不可行性证书。我们还展示了判断Minty条件是否成立是$\mathsf{coNP}$-完全的。我们提供了主要结果的几个扩展和新应用。具体来说，我们获得了用于i）解决单调VIs的第一个多项式时间算法，ii）全局最小化（潜在非光滑）quasar-凸函数，以及iii）计算多人谐波博弈中的纳什均衡的第一个多项式时间算法。

更新时间: 2025-04-04 13:24:41

领域: math.OC,cs.GT,cs.LG

下载: http://arxiv.org/abs/2504.03432v1

Comparison of SAT-based and ASP-based Algorithms for Inconsistency Measurement

We present algorithms based on satisfiability problem (SAT) solving, as well as answer set programming (ASP), for solving the problem of determining inconsistency degrees in propositional knowledge bases. We consider six different inconsistency measures whose respective decision problems lie on the first level of the polynomial hierarchy. Namely, these are the contension inconsistency measure, the forgetting-based inconsistency measure, the hitting set inconsistency measure, the max-distance inconsistency measure, the sum-distance inconsistency measure, and the hit-distance inconsistency measure. In an extensive experimental analysis, we compare the SAT-based and ASP-based approaches with each other, as well as with a set of naive baseline algorithms. Our results demonstrate that overall, both the SAT-based and the ASP-based approaches clearly outperform the naive baseline methods in terms of runtime. The results further show that the proposed ASP-based approaches perform superior to the SAT-based ones with regard to all six inconsistency measures considered in this work. Moreover, we conduct additional experiments to explain the aforementioned results in greater detail.

Updated: 2025-04-04 13:21:03

标题: SAT-based和ASP-based算法在一致性度量方面的比较

摘要: 我们提出了基于可满足性问题（SAT）求解和答案集编程（ASP）的算法，用于解决确定命题知识库中不一致程度的问题。我们考虑了六种不同的不一致性度量，它们各自的决策问题位于多项式层次的第一级。具体来说，这些度量包括争执不一致度量、基于遗忘的不一致度量、命中集不一致度量、最大距离不一致度量、总距离不一致度量和命中距离不一致度量。在广泛的实验分析中，我们将基于SAT和基于ASP的方法相互比较，并与一组天真基线算法进行比较。我们的结果表明，总体而言，基于SAT和基于ASP的方法在运行时间方面明显优于天真基线方法。结果进一步显示，所提出的基于ASP的方法在本文考虑的所有六种不一致性度量方面均优于基于SAT的方法。此外，我们进行了额外实验以更详细地解释上述结果。

更新时间: 2025-04-04 13:21:03

领域: cs.AI

下载: http://arxiv.org/abs/2304.14832v2

Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks

Quantum computing is currently strongly limited by the impact of noise, in particular introduced by the application of two-qubit gates. For this reason, reducing the number of two-qubit gates is of paramount importance on noisy intermediate-scale quantum hardware. To advance towards more reliable quantum computing, we introduce a framework based on ZX calculus, graph-neural networks and reinforcement learning for quantum circuit optimization. By combining reinforcement learning and tree search, our method addresses the challenge of selecting optimal sequences of ZX calculus rewrite rules. Instead of relying on existing heuristic rules for minimizing circuits, our method trains a novel reinforcement learning policy that directly operates on ZX-graphs, therefore allowing us to search through the space of all possible circuit transformations to find a circuit significantly minimizing the number of CNOT gates. This way we can scale beyond hard-coded rules towards discovering arbitrary optimization rules. We demonstrate our method's competetiveness with state-of-the-art circuit optimizers and generalization capabilities on large sets of diverse random circuits.

Updated: 2025-04-04 13:19:08

标题: 通过强化学习和图神经网络利用ZX图优化量子电路

摘要: 目前，量子计算受到噪声的影响严重限制，尤其是由两量子比特门引入的噪声。因此，在噪声干扰的中等规模量子硬件上，减少两量子比特门的数量至关重要。为了推动更可靠的量子计算，我们引入了一种基于ZX演算、图神经网络和强化学习的量子电路优化框架。通过结合强化学习和树搜索，我们的方法解决了选择最佳ZX演算重写规则序列的挑战。我们的方法不依赖于现有的启发式规则来最小化电路，而是训练了一个直接在ZX图上操作的新颖强化学习策略，从而使我们能够搜索所有可能电路变换的空间，找到大大减少CNOT门数量的电路。这样，我们可以超越硬编码规则，发现任意优化规则。我们展示了我们的方法与最先进的电路优化器的竞争力，以及在大量多样化的随机电路集上的泛化能力。

更新时间: 2025-04-04 13:19:08

领域: cs.LG,quant-ph

下载: http://arxiv.org/abs/2504.03429v1

Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models

In the era of foundation models, we revisit continual learning~(CL), which aims to enable vision transformers (ViTs) to learn new tasks over time. However, as the scale of these models increases, catastrophic forgetting remains a persistent challenge, particularly in the presence of significant domain shifts across tasks. Recent studies highlight a crossover between CL techniques and parameter-efficient fine-tuning (PEFT), which focuses on fine-tuning only a small set of trainable parameters to adapt to downstream tasks, such as low-rank adaptation (LoRA). While LoRA achieves faster convergence and requires fewer trainable parameters, it has seldom been explored in the context of continual learning. To address this gap, we propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA), which introduces both an orthogonal LoRA adapter and a residual LoRA adapter parallel to pre-trained weights in each layer. These components are orchestrated by a dynamic memory mechanism to strike a balance between stability and plasticity. The orthogonal LoRA adapter's parameters are updated in an orthogonal subspace of previous tasks to mitigate catastrophic forgetting, while the residual LoRA adapter's parameters are updated in the residual subspace spanned by task-specific bases without interaction across tasks, offering complementary capabilities for fine-tuning new tasks. On ViT-based models, we demonstrate that DualLoRA offers significant advantages in accuracy, inference speed, and memory efficiency over existing CL methods across multiple benchmarks.

Updated: 2025-04-04 13:16:35

标题: 使用预训练模型的双低秩适应在持续学习中的应用

摘要: 在基础模型时代，我们重新审视持续学习（CL），旨在使视觉变换器（ViTs）随着时间学习新任务。然而，随着这些模型规模的增加，灾难性遗忘仍然是一个持续的挑战，特别是在任务之间存在显著领域转移的情况下。最近的研究强调了CL技术和参数高效微调（PEFT）之间的交叉，PEFT专注于仅微调一小组可训练参数以适应下游任务，如低秩适应（LoRA）。虽然LoRA实现了更快的收敛速度并且需要更少的可训练参数，但它在持续学习的背景下很少被探索。为了填补这一差距，我们提出了一种名为双低秩适应（DualLoRA）的新型PEFT-CL方法，该方法在每一层中引入了一个正交的LoRA适配器和一个残余的LoRA适配器，并行于预训练权重。这些组件由一个动态内存机制协调，以在稳定性和可塑性之间取得平衡。正交的LoRA适配器的参数在先前任务的正交子空间中进行更新，以减轻灾难性遗忘，而残余的LoRA适配器的参数则在由特定任务基底张成的残余子空间中进行更新，不跨任务进行交互，为微调新任务提供互补能力。在基于ViT的模型上，我们证明了DualLoRA在准确性、推理速度和内存效率方面相对于现有的CL方法在多个基准测试中具有显著优势。

更新时间: 2025-04-04 13:16:35

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.00623v2

The AI Cosmologist I: An Agentic System for Automated Data Analysis

We present the AI Cosmologist, an agentic system designed to automate cosmological/astronomical data analysis and machine learning research workflows. This implements a complete pipeline from idea generation to experimental evaluation and research dissemination, mimicking the scientific process typically performed by human researchers. The system employs specialized agents for planning, coding, execution, analysis, and synthesis that work together to develop novel approaches. Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies, writes complete code, handles execution errors, analyzes results, and synthesizes new approaches based on experimental outcomes. We demonstrate the AI Cosmologist capabilities across several machine learning tasks, showing how it can successfully explore solution spaces, iterate based on experimental results, and combine successful elements from different approaches. Our results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery. The code and experimental data used in this paper are available on GitHub at https://github.com/adammoss/aicosmologist. Example papers included in the appendix demonstrate the system's capability to autonomously produce complete scientific publications, starting from only the dataset and task description

Updated: 2025-04-04 13:12:08

标题: AI宇宙学家I：用于自动化数据分析的主体系统

摘要: 我们介绍了AI宇宙学家，这是一个旨在自动化宇宙学/天文数据分析和机器学习研究工作流程的主动系统。该系统实现了从构思到实验评估和研究传播的完整流程，模仿人类研究人员通常执行的科学过程。该系统利用专门的代理进行规划、编码、执行、分析和综合工作，共同开发新颖的方法。与传统的自动机器学习系统不同，AI宇宙学家生成多样化的实施策略，编写完整的代码，处理执行错误，分析结果，并基于实验结果综合新方法。我们展示了AI宇宙学家在几个机器学习任务上的能力，展示了它如何成功地探索解决方案空间，根据实验结果进行迭代，并结合不同方法的成功元素。我们的结果表明，主动系统可以自动化部分研究过程，可能加速科学发现。本文使用的代码和实验数据可在GitHub上找到，网址为https://github.com/adammoss/aicosmologist。附录中包含的示例论文展示了系统从仅有数据集和任务描述开始，自主产生完整的科学出版物的能力。

更新时间: 2025-04-04 13:12:08

领域: astro-ph.IM,astro-ph.CO,astro-ph.GA,cs.AI,physics.data-an

下载: http://arxiv.org/abs/2504.03424v1

DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models

This paper presents a novel deep learning framework for robotic arm manipulation that integrates multimodal inputs using a late-fusion strategy. Unlike traditional end-to-end or reinforcement learning approaches, our method processes image sequences with pre-trained models and robot state data with machine learning algorithms, fusing their outputs to predict continuous action values for control. Evaluated on BridgeData V2 and Kuka datasets, the best configuration (VGG16 + Random Forest) achieved MSEs of 0.0021 and 0.0028, respectively, demonstrating strong predictive performance and robustness. The framework supports modularity, interpretability, and real-time decision-making, aligning with the goals of adaptive, human-in-the-loop cyber-physical systems.

Updated: 2025-04-04 13:11:43

标题: DML-RAM：使用预训练模型进行机器人臂操作的深度多模态学习框架

摘要: 本文提出了一种新颖的深度学习框架，用于机械臂操作，该框架使用延迟融合策略集成多模态输入。与传统的端到端或强化学习方法不同，我们的方法使用预训练模型处理图像序列和机器人状态数据，将它们的输出融合以预测用于控制的连续动作值。在BridgeData V2和Kuka数据集上评估，最佳配置（VGG16 +随机森林）分别实现了0.0021和0.0028的均方误差，表现出强大的预测性能和鲁棒性。该框架支持模块化、可解释性和实时决策，符合自适应、人在环路的智能物理系统的目标。

更新时间: 2025-04-04 13:11:43

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2504.03423v1

Autonomous state-space segmentation for Deep-RL sparse reward scenarios

Dealing with environments with sparse rewards has always been crucial for systems developed to operate in autonomous open-ended learning settings. Intrinsic Motivations could be an effective way to help Deep Reinforcement Learning algorithms learn in such scenarios. In fact, intrinsic reward signals, such as novelty or curiosity, are generally adopted to improve exploration when extrinsic rewards are delayed or absent. Building on previous works, we tackle the problem of learning policies in the presence of sparse rewards by proposing a two-level architecture that alternates an ''intrinsically driven'' phase of exploration and autonomous sub-goal generation, to a phase of sparse reward, goal-directed policy learning. The idea is to build several small networks, each one specialized on a particular sub-path, and use them as starting points for future exploration without the need to further explore from scratch previously learnt paths. Two versions of the system have been trained and tested in the Gym SuperMarioBros environment without considering any additional extrinsic reward. The results show the validity of our approach and the importance of autonomously segment the environment to generate an efficient path towards the final goal.

Updated: 2025-04-04 13:06:23

标题: 深度强化学习稀疏奖励场景下的自主状态空间分割

摘要: 处理稀疏奖励环境一直是自主开放式学习环境中系统开发的关键问题。内在动机可能是帮助深度强化学习算法在这种情况下学习的有效方式。事实上，内在奖励信号，如新颖性或好奇心，通常被采用来改善探索，当外部奖励被延迟或缺失时。基于以往的研究，我们通过提出一个两级架构来解决在稀疏奖励存在的情况下学习策略的问题，该架构交替进行“内在驱动”阶段的探索和自主子目标生成，以及稀疏奖励、目标导向策略学习阶段。其思想是构建几个小网络，每个网络专门针对特定的子路径，并将它们用作未来探索的起点，而无需从头开始再次探索先前学习的路径。该系统的两个版本已在Gym SuperMarioBros环境中进行了训练和测试，而不考虑任何额外的外部奖励。结果显示了我们方法的有效性，并且自主地划分环境以生成通向最终目标的高效路径的重要性。

更新时间: 2025-04-04 13:06:23

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.03420v1

NeRFlex: Resource-aware Real-time High-quality Rendering of Complex Scenes on Mobile Devices

Neural Radiance Fields (NeRF) is a cutting-edge neural network-based technique for novel view synthesis in 3D reconstruction. However, its significant computational demands pose challenges for deployment on mobile devices. While mesh-based NeRF solutions have shown potential in achieving real-time rendering on mobile platforms, they often fail to deliver high-quality reconstructions when rendering practical complex scenes. Additionally, the non-negligible memory overhead caused by pre-computed intermediate results complicates their practical application. To overcome these challenges, we present NeRFlex, a resource-aware, high-resolution, real-time rendering framework for complex scenes on mobile devices. NeRFlex integrates mobile NeRF rendering with multi-NeRF representations that decompose a scene into multiple sub-scenes, each represented by an individual NeRF network. Crucially, NeRFlex considers both memory and computation constraints as first-class citizens and redesigns the reconstruction process accordingly. NeRFlex first designs a detail-oriented segmentation module to identify sub-scenes with high-frequency details. For each NeRF network, a lightweight profiler, built on domain knowledge, is used to accurately map configurations to visual quality and memory usage. Based on these insights and the resource constraints on mobile devices, NeRFlex presents a dynamic programming algorithm to efficiently determine configurations for all NeRF representations, despite the NP-hardness of the original decision problem. Extensive experiments on real-world datasets and mobile devices demonstrate that NeRFlex achieves real-time, high-quality rendering on commercial mobile devices.

Updated: 2025-04-04 12:53:33

标题: NeRFlex：移动设备上复杂场景的资源感知实时高质量渲染

摘要: 神经辐射场（NeRF）是一种基于神经网络的先进技术，用于三维重建中的新视图合成。然而，其巨大的计算需求对移动设备上的部署提出了挑战。虽然基于网格的NeRF解决方案已经显示出在移动平台上实现实时渲染的潜力，但在渲染实际复杂场景时往往无法提供高质量的重建。此外，由预先计算的中间结果引起的不可忽略的内存开销使其实际应用变得更加复杂。为了克服这些挑战，我们提出了NeRFlex，这是一个适用于移动设备上复杂场景的资源感知、高分辨率、实时渲染框架。NeRFlex将移动NeRF渲染与多个NeRF表示集成在一起，将场景分解为多个子场景，每个子场景由一个独立的NeRF网络表示。关键是，NeRFlex将内存和计算约束作为一等公民，并相应地重新设计重建过程。NeRFlex首先设计了一个细节导向的分割模块，以识别具有高频细节的子场景。对于每个NeRF网络，基于领域知识构建的轻量级分析器被用于准确映射配置到视觉质量和内存使用。基于这些见解和移动设备上的资源约束，NeRFlex提出了一种动态规划算法，有效确定所有NeRF表示的配置，尽管原始决策问题是NP难的。对真实世界数据集和移动设备进行的大量实验表明，NeRFlex在商用移动设备上实现了实时、高质量的渲染。

更新时间: 2025-04-04 12:53:33

领域: cs.GR,cs.CV,cs.LG,cs.MM,cs.PF

下载: http://arxiv.org/abs/2504.03415v1

PaperBench: Evaluating AI's Ability to Replicate AI Research

We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research. Agents must replicate 20 ICML 2024 Spotlight and Oral papers from scratch, including understanding paper contributions, developing a codebase, and successfully executing experiments. For objective evaluation, we develop rubrics that hierarchically decompose each replication task into smaller sub-tasks with clear grading criteria. In total, PaperBench contains 8,316 individually gradable tasks. Rubrics are co-developed with the author(s) of each ICML paper for accuracy and realism. To enable scalable evaluation, we also develop an LLM-based judge to automatically grade replication attempts against rubrics, and assess our judge's performance by creating a separate benchmark for judges. We evaluate several frontier models on PaperBench, finding that the best-performing tested agent, Claude 3.5 Sonnet (New) with open-source scaffolding, achieves an average replication score of 21.0\%. Finally, we recruit top ML PhDs to attempt a subset of PaperBench, finding that models do not yet outperform the human baseline. We \href{https://github.com/openai/preparedness}{open-source our code} to facilitate future research in understanding the AI engineering capabilities of AI agents.

Updated: 2025-04-04 12:44:57

标题: PaperBench：评估人工智能复制人工智能研究的能力

摘要: 我们介绍了PaperBench，这是一个评估AI代理能力的基准，用于复制最新的AI研究成果。代理必须从零开始复制20篇来自ICML 2024的Spotlight和Oral论文，包括理解论文的贡献、开发代码库和成功执行实验。为了客观评估，我们开发了一套分级评分标准，将每个复制任务分解为更小的子任务，具有明确的评分标准。总共，PaperBench包含8,316个可单独评分的任务。评分标准与每篇ICML论文的作者共同开发，以确保准确性和现实性。为了实现可扩展评估，我们还开发了一个基于LLM的评委系统，自动根据评分标准对复制尝试进行评分，并通过创建一个专门的评委基准来评估我们的评委系统的表现。我们在PaperBench上评估了几种前沿模型，发现经过测试的表现最佳的代理模型Claude 3.5 Sonnet（新版）在开源支架的帮助下，平均复制得分为21.0\%。最后，我们招募顶尖的机器学习博士生尝试了PaperBench的一个子集，发现模型尚未超越人类基准。我们开源我们的代码，以促进未来对AI代理的AI工程能力的研究。

更新时间: 2025-04-04 12:44:57

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.01848v2

An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation

We consider a bilevel learning framework for learning linear operators. In this framework, the learnable parameters are optimized via a loss function that also depends on the minimizer of a convex optimization problem (denoted lower-level problem). We utilize an iterative algorithm called `piggyback' to compute the gradient of the loss and minimizer of the lower-level problem. Given that the lower-level problem is solved numerically, the loss function and thus its gradient can only be computed inexactly. To estimate the accuracy of the computed hypergradient, we derive an a-posteriori error bound, which provides guides for setting the tolerance for the lower-level problem, as well as the piggyback algorithm. To efficiently solve the upper-level optimization, we also propose an adaptive method for choosing a suitable step-size. To illustrate the proposed method, we consider a few learned regularizer problems, such as training an input-convex neural network.

Updated: 2025-04-04 12:27:11

标题: 一种自适应不精确的双层学习方法：使用原始-对偶风格的微分化

摘要: 我们考虑了一个用于学习线性算子的双层学习框架。在这个框架中，可学习参数通过一个损失函数进行优化，该损失函数还依赖于一个凸优化问题的最小化器（称为下层问题）。我们利用一个名为“piggyback”的迭代算法来计算损失的梯度和下层问题的最小化器。考虑到下层问题是通过数值方法求解的，损失函数及其梯度只能以近似方式计算。为了估计计算得到的超梯度的准确性，我们推导了一个后验误差界，为设定下层问题的容差以及piggyback算法提供指导。为了高效地解决上层优化问题，我们还提出了一种自适应方法来选择合适的步长。为了说明所提出的方法，我们考虑了一些学习到的正则化问题，比如训练一个输入凸神经网络。

更新时间: 2025-04-04 12:27:11

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2412.06436v2

Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating a gap that hinders the application and evaluation of methods from one to the other. To bridge the gap, in this work, we present a \underline{\textbf{U}}nified \underline{\textbf{B}}enchmark for unsupervised \underline{\textbf{G}}raph-level \underline{\textbf{O}}OD and anoma\underline{\textbf{L}}y \underline{\textbf{D}}etection (\ourmethod), a comprehensive evaluation framework that unifies GLAD and GLOD under the concept of generalized graph-level OOD detection. Our benchmark encompasses 35 datasets spanning four practical anomaly and OOD detection scenarios, facilitating the comparison of 18 representative GLAD/GLOD methods. We conduct multi-dimensional analyses to explore the effectiveness, OOD sensitivity spectrum, robustness, and efficiency of existing methods, shedding light on their strengths and limitations. Furthermore, we provide an open-source codebase (https://github.com/UB-GOLD/UB-GOLD) of \ourmethod to foster reproducible research and outline potential directions for future investigations based on our insights.

Updated: 2025-04-04 12:19:21

标题: 统一无监督图级异常检测和超出分布检测：一个基准测试

摘要: 为了构建安全可靠的图机器学习系统，最近几年，无监督图级异常检测（GLAD）和无监督图级离群检测（OOD）引起了广泛关注。虽然这两个研究方向确实有着相同的目标，但由于不同的评估设置，在社区中它们被独立研究，造成了一个阻碍方法从一个领域应用到另一个领域的差距。为了弥合这个差距，在这项工作中，我们提出了一个统一的无监督图级OOD和异常检测（UB-GOLD）基准，这是一个综合评估框架，将GLAD和GLOD统一到广义图级OOD检测的概念下。我们的基准包含35个数据集，涵盖四种实际的异常和OOD检测场景，便于比较18种代表性的GLAD/GLOD方法。我们进行多维分析，探索现有方法的有效性、OOD敏感性谱、鲁棒性和效率，揭示它们的优势和局限性。此外，我们提供了一个开源代码库（https://github.com/UB-GOLD/UB-GOLD）来促进可重复性研究，并根据我们的见解概述了未来研究的潜在方向。

更新时间: 2025-04-04 12:19:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.15523v2

Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs

Self-driving laboratories have begun to replace human experimenters in performing single experimental skills or predetermined experimental protocols. However, as the pace of idea iteration in scientific research has been intensified by Artificial Intelligence, the demand for rapid design of new protocols for new discoveries become evident. Efforts to automate protocol design have been initiated, but the capabilities of knowledge-based machine designers, such as Large Language Models, have not been fully elicited, probably for the absence of a systematic representation of experimental knowledge, as opposed to isolated, flatten pieces of information. To tackle this issue, we propose a multi-faceted, multi-scale representation, where instance actions, generalized operations, and product flow models are hierarchically encapsulated using Domain-Specific Languages. We further develop a data-driven algorithm based on non-parametric modeling that autonomously customizes these representations for specific domains. The proposed representation is equipped with various machine designers to manage protocol design tasks, including planning, modification, and adjustment. The results demonstrate that the proposed method could effectively complement Large Language Models in the protocol design process, serving as an auxiliary module in the realm of machine-assisted scientific exploration.

Updated: 2025-04-04 12:05:15

标题: 自动驾驶实验室中协议设计的分层封装表示

摘要: 自动化实验室已经开始取代人类实验者执行单个实验技能或预定实验方案。然而，随着人工智能加速科学研究中的想法迭代速度，对于快速设计新方案以发现新知识的需求变得明显。自动化方案设计的努力已经开始，但知识型机器设计者的能力，如大型语言模型，可能尚未完全发挥，可能是因为缺乏系统性的实验知识表示，而不是孤立的、扁平的信息片段。为了解决这个问题，我们提出了一个多方面、多尺度的表示，其中实例操作、广义操作和产品流模型使用特定领域语言进行层次化封装。我们进一步开发了基于非参数建模的数据驱动算法，可以自动为特定领域定制这些表示。所提出的表示配备了各种机器设计者，可以管理方案设计任务，包括规划、修改和调整。结果表明，所提出的方法在方案设计过程中能够有效地辅助大型语言模型，作为机器辅助科学探索领域中的辅助模块。

更新时间: 2025-04-04 12:05:15

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2504.03810v1

Assessing SPARQL capabilities of Large Language Models

The integration of Large Language Models (LLMs) with Knowledge Graphs (KGs) offers significant synergistic potential for knowledge-driven applications. One possible integration is the interpretation and generation of formal languages, such as those used in the Semantic Web, with SPARQL being a core technology for accessing KGs. In this paper, we focus on measuring out-of-the box capabilities of LLMs to work with SPARQL and more specifically with SPARQL SELECT queries applying a quantitative approach. We implemented various benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation with several LLMs. The tasks assess capabilities along the dimensions of syntax, semantic read, semantic create, and the role of knowledge graph prompt inclusion. With this new benchmarking tasks, we evaluated a selection of GPT, Gemini, and Claude models. Our findings indicate that working with SPARQL SELECT queries is still challenging for LLMs and heavily depends on the specific LLM as well as the complexity of the task. While fixing basic syntax errors seems to pose no problems for the best of the current LLMs evaluated, creating semantically correct SPARQL SELECT queries is difficult in several cases.

Updated: 2025-04-04 11:59:49

标题: 评估大型语言模型的SPARQL能力

摘要: 将大型语言模型（LLMs）与知识图谱（KGs）集成，为基于知识的应用提供了显著的协同潜力。一种可能的集成方式是解释和生成形式语言，比如在语义网中使用的那些，SPARQL是访问知识图谱的核心技术之一。本文重点衡量了LLMs与SPARQL以及更具体地说是SPARQL SELECT查询的开箱即用能力，并应用了定量方法。我们在LLM-KG-Bench框架中实施了各种基准测试任务，以进行自动执行和评估，使用了几种LLMs。这些任务评估了在语法、语义读取、语义生成以及知识图谱提示包含等维度上的能力。通过这些新的基准测试任务，我们评估了一些GPT、Gemini和Claude模型。我们的研究结果表明，对LLMs来说，使用SPARQL SELECT查询仍然具有挑战性，并且在很大程度上取决于具体的LLM以及任务的复杂性。虽然修复基本语法错误似乎对于当前评估中最好的LLMs之一来说并不成问题，但在几种情况下创建语义正确的SPARQL SELECT查询是困难的。

更新时间: 2025-04-04 11:59:49

领域: cs.DB,cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2409.05925v2

BitHEP -- The Limits of Low-Precision ML in HEP

The increasing complexity of modern neural network architectures demands fast and memory-efficient implementations to mitigate computational bottlenecks. In this work, we evaluate the recently proposed BitNet architecture in HEP applications, assessing its performance in classification, regression, and generative modeling tasks. Specifically, we investigate its suitability for quark-gluon discrimination, SMEFT parameter estimation, and detector simulation, comparing its efficiency and accuracy to state-of-the-art methods. Our results show that while BitNet consistently performs competitively in classification tasks, its performance in regression and generation varies with the size and type of the network, highlighting key limitations and potential areas for improvement.

Updated: 2025-04-04 11:57:59

标题: BitHEP - 高能物理中低精度机器学习的限制

摘要: 现代神经网络架构日益复杂，要求快速且内存高效的实现以缓解计算瓶颈。在这项工作中，我们评估了最近提出的BitNet架构在HEP应用中的表现，评估其在分类、回归和生成建模任务中的性能。具体而言，我们调查了它在夸克胶子区分、SMEFT参数估计和探测器模拟方面的适用性，将其效率和准确性与最先进的方法进行比较。我们的结果显示，虽然BitNet在分类任务中表现竞争力强，但在回归和生成方面的表现因网络的大小和类型而异，突出了关键的局限性和改进的潜在领域。

更新时间: 2025-04-04 11:57:59

领域: cs.LG,hep-ex,hep-ph

下载: http://arxiv.org/abs/2504.03387v1

Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination

The growth in prominence of large language models (LLMs) in everyday life can be largely attributed to their generative abilities, yet some of this is also owed to the risks and costs associated with their use. On one front is their tendency to hallucinate false or misleading information, limiting their reliability. On another is the increasing focus on the computational limitations associated with traditional self-attention based LLMs, which has brought about new alternatives, in particular recurrent models, meant to overcome them. Yet it remains uncommon to consider these two concerns simultaneously. Do changes in architecture exacerbate/alleviate existing concerns about hallucinations? Do they affect how and where they occur? Through an extensive evaluation, we study how these architecture-based inductive biases affect the propensity to hallucinate. While hallucination remains a general phenomenon not limited to specific architectures, the situations in which they occur and the ease with which specific types of hallucinations can be induced can significantly differ based on the model architecture. These findings highlight the need for better understanding both these problems in conjunction with each other, as well as consider how to design more universal techniques for handling hallucinations.

Updated: 2025-04-04 11:55:58

标题: 机器蛇是否像电子羊一样做梦？探究建筑归纳偏见对幻觉的影响

摘要: 大型语言模型（LLMs）在日常生活中的显著增长主要归功于它们的生成能力，但也部分归因于与其使用相关的风险和成本。一方面是它们倾向于产生虚假或误导性信息，从而限制了它们的可靠性。另一方面是对传统自注意力型LLMs相关的计算限制日益关注，这导致了新的替代方案的出现，尤其是递归模型，旨在克服这些限制。然而，很少同时考虑这两个问题。架构的变化是否加剧/减轻了关于幻觉的现有担忧？它们是否影响幻觉的发生方式和位置？通过广泛评估，我们研究了这些基于架构的归纳偏见如何影响产生幻觉的倾向。虽然幻觉仍然是一种普遍现象，不局限于特定架构，但根据模型架构，它们发生的情况以及特定类型的幻觉易于诱发的程度可能会有显著差异。这些发现强调了需要更好地理解这两个问题如何相互关联，以及考虑如何设计更普遍的技术来处理幻觉。

更新时间: 2025-04-04 11:55:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17477v4

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges

Moral values have deep roots in early civilizations, codified within norms and laws that regulated societal order and the common good. They play a crucial role in understanding the psychological basis of human behavior and cultural orientation. The Moral Foundation Theory (MFT) is a well-established framework that identifies the core moral foundations underlying the manner in which different cultures shape individual and social lives. Recent advancements in natural language processing, particularly Pre-trained Language Models (PLMs), have enabled the extraction and analysis of moral dimensions from textual data. This survey presents a comprehensive review of MFT-informed PLMs, providing an analysis of moral tendencies in PLMs and their application in the context of the MFT. We also review relevant datasets and lexicons and discuss trends, limitations, and future directions. By providing a structured overview of the intersection between PLMs and MFT, this work bridges moral psychology insights within the realm of PLMs, paving the way for further research and development in creating morally aware AI systems.

Updated: 2025-04-04 11:52:55

标题: 对道德基础理论和预训练语言模型的调查：当前进展和挑战

摘要: 道德价值在早期文明中根深蒂固，在规范社会秩序和共同利益的规范和法律中得到了 codified。它们在理解人类行为和文化取向的心理基础中起着至关重要的作用。道德基础理论（MFT）是一个既成的框架，可识别不同文化塑造个人和社会生活方式的核心道德基础。近年来，自然语言处理方面的进展，特别是预训练语言模型（PLMs），已经实现了从文本数据中提取和分析道德维度。本调查提供了MFT指导的PLMs的全面审查，分析了PLMs中的道德倾向及其在MFT背景下的应用。我们还审查了相关数据集和词典，并讨论了趋势、限制和未来方向。通过提供PLMs和MFT之间交叉的结构化概述，本研究在PLMs领域内建立了道德心理学见解的桥梁，为进一步研究和开发创造具有道德意识的人工智能系统铺平了道路。

更新时间: 2025-04-04 11:52:55

领域: cs.CL,cs.AI,cs.CY,cs.DL,cs.HC

下载: http://arxiv.org/abs/2409.13521v2

Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning

Reasoning-Oriented Reinforcement Learning (RORL) enhances the reasoning ability of Large Language Models (LLMs). However, due to the sparsity of rewards in RORL, effective training is highly dependent on the selection of problems of appropriate difficulty. Although curriculum learning attempts to address this by adjusting difficulty, it often relies on static schedules, and even recent online filtering methods lack theoretical grounding and a systematic understanding of their effectiveness. In this work, we theoretically and empirically show that curating the batch with the problems that the training model achieves intermediate accuracy on the fly can maximize the effectiveness of RORL training, namely balanced online difficulty filtering. We first derive that the lower bound of the KL divergence between the initial and the optimal policy can be expressed with the variance of the sampled accuracy. Building on those insights, we show that balanced filtering can maximize the lower bound, leading to better performance. Experimental results across five challenging math reasoning benchmarks show that balanced online filtering yields an additional 10% in AIME and 4% improvements in average over plain GRPO. Moreover, further analysis shows the gains in sample efficiency and training time efficiency, exceeding the maximum reward of plain GRPO within 60% training time and the volume of the training set.

Updated: 2025-04-04 11:52:05

标题: 在线困难过滤对面向推理的强化学习

摘要: 理性导向的强化学习（RORL）提升了大型语言模型（LLMs）的推理能力。然而，由于RORL中奖励的稀缺性，有效训练高度依赖于选择适当难度的问题。虽然课程学习试图通过调整难度来解决这个问题，但通常依赖静态时间表，甚至最近的在线过滤方法缺乏理论基础和对其有效性的系统理解。在这项工作中，我们在理论和实证上展示，通过即时对训练模型达到中间准确度的问题进行批处理筛选，可以最大化RORL训练的效果，即平衡在线难度过滤。我们首先推导出初始策略和最优策略之间KL散度的下界可以用样本准确度的方差表示。基于这些洞见，我们展示了平衡过滤可以最大化下界，从而提高性能。在五个具有挑战性的数学推理基准测试中的实验结果显示，平衡在线过滤可以使AIME提高10％，平均提高4％，而且进一步分析显示采样效率和训练时间效率的增益，可以在60％的训练时间内超过普通GRPO的最大奖励和训练集的量。

更新时间: 2025-04-04 11:52:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.03380v1

Drawing a Map of Elections

Our main contribution is the introduction of the map of elections framework. A map of elections consists of three main elements: (1) a dataset of elections (i.e., collections of ordinal votes over given sets of candidates), (2) a way of measuring similarities between these elections, and (3) a representation of the elections in the 2D Euclidean space as points, so that the more similar two elections are, the closer are their points. In our maps, we mostly focus on datasets of synthetic elections, but we also show an example of a map over real-life ones. To measure similarities, we would have preferred to use, e.g., the isomorphic swap distance, but this is infeasible due to its high computational complexity. Hence, we propose polynomial-time computable positionwise distance and use it instead. Regarding the representations in 2D Euclidean space, we mostly use the Kamada-Kawai algorithm, but we also show two alternatives. We develop the necessary theoretical results to form our maps and argue experimentally that they are accurate and credible. Further, we show how coloring the elections in a map according to various criteria helps in analyzing results of a number of experiments. In particular, we show colorings according to the scores of winning candidates or committees, running times of ILP-based winner determination algorithms, and approximation ratios achieved by particular algorithms.

Updated: 2025-04-04 11:44:56

标题: 绘制选举地图

摘要: 我们的主要贡献是引入选举地图框架。选举地图由三个主要元素组成：（1）选举数据集（即在给定候选人集合上的序数投票集合），（2）衡量这些选举之间相似性的方法，以及（3）将选举在二维欧几里德空间中表示为点，使得两个选举越相似，它们的点就越接近。在我们的地图中，我们主要关注合成选举数据集，但我们也展示了一个实际选举数据集的示例。为了衡量相似性，我们本来更愿意使用同构交换距离，但由于其高计算复杂性，这是不可行的。因此，我们提出了可多项式时间计算的位置距离，并使用它代替。关于在二维欧几里德空间中的表示，我们主要使用Kamada-Kawai算法，但我们也展示了两种替代方案。我们开发了必要的理论结果来形成我们的地图，并通过实验证明它们是准确可信的。此外，我们展示了根据各种标准对地图中的选举进行着色如何有助于分析多个实验的结果。特别是，我们展示了根据获胜候选人或委员会的得分、基于ILP的获胜者确定算法的运行时间以及特定算法实现的近似比率来对选举进行着色。

更新时间: 2025-04-04 11:44:56

领域: cs.MA,cs.AI,cs.GT

下载: http://arxiv.org/abs/2504.03809v1

Are clinicians ethically obligated to disclose their use of medical machine learning systems to patients?

It is commonly accepted that clinicians are ethically obligated to disclose their use of medical machine learning systems to patients, and that failure to do so would amount to a moral fault for which clinicians ought to be held accountable. Call this "the disclosure thesis." Four main arguments have been, or could be, given to support the disclosure thesis in the ethics literature: the risk-based argument, the rights-based argument, the materiality argument, and the autonomy argument. In this article, I argue that each of these four arguments are unconvincing, and therefore, that the disclosure thesis ought to be rejected. I suggest that mandating disclosure may also even risk harming patients by providing stakeholders with a way to avoid accountability for harm that results from improper applications or uses of these systems.

Updated: 2025-04-04 11:35:58

标题: 临床医生是否有道德义务向患者披露他们使用医疗机器学习系统的情况？

摘要: 医疗机器学习系统的使用应该向患者披露是伦理义务，而未能这样做将构成道德错误，医生应当对此负责。将此称为“披露命题”。伦理学文献中已提出或可能提出支持披露命题的四个主要论点：基于风险的论点、基于权利的论点、实质性论点和自主性论点。在本文中，我认为这四个论点都不令人信服，因此应该拒绝披露命题。我认为强制披露可能甚至会冒着伤害患者的风险，因为这会为利益相关者提供一种避免因不当应用或使用这些系统而导致的伤害而不承担责任的途径。

更新时间: 2025-04-04 11:35:58

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.01043v2

SoK: Attacks on Modern Card Payments

EMV is the global standard for smart card payments and its NFC-based version, EMV contactless, is widely used, also for mobile payments. In this systematization of knowledge, we examine attacks on the EMV contactless protocol. We provide a comprehensive framework encompassing its desired security properties and adversary models. We also identify and categorize a comprehensive collection of protocol flaws and show how different subsets thereof can be combined into attacks. In addition to this systematization, we examine the underlying reasons for the many attacks against EMV and point to a better way forward.

Updated: 2025-04-04 11:34:01

标题: SoK: 现代信用卡支付的攻击

摘要: EMV是智能卡支付的全球标准，其基于NFC的版本EMV无接触也被广泛使用，包括移动支付。在这篇知识系统化的论文中，我们研究了针对EMV无接触协议的攻击。我们提供了一个涵盖其期望安全性属性和对手模型的全面框架。我们还识别并分类了一系列协议缺陷，并展示了不同子集如何组合成攻击。除了这种系统化，我们还研究了针对EMV的许多攻击的根本原因，并指出了更好的前进方式。

更新时间: 2025-04-04 11:34:01

领域: cs.CR

下载: http://arxiv.org/abs/2504.03363v1

GPTQv2: Efficient Finetuning-Free Quantization for Asymmetric Calibration

We introduce GPTQv2, a novel finetuning-free quantization method for compressing large-scale transformer architectures. Unlike the previous GPTQ method, which independently calibrates each layer, we always match the quantized layer's output to the exact output in the full-precision model, resulting in a scheme that we call asymmetric calibration. Such a scheme can effectively reduce the quantization error accumulated in previous layers. We analyze this problem using optimal brain compression to derive a close-formed solution. The new solution explicitly minimizes the quantization error as well as the accumulated asymmetry error. Furthermore, we utilize various techniques to parallelize the solution calculation, including channel parallelization, neuron decomposition, and Cholesky reformulation for matrix fusion. As a result, GPTQv2 is easy to implement, simply using 20 more lines of code than GPTQ but improving its performance under low-bit quantization. Remarkably, on a single GPU, we quantize a 405B language transformer as well as EVA-02 the rank first vision transformer that achieves 90% pretraining Imagenet accuracy. Code is available at github.com/Intelligent-Computing-Lab-Yale/GPTQv2.

Updated: 2025-04-04 11:31:54

标题: GPTQv2：无需微调的高效非对称校准量化

摘要: 我们介绍了GPTQv2，这是一种新颖的无需微调的量化方法，用于压缩大规模的变压器架构。与以前的GPTQ方法不同，该方法独立校准每一层，我们总是将量化层的输出与全精度模型中的精确输出匹配，从而产生一种我们称之为不对称校准的方案。这种方案可以有效地减少在先前层中累积的量化误差。我们使用最佳脑压缩来分析这个问题，以导出一个封闭形式的解决方案。新解决方案明确地最小化了量化误差以及累积的不对称误差。此外，我们利用各种技术来并行计算解决方案，包括通道并行化，神经元分解以及用于矩阵融合的Cholesky重组。结果，GPTQv2易于实现，仅需比GPTQ多使用20行代码，但在低比特量化下提高了性能。值得注意的是，在单个GPU上，我们对405B语言变压器以及EVA-02进行了量化，它是实现了90%预训练Imagenet准确率的第一名视觉变压器。代码可在github.com/Intelligent-Computing-Lab-Yale/GPTQv2找到。

更新时间: 2025-04-04 11:31:54

领域: cs.LG

下载: http://arxiv.org/abs/2504.02692v2

Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency

Deploying Large Language Models (LLMs) on edge devices presents significant challenges due to computational constraints, memory limitations, inference speed, and energy consumption. Model quantization has emerged as a key technique to enable efficient LLM inference by reducing model size and computational overhead. In this study, we conduct a comprehensive analysis of 28 quantized LLMs from the Ollama library, which applies by default Post-Training Quantization (PTQ) and weight-only quantization techniques, deployed on an edge device (Raspberry Pi 4 with 4GB RAM). We evaluate energy efficiency, inference performance, and output accuracy across multiple quantization levels and task types. Models are benchmarked on five standardized datasets (CommonsenseQA, BIG-Bench Hard, TruthfulQA, GSM8K, and HumanEval), and we employ a high-resolution, hardware-based energy measurement tool to capture real-world power consumption. Our findings reveal the trade-offs between energy efficiency, inference speed, and accuracy in different quantization settings, highlighting configurations that optimize LLM deployment for resource-constrained environments. By integrating hardware-level energy profiling with LLM benchmarking, this study provides actionable insights for sustainable AI, bridging a critical gap in existing research on energy-aware LLM deployment.

Updated: 2025-04-04 11:29:30

标题: 可持续的边缘人工智能推理：评估量化LLM对能效、输出准确性和推理延迟的影响

摘要: 在边缘设备上部署大型语言模型(LLMs)存在着显著的挑战，主要包括计算约束、内存限制、推理速度和能源消耗。模型量化已经成为一种关键技术，通过减少模型大小和计算开销来实现高效的LLM推理。本研究对Ollama库中的28个量化LLMs进行了全面分析，该库默认应用后训练量化(PTQ)和仅权重量化技术，部署在一台边缘设备上（具有4GB RAM的树莓派4）。我们评估了在多个量化级别和任务类型下的能源效率、推理性能和输出准确性。模型在五个标准数据集（CommonsenseQA、BIG-Bench Hard、TruthfulQA、GSM8K和HumanEval）上进行了基准测试，并且我们采用了一种高分辨率、基于硬件的能源测量工具来捕捉实际的功耗。我们的研究结果揭示了在不同量化设置下能源效率、推理速度和准确性之间的权衡，突出了优化LLM在资源受限环境中部署的配置。通过将硬件级能源分析与LLM基准测试相结合，本研究为可持续的人工智能提供了可操作的见解，填补了现有关于能源感知LLM部署的研究中的一个关键差距。

更新时间: 2025-04-04 11:29:30

领域: cs.CY,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.03360v1

A metrological framework for uncertainty evaluation in machine learning classification models

Machine learning (ML) classification models are increasingly being used in a wide range of applications where it is important that predictions are accompanied by uncertainties, including in climate and earth observation, medical diagnosis and bioaerosol monitoring. The output of an ML classification model is a type of categorical variable known as a nominal property in the International Vocabulary of Metrology (VIM). However, concepts related to uncertainty evaluation for nominal properties are not defined in the VIM, nor is such evaluation addressed by the Guide to the Expression of Uncertainty in Measurement (GUM). In this paper we propose a metrological conceptual uncertainty evaluation framework for ML classification, and illustrate its use in the context of two applications that exemplify the issues and have significant societal impact, namely, climate and earth observation and medical diagnosis. Our framework would enable an extension of the VIM and GUM to uncertainty for nominal properties, which would make both applicable to ML classification models.

Updated: 2025-04-04 11:28:48

标题: 一个用于机器学习分类模型不确定性评估的计量框架

摘要: 机器学习（ML）分类模型越来越多地被应用在需要预测伴随不确定性的广泛领域，包括气候和地球观测、医学诊断和生物气溶胶监测。ML分类模型的输出是一种国际计量词汇（VIM）中称为名义属性的分类变量。然而，VIM中并未定义与名义属性不确定性评估相关的概念，也没有《测量不确定度表示指南》（GUM）中涉及这种评估。本文提出了一个用于ML分类的计量概念不确定性评估框架，并在气候和地球观测以及医学诊断这两个具有重大社会影响的应用场景中进行了说明。我们的框架将使VIM和GUM得以扩展至名义属性的不确定性，从而使两者适用于ML分类模型。

更新时间: 2025-04-04 11:28:48

领域: cs.LG

下载: http://arxiv.org/abs/2504.03359v1

Decentralized Collective World Model for Emergent Communication and Coordination

We propose a fully decentralized multi-agent world model that enables both symbol emergence for communication and coordinated behavior through temporal extension of collective predictive coding. Unlike previous research that focuses on either communication or coordination separately, our approach achieves both simultaneously. Our method integrates world models with communication channels, enabling agents to predict environmental dynamics, estimate states from partial observations, and share critical information through bidirectional message exchange with contrastive learning for message alignment. Using a two-agent trajectory drawing task, we demonstrate that our communication-based approach outperforms non-communicative models when agents have divergent perceptual capabilities, achieving the second-best coordination after centralized models. Importantly, our distributed approach with constraints preventing direct access to other agents' internal states facilitates the emergence of more meaningful symbol systems that accurately reflect environmental states. These findings demonstrate the effectiveness of decentralized communication for supporting coordination while developing shared representations of the environment.

Updated: 2025-04-04 11:17:52

标题: 去中心化的集体世界模型用于紧急沟通和协调

摘要: 我们提出了一个完全分散的多智能体世界模型，通过集体预测编码的时间扩展实现了符号出现用于通信和协调行为。与之前专注于通信或协调的研究不同，我们的方法同时实现了两者。我们的方法将世界模型与通信渠道整合在一起，使智能体能够预测环境动态，从部分观测中估计状态，并通过双向消息交换共享关键信息，通过对齐消息进行对比学习。通过一个两智能体轨迹绘制任务，我们证明了我们基于通信的方法在智能体具有不同感知能力时优于非通信模型，在集中模型之后实现了第二好的协调。重要的是，我们的分布式方法通过限制直接访问其他智能体的内部状态，促进了更有意义的符号系统的出现，准确反映了环境状态。这些发现证明了分散通信对支持协调以及发展环境共享表示的有效性。

更新时间: 2025-04-04 11:17:52

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2504.03353v1

End-to-end Learning of Sparse Interventions on Activations to Steer Generation

The growing use of generative models in daily life calls for efficient mechanisms to control their generation, to e.g., produce safe content or provide users with tools to explore style changes. Ideally, such mechanisms should be cheap, both at train and inference time, while preserving output quality. Recent research has shown that such mechanisms can be obtained by intervening exclusively on model activations, with the goal of correcting distributional differences between activations seen when using prompts from a source vs. a target set (e.g., toxic and non-toxic sentences). While cheap, these fast methods are inherently crude: their maps are tuned locally, not accounting for their impact on downstream layers, resulting in interventions that cause unintended shifts when used out-of-sample. We propose in this work linear end-to-end activation steering (LinEAS), an approach trained with a global loss that accounts simultaneously for all layerwise distributional shifts. In addition to being more robust, the loss used to train LinEAS can be regularized with sparsifying norms, which can automatically carry out neuron and layer selection. Empirically, LinEAS only requires a handful of samples to be effective, and beats similar baselines on toxicity mitigation, while performing on par with far more involved finetuning approaches. We show that LinEAS interventions can be composed, study the impact of sparsity on their performance, and showcase applications in text-to-image diffusions.

Updated: 2025-04-04 11:17:20

标题: 端到端学习稀疏激活干预以引导生成

摘要: 随着生成模型在日常生活中的不断应用，需要有效的机制来控制它们的生成，例如产生安全内容或为用户提供探索样式变化的工具。理想情况下，这些机制应该在训练和推理时都要廉价，并且保持输出质量。最近的研究表明，可以通过仅对模型激活进行干预来获得这样的机制，目的是纠正使用源集合和目标集合（例如有毒和无毒句子）的提示时看到的激活之间的分布差异。虽然廉价，但这些快速方法本质上是粗糙的：它们的映射在本地调整，未考虑它们对下游层的影响，导致在样本外使用时会引起意外的变化。我们在这项工作中提出了线性端到端激活导向（LinEAS）方法，该方法通过全局损失进行训练，同时考虑所有层面的分布变化。除了更加稳健外，用于训练LinEAS的损失可以通过稀疏化规范化进行正则化，这可以自动进行神经元和层选择。从经验上看，LinEAS只需要少量样本就能发挥作用，并且在减轻毒性方面击败了类似的基线方法，同时在性能上与更复杂的微调方法持平。我们展示了LinEAS干预可以被组合，研究了稀疏性对其性能的影响，并展示了在文本到图像扩散中的应用。

更新时间: 2025-04-04 11:17:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.10679v2

AI Alignment: A Comprehensive Survey

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources.

Updated: 2025-04-04 11:14:49

标题: AI对齐：一项全面调查

摘要: AI对齐旨在使人工智能系统符合人类意图和价值观。随着人工智能系统变得更加强大，不对齐的风险也随之增加。为了全面和及时地了解对齐领域，本调查深入探讨对齐的核心概念、方法论和实践。首先，我们确定四项原则作为AI对齐的关键目标：稳健性、可解释性、可控性和道德性（RICE）。在这四项原则的指导下，我们概述了当前对齐研究的概况，并将其分解为两个关键组成部分：前向对齐和后向对齐。前者旨在通过对齐训练使人工智能系统对齐，而后者旨在获得有关系统对齐的证据，并适当地对其进行管理，以避免加剧不对齐的风险。在前向对齐方面，我们讨论了从反馈中学习和在分布转移下学习的技术。在后向对齐方面，我们讨论了保障技术和治理实践。我们还发布并持续更新网站（www.alignmentsurvey.com），其中包括教程、论文集、博客文章和其他资源。

更新时间: 2025-04-04 11:14:49

领域: cs.AI

下载: http://arxiv.org/abs/2310.19852v6

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

We study the problem of computing deterministic optimal policies for constrained Markov decision processes (MDPs) with continuous state and action spaces, which are widely encountered in constrained dynamical systems. Designing deterministic policy gradient methods in continuous state and action spaces is particularly challenging due to the lack of enumerable state-action pairs and the adoption of deterministic policies, hindering the application of existing policy gradient methods. To this end, we develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence. Specifically, we leverage regularization of the Lagrangian of the constrained MDP to propose a deterministic policy gradient primal-dual (D-PGPD) algorithm that updates the deterministic policy via a quadratic-regularized gradient ascent step and the dual variable via a quadratic-regularized gradient descent step. We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair. We instantiate D-PGPD with function approximation and prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair, up to a function approximation error. Furthermore, we demonstrate the effectiveness of our method in two continuous control problems: robot navigation and fluid control. This appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.

Updated: 2025-04-04 11:14:35

标题: Deterministic Policy Gradient原始-对偶方法用于连续空间约束MDP

摘要: 我们研究了在具有连续状态和动作空间的约束马尔可夫决策过程（MDP）中计算确定性最优策略的问题，这些问题在约束动态系统中经常遇到。在连续状态和动作空间中设计确定性策略梯度方法尤其具有挑战性，因为缺乏可列举的状态-动作对，并且采用确定性策略，阻碍了现有策略梯度方法的应用。为此，我们开发了一种确定性策略梯度原始-对偶方法，以找到一个具有非渐近收敛性的最优确定性策略。具体来说，我们利用约束MDP的拉格朗日函数的正则化提出了一个确定性策略梯度原始-对偶（D-PGPD）算法，通过二次正则化的梯度上升步和对偶变量通过二次正则化的梯度下降步来更新确定性策略。我们证明了D-PGPD的原始-对偶迭代以亚线性速率收敛到一个最优的正则化原始-对偶对。我们使用函数逼近实例化了D-PGPD，并证明D-PGPD的原始-对偶迭代以亚线性速率收敛到一个最优的正则化原始-对偶对，直至函数逼近误差。此外，我们在两个连续控制问题中展示了我们方法的有效性：机器人导航和流体控制。这似乎是第一个为连续空间约束MDP提出确定性策略搜索方法的工作。

更新时间: 2025-04-04 11:14:35

领域: cs.AI,math.OC

下载: http://arxiv.org/abs/2408.10015v2

Solving Finite-Horizon MDPs via Low-Rank Tensors

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary. This aggravates the challenges of high-dimensional MDPs, as they suffer from the curse of dimensionality and high sample complexity. To address these issues, we propose modeling the VFs of finite-horizon MDPs as low-rank tensors, enabling a scalable representation that renders the problem of learning optimal policies tractable. We introduce an optimization-based framework for solving the Bellman equations with low-rank constraints, along with block-coordinate descent (BCD) and block-coordinate gradient descent (BCGD) algorithms, both with theoretical convergence guarantees. For scenarios where the system dynamics are unknown, we adapt the proposed BCGD method to estimate the VFs using sampled trajectories. Numerical experiments further demonstrate that the proposed framework reduces computational demands in controlled synthetic scenarios and more realistic resource allocation problems.

Updated: 2025-04-04 11:10:09

标题: 通过低秩张量解决有限时间跨度的MDPs

摘要: 我们研究了在有限时间段马尔可夫决策过程（MDPs）中使用低秩强化学习（RL）方法学习最优策略的问题。在有限时间段的MDPs中，策略和价值函数（VFs）不是固定的。这加剧了高维MDPs的挑战，因为它们受到维度灾难和高样本复杂性的影响。为了解决这些问题，我们提出将有限时间段MDPs的VFs建模为低秩张量，实现可扩展的表示，使学习最优策略的问题可解。我们引入了一个基于优化的框架来解决带有低秩约束的贝尔曼方程，以及带有理论收敛保证的块坐标下降（BCD）和块坐标梯度下降（BCGD）算法。对于系统动态未知的情况，我们调整了提出的BCGD方法来使用采样轨迹估计VFs。数值实验进一步表明，所提出的框架降低了在受控合成场景和更真实的资源分配问题中的计算需求。

更新时间: 2025-04-04 11:10:09

领域: cs.LG

下载: http://arxiv.org/abs/2501.10598v2

Bayesian LSTM for indoor temperature modeling

Improving energy efficiency of building heating systems is essential for reducing global energy consumption and greenhouse gas emissions. Traditional control methods in buildings rely on static heating curves based solely on outdoor temperature measurements, neglecting system state and free heat sources like solar gain. Model predictive control (MPC) not only addresses these limitations but further optimizes heating control by incorporating weather forecasts and system state predictions. However, current industrial MPC solutions often use simplified physics-inspired models, which compromise accuracy for interpretability. While purely data-driven models offer better predictive performance, they face challenges like overfitting and lack of transparency. To bridge this gap, we propose a Bayesian Long Short-Term Memory (LSTM) architecture for indoor temperature modeling. Our experiments across 100 real-world buildings demonstrate that the Bayesian LSTM outperforms an industrial physics-based model in predictive accuracy, enabling potential for improved energy efficiency and thermal comfort if deployed in heating MPC solutions. Over deterministic black-box approaches, the Bayesian framework provides additional advantages by improving generalization ability and allowing interpretation of predictions via uncertainty quantification. This work advances data-driven heating control by balancing predictive performance with the transparency and reliability required for real-world heating MPC applications.

Updated: 2025-04-04 11:07:23

标题: 贝叶斯LSTM用于室内温度建模

摘要: 提高建筑供暖系统的能效对于减少全球能源消耗和温室气体排放至关重要。传统的建筑控制方法依赖于基于室外温度测量的静态供热曲线，忽视了系统状态和免费供热源（如太阳能增益）。模型预测控制（MPC）不仅解决了这些局限性，而且通过整合天气预报和系统状态预测进一步优化了供暖控制。然而，当前工业MPC解决方案通常使用简化的受物理启发的模型，这种模型在准确性和可解释性之间存在折衷。虽然纯数据驱动模型在预测性能方面表现更好，但面临着过度拟合和缺乏透明性等挑战。为了弥合这一差距，我们提出了一种用于室内温度建模的贝叶斯长短期记忆（LSTM）架构。我们在100栋真实建筑中的实验表明，贝叶斯LSTM在预测准确性方面优于工业物理模型，如果应用于供暖MPC解决方案，可以提高能源效率和热舒适性。与确定性黑盒方法相比，贝叶斯框架通过提高泛化能力并通过不确定性量化允许解释预测，提供了额外的优势。这项工作通过平衡对真实世界供暖MPC应用所需的透明性和可靠性，推动了数据驱动供暖控制。

更新时间: 2025-04-04 11:07:23

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2504.03350v1

Optimizing Password Cracking for Digital Investigations

Efficient password cracking is a critical aspect of digital forensics, enabling investigators to decrypt protected content during criminal investigations. Traditional password cracking methods, including brute-force, dictionary and rule-based attacks face challenges in balancing efficiency with increasing computational complexity. This study explores rule based optimisation strategies to enhance the effectiveness of password cracking while minimising resource consumption. By analysing publicly available password datasets, we propose an optimised rule set that reduces computational iterations by approximately 40%, significantly improving the speed of password recovery. Additionally, the impact of national password recommendations were examined, specifically, the UK National Cyber Security Centre's three word password guideline on password security and forensic recovery. Through user generated password surveys, we evaluate the crackability of three word passwords using dictionaries of varying common word proportions. Results indicate that while three word passwords provide improved memorability and usability, they remain vulnerable when common word combinations are used, with up to 77.5% of passwords cracked using a 30% common word dictionary subset. The study underscores the importance of dynamic password cracking strategies that account for evolving user behaviours and policy driven password structures. Findings contribution to both forensic efficiency and cyber security awareness, highlight the dual impact of password policies on security and investigative capabilities. Future work will focus upon refining rule based cracking techniques and expanding research on password composition trends.

Updated: 2025-04-04 11:03:39

标题: 优化数字调查中的密码破解

摘要: 高效的密码破解是数字取证的一个关键方面，使调查人员能够在刑事调查中解密受保护的内容。传统的密码破解方法，包括暴力破解、字典和基于规则的攻击，在平衡效率和日益增加的计算复杂性方面面临挑战。本研究探讨了基于规则的优化策略，以增强密码破解的效果，同时最大限度地减少资源消耗。通过分析公开可用的密码数据集，我们提出了一个优化的规则集，将计算迭代次数减少约40%，显著提高了密码恢复的速度。此外，还考察了国家密码建议的影响，特别是英国国家网络安全中心关于密码安全和取证恢复的三个词密码指南。通过用户生成的密码调查，我们评估了使用不同常用词比例的字典的三个词密码的可破解性。结果表明，虽然三个词密码提供了更好的记忆性和可用性，但当使用常见词组合时仍然容易受到攻击，使用30%常见词典子集可破解高达77.5%的密码。该研究强调了动态密码破解策略的重要性，这种策略考虑了不断变化的用户行为和政策驱动的密码结构。研究结果对取证效率和网络安全意识做出了贡献，突显了密码政策对安全性和调查能力的双重影响。未来工作将重点关注优化基于规则的破解技术，并扩大对密码组合趋势的研究。

更新时间: 2025-04-04 11:03:39

领域: cs.CR

下载: http://arxiv.org/abs/2504.03347v1

Talk2X -- An Open-Source Toolkit Facilitating Deployment of LLM-Powered Chatbots on the Web

Integrated into websites, LLM-powered chatbots offer alternative means of navigation and information retrieval, leading to a shift in how users access information on the web. Yet, predominantly closed-sourced solutions limit proliferation among web hosts and suffer from a lack of transparency with regard to implementation details and energy efficiency. In this work, we propose our openly available agent Talk2X leveraging an adapted retrieval-augmented generation approach (RAG) combined with an automatically generated vector database, benefiting energy efficiency. Talk2X's architecture is generalizable to arbitrary websites offering developers a ready to use tool for integration. Using a mixed-methods approach, we evaluated Talk2X's usability by tasking users to acquire specific assets from an open science repository. Talk2X significantly improved task completion time, correctness, and user experience supporting users in quickly pinpointing specific information as compared to standard user-website interaction. Our findings contribute technical advancements to an ongoing paradigm shift of how we access information on the web.

Updated: 2025-04-04 10:58:57

标题: Talk2X -- 一个开源工具包，促进LLM驱动的聊天机器人在网络上的部署

摘要: 集成到网站中，LLM 驱动的聊天机器人提供了替代的导航和信息检索方式，导致用户访问网络信息的方式发生了转变。然而，主要是封闭源解决方案限制了在网络主机之间的传播，并且在实施细节和能源效率方面缺乏透明度。在这项工作中，我们提出了我们的开放可用代理 Talk2X，利用了改进的检索增强生成方法（RAG）结合自动生成的向量数据库，有利于能源效率。Talk2X 的架构是通用的，适用于任意网站，为开发人员提供了一个即插即用的工具。通过采用混合方法，我们评估了 Talk2X 的可用性，要求用户从开放科学库中获取特定资产。与标准用户-网站交互相比，Talk2X 显著改善了任务完成时间、正确性和用户体验，支持用户快速准确地定位特定信息。我们的研究结果为正在进行的如何在网络上访问信息的范式转变贡献了技术进步。

更新时间: 2025-04-04 10:58:57

领域: cs.AI,cs.HC,cs.IR

下载: http://arxiv.org/abs/2504.03343v1

LLMs Prompted for Graphs: Hallucinations and Generative Capabilities

Large Language Models (LLMs) are nowadays prompted for a wide variety of tasks. In this article, we investigate their ability in reciting and generating graphs. We first study the ability of LLMs to regurgitate well known graphs from the literature (e.g. Karate club or the graph atlas)4. Secondly, we question the generative capabilities of LLMs by asking for Erdos-Renyi random graphs. As opposed to the possibility that they could memorize some Erdos-Renyi graphs included in their scraped training set, this second investigation aims at studying a possible emergent property of LLMs. For both tasks, we propose a metric to assess their errors with the lens of hallucination (i.e. incorrect information returned as facts). We most notably find that the amplitude of graph hallucinations can characterize the superiority of some LLMs. Indeed, for the recitation task, we observe that graph hallucinations correlate with the Hallucination Leaderboard, a hallucination rank that leverages 10, 000 times more prompts to obtain its ranking. For the generation task, we find surprisingly good and reproducible results in most of LLMs. We believe this to constitute a starting point for more in-depth studies of this emergent capability and a challenging benchmark for their improvements. Altogether, these two aspects of LLMs capabilities bridge a gap between the network science and machine learning communities.

Updated: 2025-04-04 10:58:40

标题: LLMs针对图表的促发：幻觉和生成能力

摘要: 大型语言模型（LLMs）如今被用于各种各样的任务。在本文中，我们调查它们在背诵和生成图形方面的能力。我们首先研究LLMs从文献中（例如Karate俱乐部或图谱）背诵出众所周知的图形的能力。其次，我们通过要求Erdos-Renyi随机图来质疑LLMs的生成能力。与它们可能记住在其训练集中包含的一些Erdos-Renyi图形不同，这第二项调查旨在研究LLMs的可能出现的特性。对于这两个任务，我们提出了一种用于评估它们错误的度量标准，通过错觉的角度（即返回不正确的信息作为事实）。我们特别发现，图形错觉的幅度可以表征某些LLMs的优越性。事实上，在背诵任务中，我们观察到图形错觉与错觉排行榜相关，这是一个利用10,000倍的提示来获得其排名的错觉排名。对于生成任务，我们发现大多数LLMs的结果出奇地好且可复制。我们相信这构成了更深入研究这种新兴能力的起点，并且对于它们的改进构成了一个具有挑战性的基准。总的来说，LLMs能力的这两个方面弥合了网络科学和机器学习社群之间的鸿沟。

更新时间: 2025-04-04 10:58:40

领域: cs.CL,cs.AI,cs.SI

下载: http://arxiv.org/abs/2409.00159v3

EOOD: Entropy-based Out-of-distribution Detection

Deep neural networks (DNNs) often exhibit overconfidence when encountering out-of-distribution (OOD) samples, posing significant challenges for deployment. Since DNNs are trained on in-distribution (ID) datasets, the information flow of ID samples through DNNs inevitably differs from that of OOD samples. In this paper, we propose an Entropy-based Out-Of-distribution Detection (EOOD) framework. EOOD first identifies specific block where the information flow differences between ID and OOD samples are more pronounced, using both ID and pseudo-OOD samples. It then calculates the conditional entropy on the selected block as the OOD confidence score. Comprehensive experiments conducted across various ID and OOD settings demonstrate the effectiveness of EOOD in OOD detection and its superiority over state-of-the-art methods.

Updated: 2025-04-04 10:57:03

标题: EOOD：基于熵的异常检测

摘要: 深度神经网络（DNNs）在遇到超出分布（OOD）样本时经常表现出过度自信，给部署带来了重大挑战。由于DNNs是在分布（ID）数据集上训练的，因此ID样本通过DNNs的信息流与OOD样本不可避免地有所不同。本文提出了一种基于熵的Out-Of-distribution Detection（EOOD）框架。EOOD首先通过使用ID和伪OOD样本识别出信息流在特定区块中ID和OOD样本之间的差异更加明显的地方。然后，它计算所选区块上的条件熵作为OOD置信度得分。在各种ID和OOD设置下进行的全面实验表明了EOOD在OOD检测中的有效性以及其优于现有方法的优越性。

更新时间: 2025-04-04 10:57:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03342v1

PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects and their parameters from images using interpretable shape programs. Unlike traditional CAD model retrieval, shape programs allow reasoning about semantic parameters, editing, and a low memory footprint. Despite their potential, shape programs for 3D scene understanding have been largely overlooked. Our key contribution is enabling gradient-based optimization by parsing shape programs, or more precisely procedural models designed in Blender, into efficient PyTorch code. While there are many possible applications of our PyTochGeoNodes, we show that a combination of PyTorchGeoNodes with genetic algorithm is a method of choice to optimize both discrete and continuous shape program parameters for 3D reconstruction and understanding of 3D object parameters. Our modular framework can be further integrated with other reconstruction algorithms, and we demonstrate one such integration to enable procedural Gaussian splatting. Our experiments on the ScanNet dataset show that our method achieves accurate reconstructions while enabling, until now, unseen level of 3D scene understanding.

Updated: 2025-04-04 10:54:29

标题: PyTorchGeoNodes：实现可微分的三维形状重建程序

摘要: 我们提出了PyTorchGeoNodes，这是一个可微分模块，用于通过可解释的形状程序从图像中重建3D对象及其参数。与传统的CAD模型检索不同，形状程序允许推理语义参数、编辑，并具有低内存占用。尽管具有潜力，但用于3D场景理解的形状程序在很大程度上被忽视。我们的主要贡献是通过将在Blender中设计的过程模型解析为高效的PyTorch代码，从而实现基于梯度的优化。尽管我们的PyTochGeoNodes有许多可能的应用，但我们展示了将PyTorchGeoNodes与遗传算法结合使用是优化3D重建和理解3D对象参数的首选方法。我们的模块化框架可以进一步与其他重建算法集成，并且我们展示了一种这样的集成以实现过程高斯分割。我们在ScanNet数据集上的实验表明，我们的方法实现了准确的重建，同时实现了迄今为止未曾见到的3D场景理解水平。

更新时间: 2025-04-04 10:54:29

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.10620v2

Temporal-contextual Event Learning for Pedestrian Crossing Intent Prediction

Ensuring the safety of vulnerable road users through accurate prediction of pedestrian crossing intention (PCI) plays a crucial role in the context of autonomous and assisted driving. Analyzing the set of observation video frames in ego-view has been widely used in most PCI prediction methods to forecast the cross intent. However, they struggle to capture the critical events related to pedestrian behaviour along the temporal dimension due to the high redundancy of the video frames, which results in the sub-optimal performance of PCI prediction. Our research addresses the challenge by introducing a novel approach called \underline{T}emporal-\underline{c}ontextual Event \underline{L}earning (TCL). The TCL is composed of the Temporal Merging Module (TMM), which aims to manage the redundancy by clustering the observed video frames into multiple key temporal events. Then, the Contextual Attention Block (CAB) is employed to adaptively aggregate multiple event features along with visual and non-visual data. By synthesizing the temporal feature extraction and contextual attention on the key information across the critical events, TCL can learn expressive representation for the PCI prediction. Extensive experiments are carried out on three widely adopted datasets, including PIE, JAAD-beh, and JAAD-all. The results show that TCL substantially surpasses the state-of-the-art methods. Our code can be accessed at https://github.com/dadaguailhb/TCL.

Updated: 2025-04-04 10:44:24

标题: 时间上下文事件学习用于行人过马路意图预测

摘要: 确保易受伤害的道路使用者通过准确预测行人过马路意图（PCI）在自动驾驶和辅助驾驶的背景下起着至关重要的作用。分析自我视角中的一组观察视频帧在大多数PCI预测方法中被广泛使用，以预测过马路意图。然而，由于视频帧的高冗余性，它们很难捕捉与行人行为相关的关键事件沿时间维度，这导致PCI预测的性能次优。我们的研究通过引入一种名为\textbf{T}emporal-\textbf{c}ontextual Event \textbf{L}earning（TCL）的新方法来解决这一挑战。TCL由时间合并模块（TMM）组成，旨在通过将观察到的视频帧聚类成多个关键时间事件来管理冗余性。然后，采用上下文注意块（CAB）来自适应地聚合多个事件特征以及视觉和非视觉数据。通过在关键事件中综合时间特征提取和上下文关注，TCL可以为PCI预测学习表达丰富的表示。我们在包括PIE、JAAD-beh和JAAD-all在内的三个广泛采用的数据集上进行了大量实验。结果表明，TCL显著超越了现有技术方法。我们的代码可以在https://github.com/dadaguailhb/TCL上访问。

更新时间: 2025-04-04 10:44:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06292v1

Minority-Focused Text-to-Image Generation via Prompt Optimization

We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretrained T2I diffusion models primarily focus on high-density regions, largely due to the influence of guided samplers (like CFG) that are essential for high-quality generation. To address this, we present a novel framework to counter the high-density-focus of T2I diffusion models. Specifically, we first develop an online prompt optimization framework that encourages emergence of desired properties during inference while preserving semantic contents of user-provided prompts. We subsequently tailor this generic prompt optimizer into a specialized solver that promotes generation of minority features by incorporating a carefully-crafted likelihood objective. Extensive experiments conducted across various types of T2I models demonstrate that our approach significantly enhances the capability to produce high-quality minority instances compared to existing samplers. Code is available at https://github.com/soobin-um/MinorityPrompt.

Updated: 2025-04-04 10:37:36

标题: 通过提示优化实现针对少数群体的文本到图像生成

摘要: 我们调查了使用预训练文本到图像（T2I）潜在扩散模型生成少数样本的方法。在T2I生成的背景下，少数实例可以被定义为存在于文本条件数据分布的低密度区域上的实例。它们对于现代T2I生成器的各种应用非常有价值，例如数据增强和创意人工智能。不幸的是，现有的预训练T2I扩散模型主要集中在高密度区域，这在很大程度上是由于引导采样器（如CFG）的影响，这些采样器对于高质量生成是必不可少的。为了解决这个问题，我们提出了一个新颖的框架来抵消T2I扩散模型的高密度关注。具体而言，我们首先开发了一个在线提示优化框架，鼓励在推理过程中出现所需的特性，同时保留用户提供提示的语义内容。然后，我们将这个通用提示优化器定制为一个专门的求解器，通过结合精心设计的似然目标促进少数特征的生成。在各种类型的T2I模型上进行的大量实验表明，与现有的采样器相比，我们的方法显著增强了生成高质量少数实例的能力。代码可在https://github.com/soobin-um/MinorityPrompt找到。

更新时间: 2025-04-04 10:37:36

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.07838v3

Data Augmentation of Time-Series Data in Human Movement Biomechanics: A Scoping Review

The integration of machine learning and deep learning has transformed data analytics in biomechanics, enabled by extensive wearable sensor data. However, the field faces challenges such as limited large-scale datasets and high data acquisition costs, which hinder the development of robust algorithms. Data augmentation techniques show promise in addressing these issues, but their application to biomechanical time-series data requires comprehensive evaluation. This scoping review investigates data augmentation methods for time-series data in the biomechanics domain. It analyzes current approaches for augmenting and generating time-series datasets, evaluates their effectiveness, and offers recommendations for applying these techniques in biomechanics. Four databases, PubMed, IEEE Xplore, Scopus, and Web of Science, were searched for studies published between 2013 and 2024. Following PRISMA-ScR guidelines, a two-stage screening identified 21 relevant publications. Results show that there is no universally preferred method for augmenting biomechanical time-series data; instead, methods vary based on study objectives. A major issue identified is the absence of soft tissue artifacts in synthetic data, leading to discrepancies referred to as the synthetic gap. Moreover, many studies lack proper evaluation of augmentation methods, making it difficult to assess their effects on model performance and data quality. This review highlights the critical role of data augmentation in addressing limited dataset availability and improving model generalization in biomechanics. Tailoring augmentation strategies to the characteristics of biomechanical data is essential for advancing predictive modeling. A better understanding of how different augmentation methods impact data quality and downstream tasks will be key to developing more effective and realistic techniques.

Updated: 2025-04-04 10:31:44

标题: 《人体运动生物力学中时间序列数据的数据增强：范围审查》

摘要: 机器学习和深度学习的整合已经改变了生物力学领域的数据分析，这得益于大量可穿戴传感器数据的支持。然而，该领域面临着诸如有限的大规模数据集和高昂的数据获取成本等挑战，这些挑战阻碍了强大算法的发展。数据增强技术显示出在解决这些问题方面的潜力，但它们在生物力学时间序列数据中的应用需要全面评估。本文对生物力学领域的时间序列数据的数据增强方法进行了范围性审查。它分析了当前用于增强和生成时间序列数据集的方法，评估了它们的有效性，并提供了在生物力学中应用这些技术的建议。搜索了四个数据库，包括PubMed、IEEE Xplore、Scopus和Web of Science，并检索了2013年至2024年间发表的研究。按照PRISMA-ScR指南的要求，进行了两阶段筛选，确定了21篇相关出版物。结果显示，并没有一种普遍优选的增强生物力学时间序列数据的方法；相反，方法基于研究目标而变化。一个主要问题是合成数据中缺少软组织伪迹，导致了所谓的合成差距。此外，许多研究缺乏对增强方法的适当评估，这使得难以评估它们对模型性能和数据质量的影响。本审查强调了数据增强在解决有限数据集可用性问题和改善生物力学模型泛化能力方面的关键作用。将增强策略调整到生物力学数据的特征是推进预测建模的关键。更好地了解不同的增强方法如何影响数据质量和下游任务将是开发更有效和现实的技术的关键。

更新时间: 2025-04-04 10:31:44

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2504.03334v1

Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification

This paper investigates the design of effective prompt strategies for generating realistic datasets using Text-To-Audio (TTA) models. We also analyze different techniques for efficiently combining these datasets to enhance their utility in sound classification tasks. By evaluating two sound classification datasets with two TTA models, we apply a range of prompt strategies. Our findings reveal that task-specific prompt strategies significantly outperform basic prompt approaches in data generation. Furthermore, merging datasets generated using different TTA models proves to enhance classification performance more effectively than merely increasing the training dataset size. Overall, our results underscore the advantages of these methods as effective data augmentation techniques using synthetic data.

Updated: 2025-04-04 10:14:11

标题: 注意提示：在音频生成中改善声音分类的提示策略

摘要: 本文研究了使用文本到语音（TTA）模型生成逼真数据集的有效提示策略的设计。我们还分析了不同的技术，以有效地结合这些数据集，以增强它们在声音分类任务中的效用。通过评估两个声音分类数据集和两个TTA模型，我们应用了一系列提示策略。我们的研究结果表明，特定任务的提示策略明显优于基本的提示方法在数据生成中。此外，合并使用不同TTA模型生成的数据集被证明比仅仅增加训练数据集大小更有效地提高分类性能。总的来说，我们的结果强调了这些方法作为使用合成数据的有效数据增强技术的优势。

更新时间: 2025-04-04 10:14:11

领域: eess.AS,cs.AI,cs.SD,eess.SP

下载: http://arxiv.org/abs/2504.03329v1

Policy Optimization Algorithms in a Unified Framework

Policy optimization algorithms are crucial in many fields but challenging to grasp and implement, often due to complex calculations related to Markov decision processes and varying use of discount and average reward setups. This paper presents a unified framework that applies generalized ergodicity theory and perturbation analysis to clarify and enhance the application of these algorithms. Generalized ergodicity theory sheds light on the steady-state behavior of stochastic processes, aiding understanding of both discounted and average rewards. Perturbation analysis provides in-depth insights into the fundamental principles of policy optimization algorithms. We use this framework to identify common implementation errors and demonstrate the correct approaches. Through a case study on Linear Quadratic Regulator problems, we illustrate how slight variations in algorithm design affect implementation outcomes. We aim to make policy optimization algorithms more accessible and reduce their misuse in practice.

Updated: 2025-04-04 10:14:01

标题: 在一个统一框架中的政策优化算法

摘要: 政策优化算法在许多领域中至关重要，但由于涉及到马尔可夫决策过程的复杂计算和不同的折扣和平均奖励设置，往往难以理解和实施。本文提出了一个统一的框架，应用广义遍历性理论和扰动分析来澄清和增强这些算法的应用。广义遍历性理论揭示了随机过程的稳态行为，有助于理解折扣和平均奖励。扰动分析提供了对政策优化算法基本原理的深入洞察。我们使用这个框架来识别常见的实施错误，并展示正确的方法。通过对线性二次调节器问题的案例研究，我们说明算法设计中的细微变化如何影响实施结果。我们的目标是使政策优化算法更易于访问，并减少在实践中的误用。

更新时间: 2025-04-04 10:14:01

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2504.03328v1

Task Singular Vectors: Reducing Task Interference in Model Merging

Task Arithmetic has emerged as a simple yet effective method to merge models without additional training. However, by treating entire networks as flat parameter vectors, it overlooks key structural information and is susceptible to task interference. In this paper, we study task vectors at the layer level, focusing on task layer matrices and their singular value decomposition. In particular, we concentrate on the resulting singular vectors, which we refer to as Task Singular Vectors (TSV). Recognizing that layer task matrices are often low-rank, we propose TSV-Compress (TSV-C), a simple procedure that compresses them to 10% of their original size while retaining 99% of accuracy. We further leverage this low-rank space to define a new measure of task interference based on the interaction of singular vectors from different tasks. Building on these findings, we introduce TSV-Merge (TSV-M), a novel model merging approach that combines compression with interference reduction, significantly outperforming existing methods.

Updated: 2025-04-04 10:10:41

标题: 任务奇异向量：减少模型合并中的任务干扰

摘要: 任务算术已经成为一种简单而有效的方法，可以合并模型而无需额外的训练。然而，通过将整个网络视为平面参数向量，它忽略了关键的结构信息，并容易受到任务干扰的影响。在本文中，我们研究了层级任务向量，重点关注任务层矩阵及其奇异值分解。特别是，我们专注于所得到的奇异向量，我们将其称为任务奇异向量（TSV）。认识到层任务矩阵通常是低秩的，我们提出了TSV-Compress（TSV-C），这是一个简单的过程，将它们压缩到其原始大小的10％，同时保持99％的准确性。我们进一步利用这个低秩空间来定义基于不同任务奇异向量交互的任务干扰新度量。基于这些发现，我们引入了TSV-Merge（TSV-M），一种新颖的模型合并方法，将压缩与干扰减少相结合，显著优于现有方法。

更新时间: 2025-04-04 10:10:41

领域: cs.LG,stat.ML,I.5.1; I.4.2; I.2.10

下载: http://arxiv.org/abs/2412.00081v3

Optimal Embedding Guided Negative Sample Generation for Knowledge Graph Link Prediction

Knowledge graph embedding (KGE) models encode the structural information of knowledge graphs to predicting new links. Effective training of these models requires distinguishing between positive and negative samples with high precision. Although prior research has shown that improving the quality of negative samples can significantly enhance model accuracy, identifying high-quality negative samples remains a challenging problem. This paper theoretically investigates the condition under which negative samples lead to optimal KG embedding and identifies a sufficient condition for an effective negative sample distribution. Based on this theoretical foundation, we propose \textbf{E}mbedding \textbf{MU}tation (\textsc{EMU}), a novel framework that \emph{generates} negative samples satisfying this condition, in contrast to conventional methods that focus on \emph{identifying} challenging negative samples within the training data. Importantly, the simplicity of \textsc{EMU} ensures seamless integration with existing KGE models and negative sampling methods. To evaluate its efficacy, we conducted comprehensive experiments across multiple datasets. The results consistently demonstrate significant improvements in link prediction performance across various KGE models and negative sampling methods. Notably, \textsc{EMU} enables performance improvements comparable to those achieved by models with embedding dimension five times larger. An implementation of the method and experiments are available at https://github.com/nec-research/EMU-KG.

Updated: 2025-04-04 10:10:18

标题: 最佳嵌入引导的负样本生成用于知识图谱链接预测

摘要: 知识图谱嵌入（KGE）模型编码知识图谱的结构信息以预测新的链接。有效训练这些模型需要高精度地区分正样本和负样本。尽管先前的研究表明，改善负样本的质量可以显著提高模型准确性，但识别高质量的负样本仍然是一个具有挑战性的问题。本文从理论上研究了负样本导致最佳KG嵌入的条件，并确定了有效负样本分布的充分条件。基于这一理论基础，我们提出了\textbf{E}mbedding \textbf{MU}tation（\textsc{EMU}），这是一个新颖的框架，用于生成满足这一条件的负样本，与传统方法不同，传统方法侧重于在训练数据中识别具有挑战性的负样本。重要的是，\textsc{EMU}的简单性确保与现有的KGE模型和负采样方法无缝集成。为了评估其有效性，我们在多个数据集上进行了全面的实验。结果一致表明，在各种KGE模型和负采样方法中，链接预测性能显著提高。值得注意的是，\textsc{EMU}使性能提高相当于具有五倍嵌入维度的模型所实现的性能提升。该方法的实现和实验可在 https://github.com/nec-research/EMU-KG 上找到。

更新时间: 2025-04-04 10:10:18

领域: cs.LG,cs.CL,cs.IR

下载: http://arxiv.org/abs/2504.03327v1

Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO

As the Information Retrieval (IR) field increasingly recognizes the importance of inclusivity, addressing the needs of low-resource languages remains a significant challenge. This paper introduces the first large-scale Urdu IR dataset, created by translating the MS MARCO dataset through machine translation. We establish baseline results through zero-shot learning for IR in Urdu and subsequently apply the mMARCO multilingual IR methodology to this newly translated dataset. Our findings demonstrate that the fine-tuned model (Urdu-mT5-mMARCO) achieves a Mean Reciprocal Rank (MRR@10) of 0.247 and a Recall@10 of 0.439, representing significant improvements over zero-shot results and showing the potential for expanding IR access for Urdu speakers. By bridging access gaps for speakers of low-resource languages, this work not only advances multilingual IR research but also emphasizes the ethical and societal importance of inclusive IR technologies. This work provides valuable insights into the challenges and solutions for improving language representation and lays the groundwork for future research, especially in South Asian languages, which can benefit from the adaptable methods used in this study.

Updated: 2025-04-04 10:07:23

标题: 实现低资源语言检索：为乌尔都语MS MARCO建立基线

摘要: 随着信息检索（IR）领域越来越认识到包容性的重要性，解决低资源语言的需求仍然是一个重大挑战。本文介绍了第一个大规模的乌尔都语IR数据集，通过机器翻译将MS MARCO数据集翻译而来。我们通过零-shot学习在乌尔都语中建立了基线结果，随后将mMARCO多语言IR方法应用于这个新翻译的数据集。我们的研究结果表明，经过微调的模型（Urdu-mT5-mMARCO）实现了Mean Reciprocal Rank（MRR@10）为0.247和Recall@10为0.439，相比零-shot结果有显著改善，并展示了扩大乌尔都语使用者IR访问的潜力。通过弥合低资源语言使用者的访问差距，这项工作不仅推动了多语言IR研究，还强调了包容性IR技术的道德和社会重要性。这项工作为改善语言表达的挑战和解决方案提供了宝贵的见解，并为未来研究奠定了基础，特别是在南亚语言中，可以受益于本研究中使用的适应性方法。

更新时间: 2025-04-04 10:07:23

领域: cs.CL,cs.AI,cs.IR,68T50,I.2.7

下载: http://arxiv.org/abs/2412.12997v3

Block Toeplitz Sparse Precision Matrix Estimation for Large-Scale Interval-Valued Time Series Forecasting

Modeling and forecasting interval-valued time series (ITS) have attracted considerable attention due to their growing presence in various contexts. To the best of our knowledge, there have been no efforts to model large-scale ITS. In this paper, we propose a feature extraction procedure for large-scale ITS, which involves key steps such as auto-segmentation and clustering, and feature transfer learning. This procedure can be seamlessly integrated with any suitable prediction models for forecasting purposes. Specifically, we transform the automatic segmentation and clustering of ITS into the estimation of Toeplitz sparse precision matrices and assignment set. The majorization-minimization algorithm is employed to convert this highly non-convex optimization problem into two subproblems. We derive efficient dynamic programming and alternating direction method to solve these two subproblems alternately and establish their convergence properties. By employing the Joint Recurrence Plot (JRP) to image subsequence and assigning a class label to each cluster, an image dataset is constructed. Then, an appropriate neural network is chosen to train on this image dataset and used to extract features for the next step of forecasting. Real data applications demonstrate that the proposed method can effectively obtain invariant representations of the raw data and enhance forecasting performance.

Updated: 2025-04-04 09:57:05

标题: 大规模区间值时间序列预测的块Toeplitz稀疏精度矩阵估计

摘要: 建模和预测区间值时间序列（ITS）引起了广泛关注，因为它们在各种情境中越来越普遍。据我们所知，尚未有努力对大规模ITS进行建模。在本文中，我们提出了一个适用于大规模ITS的特征提取过程，其中包括自动分割和聚类以及特征转移学习等关键步骤。这个过程可以与任何适合的预测模型无缝集成，用于预测目的。具体来说，我们将ITS的自动分割和聚类转化为Toeplitz稀疏精度矩阵和分配集的估计。我们采用主导极小化算法将这个高度非凸优化问题转化为两个子问题。我们推导出高效的动态规划和交替方向方法，交替解决这两个子问题，并建立它们的收敛性质。通过使用联合重复图（JRP）来图像子序列并为每个聚类分配一个类标签，构建了一个图像数据集。然后，选择适当的神经网络在这个图像数据集上进行训练，并用于提取特征，以进行下一步的预测。实际数据应用表明，所提出的方法可以有效地获得原始数据的不变表示，并增强预测性能。

更新时间: 2025-04-04 09:57:05

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2504.03322v1

Low-Resource Transliteration for Roman-Urdu and Urdu Using Transformer-Based Models

As the Information Retrieval (IR) field increasingly recognizes the importance of inclusivity, addressing the needs of low-resource languages remains a significant challenge. Transliteration between Urdu and its Romanized form, Roman Urdu, remains underexplored despite the widespread use of both scripts in South Asia. Prior work using RNNs on the Roman-Urdu-Parl dataset showed promising results but suffered from poor domain adaptability and limited evaluation. We propose a transformer-based approach using the m2m100 multilingual translation model, enhanced with masked language modeling (MLM) pretraining and fine-tuning on both Roman-Urdu-Parl and the domain-diverse Dakshina dataset. To address previous evaluation flaws, we introduce rigorous dataset splits and assess performance using BLEU, character-level BLEU, and CHRF. Our model achieves strong transliteration performance, with Char-BLEU scores of 96.37 for Urdu->Roman-Urdu and 97.44 for Roman-Urdu->Urdu. These results outperform both RNN baselines and GPT-4o Mini and demonstrate the effectiveness of multilingual transfer learning for low-resource transliteration tasks.

Updated: 2025-04-04 09:55:38

标题: 基于Transformer模型的罗马-乌尔都语和乌尔都语低资源转译

摘要: 随着信息检索(IR)领域越来越意识到包容性的重要性，解决低资源语言的需求仍然是一个重大挑战。尽管乌尔都语和它的罗马化形式罗马乌尔都在南亚地区广泛使用，但它们之间的音译仍然未被充分探索。先前使用RNNs在罗马-乌尔都-巴尔数据集上展示出了有希望的结果，但存在领域适应性差和评估有限的问题。我们提出了一种基于Transformer的方法，使用m2m100多语言翻译模型，增强了掩码语言建模(MLM)的预训练，并在罗马-乌尔都-巴尔和领域多样的Dakshina数据集上进行微调。为了解决先前评估缺陷，我们引入了严格的数据集分离，并使用BLEU、字符级BLEU和CHRF评估性能。我们的模型取得了强大的音译性能，乌尔都->罗马乌尔都的Char-BLEU分数为96.37，罗马乌尔都->乌尔都为97.44。这些结果优于RNN基线和GPT-4o Mini，并展示了多语言迁移学习在低资源音译任务中的有效性。

更新时间: 2025-04-04 09:55:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.21530v2

Adaptive Classification of Interval-Valued Time Series

In recent years, the modeling and analysis of interval-valued time series have garnered significant attention in the fields of econometrics and statistics. However, the existing literature primarily focuses on regression tasks while neglecting classification aspects. In this paper, we propose an adaptive approach for interval-valued time series classification. Specifically, we represent interval-valued time series using convex combinations of upper and lower bounds of intervals and transform these representations into images based on point-valued time series imaging methods. We utilize a fine-grained image classification neural network to classify these images, to achieve the goal of classifying the original interval-valued time series. This proposed method is applicable to both univariate and multivariate interval-valued time series. On the optimization front, we treat the convex combination coefficients as learnable parameters similar to the parameters of the neural network and provide an efficient estimation method based on the alternating direction method of multipliers (ADMM). On the theoretical front, under specific conditions, we establish a margin-based multiclass generalization bound for generic CNNs composed of basic blocks involving convolution, pooling, and fully connected layers. Through simulation studies and real data applications, we validate the effectiveness of the proposed method and compare its performance against a wide range of point-valued time series classification methods.

Updated: 2025-04-04 09:52:40

标题: 自适应分类间隔值时间序列

摘要: 近年来，在计量经济学和统计学领域，区间值时间序列的建模和分析引起了重要关注。然而，现有文献主要侧重于回归任务，忽视了分类方面。在本文中，我们提出了一种自适应的区间值时间序列分类方法。具体而言，我们使用区间的上下界的凸组合来表示区间值时间序列，并将这些表示转化为基于点值时间序列成像方法的图像。我们利用细粒度图像分类神经网络对这些图像进行分类，以实现对原始区间值时间序列的分类目标。这种方法适用于单变量和多变量区间值时间序列。在优化方面，我们将凸组合系数视为可学习参数，类似于神经网络的参数，并提供基于交替方向乘子法（ADMM）的高效估计方法。在理论方面，在特定条件下，我们为由涉及卷积、池化和全连接层的基本块组成的通用CNNs建立了基于间隔的多类泛化界限。通过仿真研究和实际数据应用，我们验证了所提出方法的有效性，并将其性能与广泛的点值时间序列分类方法进行了比较。

更新时间: 2025-04-04 09:52:40

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2504.03318v1

Incremental Outlier Detection Modelling Using Streaming Analytics in Finance & Health Care

In the era of real-time data, traditional methods often struggle to keep pace with the dynamic nature of streaming environments. In this paper, we proposed a hybrid framework where in (i) stage-I follows a traditional approach where the model is built once and evaluated in a real-time environment, and (ii) stage-II employs an incremental learning approach where the model is continuously retrained as new data arrives, enabling it to adapt and stay up to date. To implement these frameworks, we employed 8 distinct state-of-the-art outlier detection models, including one-class support vector machine (OCSVM), isolation forest adaptive sliding window approach (IForest ASD), exact storm (ES), angle-based outlier detection (ABOD), local outlier factor (LOF), Kitsunes online algorithm (KitNet), and K-nearest neighbour conformal density and distance based (KNN CAD). We evaluated the performance of these models across seven financial and healthcare prediction tasks, including credit card fraud detection, churn prediction, Ethereum fraud detection, heart stroke prediction, and diabetes prediction. The results indicate that our proposed incremental learning framework significantly improves performance, particularly on highly imbalanced datasets. Among all models, the IForest ASD model consistently ranked among the top three best-performing models, demonstrating superior effectiveness across various datasets.

Updated: 2025-04-04 09:52:35

标题: 使用流式分析在金融和医疗保健领域进行增量异常检测建模

摘要: 在实时数据时代，传统方法往往难以跟上流式环境的动态特性。在本文中，我们提出了一个混合框架，其中（i）第一阶段遵循传统方法，在这种方法中模型只建立一次并在实时环境中评估，（ii）第二阶段采用增量学习方法，其中模型不断重新训练以适应和保持最新状态。为了实现这些框架，我们采用了8种不同的最新异常检测模型，包括一类支持向量机（OCSVM）、孤立森林自适应滑动窗口方法（IForest ASD）、精确风暴（ES）、基于角度的异常检测（ABOD）、局部异常因子（LOF）、Kitsunes在线算法（KitNet）和基于K最近邻的符合密度和距离的（KNN CAD）。我们评估了这些模型在七个金融和医疗预测任务中的性能，包括信用卡欺诈检测、流失预测、以太坊欺诈检测、心脏卒中预测和糖尿病预测。结果表明，我们提出的增量学习框架显著提高了性能，尤其是在高度不平衡的数据集上。在所有模型中，IForest ASD模型始终位居前三名表现最佳的模型之列，展示了其在各种数据集上的卓越有效性。

更新时间: 2025-04-04 09:52:35

领域: cs.LG

下载: http://arxiv.org/abs/2305.09907v2

Detecting underdetermination in parameterized quantum circuits

A central question in machine learning is how reliable the predictions of a trained model are. Reliability includes the identification of instances for which a model is likely not to be trusted based on an analysis of the learning system itself. Such unreliability for an input may arise from the model family providing a variety of hypotheses consistent with the training data, which can vastly disagree in their predictions on that particular input point. This is called the underdetermination problem, and it is important to develop methods to detect it. With the emergence of quantum machine learning (QML) as a prospective alternative to classical methods for certain learning problems, the question arises to what extent they are subject to underdetermination and whether similar techniques as those developed for classical models can be employed for its detection. In this work, we first provide an overview of concepts from Safe AI and reliability, which in particular received little attention in QML. We then explore the use of a method based on local second-order information for the detection of underdetermination in parameterized quantum circuits through numerical experiments. We further demonstrate that the approach is robust to certain levels of shot noise. Our work contributes to the body of literature on Safe Quantum AI, which is an emerging field of growing importance.

Updated: 2025-04-04 09:50:13

标题: 检测参数化量子电路中的不确定性

摘要: 机器学习中的一个核心问题是训练模型的预测有多可靠。可靠性包括识别出模型可能不可信赖的实例，这是基于对学习系统本身的分析而得出的结论。这种输入的不可靠性可能源自模型族提供了许多与训练数据一致的假设，这些假设在特定输入点上的预测可能大不相同。这被称为不确定性问题，发展出检测它的方法是非常重要的。随着量子机器学习（QML）作为对某些学习问题的经典方法的潜在替代品的出现，一个问题出现了，即它们在多大程度上受到不确定性的影响，以及是否可以运用类似于为经典模型开发的技术来检测。在这项工作中，我们首先概述了来自安全人工智能和可靠性的概念，这些概念在QML中尤其受到了较少的关注。然后，我们通过数值实验探讨了一种基于局部二阶信息的方法，用于检测参数化量子电路中的不确定性。我们进一步证明了这种方法对一定程度的拍摄噪声是稳健的。我们的工作为“安全量子人工智能”领域的文献贡献了一份力量，这是一个日益重要的新兴领域。

更新时间: 2025-04-04 09:50:13

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2504.03315v1

Evaluating Compact LLMs for Zero-Shot Iberian Language Tasks on End-User Devices

Large Language Models have significantly advanced natural language processing, achieving remarkable performance in tasks such as language generation, translation, and reasoning. However, their substantial computational requirements restrict deployment to high-end systems, limiting accessibility on consumer-grade devices. This challenge is especially pronounced for under-resourced languages like those spoken in the Iberian Peninsula, where relatively limited linguistic resources and benchmarks hinder effective evaluation. This work presents a comprehensive evaluation of compact state-of-the-art LLMs across several essential NLP tasks tailored for Iberian languages. The results reveal that while some models consistently excel in certain tasks, significant performance gaps remain, particularly for languages such as Basque. These findings highlight the need for further research on balancing model compactness with robust multilingual performance

Updated: 2025-04-04 09:47:58

标题: 评估用于零-shot 伊比利亚语言任务的紧凑型LLMs在终端用户设备上的表现

摘要: 大型语言模型在自然语言处理方面取得了重大进展，在语言生成、翻译和推理等任务中表现出色。然而，它们巨大的计算要求限制了在高端系统上的部署，限制了在消费级设备上的可访问性。这一挑战在像伊比利亚半岛上的一些资源匮乏的语言中尤为突出，相对有限的语言资源和基准阻碍了有效评估。本研究对精简的最新LLM在伊比利亚语言的几个重要自然语言处理任务中进行了全面评估。结果显示，虽然一些模型在某些任务中表现出色，但仍存在显著的性能差距，尤其是对于像巴斯克语这样的语言。这些发现强调了在平衡模型紧凑性和稳健的多语言性能方面需要进一步研究的必要性。

更新时间: 2025-04-04 09:47:58

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.03312v1

Roto-Translation Invariant Metrics on Position-Orientation Space

Riemannian metrics on the position-orientation space M(3) that are roto-translation group SE(3) invariant play a key role in image analysis tasks like enhancement, denoising, and segmentation. These metrics enable roto-translation equivariant algorithms, with the associated Riemannian distance often used in implementation. However, computing the Riemannian distance is costly, which makes it unsuitable in situations where constant recomputation is needed. We propose the mav (minimal angular velocity) distance, defined as the Riemannian length of a geometrically meaningful curve, as a practical alternative. We see an application of the mav distance in geometric deep learning. Namely, neural networks architectures such as PONITA, relies on geometric invariants to create their roto-translation equivariant model. The mav distance offers a trainable invariant, with the parameters that determine the Riemannian metric acting as learnable weights. In this paper we: 1) classify and parametrize all SE(3) invariant metrics on M(3), 2) describes how to efficiently calculate the mav distance, and 3) investigate if including the mav distance within PONITA can positively impact its accuracy in predicting molecular properties.

Updated: 2025-04-04 09:36:11

标题: Translation-Invariant Metrics on Position-Orientation Space

摘要: 在位置-方向空间M(3)上的黎曼度量是旋转平移群SE(3)不变的，在图像分析任务中发挥关键作用，如增强、去噪和分割。这些度量使得旋转平移等变算法成为可能，相关的黎曼距离通常用于实现中。然而，计算黎曼距离是昂贵的，这使得它在需要不断重新计算的情况下不适用。我们提出了mav（最小角速度）距离，定义为几何上有意义的曲线的黎曼长度，作为实用的替代方案。我们看到mav距离在几何深度学习中的应用。即，神经网络架构如PONITA，依赖于几何不变量来创建它们的旋转平移等变模型。mav距离提供了一个可训练的不变量，其参数决定了作为可学习权重的黎曼度量。在本文中，我们：1）对M(3)上的所有SE(3)不变度量进行分类和参数化，2）描述如何有效计算mav距离，以及3）调查将mav距离包含在PONITA中是否能够积极影响其在预测分子性质方面的准确性。

更新时间: 2025-04-04 09:36:11

领域: math.DG,cs.LG

下载: http://arxiv.org/abs/2504.03309v1

On the algebraic degree stability of vectorial Boolean functions when restricted to affine subspaces

We study the behaviour of the algebraic degree of vectorial Boolean functions when their inputs are restricted to an affine subspace of their domain. Functions which maintain their degree on all subspaces of as high a codimension as possible are particularly interesting for cryptographic applications. For functions which are power functions $x^d$ in their univariate representation, we fully characterize the exponents $d$ for which the algebraic degree of the function stays unchanged when the input is restricted to spaces of codimension 1 or 2. For codimensions $k\ge 3$, we give a sufficient condition for the algebraic degree to stay unchanged. We apply these results to the multiplicative inverse function, as well as to the Kasami functions. We define an optimality notion regarding the stability of the degree on subspaces, and determine a number of optimal functions, including the multiplicative inverse function and the quadratic APN functions. We also give an explicit formula for counting the functions that keep their algebraic degree unchanged when restricted to hyperplanes.

Updated: 2025-04-04 09:33:03

标题: 关于向量布尔函数在限制到仿射子空间时的代数度稳定性

摘要: 我们研究了当输入受限于定义域的一个仿射子空间时，向量布尔函数的代数度的行为。在尽可能高维的子空间上保持其度的函数对密码学应用尤其有趣。对于在其一元表示中为幂函数$x^d$的函数，我们完全表征了当输入受限于维数为1或2的空间时，函数的代数度保持不变的指数$d$。对于维数$k\ge 3$的情况，我们给出了代数度保持不变的一个充分条件。我们将这些结果应用于乘法逆函数，以及Kasami函数。我们定义了关于子空间上度稳定性的最优性概念，并确定了许多最优函数，包括乘法逆函数和二次APN函数。我们还给出了一个明确的计算公式，用于计算当受限于超平面时保持其代数度不变的函数数量。

更新时间: 2025-04-04 09:33:03

领域: math.AC,cs.CR,06E30, 94D10

下载: http://arxiv.org/abs/2504.03307v1

Multi-Flow: Multi-View-Enriched Normalizing Flows for Industrial Anomaly Detection

With more well-performing anomaly detection methods proposed, many of the single-view tasks have been solved to a relatively good degree. However, real-world production scenarios often involve complex industrial products, whose properties may not be fully captured by one single image. While normalizing flow based approaches already work well in single-camera scenarios, they currently do not make use of the priors in multi-view data. We aim to bridge this gap by using these flow-based models as a strong foundation and propose Multi-Flow, a novel multi-view anomaly detection method. Multi-Flow makes use of a novel multi-view architecture, whose exact likelihood estimation is enhanced by fusing information across different views. For this, we propose a new cross-view message-passing scheme, letting information flow between neighboring views. We empirically validate it on the real-world multi-view data set Real-IAD and reach a new state-of-the-art, surpassing current baselines in both image-wise and sample-wise anomaly detection tasks.

Updated: 2025-04-04 09:32:01

标题: 多流：多视图丰富的正规化流用于工业异常检测

摘要: 随着提出了更多表现良好的异常检测方法，许多单视图任务已经得到了相对较好的解决。然而，现实世界的生产场景通常涉及复杂的工业产品，其特性可能不能完全由一张单一图像捕捉到。虽然基于正则化流的方法已经在单摄像头场景中表现良好，但它们目前并未利用多视图数据中的先验知识。我们旨在通过使用这些基于流的模型作为强大的基础，并提出了Multi-Flow，一种新颖的多视图异常检测方法来填补这一差距。Multi-Flow利用一种新颖的多视图架构，通过融合不同视图之间的信息来增强精确的可能性估计。为此，我们提出了一种新的跨视图消息传递方案，让信息在相邻视图之间流动。我们在真实世界的多视图数据集Real-IAD上进行了实证验证，并取得了新的最先进水平，超过了当前基线在图像级和样本级异常检测任务中的表现。

更新时间: 2025-04-04 09:32:01

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.03306v1

Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models

Large language models (LLMs) often produce inaccurate or misleading content-hallucinations. To address this challenge, we introduce Noise-Augmented Fine-Tuning (NoiseFiT), a novel framework that leverages adaptive noise injection based on the signal-to-noise ratio (SNR) to enhance model robustness. In particular, NoiseFiT selectively perturbs layers identified as either high-SNR (more robust) or low-SNR (potentially under-regularized) using a dynamically scaled Gaussian noise. We further propose a hybrid loss that combines standard cross-entropy, soft cross-entropy, and consistency regularization to ensure stable and accurate outputs under noisy training conditions. Our theoretical analysis shows that adaptive noise injection is both unbiased and variance-preserving, providing strong guarantees for convergence in expectation. Empirical results on multiple test and benchmark datasets demonstrate that NoiseFiT significantly reduces hallucination rates, often improving or matching baseline performance in key tasks. These findings highlight the promise of noise-driven strategies for achieving robust, trustworthy language modeling without incurring prohibitive computational overhead. Given the comprehensive and detailed nature of our experiments, we have publicly released the fine-tuning logs, benchmark evaluation artifacts, and source code online at W&B, Hugging Face, and GitHub, respectively, to foster further research, accessibility and reproducibility.

Updated: 2025-04-04 09:27:19

标题: 大型语言模型中减轻幻觉的噪声增强微调

摘要: 大型语言模型（LLM）通常会产生不准确或误导性的内容幻觉。为了解决这一挑战，我们引入了噪声增强微调（NoiseFiT），这是一个利用基于信噪比（SNR）的自适应噪声注入来增强模型稳健性的新框架。具体而言，NoiseFiT通过动态缩放的高斯噪声有选择性地扰动被识别为高信噪比（更稳健）或低信噪比（潜在过度正则化）的层。我们进一步提出了一个混合损失，结合了标准交叉熵、软交叉熵和一致性正则化，以确保在嘈杂的训练条件下稳定和准确的输出。我们的理论分析表明，自适应噪声注入既是无偏的，又是方差保持的，为期望中的收敛提供了强有力的保证。在多个测试和基准数据集上的实证结果表明，NoiseFiT显著降低了幻觉率，通常在关键任务中提高或与基准性能匹配。这些发现突显了基于噪声驱动策略实现稳健、可信的语言建模的潜力，而不会产生过高的计算开销。鉴于我们实验的全面和详细性质，我们已在W&B、Hugging Face和GitHub上公开发布了微调日志、基准评估工件和源代码，以促进进一步的研究、可访问性和可重现性。

更新时间: 2025-04-04 09:27:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.03302v1

Universal Collection of Euclidean Invariants between Pairs of Position-Orientations

Euclidean E(3) equivariant neural networks that employ scalar fields on position-orientation space M(3) have been effectively applied to tasks such as predicting molecular dynamics and properties. To perform equivariant convolutional-like operations in these architectures one needs Euclidean invariant kernels on M(3) x M(3). In practice, a handcrafted collection of invariants is selected, and this collection is then fed into multilayer perceptrons to parametrize the kernels. We rigorously describe an optimal collection of 4 smooth scalar invariants on the whole of M(3) x M(3). With optimal we mean that the collection is independent and universal, meaning that all invariants are pertinent, and any invariant kernel is a function of them. We evaluate two collections of invariants, one universal and one not, using the PONITA neural network architecture. Our experiments show that using a collection of invariants that is universal positively impacts the accuracy of PONITA significantly.

Updated: 2025-04-04 09:25:59

标题: Pairs of Position-Orientations之间的欧几里德不变量的通用集合

摘要: 欧几里德E(3)等变神经网络在位置-方向空间M(3)上应用标量场，已被有效地应用于预测分子动力学和性质等任务。为了在这些体系中执行等变卷积操作，需要在M(3) x M(3)上具有欧几里德不变性的核。在实践中，会选择手工制作的一组不变量，然后将此集合馈入多层感知器以对核进行参数化。我们严格描述了整个M(3) x M(3)上的4个平滑标量不变量的最佳集合。最佳意味着该集合是独立和通用的，即所有不变量都相关，并且任何不变核都是它们的函数。我们评估了两组不变量，一组是通用的，另一组不是，使用PONITA神经网络架构。我们的实验显示，使用通用的不变量集合显著提高了PONITA的准确性。

更新时间: 2025-04-04 09:25:59

领域: math.DG,cs.LG

下载: http://arxiv.org/abs/2504.03299v1

Offline and Distributional Reinforcement Learning for Wireless Communications

The rapid growth of heterogeneous and massive wireless connectivity in 6G networks demands intelligent solutions to ensure scalability, reliability, privacy, ultra-low latency, and effective control. Although artificial intelligence (AI) and machine learning (ML) have demonstrated their potential in this domain, traditional online reinforcement learning (RL) and deep RL methods face limitations in real-time wireless networks. For instance, these methods rely on online interaction with the environment, which might be unfeasible, costly, or unsafe. In addition, they cannot handle the inherent uncertainties in real-time wireless applications. We focus on offline and distributional RL, two advanced RL techniques that can overcome these challenges by training on static datasets and accounting for network uncertainties. We introduce a novel framework that combines offline and distributional RL for wireless communication applications. Through case studies on unmanned aerial vehicle (UAV) trajectory optimization and radio resource management (RRM), we demonstrate that our proposed Conservative Quantile Regression (CQR) algorithm outperforms conventional RL approaches regarding convergence speed and risk management. Finally, we discuss open challenges and potential future directions for applying these techniques in 6G networks, paving the way for safer and more efficient real-time wireless systems.

Updated: 2025-04-04 09:24:39

标题: 无线通信的离线和分布式强化学习

摘要: 6G网络中异构和大规模无线连接的快速增长需要智能解决方案来确保可扩展性、可靠性、隐私性、超低延迟和有效控制。尽管人工智能（AI）和机器学习（ML）在这一领域展示出了潜力，但传统的在线强化学习（RL）和深度RL方法在实时无线网络中面临限制。例如，这些方法依赖于与环境的在线交互，这可能是不可行的、昂贵的或不安全的。此外，它们无法处理实时无线应用中固有的不确定性。我们专注于离线和分布式RL，这两种先进的RL技术可以通过在静态数据集上进行训练并考虑网络不确定性来克服这些挑战。我们引入了一个将离线和分布式RL结合起来应用于无线通信应用的新框架。通过对无人机轨迹优化和无线资源管理（RRM）的案例研究，我们证明了我们提出的保守分位回归（CQR）算法在收敛速度和风险管理方面优于传统的RL方法。最后，我们讨论了在6G网络中应用这些技术的开放挑战和潜在未来方向，为更安全和更高效的实时无线系统铺平道路。

更新时间: 2025-04-04 09:24:39

领域: cs.LG,cs.MA,cs.NI

下载: http://arxiv.org/abs/2504.03804v1

Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow

Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes

Updated: 2025-04-04 09:22:05

标题: 使用图神经网络的近端策略优化在最优潮流中的应用

摘要: 最优潮流（OPF）是电力系统领域一个非常传统的研究领域，旨在寻找电力厂的最佳运行点，并需要在现实场景中每隔几分钟解决一次。然而，由于在发电系统中出现的非凸性，目前还没有一种快速、稳健的解决全交流最优潮流（ACOPF）的技术。在过去的几十年里，电力网络已经发展成为一个典型的动态、非线性和大规模控制系统，被称为电力系统，因此寻找更好、更快的ACOPF解决方案变得至关重要。图神经网络（GNN）的出现使得机器学习（ML）算法可以自然地在图数据（如电力网络）上使用。另一方面，深度强化学习（DRL）以解决复杂决策问题而闻名。尽管这两种方法分别在文献中开始出现解决方案，但尚未结合两者的优势。我们提出了一种基于Proximal Policy Optimization算法和图神经网络的新型架构来解决最优潮流问题。目标是设计一种学习如何解决优化问题的架构，同时能够推广到未见过的场景。我们将我们的解决方案与DCOPF在成本方面进行比较，首先在IEEE 30母线系统上训练我们的DRL代理，然后在该基础网络上进行拓扑变化的OPF计算。

更新时间: 2025-04-04 09:22:05

领域: cs.AI

下载: http://arxiv.org/abs/2212.12470v3

Stance-Driven Multimodal Controlled Statement Generation: New Dataset and Task

Formulating statements that support diverse or controversial stances on specific topics is vital for platforms that enable user expression, reshape political discourse, and drive social critique and information dissemination. With the rise of Large Language Models (LLMs), controllable text generation towards specific stances has become a promising research area with applications in shaping public opinion and commercial marketing. However, current datasets often focus solely on pure texts, lacking multimodal content and effective context, particularly in the context of stance detection. In this paper, we formally define and study the new problem of stance-driven controllable content generation for tweets with text and images, where given a multimodal post (text and image/video), a model generates a stance-controlled response. To this end, we create the Multimodal Stance Generation Dataset (StanceGen2024), the first resource explicitly designed for multimodal stance-controllable text generation in political discourse. It includes posts and user comments from the 2024 U.S. presidential election, featuring text, images, videos, and stance annotations to explore how multimodal political content shapes stance expression. Furthermore, we propose a Stance-Driven Multimodal Generation (SDMG) framework that integrates weighted fusion of multimodal features and stance guidance to improve semantic consistency and stance control. We release the dataset and code (https://anonymous.4open.science/r/StanceGen-BE9D) for public use and further research.

Updated: 2025-04-04 09:20:19

标题: 立场驱动的多模态控制声明生成：新数据集和任务

摘要: 制定支持多元或有争议立场的陈述对于能够促进用户表达、重塑政治话语并推动社会批判和信息传播的平台至关重要。随着大型语言模型（LLMs）的兴起，针对特定立场的可控文本生成已成为一个具有潜力的研究领域，可在塑造公众舆论和商业营销方面应用。然而，目前的数据集通常仅关注纯文本，缺乏多模态内容和有效上下文，特别是在立场检测的背景下。在本文中，我们正式定义并研究了立场驱动的可控内容生成新问题，即针对带有文本和图像的推文进行文本生成，模型生成一个受立场控制的回应。为此，我们创建了第一个专门设计用于多模态立场可控文本生成的资源——多模态立场生成数据集（StanceGen2024），其中包括来自2024年美国总统选举的帖子和用户评论，涵盖文本、图像、视频和立场注释，以探讨多模态政治内容如何塑造立场表达。此外，我们提出了一个立场驱动的多模态生成（SDMG）框架，整合了多模态特征的加权融合和立场指导，以改善语义一致性和立场控制。我们发布了数据集和代码（https://anonymous.4open.science/r/StanceGen-BE9D）供公众使用和进一步研究。

更新时间: 2025-04-04 09:20:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.03295v1

Towards Effective EU E-Participation: The Development of AskThePublic

E-participation platforms can be an important asset for governments in increasing trust and fostering democratic societies. By engaging non-governmental and private institutions, domain experts, and even the general public, policymakers can make informed and inclusive decisions. Drawing on the Media Richness Theory and applying the Design Science Research method, we explore how a chatbot can be designed to improve the effectiveness of the policy-making process of existing citizen involvement platforms. Leveraging the Have Your Say platform, which solicits feedback on European Commission initiatives and regulations, a Large Language Model based chatbot, called AskThePublic is created, providing policymakers, journalists, researchers, and interested citizens with a convenient channel to explore and engage with public input. By conducting 11 semistructured interviews, the results show that the participants value the interactive and structured responses as well as enhanced language capabilities, thus increasing their likelihood of engaging with AskThePublic over the existing platform. An outlook for future iterations is provided and discussed with regard to the perspectives of the different stakeholders.

Updated: 2025-04-04 09:15:06

标题: 朝着有效的欧盟电子参与：AskThePublic的发展

摘要: 电子参与平台可以成为政府增加信任和促进民主社会的重要资产。通过吸引非政府和私人机构、领域专家，甚至普通公众，决策者可以做出明智和包容的决策。借鉴媒体丰富性理论并应用设计科学研究方法，我们探讨了如何设计聊天机器人以提高现有市民参与平台的政策制定过程的效果。利用“Have Your Say”平台，该平台征求欧洲委员会倡议和法规的反馈意见，创建了一款基于大型语言模型的聊天机器人，名为AskThePublic，为决策者、记者、研究人员和感兴趣的公民提供了一个便捷的渠道来探索和参与公众意见。通过进行11次半结构化访谈，结果显示参与者重视互动和结构化回应以及增强的语言能力，从而增加了他们与AskThePublic平台的互动可能性。提供了未来迭代的展望，并就不同利益相关者的观点进行了讨论。

更新时间: 2025-04-04 09:15:06

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2504.03287v1

What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices

Large Language Models (LLMs) are increasingly deployed as gateways to information, yet their content moderation practices remain underexplored. This work investigates the extent to which LLMs refuse to answer or omit information when prompted on political topics. To do so, we distinguish between hard censorship (i.e., generated refusals, error messages, or canned denial responses) and soft censorship (i.e., selective omission or downplaying of key elements), which we identify in LLMs' responses when asked to provide information on a broad range of political figures. Our analysis covers 14 state-of-the-art models from Western countries, China, and Russia, prompted in all six official United Nations (UN) languages. Our analysis suggests that although censorship is observed across the board, it is predominantly tailored to an LLM provider's domestic audience and typically manifests as either hard censorship or soft censorship (though rarely both concurrently). These findings underscore the need for ideological and geographic diversity among publicly available LLMs, and greater transparency in LLM moderation strategies to facilitate informed user choices. All data are made freely available.

Updated: 2025-04-04 09:09:06

标题: 大型语言模型不谈论的内容：对审查和审查实践的经验性研究

摘要: 大型语言模型（LLMs）越来越被用作获取信息的入口，然而它们的内容管理实践仍未被充分探讨。本研究调查了LLMs在政治话题上被提示时拒绝回答或省略信息的程度。为此，我们区分了硬性审查（即生成的拒绝、错误消息或常规的否认回应）和软性审查（即选择性省略或淡化关键元素），我们在LLMs的回应中发现这些情况，当被要求提供关于各种政治人物的信息时。我们的分析涵盖了来自西方国家、中国和俄罗斯的14个最先进的模型，这些模型受到了所有六种官方联合国（UN）语言的启发。我们的分析表明，尽管审查在各方面都有观察到，但主要是为了LLM提供者的国内受众，并且通常表现为硬性审查或软性审查（尽管很少同时出现）。这些发现强调了公开可用的LLMs需要意识形态和地理多样性，以及LLM管理策略需要更大的透明度，以促进用户做出知情选择。所有数据均可免费获取。

更新时间: 2025-04-04 09:09:06

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2504.03803v1

JanusDDG: A Thermodynamics-Compliant Model for Sequence-Based Protein Stability via Two-Fronts Multi-Head Attention

Understanding how residue variations affect protein stability is crucial for designing functional proteins and deciphering the molecular mechanisms underlying disease-related mutations. Recent advances in protein language models (PLMs) have revolutionized computational protein analysis, enabling, among other things, more accurate predictions of mutational effects. In this work, we introduce JanusDDG, a deep learning framework that leverages PLM-derived embeddings and a bidirectional cross-attention transformer architecture to predict $\Delta \Delta G$ of single and multiple-residue mutations while simultaneously being constrained to respect fundamental thermodynamic properties, such as antisymmetry and transitivity. Unlike conventional self-attention, JanusDDG computes queries (Q) and values (V) as the difference between wild-type and mutant embeddings, while keys (K) alternate between the two. This cross-interleaved attention mechanism enables the model to capture mutation-induced perturbations while preserving essential contextual information. Experimental results show that JanusDDG achieves state-of-the-art performance in predicting $\Delta \Delta G$ from sequence alone, matching or exceeding the accuracy of structure-based methods for both single and multiple mutations.

Updated: 2025-04-04 09:02:32

标题: JanusDDG：基于序列的蛋白稳定性的热力学一致模型，通过双前端多头注意力

摘要: 了解残基变化如何影响蛋白质稳定性对于设计功能性蛋白质和解析与疾病相关突变的分子机制至关重要。最近在蛋白质语言模型（PLMs）方面取得的进展彻底改变了计算蛋白质分析的方式，使得在其他方面，更准确地预测突变效应成为可能。在这项工作中，我们介绍了JanusDDG，这是一个深度学习框架，利用PLM导出的嵌入和双向交叉注意力变换器架构来预测单个和多个残基突变的ΔΔG，同时受到基本热力学性质的约束，如反对称性和传递性。与传统的自注意力不同，JanusDDG计算查询（Q）和值（V）作为野生型和突变型嵌入之间的差异，而关键（K）在两者之间交替。这种交叉交错的注意力机制使模型能够捕捉突变引起的扰动，同时保留基本的上下文信息。实验结果显示，JanusDDG在仅从序列预测ΔΔG方面达到了最先进的性能，对于单个和多个突变，匹配或超过了基于结构的方法的准确性。

更新时间: 2025-04-04 09:02:32

领域: q-bio.QM,cs.AI,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2504.03278v1

Privacy-Preserving Logistic Regression Training on Large Datasets

Privacy-preserving machine learning is one class of cryptographic methods that aim to analyze private and sensitive data while keeping privacy, such as homomorphic logistic regression training over large encrypted data. In this paper, we propose an efficient algorithm for logistic regression training on large encrypted data using Homomorphic Encryption (HE), which is the mini-batch version of recent methods using a faster gradient variant called $\texttt{quadratic gradient}$. It is claimed that $\texttt{quadratic gradient}$ can integrate curve information (Hessian matrix) into the gradient and therefore can effectively accelerate the first-order gradient (descent) algorithms. We also implement the full-batch version of their method when the encrypted dataset is so large that it has to be encrypted in the mini-batch manner. We compare our mini-batch algorithm with our full-batch implementation method on real financial data consisting of 422,108 samples with 200 freatures. %Our experiments show that Nesterov's accelerated gradient (NAG) Given the inefficiency of HEs, our results are inspiring and demonstrate that the logistic regression training on large encrypted dataset is of practical feasibility, marking a significant milestone in our understanding.

Updated: 2025-04-04 08:57:16

标题: 隐私保护的大型数据集上的逻辑回归训练

摘要: 隐私保护机器学习是密码学方法的一类，旨在分析私密和敏感数据的同时保护隐私，例如在大规模加密数据上进行同态逻辑回归训练。本文提出了一种使用同态加密（HE）的大规模加密数据上进行逻辑回归训练的高效算法，这是最近方法的小批量版本，使用了更快的梯度变体称为“二次梯度”。据称，“二次梯度”可以将曲线信息（Hessian矩阵）整合到梯度中，因此可以有效加速第一阶梯度（下降）算法。当加密数据集太大而必须以小批量方式加密时，我们还实现了其方法的全批量版本。我们将我们的小批量算法与我们在由422,108个样本和200个特征组成的真实金融数据上的全批量实现方法进行比较。我们的实验表明，鉴于HE的低效性，我们的结果令人鼓舞，并表明在大规模加密数据集上进行逻辑回归训练是可行的，标志着我们理解的重要里程碑。

更新时间: 2025-04-04 08:57:16

领域: cs.CR

下载: http://arxiv.org/abs/2406.13221v4

Monte Carlo Graph Coloring

Graph Coloring is probably one of the most studied and famous problem in graph algorithms. Exact methods fail to solve instances with more than few hundred vertices, therefore, a large number of heuristics have been proposed. Nested Monte Carlo Search (NMCS) and Nested Rollout Policy Adaptation (NRPA) are Monte Carlo search algorithms for single player games. Surprisingly, few work has been dedicated to evaluating Monte Carlo search algorithms to combinatorial graph problems. In this paper we expose how to efficiently apply Monte Carlo search to Graph Coloring and compare this approach to existing ones.

Updated: 2025-04-04 08:57:01

标题: 蒙特卡洛图着色

摘要: 图着色可能是图算法中研究最多、最著名的问题之一。精确方法无法解决具有数百个顶点以上的实例，因此提出了大量启发式方法。嵌套蒙特卡洛搜索（NMCS）和嵌套滚动策略调整（NRPA）是单人游戏的蒙特卡洛搜索算法。令人惊讶的是，很少有工作专门评估蒙特卡洛搜索算法对组合图问题的应用。在本文中，我们揭示了如何有效地将蒙特卡洛搜索应用于图着色，并将此方法与现有方法进行比较。

更新时间: 2025-04-04 08:57:01

领域: cs.AI

下载: http://arxiv.org/abs/2504.03277v1

Do Large Language Models Solve the Problems of Agent-Based Modeling? A Critical Review of Generative Social Simulations

Recent advancements in AI have reinvigorated Agent-Based Models (ABMs), as the integration of Large Language Models (LLMs) has led to the emergence of ``generative ABMs'' as a novel approach to simulating social systems. While ABMs offer means to bridge micro-level interactions with macro-level patterns, they have long faced criticisms from social scientists, pointing to e.g., lack of realism, computational complexity, and challenges of calibrating and validating against empirical data. This paper reviews the generative ABM literature to assess how this new approach adequately addresses these long-standing criticisms. Our findings show that studies show limited awareness of historical debates. Validation remains poorly addressed, with many studies relying solely on subjective assessments of model `believability', and even the most rigorous validation failing to adequately evidence operational validity. We argue that there are reasons to believe that LLMs will exacerbate rather than resolve the long-standing challenges of ABMs. The black-box nature of LLMs moreover limit their usefulness for disentangling complex emergent causal mechanisms. While generative ABMs are still in a stage of early experimentation, these findings question of whether and how the field can transition to the type of rigorous modeling needed to contribute to social scientific theory.

Updated: 2025-04-04 08:48:43

标题: 大型语言模型解决代理模型问题吗？对生成社会模拟的关键审查

摘要: 最近人工智能的进展重新激发了基于Agent的模型（ABMs）的兴趣，因为大型语言模型（LLMs）的整合导致了“生成ABMs”的出现，作为模拟社会系统的一种新方法。虽然ABMs提供了将微观层面的互动与宏观级别模式联系起来的手段，但长期以来受到社会科学家的批评，指出缺乏现实主义、计算复杂性和校准和验证与经验数据的挑战等问题。本文回顾了生成ABM文献，评估了这种新方法如何充分解决这些长期存在的批评。我们的研究发现表明，研究对历史争论的认识有限。验证仍然受到糟糕处理，许多研究仅依赖于对模型“可信度”的主观评估，即使最严格的验证也未能充分证明操作有效性。我们认为，有理由相信LLMs将加剧而不是解决ABMs长期存在的挑战。此外，LLMs的黑匣子特性限制了它们对解开复杂的新兴因果机制的有用性。虽然生成ABMs仍处于早期实验阶段，这些发现质疑了该领域是否以及如何转向需要严格建模的类型，以促进社会科学理论的发展。

更新时间: 2025-04-04 08:48:43

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2504.03274v1

Verification of Autonomous Neural Car Control with KeYmaera X

This article presents a formal model and formal safety proofs for the ABZ'25 case study in differential dynamic logic (dL). The case study considers an autonomous car driving on a highway avoiding collisions with neighbouring cars. Using KeYmaera X's dL implementation, we prove absence of collision on an infinite time horizon which ensures that safety is preserved independently of trip length. The safety guarantees hold for time-varying reaction time and brake force. Our dL model considers the single lane scenario with cars ahead or behind. We demonstrate that dL with its tools is a rigorous foundation for runtime monitoring, shielding, and neural network verification. Doing so sheds light on inconsistencies between the provided specification and simulation environment highway-env of the ABZ'25 study. We attempt to fix these inconsistencies and uncover numerous counterexamples which also indicate issues in the provided reinforcement learning environment.

Updated: 2025-04-04 08:43:31

标题: 使用KeYmaera X验证自主神经汽车控制

摘要: 这篇文章提出了ABZ'25案例研究的形式化模型和形式化安全证明，使用微分动态逻辑（dL）。该案例研究考虑了一辆自动驾驶汽车在高速公路上行驶，避免与邻近车辆碰撞。通过使用KeYmaera X的dL实现，我们证明了在无限时间范围内不存在碰撞，这确保了安全性独立于行程长度。安全性保证适用于时间变化的反应时间和刹车力量。我们的dL模型考虑了单车道情景，前后有汽车。我们展示了dL及其工具是运行时监控、屏蔽和神经网络验证的严格基础。通过这样做，我们揭示了ABZ'25研究中提供的规范与模拟环境highway-env之间的不一致之处。我们尝试修复这些不一致之处，并揭示了许多反例，这也表明了提供的强化学习环境存在问题。

更新时间: 2025-04-04 08:43:31

领域: eess.SY,cs.AI,cs.LG,cs.LO,cs.SY

下载: http://arxiv.org/abs/2504.03272v1

An Extended Symbolic-Arithmetic Model for Teaching Double-Black Removal with Rotation in Red-Black Trees

Double-black (DB) nodes have no place in red-black (RB) trees. So when DB nodes are formed, they are immediately removed. The removal of DB nodes that cause rotation and recoloring of other connected nodes poses greater challenges in the teaching and learning of RB trees. To ease this difficulty, this paper extends our previous work on the symbolic arithmetic algebraic (SA) method for removing DB nodes. The SA operations that are given as, Red + Black = Black; Black - Black = Red; Black + Black = DB; and DB - Black = Black removes DB nodes and rebalances black heights in RB trees. By extension, this paper projects three SA mathematical equations, namely, general symbolic arithmetic rule; partial symbolic arithmetic rule1; and partial symbolic arithmetic rule2. The removal of a DB node ultimately affects black heights in RB trees. To balance black heights using the SA equations, all the RB tree cases, namely, LR, RL, LL, and RR, were considered in this work; and the position of the nodes connected directly or indirectly to the DB node was also tested. In this study, to balance a RB tree, the issues considered w.r.t. the different cases of the RB tree were i) whether a DB node has an inner, outer, or both inner and outer black nephews; or ii) whether a DB node has an inner, outer or both inner and outer red nephews. The nephews r and x in this work are the children of the sibling s to a DB, and further up the tree, the parent p of a DB is their grandparent g. Thus, r and x have indirect relationships to a DB at the point of formation of the DB node. The novelty of the SA equations is in their effectiveness in the removal of DB that involves rotation of nodes as well as the recoloring of nodes along any simple path so as to balance black heights in a tree.

Updated: 2025-04-04 08:19:26

标题: 一个扩展的符号-算术模型，用于在红黑树中教授带旋转的双黑节点移除

摘要: 双黑（DB）节点在红黑（RB）树中没有位置。因此，当形成DB节点时，它们会立即被移除。删除导致其他连接节点旋转和重新着色的DB节点在教学和学习RB树中提出了更大的挑战。为了缓解这一困难，本文扩展了我们先前关于符号算术代数（SA）方法用于移除DB节点的工作。给出的SA操作为：红+黑=黑；黑-黑=红；黑+黑=DB；DB-黑=黑，这些操作可移除DB节点并重新平衡RB树中的黑高度。此外，本文提出了三个SA数学方程，即一般符号算术规则；部分符号算术规则1；和部分符号算术规则2。移除DB节点最终会影响RB树中的黑高度。为了使用SA方程平衡黑高度，本研究考虑了所有RB树情况，包括LR、RL、LL和RR，并测试了与DB节点直接或间接连接的节点的位置。在本研究中，为了平衡RB树，考虑了以下问题：i）DB节点是否具有内部、外部或内部和外部黑色侄子；或ii）DB节点是否具有内部、外部或内部和外部红色侄子。本文中的侄子r和x是DB的兄弟s的子节点，并且在树的更高层次中，DB的父节点p是它们的祖父节点g。因此，在DB节点形成时，r和x与DB之间存在间接关系。SA方程的新颖之处在于它们在删除涉及节点旋转和重新着色的DB节点以及沿任何简单路径平衡黑高度的树时的有效性。

更新时间: 2025-04-04 08:19:26

领域: cs.DS,cs.AI

下载: http://arxiv.org/abs/2504.03259v1

Semantic-guided Representation Learning for Multi-Label Recognition

Multi-label Recognition (MLR) involves assigning multiple labels to each data instance in an image, offering advantages over single-label classification in complex scenarios. However, it faces the challenge of annotating all relevant categories, often leading to uncertain annotations, such as unseen or incomplete labels. Recent Vision and Language Pre-training (VLP) based methods have made significant progress in tackling zero-shot MLR tasks by leveraging rich vision-language correlations. However, the correlation between multi-label semantics has not been fully explored, and the learned visual features often lack essential semantic information. To overcome these limitations, we introduce a Semantic-guided Representation Learning approach (SigRL) that enables the model to learn effective visual and textual representations, thereby improving the downstream alignment of visual images and categories. Specifically, we first introduce a graph-based multi-label correlation module (GMC) to facilitate information exchange between labels, enriching the semantic representation across the multi-label texts. Next, we propose a Semantic Visual Feature Reconstruction module (SVFR) to enhance the semantic information in the visual representation by integrating the learned textual representation during reconstruction. Finally, we optimize the image-text matching capability of the VLP model using both local and global features to achieve zero-shot MLR. Comprehensive experiments are conducted on several MLR benchmarks, encompassing both zero-shot MLR (with unseen labels) and single positive multi-label learning (with limited labels), demonstrating the superior performance of our approach compared to state-of-the-art methods. The code is available at https://github.com/MVL-Lab/SigRL.

Updated: 2025-04-04 08:15:08

标题: 多标签识别的语义引导表示学习

摘要: 多标签识别（MLR）涉及为图像中的每个数据实例分配多个标签，相较于单标签分类在复杂情境下具有优势。然而，它面临着注释所有相关类别的挑战，通常导致不确定的注释，如未见或不完整的标签。最近基于视觉和语言预训练（VLP）的方法在处理零样本MLR任务方面取得了显著进展，利用了丰富的视觉-语言相关性。然而，多标签语义之间的相关性尚未得到充分探索，学习到的视觉特征通常缺乏必要的语义信息。为了克服这些限制，我们引入了一种语义引导的表示学习方法（SigRL），使模型能够学习有效的视觉和文本表示，从而提高视觉图像和类别之间的下游对齐。具体地，我们首先引入了基于图的多标签相关性模块（GMC）来促进标签之间的信息交换，丰富跨多标签文本的语义表示。其次，我们提出了语义视觉特征重建模块（SVFR），通过在重建过程中整合学习到的文本表示来增强视觉表示中的语义信息。最后，我们利用局部和全局特征优化VLP模型的图像-文本匹配能力，实现零样本MLR。我们在几个MLR基准上进行了全面的实验，涵盖了零样本MLR（包括未见标签）和单正多标签学习（包括有限标签），展示了我们方法相较于最先进方法的卓越性能。代码可在https://github.com/MVL-Lab/SigRL获取。

更新时间: 2025-04-04 08:15:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03801v1

UniGEM: A Unified Approach to Generation and Property Prediction for Molecules

Molecular generation and molecular property prediction are both crucial for drug discovery, but they are often developed independently. Inspired by recent studies, which demonstrate that diffusion model, a prominent generative approach, can learn meaningful data representations that enhance predictive tasks, we explore the potential for developing a unified generative model in the molecular domain that effectively addresses both molecular generation and property prediction tasks. However, the integration of these tasks is challenging due to inherent inconsistencies, making simple multi-task learning ineffective. To address this, we propose UniGEM, the first unified model to successfully integrate molecular generation and property prediction, delivering superior performance in both tasks. Our key innovation lies in a novel two-phase generative process, where predictive tasks are activated in the later stages, after the molecular scaffold is formed. We further enhance task balance through innovative training strategies. Rigorous theoretical analysis and comprehensive experiments demonstrate our significant improvements in both tasks. The principles behind UniGEM hold promise for broader applications, including natural language processing and computer vision.

Updated: 2025-04-04 07:57:36

标题: UniGEM：一种统一的分子生成和属性预测方法

摘要: 分子生成和分子属性预测对于药物发现都至关重要，但它们通常是独立开发的。受最近的研究启发，这些研究表明扩散模型，一种突出的生成方法，可以学习增强预测任务的有意义的数据表示，我们探索了在分子领域开发统一生成模型的潜力，有效地解决分子生成和属性预测任务。然而，由于固有的不一致性，这些任务的整合具有挑战性，导致简单的多任务学习效果不佳。为了解决这个问题，我们提出了UniGEM，这是第一个成功整合分子生成和属性预测的统一模型，在两个任务中都提供了出色的性能。我们的关键创新在于一种新颖的两阶段生成过程，在分子骨架形成后，预测任务被激活。我们通过创新的训练策略进一步增强任务平衡。严格的理论分析和全面的实验证明了我们在两个任务中的显著改进。UniGEM背后的原则为更广泛的应用提供了希望，包括自然语言处理和计算机视觉。

更新时间: 2025-04-04 07:57:36

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2410.10516v3

Seeing is Believing: Belief-Space Planning with Foundation Models as Uncertainty Estimators

Generalizable robotic mobile manipulation in open-world environments poses significant challenges due to long horizons, complex goals, and partial observability. A promising approach to address these challenges involves planning with a library of parameterized skills, where a task planner sequences these skills to achieve goals specified in structured languages, such as logical expressions over symbolic facts. While vision-language models (VLMs) can be used to ground these expressions, they often assume full observability, leading to suboptimal behavior when the agent lacks sufficient information to evaluate facts with certainty. This paper introduces a novel framework that leverages VLMs as a perception module to estimate uncertainty and facilitate symbolic grounding. Our approach constructs a symbolic belief representation and uses a belief-space planner to generate uncertainty-aware plans that incorporate strategic information gathering. This enables the agent to effectively reason about partial observability and property uncertainty. We demonstrate our system on a range of challenging real-world tasks that require reasoning in partially observable environments. Simulated evaluations show that our approach outperforms both vanilla VLM-based end-to-end planning or VLM-based state estimation baselines by planning for and executing strategic information gathering. This work highlights the potential of VLMs to construct belief-space symbolic scene representations, enabling downstream tasks such as uncertainty-aware planning.

Updated: 2025-04-04 07:48:53

标题: Seeing is Believing: 以基础模型作为不确定性估计器的信念空间规划

摘要: 在开放世界环境中进行通用的机器人移动操作面临着重要挑战，这是因为存在长期规划、复杂目标和部分可观察性。解决这些挑战的一个有前途的方法涉及使用参数化技能库进行规划，其中任务规划器按照结构化语言中指定的目标顺序排列这些技能，例如逻辑表达式覆盖符号化事实。虽然视觉语言模型（VLMs）可以用来支撑这些表达式，但它们通常假设存在完全可观察性，当代理缺乏足够信息来确定评估事实时会导致行为不佳。本文介绍了一个新颖的框架，利用VLMs作为感知模块来估计不确定性并促进符号化基础。我们的方法构建了一个符号信念表示，并使用信念空间规划器生成考虑不确定性的计划，其中包含战略信息收集。这使得代理能够有效地推理部分可观察性和属性不确定性。我们在一系列需要在部分可观察环境中进行推理的具有挑战性的真实任务上展示了我们的系统。模拟评估表明，我们的方法优于基于普通VLM的端到端规划或基于VLM的状态估计基线，因为它规划并执行战略信息收集。这项工作突出了VLMs构建信念空间符号化场景表示的潜力，从而实现下游任务，如不确定性感知规划。

更新时间: 2025-04-04 07:48:53

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2504.03245v1

Tuning-Free Structured Sparse PCA via Deep Unfolding Networks

Sparse principal component analysis (PCA) is a well-established dimensionality reduction technique that is often used for unsupervised feature selection (UFS). However, determining the regularization parameters is rather challenging, and conventional approaches, including grid search and Bayesian optimization, not only bring great computational costs but also exhibit high sensitivity. To address these limitations, we first establish a structured sparse PCA formulation by integrating $\ell_1$-norm and $\ell_{2,1}$-norm to capture the local and global structures, respectively. Building upon the off-the-shelf alternating direction method of multipliers (ADMM) optimization framework, we then design an interpretable deep unfolding network that translates iterative optimization steps into trainable neural architectures. This innovation enables automatic learning of the regularization parameters, effectively bypassing the empirical tuning requirements of conventional methods. Numerical experiments on benchmark datasets validate the advantages of our proposed method over the existing state-of-the-art methods. Our code will be accessible at https://github.com/xianchaoxiu/SPCA-Net.

Updated: 2025-04-04 07:47:35

标题: 通过深度展开网络实现的无调整结构稀疏PCA

摘要: 稀疏主成分分析（PCA）是一种成熟的降维技术，通常用于无监督特征选择（UFS）。然而，确定正则化参数相当具有挑战性，传统方法包括网格搜索和贝叶斯优化不仅带来巨大的计算成本，而且表现出很高的敏感性。为了解决这些限制，我们首先通过将$\ell_1$范数和$\ell_{2,1}$范数整合，建立了一个结构化的稀疏PCA公式，分别捕捉局部和全局结构。在现成的交替方向乘子方法（ADMM）优化框架的基础上，我们设计了一个可解释的深度展开网络，将迭代优化步骤转化为可训练的神经结构。这一创新使得能够自动学习正则化参数，有效地绕过传统方法的经验调整要求。在基准数据集上的数值实验验证了我们提出的方法相对于现有最先进方法的优势。我们的代码将在https://github.com/xianchaoxiu/SPCA-Net 上可访问。

更新时间: 2025-04-04 07:47:35

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2502.20837v2

Rotation Invariance in Floor Plan Digitization using Zernike Moments

Nowadays, a lot of old floor plans exist in printed form or are stored as scanned raster images. Slight rotations or shifts may occur during scanning. Bringing floor plans of this form into a machine readable form to enable further use, still poses a problem. Therefore, we propose an end-to-end pipeline that pre-processes the image and leverages a novel approach to create a region adjacency graph (RAG) from the pre-processed image and predict its nodes. By incorporating normalization steps into the RAG feature extraction, we significantly improved the rotation invariance of the RAG feature calculation. Moreover, applying our method leads to an improved F1 score and IoU on rotated data. Furthermore, we proposed a wall splitting algorithm for partitioning walls into segments associated with the corresponding rooms.

Updated: 2025-04-04 07:44:07

标题: 使用Zernike矩实现楼层平面数字化的旋转不变性

摘要: 如今，许多老的平面图以印刷形式存在，或者存储为扫描的光栅图像。在扫描过程中可能会发生轻微的旋转或移位。将这种形式的平面图转换为机器可读形式以便进一步使用仍然是一个问题。因此，我们提出了一个端到端的流程，对图像进行预处理，并利用一种新颖的方法从预处理图像中创建区域相邻图（RAG）并预测其节点。通过将归一化步骤纳入到RAG特征提取中，我们显著改善了RAG特征计算的旋转不变性。此外，应用我们的方法导致了在旋转数据上改进的F1分数和IoU。此外，我们提出了一种墙壁分割算法，用于将墙壁分割为与相应房间相关联的段落。

更新时间: 2025-04-04 07:44:07

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03241v1

Decision SpikeFormer: Spike-Driven Transformer for Decision Making

Offline reinforcement learning (RL) enables policy training solely on pre-collected data, avoiding direct environment interaction - a crucial benefit for energy-constrained embodied AI applications. Although Artificial Neural Networks (ANN)-based methods perform well in offline RL, their high computational and energy demands motivate exploration of more efficient alternatives. Spiking Neural Networks (SNNs) show promise for such tasks, given their low power consumption. In this work, we introduce DSFormer, the first spike-driven transformer model designed to tackle offline RL via sequence modeling. Unlike existing SNN transformers focused on spatial dimensions for vision tasks, we develop Temporal Spiking Self-Attention (TSSA) and Positional Spiking Self-Attention (PSSA) in DSFormer to capture the temporal and positional dependencies essential for sequence modeling in RL. Additionally, we propose Progressive Threshold-dependent Batch Normalization (PTBN), which combines the benefits of LayerNorm and BatchNorm to preserve temporal dependencies while maintaining the spiking nature of SNNs. Comprehensive results in the D4RL benchmark show DSFormer's superiority over both SNN and ANN counterparts, achieving 78.4% energy savings, highlighting DSFormer's advantages not only in energy efficiency but also in competitive performance. Code and models are public at https://wei-nijuan.github.io/DecisionSpikeFormer.

Updated: 2025-04-04 07:42:36

标题: 决策尖峰：基于尖峰驱动的变压器用于决策制定

摘要: 离线强化学习（RL）使政策训练仅基于预先收集的数据，避免直接与环境进行交互-对于能源受限的具有体现型人工智能应用而言，这是一个关键的好处。虽然基于人工神经网络（ANN）的方法在离线RL中表现出色，但它们的高计算和能量需求促使我们探索更有效的替代方案。脉冲神经网络（SNN）在这些任务中表现出潜力，因为它们具有低功耗。在这项工作中，我们介绍了DSFormer，这是第一个通过序列建模来解决离线RL的脉冲驱动变压器模型。与现有的专注于视觉任务的SNN变压器不同，我们在DSFormer中开发了时间脉冲自注意力（TSSA）和位置脉冲自注意力（PSSA），以捕捉在RL中进行序列建模所必需的时间和位置依赖关系。此外，我们提出了渐进性阈值相关批量归一化（PTBN），将LayerNorm和BatchNorm的优点结合起来，以保留时间依赖性同时保持SNN的脉冲特性。在D4RL基准测试中的全面结果显示DSFormer在节能方面优于SNN和ANN对应物，实现了78.4%的节能，突显了DSFormer不仅在能源效率方面具有优势，而且在竞争性性能方面也具有优势。代码和模型可在https://wei-nijuan.github.io/DecisionSpikeFormer上公开获取。

更新时间: 2025-04-04 07:42:36

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2504.03800v1

Malware Detection in Docker Containers: An Image is Worth a Thousand Logs

Malware detection is increasingly challenged by evolving techniques like obfuscation and polymorphism, limiting the effectiveness of traditional methods. Meanwhile, the widespread adoption of software containers has introduced new security challenges, including the growing threat of malicious software injection, where a container, once compromised, can serve as entry point for further cyberattacks. In this work, we address these security issues by introducing a method to identify compromised containers through machine learning analysis of their file systems. We cast the entire software containers into large RGB images via their tarball representations, and propose to use established Convolutional Neural Network architectures on a streaming, patch-based manner. To support our experiments, we release the COSOCO dataset--the first of its kind--containing 3364 large-scale RGB images of benign and compromised software containers at https://huggingface.co/datasets/k3ylabs/cosoco-image-dataset. Our method detects more malware and achieves higher F1 and Recall scores than all individual and ensembles of VirusTotal engines, demonstrating its effectiveness and setting a new standard for identifying malware-compromised software containers.

Updated: 2025-04-04 07:38:16

标题: 在Docker容器中的恶意软件检测：一张镜像抵得上一千条日志

摘要: 恶意软件检测越来越受到逐渐发展的技术挑战，如混淆和多态性的限制传统方法的有效性。同时，软件容器的广泛采用引入了新的安全挑战，包括恶意软件注入的威胁不断增长，一旦容器受到攻击，就可以成为进一步网络攻击的入口点。在这项工作中，我们通过对容器文件系统进行机器学习分析，介绍了一种识别受损容器的方法。我们通过tarball表示法将整个软件容器转换成大型RGB图像，并建议在基于流式、基于补丁的方式上使用已建立的卷积神经网络架构。为了支持我们的实验，我们发布了COSOCO数据集--这是第一个这样的数据集--其中包含3364个大规模的RGB图像，展示了良性和受损软件容器的情况。我们的方法检测到更多恶意软件，并获得比所有单个和组合的VirusTotal引擎更高的F1和召回率得分，展示了其有效性，并为识别受恶意软件损害的软件容器设定了新的标准。

更新时间: 2025-04-04 07:38:16

领域: cs.CR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.03238v1

Crash Time Matters: HybridMamba for Fine-Grained Temporal Localization in Traffic Surveillance Footage

Traffic crash detection in long-form surveillance videos is critical for emergency response and infrastructure planning but remains difficult due to the brief and rare nature of crash events. We introduce HybridMamba, a novel architecture that combines visual transformers with state-space temporal modeling to achieve accurate crash time localization. Our method uses multi-level token compression and hierarchical temporal processing to remain computationally efficient without sacrificing temporal resolution. Evaluated on a large-scale dataset from the Iowa Department of Transportation, HybridMamba achieves a mean absolute error of 1.50 seconds, with 65.2 percent of predictions within one second of the ground truth. It outperforms recent video-language models such as TimeChat and VideoLLaMA2 by up to 2.8 seconds, while using significantly fewer parameters. Our results demonstrate strong generalization across videos ranging from 2 to 40 minutes in diverse conditions. HybridMamba offers a robust and efficient solution for fine-grained temporal localization in traffic surveillance. The code will be released upon publication.

Updated: 2025-04-04 07:35:11

标题: 碰撞时间很重要：HybridMamba用于交通监控视频中细粒度时间定位

摘要: 长时间监控视频中的交通事故检测对于紧急响应和基础设施规划至关重要，但由于交通事故事件的短暂和罕见性质，仍然具有挑战性。我们介绍了一种新颖的架构HybridMamba，它将视觉变换器与状态空间时间建模结合起来，以实现准确的交通事故时间定位。我们的方法使用多级标记压缩和分层时间处理，保持计算效率而不牺牲时间分辨率。在爱荷华州交通部的大型数据集上评估，HybridMamba实现了1.50秒的平均绝对误差，65.2%的预测结果与真实情况相差不到一秒。它在时间聊天和VideoLLaMA2等最近的视频语言模型上表现优异，最多比它们提高了2.8秒，同时使用的参数显著更少。我们的结果表明，在不同条件下，从2分钟到40分钟的视频中，HybridMamba在交通监控中提供了强大且高效的细粒度时间定位解决方案。代码将在发表后发布。

更新时间: 2025-04-04 07:35:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03235v1

Think When You Need: Self-Adaptive Chain-of-Thought Learning

Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems. We identify that existing approaches directly penalizing reasoning length fail to account for varying problem complexity. Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness. Moreover, we further demonstrate our method to fuzzy tasks where ground truth is unavailable. Experiments across multiple reasoning benchmarks demonstrate that our method maintains accuracy while generating significantly more concise explanations, effectively teaching models to "think when needed."

Updated: 2025-04-04 07:34:01

标题: 当你需要思考时：自适应的思维链学习

摘要: 思维链（CoT）推理提高了语言模型的性能，但往往会导致在简单问题上的低效“过度思考”。我们发现，现有方法直接惩罚推理长度未能考虑到问题复杂性的变化。我们的方法通过长度和质量比较构建奖励，受到理论假设的指导，共同增强解决方案的正确性和简洁性。此外，我们进一步证明了我们的方法适用于模糊任务，其中真相不可得知。跨多个推理基准的实验证明，我们的方法在保持准确性的同时生成更简洁的解释，有效地教导模型“在需要时思考”。

更新时间: 2025-04-04 07:34:01

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03234v1

Unlocking Neural Transparency: Jacobian Maps for Explainable AI in Alzheimer's Detection

Alzheimer's disease (AD) leads to progressive cognitive decline, making early detection crucial for effective intervention. While deep learning models have shown high accuracy in AD diagnosis, their lack of interpretability limits clinical trust and adoption. This paper introduces a novel pre-model approach leveraging Jacobian Maps (JMs) within a multi-modal framework to enhance explainability and trustworthiness in AD detection. By capturing localized brain volume changes, JMs establish meaningful correlations between model predictions and well-known neuroanatomical biomarkers of AD. We validate JMs through experiments comparing a 3D CNN trained on JMs versus on traditional preprocessed data, which demonstrates superior accuracy. We also employ 3D Grad-CAM analysis to provide both visual and quantitative insights, further showcasing improved interpretability and diagnostic reliability.

Updated: 2025-04-04 07:24:52

标题: 揭示神经透明性：雅各比映射在阿尔茨海默病检测中的可解释人工智能

摘要: 阿尔茨海默病（AD）导致认知能力逐渐下降，使得早期检测对有效干预至关重要。虽然深度学习模型在AD诊断中表现出高准确性，但它们缺乏可解释性限制了临床的信任和采用。本文介绍了一种新颖的预模型方法，利用雅可比映射（JMs）在多模态框架内增强AD检测的可解释性和可信度。通过捕捉局部脑容积变化，JMs建立了模型预测与已知AD神经解剖生物标记之间的有意义的相关性。我们通过比较在JMs上训练的3D CNN与传统预处理数据上训练的3D CNN的实验证实了JMs的有效性，表现出更高的准确性。我们还采用了3D Grad-CAM分析，提供了视觉和数量上的见解，进一步展示了改进后的可解释性和诊断可靠性。

更新时间: 2025-04-04 07:24:52

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.03230v1

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos and drawing the attention of the community in their potential as world simulators. However, despite their capabilities, VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics, resulting in incorrect dynamics and event sequences. To address this limitation, we propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior. In the first stage, we employ a Vision Language Model (VLM) as a coarse-grained motion planner, integrating chain-of-thought and physics-aware reasoning to predict a rough motion trajectories/changes that approximate real-world physical dynamics while ensuring the inter-frame consistency. In the second stage, we use the predicted motion trajectories/changes to guide the video generation of a VDM. As the predicted motion trajectories/changes are rough, noise is added during inference to provide freedom to the VDM in generating motion with more fine details. Extensive experimental results demonstrate that our framework can produce physically plausible motion, and comparative evaluations highlight the notable superiority of our approach over existing methods. More video results are available on our Project Page: https://madaoer.github.io/projects/physically_plausible_video_generation.

Updated: 2025-04-04 07:23:21

标题: VLIPP: 通过视觉和语言信息的物理先验实现真实可信的视频生成

摘要: 视频扩散模型（VDMs）近年来取得了显著进展，使得生成高度逼真的视频成为可能，并引起了社区对其作为世界模拟器潜力的关注。然而，尽管具有这些功能，由于对物理学的固有缺乏理解，VDMs通常无法生成物理合理的视频，导致错误的动态和事件序列。为了解决这一限制，我们提出了一种新颖的两阶段图像到视频生成框架，明确地将物理学与视觉和语言知识的物理先验相结合。在第一阶段，我们将视觉语言模型（VLM）作为一个粗粒度的运动规划器，整合思维链和具有物理意识的推理，以预测近似真实世界物理动态的粗糙运动轨迹/变化，同时确保帧间一致性。在第二阶段，我们使用预测的运动轨迹/变化来指导VDM的视频生成。由于预测的运动轨迹/变化是粗糙的，在推断过程中添加噪声，以提供更多细节的自由度给VDM生成更多细节的运动。大量实验证明，我们的框架可以生成物理合理的运动，比较评估突出了我们的方法相对于现有方法的显著优势。更多视频结果可在我们的项目页面上找到：https://madaoer.github.io/projects/physically_plausible_video_generation。

更新时间: 2025-04-04 07:23:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.23368v3

Learning Lie Group Generators from Trajectories

This work investigates the inverse problem of generator recovery in matrix Lie groups from discretized trajectories. Let $G$ be a real matrix Lie group and $\mathfrak{g} = \text{Lie}(G)$ its corresponding Lie algebra. A smooth trajectory $\gamma($t$)$ generated by a fixed Lie algebra element $\xi \in \mathfrak{g}$ follows the exponential flow $\gamma($t$) = g_0 \cdot \exp(t \xi)$. The central task addressed in this work is the reconstruction of such a latent generator $\xi$ from a discretized sequence of poses $ \{g_0, g_1, \dots, g_T\} \subset G$, sampled at uniform time intervals. This problem is formulated as a data-driven regression from normalized sequences of discrete Lie algebra increments $\log\left(g_{t}^{-1} g_{t+1}\right)$ to the constant generator $\xi \in \mathfrak{g}$. A feedforward neural network is trained to learn this mapping across several groups, including $\text{SE(2)}, \text{SE(3)}, \text{SO(3)}, and \text{SL(2,$\mathbb{R})$}$. It demonstrates strong empirical accuracy under both clean and noisy conditions, which validates the viability of data-driven recovery of Lie group generators using shallow neural architectures. This is Lie-RL GitHub Repo https://github.com/Anormalm/LieRL-on-Trajectories. Feel free to make suggestions and collaborations!

Updated: 2025-04-04 07:08:59

标题: 从轨迹学习李群生成器

摘要: 这项工作研究了从离散轨迹中恢复矩阵Lie群生成器的逆问题。设$G$为一个实矩阵Lie群，$\mathfrak{g} = \text{Lie}(G)$为其对应的Lie代数。由固定Lie代数元素$\xi \in \mathfrak{g}$生成的平滑轨迹$\gamma(t)$遵循指数流$\gamma(t) = g_0 \cdot \exp(t \xi)$。本工作中的核心任务是从离散姿势序列$ \{g_0, g_1, \dots, g_T\} \subset G$中重建这样一个潜在生成器$\xi$，这些姿势在均匀时间间隔内采样。该问题被表述为从规范化的离散Lie代数增量序列$\log\left(g_{t}^{-1} g_{t+1}\right)$到常数生成器$\xi \in \mathfrak{g}$的数据驱动回归。一个前馈神经网络被训练用于学习这种映射，跨越几个群，包括$\text{SE(2)}, \text{SE(3)}, \text{SO(3)}, 和 \text{SL(2,$\mathbb{R})$}$。它在干净和嘈杂条件下展现出强大的经验准确性，验证了使用浅层神经结构进行Lie群生成器的数据驱动恢复的可行性。这是Lie-RL GitHub Repo https://github.com/Anormalm/LieRL-on-Trajectories。欢迎提出建议和合作！

更新时间: 2025-04-04 07:08:59

领域: cs.LG

下载: http://arxiv.org/abs/2504.03220v1

Structured Knowledge Accumulation: The Principle of Entropic Least Action in Forward-Only Neural Learning

This paper aims to extend the Structured Knowledge Accumulation (SKA) framework recently proposed by \cite{mahi2025ska}. We introduce two core concepts: the Tensor Net function and the characteristic time property of neural learning. First, we reinterpret the learning rate as a time step in a continuous system. This transforms neural learning from discrete optimization into continuous-time evolution. We show that learning dynamics remain consistent when the product of learning rate and iteration steps stays constant. This reveals a time-invariant behavior and identifies an intrinsic timescale of the network. Second, we define the Tensor Net function as a measure that captures the relationship between decision probabilities, entropy gradients, and knowledge change. Additionally, we define its zero-crossing as the equilibrium state between decision probabilities and entropy gradients. We show that the convergence of entropy and knowledge flow provides a natural stopping condition, replacing arbitrary thresholds with an information-theoretic criterion. We also establish that SKA dynamics satisfy a variational principle based on the Euler-Lagrange equation. These findings extend SKA into a continuous and self-organizing learning model. The framework links computational learning with physical systems that evolve by natural laws. By understanding learning as a time-based process, we open new directions for building efficient, robust, and biologically-inspired AI systems.

Updated: 2025-04-04 07:00:27

标题: 结构化知识积累：正向唯一神经学习中的熵最小作用原理

摘要: 本文旨在扩展最近由\cite{mahi2025ska}提出的结构化知识积累（SKA）框架。我们引入了两个核心概念：张量网络函数和神经学习的特征时间属性。首先，我们将学习率重新解释为连续系统中的时间步长。这将神经学习从离散优化转变为连续时间演化。我们展示了当学习率和迭代步数的乘积保持恒定时，学习动态仍然保持一致。这揭示了一种时间不变行为，并确定了网络的内在时间尺度。其次，我们将张量网络函数定义为捕捉决策概率、熵梯度和知识变化之间关系的度量。此外，我们将其零点交叉定义为决策概率和熵梯度之间的平衡状态。我们展示了熵和知识流的收敛提供了一种自然的停止条件，用信息论准则替代了任意阈值。我们还建立了SKA动态满足基于欧拉-拉格朗日方程的变分原理。这些发现将SKA扩展为一个连续和自组织的学习模型。该框架将计算学习与依据自然法则演化的物理系统联系起来。通过将学习理解为基于时间的过程，我们为构建高效、稳健和生物启发的人工智能系统打开了新的方向。

更新时间: 2025-04-04 07:00:27

领域: cs.LG,68T07

下载: http://arxiv.org/abs/2504.03214v1

Persuasive Calibration

We introduce and study the persuasive calibration problem, where a principal aims to provide trustworthy predictions about underlying events to a downstream agent to make desired decisions. We adopt the standard calibration framework that regulates predictions to be unbiased conditional on their own value, and thus, they can reliably be interpreted at the face value by the agent. Allowing a small calibration error budget, we aim to answer the following question: what is and how to compute the optimal predictor under this calibration error budget, especially when there exists incentive misalignment between the principal and the agent? We focus on standard Lt-norm Expected Calibration Error (ECE) metric. We develop a general framework by viewing predictors as post-processed versions of perfectly calibrated predictors. Using this framework, we first characterize the structure of the optimal predictor. Specifically, when the principal's utility is event-independent and for L1-norm ECE, we show: (1) the optimal predictor is over-(resp. under-) confident for high (resp. low) true expected outcomes, while remaining perfectly calibrated in the middle; (2) the miscalibrated predictions exhibit a collinearity structure with the principal's utility function. On the algorithmic side, we provide a FPTAS for computing approximately optimal predictor for general principal utility and general Lt-norm ECE. Moreover, for the L1- and L-Infinity-norm ECE, we provide polynomial-time algorithms that compute the exact optimal predictor.

Updated: 2025-04-04 06:49:56

标题: 说服性校准

摘要: 我们介绍并研究了说服校准问题，其中一个委托人旨在为下游代理提供关于潜在事件的可信预测，以便做出期望的决策。我们采用了标准的校准框架，该框架规定预测在其自身价值条件下是无偏的，因此，它们可以可靠地被代理人按照面值解释。在允许一定的校准误差预算的情况下，我们的目标是回答以下问题：在这种校准误差预算下，什么是最佳预测器以及如何计算它，特别是当委托人和代理人之间存在动机不一致时？我们专注于标准的Lt-范数期望校准误差（ECE）度量。我们通过将预测器视为经过后处理的完全校准预测器的版本，开发了一个通用框架。利用这一框架，我们首先表征了最佳预测器的结构。具体地，在委托人的效用与事件无关且对于L1-范数ECE时，我们展示：（1）最佳预测器对于高（低）真实预期结果过度自信（或不足），同时在中间保持完全校准；（2）错误校准的预测表现出与委托人效用函数的共线性结构。在算法方面，我们为计算一般委托人效用和一般Lt-范数ECE的近似最佳预测器提供了一个FPTAS。此外，对于L1-和L-Infinity-范数ECE，我们提供了计算准确最佳预测器的多项式时间算法。

更新时间: 2025-04-04 06:49:56

领域: cs.LG,cs.AI,cs.GT,econ.TH

下载: http://arxiv.org/abs/2504.03211v1

PIONM: A Generalized Approach to Solving Density-Constrained Mean-Field Games Equilibrium under Modified Boundary Conditions

Neural network-based methods are effective for solving equilibria in Mean-Field Games (MFGs), particularly in high-dimensional settings. However, solving the coupled partial differential equations (PDEs) in MFGs limits their applicability since solving coupled PDEs is computationally expensive. Additionally, modifying boundary conditions, such as the initial state distribution or terminal value function, necessitates extensive retraining, reducing scalability. To address these challenges, we propose a generalized framework, PIONM (Physics-Informed Neural Operator NF-MKV Net), which leverages physics-informed neural operators to solve MFGs equations. PIONM utilizes neural operators to compute MFGs equilibria for arbitrary boundary conditions. The method encodes boundary conditions as input features and trains the model to align them with density evolution, modeled using discrete-time normalizing flows. Once trained, the algorithm efficiently computes the density distribution at any time step for modified boundary condition, ensuring efficient adaptation to different boundary conditions in MFGs equilibria. Unlike traditional MFGs methods constrained by fixed coefficients, PIONM efficiently computes equilibria under varying boundary conditions, including obstacles, diffusion coefficients, initial densities, and terminal functions. PIONM can adapt to modified conditions while preserving density distribution constraints, demonstrating superior scalability and generalization capabilities compared to existing methods.

Updated: 2025-04-04 06:46:09

标题: PIONM：一种解决修改边界条件下密度约束均场博弈均衡的广义方法

摘要: 基于神经网络的方法在解决均场博弈（MFGs）中的平衡问题方面非常有效，特别是在高维环境中。然而，解决MFGs中耦合的偏微分方程（PDEs）限制了它们的适用性，因为解决耦合的PDEs计算成本高昂。此外，修改边界条件，如初始状态分布或终端值函数，需要进行大量的重新训练，降低了可扩展性。为了解决这些挑战，我们提出了一个广义框架PIONM（基于物理信息的神经算子NF-MKV网络），利用物理信息的神经算子来解决MFGs方程。PIONM利用神经算子计算各种边界条件下的MFGs平衡。该方法将边界条件编码为输入特征，并训练模型使其与密度演化对齐，使用离散时间标准化流进行建模。一旦训练完成，该算法可以高效地计算任何时间步长下的密度分布，以确保对MFGs平衡中不同边界条件的有效适应。与传统的受固定系数约束的MFGs方法不同，PIONM可以在不同边界条件下高效地计算平衡，包括障碍物、扩散系数、初始密度和终端函数。PIONM可以在保持密度分布约束的同时适应修改后的条件，相比现有方法展现出更好的可扩展性和泛化能力。

更新时间: 2025-04-04 06:46:09

领域: cs.LG

下载: http://arxiv.org/abs/2504.03209v1

Augmenting Human Cognition With Generative AI: Lessons From AI-Assisted Decision-Making

How can we use generative AI to design tools that augment rather than replace human cognition? In this position paper, we review our own research on AI-assisted decision-making for lessons to learn. We observe that in both AI-assisted decision-making and generative AI, a popular approach is to suggest AI-generated end-to-end solutions to users, which users can then accept, reject, or edit. Alternatively, AI tools could offer more incremental support to help users solve tasks themselves, which we call process-oriented support. We describe findings on the challenges of end-to-end solutions, and how process-oriented support can address them. We also discuss the applicability of these findings to generative AI based on a recent study in which we compared both approaches to assist users in a complex decision-making task with LLMs.

Updated: 2025-04-04 06:40:03

标题: 用生成式人工智能增强人类认知：从AI辅助决策中学到的经验lessons

摘要: 我们如何利用生成式人工智能设计工具，以增强而非取代人类认知？在这篇立场论文中，我们回顾了自己在AI辅助决策方面的研究，以获得经验教训。我们观察到，在AI辅助决策和生成式人工智能中，一种流行的方法是向用户建议由AI生成的端到端解决方案，用户可以接受、拒绝或编辑。另外，AI工具可以提供更加渐进的支持，帮助用户自行解决任务，我们称之为过程导向支持。我们描述了有关端到端解决方案的挑战，并说明过程导向支持如何应对这些挑战。我们还讨论了这些发现在生成式人工智能中的适用性，基于最近一项研究，我们比较了这两种方法在LLMs帮助用户进行复杂决策任务中的效果。

更新时间: 2025-04-04 06:40:03

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2504.03207v1

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Effective conversational agents must be able to personalize their behavior to suit a user's preferences, personality, and attributes, whether they are assisting with writing tasks or operating in domains like education or healthcare. Current training methods like Reinforcement Learning from Human Feedback (RLHF) prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized interactions. Traditional approaches to personalization often rely on extensive user history, limiting their effectiveness for new or context-limited users. To overcome these limitations, we propose to incorporate an intrinsic motivation to improve the conversational agents's model of the user as an additional reward alongside multi-turn RLHF. This reward mechanism encourages the agent to actively elicit user traits by optimizing conversations to increase the accuracy of its user model. Consequently, the policy agent can deliver more personalized interactions through obtaining more information about the user. We applied our method both education and fitness settings, where LLMs teach concepts or recommend personalized strategies based on users' hidden learning style or lifestyle attributes. Using LLM-simulated users, our approach outperformed a multi-turn RLHF baseline in revealing information about the users' preferences, and adapting to them.

Updated: 2025-04-04 06:35:02

标题: 用好奇奖励增强个性化多轮对话

摘要: 有效的对话代理必须能够个性化其行为以适应用户的偏好、个性和属性，无论是在协助写作任务还是在教育或医疗等领域运作。当前的训练方法，如从人类反馈中进行强化学习（RLHF），优先考虑帮助性和安全性，但在促进真正具有同理心、适应性和个性化交互方面表现不佳。传统的个性化方法通常依赖于广泛的用户历史，限制了它们对新用户或受限上下文的有效性。为了克服这些局限性，我们建议将一种内在动机纳入到多轮RLHF之中，作为与用户模型的进一步奖励。这种奖励机制鼓励代理人通过优化对话来积极引发用户特征，以提高其用户模型的准确性。因此，策略代理可以通过获取更多关于用户的信息来提供更加个性化的交互。我们将我们的方法应用于教育和健身环境中，在这些环境中，LLMs根据用户的隐藏学习风格或生活方式属性教授概念或推荐个性化策略。使用LLM模拟用户，我们的方法在揭示用户偏好信息和适应这些信息方面优于多轮RLHF基线。

更新时间: 2025-04-04 06:35:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.03206v1

BondMatcher: H-Bond Stability Analysis in Molecular Systems

This application paper investigates the stability of hydrogen bonds (H-bonds), as characterized by the Quantum Theory of Atoms in Molecules (QTAIM). First, we contribute a database of 4544 electron densities associated to four isomers of water hexamers (the so-called Ring, Book, Cage and Prism), generated by distorting their equilibrium geometry under various structural perturbations, modeling the natural dynamic behavior of molecular systems. Second, we present a new stability measure, called bond occurrence rate, associating each bond path present at equilibrium with its rate of occurrence within the input ensemble. We also provide an algorithm, called BondMatcher, for its automatic computation, based on a tailored, geometry-aware partial isomorphism estimation between the extremum graphs of the considered electron densities. Our new stability measure allows for the automatic identification of densities lacking H-bond paths, enabling further visual inspections. Specifically, the topological analysis enabled by our framework corroborates experimental observations and provides refined geometrical criteria for characterizing the disappearance of H-bond paths. Our electron density database and our C++ implementation are available at this address: https://github.com/thom-dani/BondMatcher.

Updated: 2025-04-04 06:29:29

标题: BondMatcher: 分子系统中氢键稳定性分析

摘要: 这篇应用论文研究了氢键的稳定性，根据分子中的原子量子理论（QTAIM）进行表征。首先，我们提供了一个包含4544个电子密度的数据库，与水六聚体的四个异构体（所谓的环、书、笼和棱）相关联，通过扭曲它们的平衡几何结构来模拟分子系统的自然动态行为。其次，我们提出了一种新的稳定性度量，称为键出现率，将出现在平衡状态下的每个键路径与其在输入集合中的出现率相关联。我们还提供了一个名为BondMatcher的算法，用于自动计算，基于所考虑的电子密度的极值图之间的定制的、几何感知的部分同构估计。我们的新稳定性度量允许自动识别缺乏H键路径的密度，从而实现进一步的视觉检查。具体地，通过我们的框架实现的拓扑分析证实了实验观察结果，并提供了用于表征H键路径消失的精细几何标准。我们的电子密度数据库和我们的C++实现可在以下地址找到：https://github.com/thom-dani/BondMatcher。

更新时间: 2025-04-04 06:29:29

领域: cs.LG,eess.IV,physics.chem-ph

下载: http://arxiv.org/abs/2504.03205v1

Supervised contrastive learning from weakly-labeled audio segments for musical version matching

Detecting musical versions (different renditions of the same piece) is a challenging task with important applications. Because of the ground truth nature, existing approaches match musical versions at the track level (e.g., whole song). However, most applications require to match them at the segment level (e.g., 20s chunks). In addition, existing approaches resort to classification and triplet losses, disregarding more recent losses that could bring meaningful improvements. In this paper, we propose a method to learn from weakly annotated segments, together with a contrastive loss variant that outperforms well-studied alternatives. The former is based on pairwise segment distance reductions, while the latter modifies an existing loss following decoupling, hyper-parameter, and geometric considerations. With these two elements, we do not only achieve state-of-the-art results in the standard track-level evaluation, but we also obtain a breakthrough performance in a segment-level evaluation. We believe that, due to the generality of the challenges addressed here, the proposed methods may find utility in domains beyond audio or musical version matching.

Updated: 2025-04-04 06:20:46

标题: 通过弱标记音频片段进行音乐版本匹配的监督对比学习

摘要: 检测音乐版本（同一乐曲的不同演绎）是一项具有重要应用的挑战性任务。由于基本事实的性质，现有方法在曲目级别（例如整首歌曲）上匹配音乐版本。然而，大多数应用程序需要在片段级别（例如20秒的块）上匹配它们。此外，现有方法采用分类和三元组损失，忽略了可能带来有意义改进的更近期的损失。在本文中，我们提出了一种从弱注释段学习的方法，以及一种优于经过深入研究的替代方法的对比损失变体。前者基于成对段距离减少，而后者修改了一个现有损失，遵循解耦、超参数和几何考虑。通过这两个元素，我们不仅在标准曲目级别评估中取得了最先进的结果，而且在片段级别评估中获得了突破性的表现。我们相信，由于所涉及的挑战的普遍性，所提出的方法可能在音频或音乐版本匹配之外的领域中找到用途。

更新时间: 2025-04-04 06:20:46

领域: cs.SD,cs.AI,cs.LG,eess.AS,stat.ML

下载: http://arxiv.org/abs/2502.16936v2

FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

Diffusion-based audio-driven talking avatar methods have recently gained attention for their high-fidelity, vivid, and expressive results. However, their slow inference speed limits practical applications. Despite the development of various distillation techniques for diffusion models, we found that naive diffusion distillation methods do not yield satisfactory results. Distilled models exhibit reduced robustness with open-set input images and a decreased correlation between audio and video compared to teacher models, undermining the advantages of diffusion models. To address this, we propose FADA (Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation). We first designed a mixed-supervised loss to leverage data of varying quality and enhance the overall model capability as well as robustness. Additionally, we propose a multi-CFG distillation with learnable tokens to utilize the correlation between audio and reference image conditions, reducing the threefold inference runs caused by multi-CFG with acceptable quality degradation. Extensive experiments across multiple datasets show that FADA generates vivid videos comparable to recent diffusion model-based methods while achieving an NFE speedup of 4.17-12.5 times. Demos are available at our webpage http://fadavatar.github.io.

Updated: 2025-04-04 06:07:56

标题: FADA：混合监督多CFG蒸馏的快速扩散头像合成

摘要: 最近，基于扩散的音频驱动的说话头像方法引起了关注，因为其具有高保真度、生动和富有表现力的结果。然而，它们的推理速度慢限制了实际应用。尽管开发了各种扩散模型的蒸馏技术，我们发现天真的扩散蒸馏方法并未产生令人满意的结果。蒸馏模型在面对开放式输入图像时表现出降低的稳健性，以及音频和视频之间的相关性降低，与教师模型相比，削弱了扩散模型的优势。为了解决这个问题，我们提出了FADA（Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation）。我们首先设计了一个混合监督损失，以利用不同质量的数据，增强整体模型能力和稳健性。此外，我们提出了一个带有可学习标记的多CFG蒸馏，以利用音频和参考图像条件之间的相关性，减少由多CFG引起的三倍推理运行，同时有可接受的质量降级。在多个数据集上进行的广泛实验表明，FADA生成的生动视频与最近基于扩散模型的方法相媲美，同时实现了NFE加速4.17-12.5倍。演示可在我们的网页http://fadavatar.github.io 上找到。

更新时间: 2025-04-04 06:07:56

领域: cs.CV,cs.AI,cs.GR,cs.SD,eess.AS

下载: http://arxiv.org/abs/2412.16915v2

Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video

Reconstructing 3D scenes from monocular surgical videos can enhance surgeon's perception and therefore plays a vital role in various computer-assisted surgery tasks. However, achieving scale-consistent reconstruction remains an open challenge due to inherent issues in endoscopic videos, such as dynamic deformations and textureless surfaces. Despite recent advances, current methods either rely on calibration or instrument priors to estimate scale, or employ SfM-like multi-stage pipelines, leading to error accumulation and requiring offline optimization. In this paper, we present Endo3R, a unified 3D foundation model for online scale-consistent reconstruction from monocular surgical video, without any priors or extra optimization. Our model unifies the tasks by predicting globally aligned pointmaps, scale-consistent video depths, and camera parameters without any offline optimization. The core contribution of our method is expanding the capability of the recent pairwise reconstruction model to long-term incremental dynamic reconstruction by an uncertainty-aware dual memory mechanism. The mechanism maintains history tokens of both short-term dynamics and long-term spatial consistency. Notably, to tackle the highly dynamic nature of surgical scenes, we measure the uncertainty of tokens via Sampson distance and filter out tokens with high uncertainty. Regarding the scarcity of endoscopic datasets with ground-truth depth and camera poses, we further devise a self-supervised mechanism with a novel dynamics-aware flow loss. Abundant experiments on SCARED and Hamlyn datasets demonstrate our superior performance in zero-shot surgical video depth prediction and camera pose estimation with online efficiency. Project page: https://wrld.github.io/Endo3R/.

Updated: 2025-04-04 06:05:22

标题: Endo3R：来自动态单目内窥镜视频的统一在线重建

摘要: 从单眼手术视频中重建3D场景可以增强外科医生的感知能力，因此在各种计算机辅助手术任务中发挥着至关重要的作用。然而，由于内窥镜视频中固有的问题，如动态变形和无纹理表面，实现尺度一致的重建仍然是一个挑战。尽管最近取得了进展，但当前的方法要么依赖于校准或仪器先验来估计尺度，要么采用类似SfM的多阶段管道，导致误差累积并需要离线优化。在本文中，我们提出了Endo3R，一个统一的3D基础模型，用于从单眼手术视频中进行在线尺度一致的重建，无需任何先验知识或额外的优化。我们的模型通过预测全局对齐的点地图、尺度一致的视频深度和相机参数来统一任务，而无需任何离线优化。我们方法的核心贡献是通过一个基于不确定性的双存储器机制，将最近的成对重建模型的能力扩展到长期增量动态重建。该机制保持短期动态和长期空间一致性的历史令牌。值得注意的是，为了应对外科手术场景的高度动态性质，我们通过Sampson距离来测量令牌的不确定性，并滤除具有高不确定性的令牌。考虑到内窥镜数据集中的地面真实深度和相机姿态的稀缺性，我们进一步设计了一个自监督机制，具有新颖的动态感知流损失。在SCARED和Hamlyn数据集上丰富的实验表明，我们在零样本外科手术视频深度预测和相机姿态估计方面具有卓越的在线效率表现。项目页面：https://wrld.github.io/Endo3R/。

更新时间: 2025-04-04 06:05:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03198v1

Orthogonal Nonnegative Matrix Factorization with Sparsity Constraints

This article presents a novel approach to solving the sparsity-constrained Orthogonal Nonnegative Matrix Factorization (SCONMF) problem, which requires decomposing a non-negative data matrix into the product of two lower-rank non-negative matrices, X=WH, where the mixing matrix H has orthogonal rows HH^T=I, while also satisfying an upper bound on the number of nonzero elements in each row. By reformulating SCONMF as a capacity-constrained facility-location problem (CCFLP), the proposed method naturally integrates non-negativity, orthogonality, and sparsity constraints. Specifically, our approach integrates control-barrier function (CBF) based framework used for dynamic optimal control design problems with maximum-entropy-principle-based framework used for facility location problems to enforce these constraints while ensuring robust factorization. Additionally, this work introduces a quantitative approach for determining the ``true" rank of W or H, equivalent to the number of ``true" features - a critical aspect in ONMF applications where the number of features is unknown. Simulations on various datasets demonstrate significantly improved factorizations with low reconstruction errors (as small as by 150 times) while strictly satisfying all constraints, outperforming existing methods that struggle with balancing accuracy and constraint adherence.

Updated: 2025-04-04 05:59:30

标题: 具有稀疏约束的正交非负矩阵分解

摘要: 本文提出了一种新颖的方法来解决稀疏约束正交非负矩阵分解（SCONMF）问题，该问题需要将一个非负数据矩阵分解为两个低秩非负矩阵的乘积，X=WH，其中混合矩阵H具有正交行HH^T=I，同时满足每行中非零元素数量的上限。通过将SCONMF重新制定为容量约束设施选址问题（CCFLP），提出的方法自然地整合了非负性、正交性和稀疏性约束。具体而言，我们的方法将用于动态最优控制设计问题的基于控制屏障函数（CBF）的框架与用于设施选址问题的基于最大熵原理的框架相结合，以强制执行这些约束并确保鲁棒分解。此外，本文介绍了一种确定W或H的“真实”秩的定量方法，等同于“真实”特征的数量 - 在ONMF应用中，特征数量未知是一个关键方面。对各种数据集的模拟表明，我们的方法明显改善了因子分解，重建误差较小（最多减少150倍），同时严格满足所有约束，优于现有方法在准确性和约束遵守之间的平衡。

更新时间: 2025-04-04 05:59:30

领域: cs.DS,cs.IT,cs.LG,math.IT,math.PR

下载: http://arxiv.org/abs/2210.02672v4

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

Text-to-image (T2I) models have been shown to be vulnerable to misuse, particularly in generating not-safe-for-work (NSFW) content, raising serious ethical concerns. In this work, we present PromptGuard, a novel content moderation technique that draws inspiration from the system prompt mechanism in large language models (LLMs) for safety alignment. Unlike LLMs, T2I models lack a direct interface for enforcing behavioral guidelines. Our key idea is to optimize a safety soft prompt that functions as an implicit system prompt within the T2I model's textual embedding space. This universal soft prompt (P*) directly moderates NSFW inputs, enabling safe yet realistic image generation without altering the inference efficiency or requiring proxy models. Extensive experiments across three datasets demonstrate that PromptGuard effectively mitigates NSFW content generation while preserving high-quality benign outputs. PromptGuard achieves 7.8 times faster than prior content moderation methods, surpassing eight state-of-the-art defenses with an optimal unsafe ratio down to 5.84%.

Updated: 2025-04-04 05:56:04

标题: PromptGuard：针对文本到图像模型的软提示引导的不安全内容审核

摘要: 文本到图像（T2I）模型已被证明容易被滥用，特别是在生成不适宜的内容方面，引发了严重的道德担忧。在这项工作中，我们提出了PromptGuard，一种新颖的内容审查技术，灵感来自于大型语言模型（LLMs）中的系统提示机制，用于安全对齐。与LLMs不同，T2I模型缺乏直接接口来执行行为准则。我们的关键思想是优化一个安全软提示，作为T2I模型文本嵌入空间内的隐式系统提示。这种通用软提示（P*）直接审查不安全的输入，实现安全而又逼真的图像生成，而无需改变推理效率或需要代理模型。通过三个数据集上的大量实验，证明了PromptGuard有效地减少了不适宜内容的生成，同时保留了高质量的良性输出。PromptGuard的速度比以前的内容审查方法快7.8倍，超过了八种最先进的防御措施，将不安全比例优化到5.84%。

更新时间: 2025-04-04 05:56:04

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2501.03544v2

Experimental Study on Time Series Analysis of Lower Limb Rehabilitation Exercise Data Driven by Novel Model Architecture and Large Models

This study investigates the application of novel model architectures and large-scale foundational models in temporal series analysis of lower limb rehabilitation motion data, aiming to leverage advancements in machine learning and artificial intelligence to empower active rehabilitation guidance strategies for post-stroke patients in limb motor function recovery. Utilizing the SIAT-LLMD dataset of lower limb movement data proposed by the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, we systematically elucidate the implementation and analytical outcomes of the innovative xLSTM architecture and the foundational model Lag-Llama in short-term temporal prediction tasks involving joint kinematics and dynamics parameters. The research provides novel insights for AI-enabled medical rehabilitation applications, demonstrating the potential of cutting-edge model architectures and large-scale models in rehabilitation medicine temporal prediction. These findings establish theoretical foundations for future applications of personalized rehabilitation regimens, offering significant implications for the development of customized therapeutic interventions in clinical practice.

Updated: 2025-04-04 05:40:13

标题: 新模型体系结构和大模型驱动的下肢康复运动数据时间序列分析的实验研究

摘要: 这项研究探讨了新型模型架构和大规模基础模型在下肢康复运动数据的时间序列分析中的应用，旨在利用机器学习和人工智能的进展，为中风后患者在肢体运动功能恢复中提供积极的康复指导策略。利用由中国科学院深圳先进技术研究院提出的下肢运动数据SIAT-LLMD数据集，系统地阐明了创新的xLSTM架构和基础模型Lag-Llama在涉及关节运动学和动力学参数的短期时间预测任务中的实施和分析结果。该研究为基于人工智能的医学康复应用提供了新的见解，展示了先进模型架构和大规模模型在康复医学时间预测中的潜力。这些发现为未来个性化康复方案的应用奠定了理论基础，在临床实践中为定制治疗干预的发展提供了重要启示。

更新时间: 2025-04-04 05:40:13

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2504.03799v1

The Ground Cost for Optimal Transport of Angular Velocity

We revisit the optimal transport problem over angular velocity dynamics given by the controlled Euler equation. The solution of this problem enables stochastic guidance of spin states of a rigid body (e.g., spacecraft) over hard deadline constraint by transferring a given initial state statistics to a desired terminal state statistics. This is an instance of generalized optimal transport over a nonlinear dynamical system. While prior work has reported existence-uniqueness and numerical solution of this dynamical optimal transport problem, here we present structural results about the equivalent Kantorovich a.k.a. optimal coupling formulation. Specifically, we focus on deriving the ground cost for the associated Kantorovich optimal coupling formulation. The ground cost equals to the cost of transporting unit amount of mass from a specific realization of the initial or source joint probability measure to a realization of the terminal or target joint probability measure, and determines the Kantorovich formulation. Finding the ground cost leads to solving a structured deterministic nonlinear optimal control problem, which is shown to be amenable to an analysis technique pioneered by Athans et. al. We show that such techniques have broader applicability in determining the ground cost (thus Kantorovich formulation) for a class of generalized optimal mass transport problems involving nonlinear dynamics with translated norm-invariant drift.

Updated: 2025-04-04 05:38:00

标题: 角速度的最优传输的基本成本

摘要: 我们重新审视了在受控Euler方程给出的角速度动力学上的最优输运问题。解决这个问题使得刚体（例如，航天器）的自旋状态能够在严格的截止期限约束下进行随机引导，通过将给定的初始状态统计数据转移到期望的终端状态统计数据。这是关于非线性动态系统的广义最优输运的一个实例。虽然先前的研究报告了这个动态最优输运问题的存在唯一性和数值解，但在这里我们提出了关于等价Kantorovich（又名最优耦合）公式的结构性结果。具体来说，我们专注于推导相关Kantorovich最优耦合公式的地面成本。地面成本等于从初始或源联合概率测度的特定实现到终端或目标联合概率测度的实现中输送单位质量的成本，并确定Kantorovich公式。找到地面成本导致解决一个结构化的确定性非线性最优控制问题，该问题被证明适用于Athans等人开创的分析技术。我们展示了这样的技术在确定涉及具有平移不变漂移的非线性动力学的广义最优质量输运问题的地面成本（因此Kantorovich公式）方面具有更广泛的适用性。

更新时间: 2025-04-04 05:38:00

领域: math.OC,cs.LG,cs.SY,eess.SY,stat.ML

下载: http://arxiv.org/abs/2504.03190v1

An Intelligent and Privacy-Preserving Digital Twin Model for Aging-in-Place

The population of older adults is steadily increasing, with a strong preference for aging-in-place rather than moving to care facilities. Consequently, supporting this growing demographic has become a significant global challenge. However, facilitating successful aging-in-place is challenging, requiring consideration of multiple factors such as data privacy, health status monitoring, and living environments to improve health outcomes. In this paper, we propose an unobtrusive sensor system designed for installation in older adults' homes. Using data from the sensors, our system constructs a digital twin, a virtual representation of events and activities that occurred in the home. The system uses neural network models and decision rules to capture residents' activities and living environments. This digital twin enables continuous health monitoring by providing actionable insights into residents' well-being. Our system is designed to be low-cost and privacy-preserving, with the aim of providing green and safe monitoring for the health of older adults. We have successfully deployed our system in two homes over a time period of two months, and our findings demonstrate the feasibility and effectiveness of digital twin technology in supporting independent living for older adults. This study highlights that our system could revolutionize elder care by enabling personalized interventions, such as lifestyle adjustments, medical treatments, or modifications to the residential environment, to enhance health outcomes.

Updated: 2025-04-04 05:37:08

标题: 一个智能且保护隐私的老年人居家养老数字孪生模型

摘要: 老年人口正在稳步增长，他们更倾向于在家养老而不是搬到照料设施。因此，支持这一日益增长的人群已成为一个重要的全球挑战。然而，促进成功的在家养老是具有挑战性的，需要考虑多个因素，如数据隐私、健康状况监测和改善健康结果的生活环境。本文提出了一个设计用于老年人家中安装的不显眼的传感器系统。利用传感器的数据，我们的系统构建了数字孪生体，即家中发生的事件和活动的虚拟表示。该系统利用神经网络模型和决策规则捕捉居民的活动和生活环境。这个数字孪生体通过为居民的幸福提供可操作的见解，实现了持续的健康监测。我们的系统设计成低成本和保护隐私，旨在为老年人的健康提供绿色和安全的监测。我们已经成功在两个家庭中部署了我们的系统，时长为两个月，我们的研究结果表明数字孪生技术在支持老年人独立生活方面的可行性和有效性。这项研究强调我们的系统可以通过实现个性化干预，如生活方式调整、医疗治疗或对居住环境的修改，来改善健康结果，从而革新老年护理。

更新时间: 2025-04-04 05:37:08

领域: cs.CY,cs.AI,68T05,,I.2; J.3

下载: http://arxiv.org/abs/2504.03798v1

Simultaneous Learning of Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model

In this paper, we propose a flow-based method for learning all-to-all transfer maps among conditional distributions, approximating pairwise optimal transport. The proposed method addresses the challenge of handling continuous conditions, which often involve a large set of conditions with sparse empirical observations per condition. We introduce a novel cost function that enables simultaneous learning of optimal transports for all pairs of conditional distributions. Our method is supported by a theoretical guarantee that, in the limit, it converges to pairwise optimal transports among infinite pairs of conditional distributions. The learned transport maps are subsequently used to couple data points in conditional flow matching. We demonstrate the effectiveness of this method on synthetic and benchmark datasets, as well as on chemical datasets where continuous physical properties are defined as conditions.

Updated: 2025-04-04 05:32:54

标题: 同时学习用于训练全对全流基条件转移模型的最佳传输

摘要: 在本文中，我们提出了一种基于流的方法，用于学习条件分布之间的全对全传输映射，从而逼近成对最优输运。所提出的方法解决了处理连续条件的挑战，这些条件通常涉及一大组具有稀疏经验观测的条件。我们引入了一种新颖的成本函数，使得能够同时学习所有对条件分布的最优输运。我们的方法得到了理论保证，在极限情况下，它收敛到无限对条件分布之间的最优输运。学习到的传输映射随后被用于在条件流匹配中耦合数据点。我们在合成和基准数据集以及化学数据集上展示了这种方法的有效性，其中连续物理属性被定义为条件。

更新时间: 2025-04-04 05:32:54

领域: cs.LG

下载: http://arxiv.org/abs/2504.03188v1

On the Connection Between Diffusion Models and Molecular Dynamics

Neural Network Potentials (NNPs) have emerged as a powerful tool for modelling atomic interactions with high accuracy and computational efficiency. Recently, denoising diffusion models have shown promise in NNPs by training networks to remove noise added to stable configurations, eliminating the need for force data during training. In this work, we explore the connection between noise and forces by providing a new, simplified mathematical derivation of their relationship. We also demonstrate how a denoising model can be implemented using a conventional MD software package interfaced with a standard NNP architecture. We demonstrate the approach by training a diffusion-based NNP to simulate a coarse-grained lithium chloride solution and employ data duplication to enhance model performance.

Updated: 2025-04-04 05:32:38

标题: 关于扩散模型与分子动力学之间的联系

摘要: 神经网络势（NNPs）已经成为一种具有高精度和计算效率的建模原子相互作用的强大工具。最近，去噪扩散模型在NNPs中显示出潜力，通过训练网络去除添加到稳定构型的噪音，消除了训练过程中需要力数据的需求。在这项工作中，我们探讨了噪音和力之间的关系，提供了一个新的简化数学推导。我们还展示了如何使用传统MD软件包与标准NNP架构接口实现去噪模型。我们通过训练一个基于扩散的NNP来模拟粗粒度的氯化锂溶液，并使用数据复制来增强模型性能。

更新时间: 2025-04-04 05:32:38

领域: cs.LG

下载: http://arxiv.org/abs/2504.03187v1

Learning Natural Language Constraints for Safe Reinforcement Learning of Language Agents

Generalizable alignment is a core challenge for deploying Large Language Models (LLMs) safely in real-world NLP applications. Current alignment methods, including Reinforcement Learning from Human Feedback (RLHF), often fail to guarantee constraint satisfaction outside their training distribution due to their reliance on implicit, post-hoc preferences. Inspired by a paradigm shift to first curate data before tuning, we introduce a new framework for safe language alignment that learns natural language constraints from positive and negative demonstrations as a primary step. From inferring both a task-specific reward function and latent constraint functions, our approach fosters adaptation to novel safety requirements and robust generalization under domain shifts and adversarial inputs. We formalize the framework within a Constrained Markov Decision Process (CMDP) and validate it via a text-based navigation environment, demonstrating safe adaptation to changing danger zones. Our experiments show fewer violations upon domain shift when following a safe navigation path, and we achieve zero violations by applying learned constraints to a distilled BERT model as a fine-tuning technique. This work offers a promising path toward building safety-critical and more generalizable LLMs for practical NLP settings.

Updated: 2025-04-04 05:26:28

标题: 学习自然语言约束以确保语言代理的安全强化学习

摘要: 广泛的对齐是在实际自然语言处理应用中安全部署大型语言模型（LLMs）的核心挑战。当前的对齐方法，包括从人类反馈中强化学习（RLHF），通常由于依赖隐含的事后偏好而在训练分布之外无法保证约束满足。受到在调整之前首先筛选数据的范式转变的启发，我们引入了一个新的安全语言对齐框架，该框架通过正面和负面示范学习自然语言约束作为首要步骤。通过推断任务特定的奖励函数和潜在约束函数，我们的方法促进了对新的安全要求的适应，并在领域转移和对抗性输入下实现了稳健的泛化。我们在一个受约束的马尔可夫决策过程（CMDP）框架内形式化这个框架，并通过一个基于文本的导航环境进行验证，展示对危险区域变化的安全适应性。我们的实验表明，在遵循安全导航路径时，在领域转移时违规较少，并且通过将学习到的约束应用于精炼的BERT模型作为微调技术，我们实现了零违规。这项工作为在实际NLP环境中构建关键安全且更具泛化性的LLMs提供了一条有希望的道路。

更新时间: 2025-04-04 05:26:28

领域: cs.CL,cs.AI,I.2.7; I.2.4; I.2.6; I.2.8

下载: http://arxiv.org/abs/2504.03185v1

Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration

Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for \textit{quality diversity imitation learning}, which bridges the quality diversity optimization and imitation learning methods, are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.

Updated: 2025-04-04 05:10:15

标题: 从多样行为中模仿：具有单步档案探索的Wasserstein质量多样性模仿学习

摘要: 从有限的一组演示中学习多样化和高性能行为是一个重大挑战。传统的模仿学习方法通常在这项任务中失败，因为它们大多设计成即使有多个演示也只学习一个特定的行为。因此，需要新颖的技术来解决这一挑战，即“质量多样性模仿学习”，它连接了质量多样性优化和模仿学习方法。本文介绍了Wasserstein质量多样性模仿学习（WQDIL），该方法通过基于Wasserstein自编码器（WAE）的潜在对抗训练提高了质量多样性环境下模仿学习的稳定性，并使用基于度量条件的奖励函数和单步归档探索奖励来缓解行为过拟合问题。从实证上看，我们的方法明显优于最先进的模仿学习方法，在从MuJoCo环境衍生出的具有挑战性的连续控制任务中实现了接近专家或超越专家的QD表现。

更新时间: 2025-04-04 05:10:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.06965v2

Mathematical Modeling of Option Pricing with an Extended Black-Scholes Framework

This study investigates enhancing option pricing by extending the Black-Scholes model to include stochastic volatility and interest rate variability within the Partial Differential Equation (PDE). The PDE is solved using the finite difference method. The extended Black-Scholes model and a machine learning-based LSTM model are developed and evaluated for pricing Google stock options. Both models were backtested using historical market data. While the LSTM model exhibited higher predictive accuracy, the finite difference method demonstrated superior computational efficiency. This work provides insights into model performance under varying market conditions and emphasizes the potential of hybrid approaches for robust financial modeling.

Updated: 2025-04-04 05:06:55

标题: 使用扩展的Black-Scholes框架对期权定价进行数学建模

摘要: 这项研究通过将Black-Scholes模型扩展到包括随机波动率和利率变动在内，来增强期权定价。PDE方程中解决了PDE方程。通过有限差分方法解决PDE。开发并评估了扩展的Black-Scholes模型和基于机器学习的LSTM模型，用于定价谷歌股票期权。两种模型都使用历史市场数据进行了回测。虽然LSTM模型表现出更高的预测准确性，但有限差分方法展示出更高的计算效率。这项工作提供了在不同市场条件下模型表现的见解，并强调了混合方法在稳健金融建模中的潜力。

更新时间: 2025-04-04 05:06:55

领域: math.NA,cs.LG,cs.NA,math.PR,q-fin.CP,60G07,G.1.0; G.1.8; G.1.7; G.3; I.2.0

下载: http://arxiv.org/abs/2504.03175v1

PPFPL: Cross-silo Privacy-preserving Federated Prototype Learning Against Data Poisoning Attacks on Non-IID Data

Privacy-Preserving Federated Learning (PPFL) allows multiple clients to collaboratively train a deep learning model by submitting hidden model updates. Nonetheless, PPFL is vulnerable to data poisoning attacks due to the distributed training nature of clients. Existing solutions have struggled to improve the performance of cross-silo PPFL in poisoned Non-IID data. To address the issues, this paper proposes a privacy-preserving federated prototype learning framework, named PPFPL, which enhances the cross-silo FL performance in poisoned Non-IID data while effectively resisting data poisoning attacks. Specifically, we adopt prototypes as client-submitted model updates to eliminate the impact of tampered data distribution on federated learning. Moreover, we utilize two servers to achieve Byzantine-robust aggregation by secure aggregation protocol, which greatly reduces the impact of malicious clients. Theoretical analyses confirm the convergence of PPFPL, and experimental results on publicly available datasets show that PPFPL is effective for resisting data poisoning attacks with Non-IID conditions.

Updated: 2025-04-04 05:05:24

标题: PPFPL：针对非IID数据的数据投毒攻击的跨领域隐私保护联邦原型学习

摘要: 隐私保护的联邦学习（PPFL）允许多个客户端通过提交隐藏的模型更新来协同训练深度学习模型。然而，由于客户端的分布式训练特性，PPFL容易受到数据投毒攻击的影响。现有解决方案在提高受污染的非独立同分布（Non-IID）数据中跨平台PPFL的性能方面存在困难。为了解决这些问题，本文提出了一个名为PPFPL的隐私保护联邦原型学习框架，增强了受污染的Non-IID数据中的跨平台FL性能，同时有效抵抗数据投毒攻击。具体来说，我们采用原型作为客户端提交的模型更新，消除了篡改数据分布对联邦学习的影响。此外，我们利用两个服务器通过安全聚合协议实现拜占庭鲁棒聚合，大大减少了恶意客户端的影响。理论分析证实了PPFPL的收敛性，对公开可用数据集上的实验结果表明，PPFPL对于具有Non-IID条件的数据投毒攻击是有效的。

更新时间: 2025-04-04 05:05:24

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2504.03173v1

Bayesian Optimization of Robustness Measures Using Randomized GP-UCB-based Algorithms under Input Uncertainty

Bayesian optimization based on Gaussian process upper confidence bound (GP-UCB) has a theoretical guarantee for optimizing black-box functions. Black-box functions often have input uncertainty, but even in this case, GP-UCB can be extended to optimize evaluation measures called robustness measures. However, GP-UCB-based methods for robustness measures include a trade-off parameter $\beta$, which must be excessively large to achieve theoretical validity, just like the original GP-UCB. In this study, we propose a new method called randomized robustness measure GP-UCB (RRGP-UCB), which samples the trade-off parameter $\beta$ from a probability distribution based on a chi-squared distribution and avoids explicitly specifying $\beta$. The expected value of $\beta$ is not excessively large. Furthermore, we show that RRGP-UCB provides tight bounds on the expected value of regret based on the optimal solution and estimated solutions. Finally, we demonstrate the usefulness of the proposed method through numerical experiments.

Updated: 2025-04-04 05:01:54

标题: 贝叶斯优化在输入不确定性下使用基于随机GP-UCB算法的鲁棒性度量

摘要: 基于高斯过程上置信界的贝叶斯优化（GP-UCB）在优化黑盒函数方面具有理论保证。黑盒函数通常存在输入不确定性，但即使在这种情况下，GP-UCB也可以扩展到优化称为鲁棒性度量的评估指标。然而，基于GP-UCB的鲁棒性度量方法包括一个权衡参数β，必须过大才能实现理论有效性，就像原始的GP-UCB一样。在本研究中，我们提出了一种新方法，称为随机鲁棒性度量GP-UCB（RRGP-UCB），它从基于卡方分布的概率分布中对权衡参数β进行采样，避免明确指定β。β的期望值并不过大。此外，我们表明RRGP-UCB基于最优解和估计解提供了紧密的遗憾期望值上界。最后，我们通过数值实验展示了所提方法的实用性。

更新时间: 2025-04-04 05:01:54

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2504.03172v1

Real-Time Roadway Obstacle Detection for Electric Scooters Using Deep Learning and Multi-Sensor Fusion

The increasing adoption of electric scooters (e-scooters) in urban areas has coincided with a rise in traffic accidents and injuries, largely due to their small wheels, lack of suspension, and sensitivity to uneven surfaces. While deep learning-based object detection has been widely used to improve automobile safety, its application for e-scooter obstacle detection remains unexplored. This study introduces a novel ground obstacle detection system for e-scooters, integrating an RGB camera, and a depth camera to enhance real-time road hazard detection. Additionally, the Inertial Measurement Unit (IMU) measures linear vertical acceleration to identify surface vibrations, guiding the selection of six obstacle categories: tree branches, manhole covers, potholes, pine cones, non-directional cracks, and truncated domes. All sensors, including the RGB camera, depth camera, and IMU, are integrated within the Intel RealSense Camera D435i. A deep learning model powered by YOLO detects road hazards and utilizes depth data to estimate obstacle proximity. Evaluated on the seven hours of naturalistic riding dataset, the system achieves a high mean average precision (mAP) of 0.827 and demonstrates excellent real-time performance. This approach provides an effective solution to enhance e-scooter safety through advanced computer vision and data fusion. The dataset is accessible at https://zenodo.org/records/14583718, and the project code is hosted on https://github.com/Zeyang-Zheng/Real-Time-Roadway-Obstacle-Detection-for-Electric-Scooters.

Updated: 2025-04-04 05:01:16

标题: 实时道路障碍物检测技术在电动滑板车中的应用：基于深度学习和多传感器融合的方法.

摘要: 城市地区电动滑板车（电动滑板车）的日益普及与交通事故和受伤的增加同时发生，主要原因是它们的小车轮、缺乏悬挂装置以及对不平整表面的敏感性。尽管基于深度学习的物体检测已被广泛用于改善汽车安全，但其在电动滑板车障碍物检测方面的应用尚未被探索。本研究介绍了一种新颖的电动滑板车地面障碍物检测系统，集成了RGB相机和深度相机，以增强实时道路危险检测。此外，惯性测量单元（IMU）测量线性垂直加速度以识别表面振动，指导选择六种障碍物类别：树枝、排水井盖、坑洼、松果、非定向裂缝和截断圆顶。所有传感器，包括RGB相机、深度相机和IMU，都集成在英特尔RealSense相机D435i内。由YOLO提供动力的深度学习模型检测道路危险，并利用深度数据估计障碍物的接近程度。在七个小时的自然骑行数据集上评估，系统实现了高平均精度（mAP）为0.827，并表现出优秀的实时性能。这种方法通过先进的计算机视觉和数据融合提供了增强电动滑板车安全性的有效解决方案。数据集可在https://zenodo.org/records/14583718获得，项目代码托管在https://github.com/Zeyang-Zheng/Real-Time-Roadway-Obstacle-Detection-for-Electric-Scooters。

更新时间: 2025-04-04 05:01:16

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2504.03171v1

Certified Defense on the Fairness of Graph Neural Networks

Graph Neural Networks (GNNs) have emerged as a prominent graph learning model in various graph-based tasks over the years. Nevertheless, due to the vulnerabilities of GNNs, it has been empirically proved that malicious attackers could easily corrupt the fairness level of their predictions by adding perturbations to the input graph data. In this paper, we take crucial steps to study a novel problem of certifiable defense on the fairness level of GNNs. Specifically, we propose a principled framework named ELEGANT and present a detailed theoretical certification analysis for the fairness of GNNs. ELEGANT takes any GNNs as its backbone, and the fairness level of such a backbone is theoretically impossible to be corrupted under certain perturbation budgets for attackers. Notably, ELEGANT does not have any assumption over the GNN structure or parameters, and does not require re-training the GNNs to realize certification. Hence it can serve as a plug-and-play framework for any optimized GNNs ready to be deployed. We verify the satisfactory effectiveness of ELEGANT in practice through extensive experiments on real-world datasets across different backbones of GNNs, where ELEGANT is also demonstrated to be beneficial for GNN debiasing. Open-source code can be found at https://github.com/yushundong/ELEGANT.

Updated: 2025-04-04 05:00:42

标题: 图神经网络公平性的认证防御

摘要: 图神经网络（GNNs）已经成为多年来各种基于图的任务中突出的图学习模型。然而，由于GNN的脆弱性，经验性地证明了恶意攻击者可以通过向输入图数据添加扰动轻易地破坏其预测的公平水平。在本文中，我们采取关键步骤研究了GNN的公平水平上的认证防御的新问题。具体来说，我们提出了一个名为ELEGANT的原则性框架，并为GNN的公平性提供了详细的理论认证分析。ELEGANT将任何GNN作为其骨干，并且在某些攻击者的扰动预算下，这样一个骨干的公平水平在理论上是不可能被破坏的。值得注意的是，ELEGANT对GNN的结构或参数没有任何假设，并且不需要重新训练GNN来实现认证。因此，它可以作为一个即插即用的框架，适用于任何准备部署的优化过的GNN。我们通过在不同GNN骨干上进行广泛实验验证了ELEGANT在实践中的令人满意的有效性，其中还证明了ELEGANT对于GNN去偏倚也是有益的。开源代码可以在https://github.com/yushundong/ELEGANT 找到。

更新时间: 2025-04-04 05:00:42

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2311.02757v2

Water Mapping and Change Detection Using Time Series Derived from the Continuous Monitoring of Land Disturbance Algorithm

Given the growing environmental challenges, accurate monitoring and prediction of changes in water bodies are essential for sustainable management and conservation. The Continuous Monitoring of Land Disturbance (COLD) algorithm provides a valuable tool for real-time analysis of land changes, such as deforestation, urban expansion, agricultural activities, and natural disasters. This capability enables timely interventions and more informed decision-making. This paper assesses the effectiveness of the algorithm to estimate water bodies and track pixel-level water trends over time. Our findings indicate that COLD-derived data can reliably estimate estimate water frequency during stable periods and delineate water bodies. Furthermore, it enables the evaluation of trends in water areas after disturbances, allowing for the determination of whether water frequency increases, decreases, or remains constant.

Updated: 2025-04-04 04:59:46

标题: 水资源映射和变化检测：利用基于连续监测土地扰动算法的时间序列

摘要: 鉴于不断增长的环境挑战，准确监测和预测水体变化对于可持续管理和保护至关重要。连续监测土地扰动（COLD）算法提供了一个有价值的工具，用于实时分析土地变化，如森林砍伐、城市扩张、农业活动和自然灾害。这种能力使得及时干预和更明智的决策成为可能。本文评估了该算法在估计水体并跟踪像素级水体趋势方面的有效性。我们的研究结果表明，COLD衍生的数据可以可靠地估计水体频率，在稳定时期并勾画出水体。此外，它还可以评估扰动后水域面积的趋势，从而确定水体频率是增加、减少还是保持不变。

更新时间: 2025-04-04 04:59:46

领域: cs.LG

下载: http://arxiv.org/abs/2504.03170v1

NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving

Recent advancements in Vision-Language Models (VLMs) have demonstrated strong potential for autonomous driving tasks. However, their spatial understanding and reasoning-key capabilities for autonomous driving-still exhibit significant limitations. Notably, none of the existing benchmarks systematically evaluate VLMs' spatial reasoning capabilities in driving scenarios. To fill this gap, we propose NuScenes-SpatialQA, the first large-scale ground-truth-based Question-Answer (QA) benchmark specifically designed to evaluate the spatial understanding and reasoning capabilities of VLMs in autonomous driving. Built upon the NuScenes dataset, the benchmark is constructed through an automated 3D scene graph generation pipeline and a QA generation pipeline. The benchmark systematically evaluates VLMs' performance in both spatial understanding and reasoning across multiple dimensions. Using this benchmark, we conduct extensive experiments on diverse VLMs, including both general and spatial-enhanced models, providing the first comprehensive evaluation of their spatial capabilities in autonomous driving. Surprisingly, the experimental results show that the spatial-enhanced VLM outperforms in qualitative QA but does not demonstrate competitiveness in quantitative QA. In general, VLMs still face considerable challenges in spatial understanding and reasoning.

Updated: 2025-04-04 04:43:10

标题: NuScenes-SpatialQA：自动驾驶中视觉语言模型的空间理解和推理基准

摘要: 最近对视觉语言模型（VLMs）的进展表明其在自动驾驶任务中具有强大潜力。然而，它们的空间理解和推理-自动驾驶的关键能力仍然存在显著限制。值得注意的是，目前没有现有的基准系统地评估VLMs在驾驶场景中的空间推理能力。为了填补这一空白，我们提出了NuScenes-SpatialQA，这是第一个大规模基于实地真实数据的问题-答案（QA）基准，专门设计用于评估VLMs在自动驾驶中的空间理解和推理能力。该基准建立在NuScenes数据集上，通过自动化的3D场景图生成管道和QA生成管道构建而成。该基准系统地评估了VLMs在空间理解和推理方面的性能，跨越多个维度。利用这一基准，我们对各种VLMs进行了广泛的实验，包括一般模型和空间增强模型，首次全面评估了它们在自动驾驶中的空间能力。令人惊讶的是，实验结果显示，空间增强型VLM在定性QA方面表现优异，但在定量QA方面并不具备竞争力。总体而言，VLMs在空间理解和推理方面仍面临相当大的挑战。

更新时间: 2025-04-04 04:43:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03164v1

Enhanced Penalty-based Bidirectional Reinforcement Learning Algorithms

This research focuses on enhancing reinforcement learning (RL) algorithms by integrating penalty functions to guide agents in avoiding unwanted actions while optimizing rewards. The goal is to improve the learning process by ensuring that agents learn not only suitable actions but also which actions to avoid. Additionally, we reintroduce a bidirectional learning approach that enables agents to learn from both initial and terminal states, thereby improving speed and robustness in complex environments. Our proposed Penalty-Based Bidirectional methodology is tested against Mani skill benchmark environments, demonstrating an optimality improvement of success rate of approximately 4% compared to existing RL implementations. The findings indicate that this integrated strategy enhances policy learning, adaptability, and overall performance in challenging scenarios

Updated: 2025-04-04 04:43:07

标题: 增强型基于惩罚的双向强化学习算法

摘要: 这项研究专注于通过整合惩罚函数来增强强化学习（RL）算法，以指导代理程序避免不需要的行为同时优化奖励。其目标是通过确保代理程序不仅学习适当的行为，还学习应避免的行为，从而改进学习过程。此外，我们重新引入了一种双向学习方法，使代理程序能够从初始状态和终端状态中学习，从而提高在复杂环境中的速度和稳健性。我们提出的基于惩罚的双向方法在Mani技能基准环境中进行了测试，与现有的RL实现相比，成功率的最优改进约为4%。研究结果表明，这种综合策略增强了策略学习、适应性和在挑战性场景中的整体表现。

更新时间: 2025-04-04 04:43:07

领域: cs.LG

下载: http://arxiv.org/abs/2504.03163v1

Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking

Grokking, referring to the abrupt improvement in test accuracy after extended overfitting, offers valuable insights into the mechanisms of model generalization. Existing researches based on progress measures imply that grokking relies on understanding the optimization dynamics when the loss function is dominated solely by the weight decay term. However, we find that this optimization merely leads to token uniformity, which is not a sufficient condition for grokking. In this work, we investigate the grokking mechanism underlying the Transformer in the task of prime number operations. Based on theoretical analysis and experimental validation, we present the following insights: (i) The weight decay term encourages uniformity across all tokens in the embedding space when it is minimized. (ii) The occurrence of grokking is jointly determined by the uniformity of the embedding space and the distribution of the training dataset. Building on these insights, we provide a unified perspective for understanding various previously proposed progress measures and introduce a novel, concise, and effective progress measure that could trace the changes in test loss more accurately. Finally, to demonstrate the versatility of our theoretical framework, we design a dedicated dataset to validate our theory on ResNet-18, successfully showcasing the occurrence of grokking.

Updated: 2025-04-04 04:42:38

标题: 超越进展测量：揭示理解机制的理论洞见

摘要: 摸索（Grokking），指的是在过度拟合后测试准确性突然改善，为我们提供了宝贵的洞察力，可以帮助我们理解模型泛化的机制。现有的基于进展指标的研究表明，摸索依赖于理解当损失函数仅由权重衰减项主导时的优化动态。然而，我们发现这种优化仅仅导致了令牌的均匀性，并不足以满足摸索的条件。在这项工作中，我们研究了Transformer在素数运算任务中的摸索机制。基于理论分析和实验验证，我们提出以下见解：(i) 权重衰减项在最小化时鼓励嵌入空间中所有令牌的均匀性。(ii) 摸索的发生是由嵌入空间的均匀性和训练数据集的分布共同决定的。基于这些见解，我们提供了一个统一的视角，以更好地理解先前提出的各种进展指标，并引入了一种新颖、简洁且有效的进展指标，可以更准确地追踪测试损失的变化。最后，为了展示我们理论框架的多功能性，我们设计了一个专门的数据集来验证我们在ResNet-18上的理论，成功展示了摸索的发生。

更新时间: 2025-04-04 04:42:38

领域: cs.LG

下载: http://arxiv.org/abs/2504.03162v1

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

Large Language Models (LLMs) equipped with web search capabilities have demonstrated impressive potential for deep research tasks. However, current approaches predominantly rely on either manually engineered prompts (prompt engineering-based) with brittle performance or reinforcement learning within controlled Retrieval-Augmented Generation (RAG) environments (RAG-based) that fail to capture the complexities of real-world interaction. In this paper, we introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web. We implement a specialized multi-agent architecture where browsing agents extract relevant information from various webpage structures and overcoming significant technical challenges. Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. Our results highlight that end-to-end training in real-world web environments is not merely an implementation detail but a fundamental requirement for developing robust research capabilities aligned with real-world applications. We release DeepResearcher at https://github.com/GAIR-NLP/DeepResearcher.

Updated: 2025-04-04 04:41:28

标题: DeepResearcher：通过强化学习在现实环境中扩展深度研究

摘要: 大型语言模型（LLMs）配备了网络搜索功能，展示了深度研究任务的令人印象深刻的潜力。然而，当前的方法主要依赖于手动设计的提示（基于提示工程）具有脆弱的性能，或者在受控的检索增强生成（RAG）环境中通过强化学习（RAG-based）来捕捉真实世界互动的复杂性。在本文中，我们介绍了DeepResearcher，这是第一个通过扩展强化学习（RL）在真实世界环境中进行训练的基于LLM的深度研究代理的全面框架，具有真实网络搜索相互作用。与假设所有必要信息都存在于固定语料库中的RAG-based方法不同，我们的方法训练代理程序来浏览开放网络的嘈杂、非结构化和动态性质。我们实施了一个专门的多代理架构，其中浏览代理从各种网页结构中提取相关信息，并克服了重大的技术挑战。对开放领域研究任务的广泛实验表明，DeepResearcher相对于基于提示工程的基线和基于RAG的RL代理实现了高达28.9个点的实质性改进。我们的定性分析揭示了端到端RL训练中新兴的认知行为，包括制定计划的能力，从多个来源交叉验证信息，进行自我反思以重新定向研究，以及在无法找到明确答案时保持诚实。我们的结果强调，端到端在真实网络环境中的训练不仅仅是一个实现细节，而且是开发与真实世界应用一致的强大研究能力的基本要求。我们在https://github.com/GAIR-NLP/DeepResearcher上发布了DeepResearcher。

更新时间: 2025-04-04 04:41:28

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.03160v1

sudo rm -rf agentic_security

Large Language Models (LLMs) are increasingly deployed as computer-use agents, autonomously performing tasks within real desktop or web environments. While this evolution greatly expands practical use cases for humans, it also creates serious security exposures. We present SUDO (Screen-based Universal Detox2Tox Offense), a novel attack framework that systematically bypasses refusal trained safeguards in commercial computer-use agents, such as Claude Computer Use. The core mechanism, Detox2Tox, transforms harmful requests (that agents initially reject) into seemingly benign requests via detoxification, secures detailed instructions from advanced vision language models (VLMs), and then reintroduces malicious content via toxification just before execution. Unlike conventional jailbreaks, SUDO iteratively refines its attacks based on a built-in refusal feedback, making it increasingly effective against robust policy filters. In extensive tests spanning 50 real-world tasks and multiple state-of-the-art VLMs, SUDO achieves a stark attack success rate of 24% (with no refinement), and up to 41% (by its iterative refinement) in Claude Computer Use. By revealing these vulnerabilities and demonstrating the ease with which they can be exploited in real-world computing environments, this paper highlights an immediate need for robust, context-aware safeguards. WARNING: This paper includes harmful or offensive model outputs Our code is available at: https://github.com/AIM-Intelligence/SUDO.git

Updated: 2025-04-04 04:36:20

标题: 删除代理安全

摘要: 大型语言模型（LLMs）越来越多地被部署为计算机使用代理，自主地在真实的桌面或网络环境中执行任务。虽然这种演变极大地扩展了人类的实际应用案例，但也带来了严重的安全风险。我们提出了SUDO（基于屏幕的通用Detox2Tox攻击），这是一个新颖的攻击框架，系统地绕过了商业计算机使用代理（如Claude Computer Use）中训练的拒绝保障。核心机制Detox2Tox将有害请求（代理最初拒绝的）通过脱毒转化为看似良性的请求，通过先进的视觉语言模型（VLMs）获得详细的指令，然后在执行前通过毒化重新引入恶意内容。与传统的越狱不同，SUDO根据内置的拒绝反馈迭代地完善其攻击，使其对强大的策略过滤器越来越有效。在涵盖50个真实世界任务和多个最先进的VLMs的广泛测试中，SUDO在Claude Computer Use中取得了惊人的攻击成功率24%（无完善），并在迭代完善时高达41%。通过揭示这些漏洞并展示它们在真实的计算环境中可以被轻松利用的方式，本文强调了对强大的上下文感知保障的迫切需求。警告：本文包含有害或冒犯性的模型输出。我们的代码可在以下链接获取：https://github.com/AIM-Intelligence/SUDO.git

更新时间: 2025-04-04 04:36:20

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.20279v2

Accelerating Particle-based Energetic Variational Inference

In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics for minimizing the KL divergence, derived using a "discretize-then-variational" approach, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that our method outperforms existing ParVI approaches in efficiency, robustness, and accuracy.

Updated: 2025-04-04 04:31:19

标题: 加速基于粒子的能量变分推断

摘要: 在这项工作中，我们提出了一种新颖的基于粒子的变分推断（ParVI）方法，加速了EVI-Im。受到能量二次化（EQ）和梯度流的操作分裂技术的启发，我们的方法有效地将粒子驱动到目标分布。与EVI-Im不同，后者采用隐式欧拉方法解决变分保持粒子动力学，以最小化KL散度，该散度是使用“离散化-然后变分”方法得出的，所提出的算法避免了重复评估粒子间相互作用项，显著降低了计算成本。该框架还可扩展到其他基于梯度的抽样技术。通过多个数值实验，我们证明了我们的方法在效率、鲁棒性和准确性方面优于现有的ParVI方法。

更新时间: 2025-04-04 04:31:19

领域: stat.ML,cs.LG,62G05, 65K10, 65L05

下载: http://arxiv.org/abs/2504.03158v1

A Structure-Preserving Kernel Method for Learning Hamiltonian Systems

A structure-preserving kernel ridge regression method is presented that allows the recovery of nonlinear Hamiltonian functions out of datasets made of noisy observations of Hamiltonian vector fields. The method proposes a closed-form solution that yields excellent numerical performances that surpass other techniques proposed in the literature in this setup. From the methodological point of view, the paper extends kernel regression methods to problems in which loss functions involving linear functions of gradients are required and, in particular, a differential reproducing property and a Representer Theorem are proved in this context. The relation between the structure-preserving kernel estimator and the Gaussian posterior mean estimator is analyzed. A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters. The good performance of the proposed estimator together with the convergence rate is illustrated with various numerical experiments.

Updated: 2025-04-04 04:28:27

标题: 学习哈密顿系统的保结构核方法

摘要: 提出了一种保持结构的核岭回归方法，允许从由汉密尔顿矢量场的嘈杂观测组成的数据集中恢复非线性哈密顿函数。该方法提出了一个闭合形式解决方案，其数值性能优于文献中提出的其他技术。从方法论的角度看，本文将核回归方法扩展到需要涉及梯度的线性函数的损失函数的问题，特别地，在这种环境中证明了微分再现性质和再现定理。分析了保持结构的核估计器和高斯后验均值估计器之间的关系。进行了完整的误差分析，提供了使用固定和自适应正则化参数的收敛速度。提出的估计器的良好性能以及收敛速度通过各种数值实验进行了说明。

更新时间: 2025-04-04 04:28:27

领域: stat.ML,cs.LG,math.DS

下载: http://arxiv.org/abs/2403.10070v2

MORAL: A Multimodal Reinforcement Learning Framework for Decision Making in Autonomous Laboratories

We propose MORAL (a multimodal reinforcement learning framework for decision making in autonomous laboratories) that enhances sequential decision-making in autonomous robotic laboratories through the integration of visual and textual inputs. Using the BridgeData V2 dataset, we generate fine-tuned image captions with a pretrained BLIP-2 vision-language model and combine them with visual features through an early fusion strategy. The fused representations are processed using Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) agents. Experimental results demonstrate that multimodal agents achieve a 20% improvement in task completion rates and significantly outperform visual-only and textual-only baselines after sufficient training. Compared to transformer-based and recurrent multimodal RL models, our approach achieves superior performance in cumulative reward and caption quality metrics (BLEU, METEOR, ROUGE-L). These results highlight the impact of semantically aligned language cues in enhancing agent learning efficiency and generalization. The proposed framework contributes to the advancement of multimodal reinforcement learning and embodied AI systems in dynamic, real-world environments.

Updated: 2025-04-04 04:15:52

标题: 道德：自主实验室决策的多模态强化学习框架

摘要: 我们提出了MORAL（一种用于自主实验室中的决策制定的多模态强化学习框架），通过整合视觉和文本输入，增强自主机器人实验室中的序列决策制定。使用BridgeData V2数据集，我们利用预训练的BLIP-2视觉-语言模型生成微调的图像描述，并通过早期融合策略与视觉特征相结合。融合后的表示通过Deep Q-Network（DQN）和Proximal Policy Optimization（PPO）代理进行处理。实验结果表明，多模态代理在任务完成率方面实现了20%的改善，并在充分训练后显着优于仅视觉和仅文本的基准线。与基于转换器和循环的多模态RL模型相比，我们的方法在累积奖励和描述质量指标（BLEU、METEOR、ROUGE-L）方面表现更优秀。这些结果突显了语义对齐语言提示在增强代理学习效率和泛化能力方面的影响。所提出的框架有助于推进多模态强化学习和具身智能系统在动态的真实环境中的发展。

更新时间: 2025-04-04 04:15:52

领域: cs.LG

下载: http://arxiv.org/abs/2504.03153v1

New User Event Prediction Through the Lens of Causal Inference

Modeling and analysis for event series generated by users of heterogeneous behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to assign users to behavior-based categories and analyze each of them separately. However, this requires extensive data to fully understand the user behavior, presenting challenges in modeling newcomers without significant historical knowledge. In this work, we propose a novel discrete event prediction framework for new users with limited history, without needing to know the user's category. We treat the user event history as the "treatment" for future events and the user category as the key confounder. Thus, the prediction problem can be framed as counterfactual outcome estimation, where each event is re-weighted by its inverse propensity score. We demonstrate the improved performance of the proposed framework with a numerical simulation study and two real-world applications, including Netflix rating prediction and seller contact prediction for customer support at Amazon.

Updated: 2025-04-04 04:07:39

标题: 通过因果推断视角预测新用户事件

摘要: 建模和分析由具有异质行为模式的用户生成的事件序列密切相关于我们的日常生活，包括信用卡欺诈检测、在线平台用户推荐和社交网络分析。对于这一任务，最常采用的方法是将用户分配到基于行为的类别中，并分别分析每个类别。然而，这需要大量数据才能充分了解用户行为，同时在对没有大量历史知识的新用户建模时存在挑战。在这项工作中，我们提出了一种新颖的离散事件预测框架，用于具有有限历史的新用户，而无需了解用户的类别。我们将用户事件历史视为未来事件的“处理”，将用户类别视为关键混杂因素。因此，预测问题可以被构建为反事实结果估计，其中每个事件都根据其倒数倾向分数进行重新加权。我们通过数值模拟研究和两个真实世界应用程序（包括Netflix评分预测和亚马逊客户支持的卖家联系预测）展示了所提出框架的改进性能。

更新时间: 2025-04-04 04:07:39

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2407.05625v3

Safe Screening Rules for Group OWL Models

Group Ordered Weighted $L_{1}$-Norm (Group OWL) regularized models have emerged as a useful procedure for high-dimensional sparse multi-task learning with correlated features. Proximal gradient methods are used as standard approaches to solving Group OWL models. However, Group OWL models usually suffer huge computational costs and memory usage when the feature size is large in the high-dimensional scenario. To address this challenge, in this paper, we are the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks. Thus, by removing the inactive features during the training process, we may achieve substantial computational gain and memory savings. More importantly, the proposed screening rule can be directly integrated with the existing solvers both in the batch and stochastic settings. Theoretically, we prove our screening rule is safe and also can be safely applied to the existing iterative optimization algorithms. Our experimental results demonstrate that our screening rule can effectively identify the inactive features and leads to a significant computational speedup without any loss of accuracy.

Updated: 2025-04-04 04:07:37

标题: 安全筛选规则对于组OWL模型的影响

摘要: Group Ordered Weighted $L_{1}$-Norm（Group OWL）正则化模型已经成为高维稀疏多任务学习中具有相关特征的有用程序。梯度下降法被用作解决Group OWL模型的标准方法。然而，在高维场景下，Group OWL模型通常会遭受巨大的计算成本和内存使用量。为了解决这一挑战，在本文中，我们首次提出了Group OWL模型的安全筛选规则，通过有效处理结构化的非可分离惩罚，可以快速识别所有任务中具有零系数的无效特征。因此，在训练过程中去除无效特征，我们可以实现大幅的计算节约和内存节省。更重要的是，所提出的筛选规则可以直接与现有的求解器在批处理和随机设置中集成。从理论上讲，我们证明了我们的筛选规则是安全的，也可以安全地应用于现有的迭代优化算法。我们的实验结果表明，我们的筛选规则可以有效识别无效特征，并导致显著的计算加速，而不会丢失精度。

更新时间: 2025-04-04 04:07:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.03152v1

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks. Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains. However, effectively extending these capabilities into multimodal contexts-where models must integrate both visual and textual inputs-continues to be a significant challenge. Multimodal reasoning introduces complexities, such as handling conflicting information across modalities, which require models to adopt advanced interpretative strategies. Addressing these challenges involves not only sophisticated algorithms but also robust methodologies for evaluating reasoning accuracy and coherence. This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs. Through a thorough and up-to-date comparison, we clearly formulate core reasoning challenges and opportunities, highlighting practical methods for post-training optimization and test-time inference. Our work provides valuable insights and guidance, bridging theoretical frameworks and practical implementations, and sets clear directions for future research.

Updated: 2025-04-04 04:04:56

标题: 为什么推理很重要？多模态推理进展综述 (v1)

摘要: 推理是人类智能的核心，使得人们能够在各种任务中进行结构化问题解决。最近大型语言模型（LLMs）的进步极大地增强了它们在算术、常识和符号领域的推理能力。然而，有效地将这些能力扩展到多模态情境——在这种情境下，模型必须整合视觉和文本输入——仍然是一个重大挑战。多模态推理引入了复杂性，例如处理跨模态之间的冲突信息，这需要模型采用先进的解释策略。解决这些挑战不仅涉及复杂的算法，还需要评估推理准确性和连贯性的健壮方法。本文提供了对文本和多模态LLMs中推理技术的简洁而富有洞察力的概述。通过彻底和最新的比较，我们清楚地阐述了核心推理挑战和机遇，突出了后训练优化和测试时推理的实用方法。我们的工作提供了有价值的见解和指导，架起了理论框架和实际实施之间的桥梁，并为未来研究设定了明确的方向。

更新时间: 2025-04-04 04:04:56

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.03151v1

CSF: Fixed-outline Floorplanning Based on the Conjugate Subgradient Algorithm Assisted by Q-Learning

To perform the fixed-outline floorplanning problem efficiently, we propose to solve the original nonsmooth analytic optimization model via the conjugate subgradient algorithm (CSA), which is further accelerated by adaptively regulating the step size with the assistance of Q-learning. The objective for global floorplanning is a weighted sum of the half-perimeter wirelength, the overlapping area and the out-of-bound width, and the legalization is implemented by optimizing the weighted sum of the overlapping area and the out-of-bound width. Meanwhile, we also propose two improved variants for the legalizaiton algorithm based on constraint graphs (CGs). Experimental results demonstrate that the CSA assisted by Q-learning (CSAQ) can address both global floorplanning and legalization efficiently, and the two stages jointly contribute to competitive results on the optimization of wirelength. Meanwhile, the improved CG-based legalization methods also outperforms the original one in terms of runtime and success rate.

Updated: 2025-04-04 04:01:26

标题: CSF：基于共轭次梯度算法辅助Q学习的固定轮廓布局设计

摘要: 为了高效地解决固定轮廓布局问题，我们提出通过共轭次梯度算法（CSA）解决原始非光滑解析优化模型，并通过Q学习自适应调节步长来加速。全局布局的目标是半周长线长、重叠面积和超出边界宽度的加权和，合法化通过优化重叠面积和超出边界宽度的加权和来实现。同时，我们还提出了两种基于约束图（CGs）的改进变体的合法化算法。实验结果表明，CSA辅助Q学习（CSAQ）可以高效解决全局布局和合法化问题，两个阶段共同为线长优化贡献了竞争性结果。同时，改进的基于CG的合法化方法在运行时间和成功率方面也优于原始方法。

更新时间: 2025-04-04 04:01:26

领域: cs.LG

下载: http://arxiv.org/abs/2504.03796v1

A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations

Recent developments in Artificial Intelligence (AI) and Machine Learning (ML) are creating new opportunities for Human-Autonomy Teaming (HAT) in tasks, missions, and continuous coordinated activities. A major challenge is enabling humans to maintain awareness and control over autonomous assets, while also building trust and supporting shared contextual understanding. To address this, we present a real-time Human Digital Twin (HDT) architecture that integrates Large Language Models (LLMs) for knowledge reporting, answering, and recommendation, embodied in a visual interface. The system applies a metacognitive approach to enable personalized, context-aware responses aligned with the human teammate's expectations. The HDT acts as a visually and behaviorally realistic team member, integrated throughout the mission lifecycle, from training to deployment to after-action review. Our architecture includes speech recognition, context processing, AI-driven dialogue, emotion modeling, lip-syncing, and multimodal feedback. We describe the system design, performance metrics, and future development directions for more adaptive and realistic HAT systems.

Updated: 2025-04-04 03:56:26

标题: 一个基于知识交互和上下文感知对话的人类数字孪生架构

摘要: 人工智能（AI）和机器学习（ML）的最新发展为人-自主团队（HAT）在任务、任务和持续协调活动中创造了新的机会。一个主要挑战是使人类能够在自主资产上保持意识和控制，同时建立信任并支持共享的情境理解。为了解决这个问题，我们提出了一个实时的人类数字孪生（HDT）架构，该架构集成了用于知识报告、回答和推荐的大型语言模型（LLMs），体现在视觉界面中。该系统采用元认知方法，以使个性化、情境感知的响应与人类队友的期望保持一致。HDT作为一个视觉和行为真实的团队成员，贯穿任务生命周期的各个阶段，从训练到部署再到事后回顾。我们的架构包括语音识别、情境处理、AI驱动的对话、情感建模、唇语同步和多模态反馈。我们描述了系统设计、性能指标和未来发展方向，以实现更具适应性和真实性的HAT系统。

更新时间: 2025-04-04 03:56:26

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2504.03147v1

Comparative Analysis of Unsupervised and Supervised Autoencoders for Nuclei Classification in Clear Cell Renal Cell Carcinoma Images

This study explores the application of supervised and unsupervised autoencoders (AEs) to automate nuclei classification in clear cell renal cell carcinoma (ccRCC) images, a diagnostic task traditionally reliant on subjective visual grading by pathologists. We evaluate various AE architectures, including standard AEs, contractive AEs (CAEs), and discriminative AEs (DAEs), as well as a classifier-based discriminative AE (CDAE), optimized using the hyperparameter tuning tool Optuna. Bhattacharyya distance is selected from several metrics to assess class separability in the latent space, revealing challenges in distinguishing adjacent grades using unsupervised models. CDAE, integrating a supervised classifier branch, demonstrated superior performance in both latent space separation and classification accuracy. Given that CDAE-CNN achieved notable improvements in classification metrics, affirming the value of supervised learning for class-specific feature extraction, F1 score was incorporated into the tuning process to optimize classification performance. Results show significant improvements in identifying aggressive ccRCC grades by leveraging the classification capability of AE through latent clustering followed by fine-grained classification. Our model outperforms the current state of the art, CHR-Network, across all evaluated metrics. These findings suggest that integrating a classifier branch in AEs, combined with neural architecture search and contrastive learning, enhances grading automation in ccRCC pathology, particularly in detecting aggressive tumor grades, and may improve diagnostic accuracy.

Updated: 2025-04-04 03:52:32

标题: 无监督和监督自动编码器在清晰细胞肾细胞癌图像中的细胞核分类的比较分析

摘要: 这项研究探讨了监督和无监督自动编码器（AEs）在肾透明细胞肾癌（ccRCC）图像中自动化核分类的应用，这是一项传统依赖病理学家主观视觉评分的诊断任务。我们评估了各种AE架构，包括标准AEs，收缩AEs（CAEs）和判别AEs（DAEs），以及使用超参数调优工具Optuna优化的基于分类器的判别AE（CDAE）。从多个度量中选择Bhattacharyya距离来评估潜在空间中的类别可分性，揭示了使用无监督模型区分相邻等级的挑战。CDAE，集成了一个监督分类器分支，在潜在空间分离和分类准确性方面表现出优越性。鉴于CDAE-CNN在分类指标上取得了显着改进，确认监督学习用于特定类别特征提取的价值，F1分数被纳入调整过程以优化分类性能。结果显示通过利用AE的分类能力进行潜在聚类后细粒度分类，显著改善了识别侵袭性ccRCC等级的能力。我们的模型在所有评估指标上均优于当前的最新技术CHR-Network。这些发现表明，在ccRCC病理学中，将分类器分支整合到AEs中，结合神经结构搜索和对比学习，可以增强等级自动化，特别是在检测侵袭性肿瘤等级方面，并可能提高诊断准确性。

更新时间: 2025-04-04 03:52:32

领域: eess.IV,cs.CV,cs.LG,I.2.10; I.4.9

下载: http://arxiv.org/abs/2504.03146v1

One-Shot Heterogeneous Federated Learning with Local Model-Guided Diffusion Models

In recent years, One-shot Federated Learning methods based on Diffusion Models have garnered increasing attention due to their remarkable performance. However, most of these methods require the deployment of foundation models on client devices, which significantly raises the computational requirements and reduces their adaptability to heterogeneous client models compared to traditional FL methods. In this paper, we propose FedLMG, a heterogeneous one-shot Federated learning method with Local Model-Guided diffusion models. Briefly speaking, in FedLMG, clients do not need access to any foundation models but only train and upload their local models, which is consistent with traditional FL methods. On the clients, we employ classification loss and BN loss to capture the broad category features and detailed contextual features of the client distributions. On the server, based on the uploaded client models, we utilize backpropagation to guide the server's DM in generating synthetic datasets that comply with the client distributions, which are then used to train the aggregated model. By using the locally trained client models as a medium to transfer client knowledge, our method significantly reduces the computational requirements on client devices and effectively adapts to scenarios with heterogeneous clients. Extensive quantitation and visualization experiments on three large-scale real-world datasets, along with theoretical analysis, demonstrate that the synthetic datasets generated by FedLMG exhibit comparable quality and diversity to the client datasets, which leads to an aggregated model that outperforms all compared methods and even the performance ceiling, further elucidating the significant potential of utilizing DMs in FL.

Updated: 2025-04-04 03:46:28

标题: 一次性异构联邦学习与本地模型引导扩散模型

摘要: 最近，基于扩散模型的一次性联邦学习方法因其出色的性能而受到越来越多的关注。然而，大多数这些方法需要在客户端设备上部署基础模型，这显著提高了计算要求，并降低了它们相对于传统FL方法对异构客户模型的适应性。在本文中，我们提出了FedLMG，一种具有本地模型引导扩散模型的异构一次性联邦学习方法。简而言之，在FedLMG中，客户端不需要访问任何基础模型，而只需训练和上传其本地模型，这与传统FL方法一致。在客户端，我们使用分类损失和BN损失来捕捉客户分布的广泛类别特征和详细的上下文特征。在服务器端，基于上传的客户端模型，我们利用反向传播来指导服务器的DM生成符合客户分布的合成数据集，然后用它们来训练聚合模型。通过使用本地训练的客户端模型作为传递客户知识的媒介，我们的方法显著降低了客户端设备上的计算要求，并有效适应了具有异构客户的场景。对三个大型真实世界数据集的广泛量化和可视化实验，以及理论分析表明，由FedLMG生成的合成数据集展示出与客户数据集相当的质量和多样性，从而导致一个胜过所有比较方法甚至性能上限的聚合模型，进一步阐明了在FL中利用DM的重要潜力。

更新时间: 2025-04-04 03:46:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.08870v3

Entropy-Based Block Pruning for Efficient Large Language Models

As large language models continue to scale, their growing computational and storage demands pose significant challenges for real-world deployment. In this work, we investigate redundancy within Transformer-based models and propose an entropy-based pruning strategy to enhance efficiency while maintaining performance. Empirical analysis reveals that the entropy of hidden representations decreases in the early blocks but progressively increases across most subsequent blocks. This trend suggests that entropy serves as a more effective measure of information richness within computation blocks. Unlike cosine similarity, which primarily captures geometric relationships, entropy directly quantifies uncertainty and information content, making it a more reliable criterion for pruning. Extensive experiments demonstrate that our entropy-based pruning approach surpasses cosine similarity-based methods in reducing model size while preserving accuracy, offering a promising direction for efficient model deployment.

Updated: 2025-04-04 03:42:34

标题: 基于熵的块剪枝技术用于高效的大型语言模型

摘要: 随着大型语言模型的不断扩展，它们日益增长的计算和存储需求给实际部署带来了重大挑战。在这项工作中，我们调查了基于Transformer的模型内的冗余，并提出了一种基于熵的修剪策略，以提高效率同时保持性能。实证分析表明，隐藏表示的熵在早期块中减少，但在大多数后续块中逐渐增加。这一趋势表明，熵作为计算块内信息丰富度的更有效衡量指标。与余弦相似性不同，它主要捕捉几何关系，熵直接量化不确定性和信息内容，使其成为修剪的更可靠标准。广泛实验表明，我们基于熵的修剪方法在减小模型大小同时保持准确性方面超越了基于余弦相似性的方法，为高效模型部署提供了有希望的方向。

更新时间: 2025-04-04 03:42:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.03794v1

Outlook Towards Deployable Continual Learning for Particle Accelerators

Particle Accelerators are high power complex machines. To ensure uninterrupted operation of these machines, thousands of pieces of equipment need to be synchronized, which requires addressing many challenges including design, optimization and control, anomaly detection and machine protection. With recent advancements, Machine Learning (ML) holds promise to assist in more advance prognostics, optimization, and control. While ML based solutions have been developed for several applications in particle accelerators, only few have reached deployment and even fewer to long term usage, due to particle accelerator data distribution drifts caused by changes in both measurable and non-measurable parameters. In this paper, we identify some of the key areas within particle accelerators where continual learning can allow maintenance of ML model performance with distribution drifts. Particularly, we first discuss existing applications of ML in particle accelerators, and their limitations due to distribution drift. Next, we review existing continual learning techniques and investigate their potential applications to address data distribution drifts in accelerators. By identifying the opportunities and challenges in applying continual learning, this paper seeks to open up the new field and inspire more research efforts towards deployable continual learning for particle accelerators.

Updated: 2025-04-04 03:34:39

标题: 粒子加速器部署式持续学习展望

摘要: 粒子加速器是高功率复杂的机器。为了确保这些机器的不间断运行，需要同步成千上万件设备，这需要解决许多挑战，包括设计、优化和控制、异常检测和机器保护。随着最近的进展，机器学习（ML）有望帮助更先进的预测、优化和控制。虽然已经为粒子加速器的几个应用开发了基于ML的解决方案，但由于可测量和不可测量参数的变化导致了粒子加速器数据分布漂移，只有少数解决方案已经部署，甚至更少的解决方案已经长期使用。在本文中，我们确定了粒子加速器中一些关键领域，持续学习可以使ML模型在数据分布漂移时保持性能。特别地，我们首先讨论了粒子加速器中现有的ML应用及其由于数据分布漂移而受到的限制。接下来，我们回顾现有的持续学习技术，并探讨它们在解决加速器中数据分布漂移方面的潜在应用。通过识别应用持续学习的机会和挑战，本文旨在开辟新领域，并激发更多的研究努力，以实现可部署的粒子加速器持续学习。

更新时间: 2025-04-04 03:34:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03793v1

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

In this paper, we introduce Motion-Grounded Video Reasoning, a new motion understanding task that requires generating visual answers (video segmentation masks) according to the input question, and hence needs implicit spatiotemporal reasoning and grounding. This task extends existing spatiotemporal grounding work focusing on explicit action/motion grounding, to a more general format by enabling implicit reasoning via questions. To facilitate the development of the new task, we collect a large-scale dataset called GROUNDMORE, which comprises 1,715 video clips, 249K object masks that are deliberately designed with 4 question types (Causal, Sequential, Counterfactual, and Descriptive) for benchmarking deep and comprehensive motion reasoning abilities. GROUNDMORE uniquely requires models to generate visual answers, providing a more concrete and visually interpretable response than plain texts. It evaluates models on both spatiotemporal grounding and reasoning, fostering to address complex challenges in motion-related video reasoning, temporal perception, and pixel-level understanding. Furthermore, we introduce a novel baseline model named Motion-Grounded Video Reasoning Assistant (MORA). MORA incorporates the multimodal reasoning ability from the Multimodal LLM, the pixel-level perception capability from the grounding model (SAM), and the temporal perception ability from a lightweight localization head. MORA achieves respectable performance on GROUNDMORE outperforming the best existing visual grounding baseline model by an average of 21.5% relatively. We hope this novel and challenging task will pave the way for future advancements in robust and general motion understanding via video reasoning segmentation

Updated: 2025-04-04 03:20:03

标题: 基于运动的视频推理：在像素级别理解和感知运动

摘要: 在本文中，我们介绍了Motion-Grounded Video Reasoning，这是一个新的动作理解任务，需要根据输入问题生成视觉答案（视频分割掩模），因此需要隐含的时空推理和基础。这项任务扩展了现有的着重于显式动作/运动基础的时空基础工作，通过问题启用了隐含推理，使其更具一般性。为了促进新任务的发展，我们收集了一个名为GROUNDMORE的大规模数据集，包括1,715个视频片段、249,000个物体掩模，这些掩模特意设计了4种问题类型（因果、序列、反事实和描述）用于评估深度和全面的动作推理能力。GROUNDMORE独特之处在于要求模型生成视觉答案，提供比纯文本更具体和可视化解释性的响应。它对模型进行了时空基础和推理的评估，有助于解决与动作相关的视频推理、时间感知和像素级理解等复杂挑战。此外，我们引入了一个名为Motion-Grounded Video Reasoning Assistant（MORA）的新基线模型。MORA结合了Multimodal LLM的多模态推理能力、基础模型（SAM）的像素级感知能力以及轻量级的定位头的时间感知能力。MORA在GROUNDMORE上取得了可观的表现，相对于最佳现有的视觉基础基线模型，平均性能提高了21.5%。我们希望这个新颖而具有挑战性的任务将为未来通过视频推理分割实现强大和通用的动作理解方面的进步铺平道路。

更新时间: 2025-04-04 03:20:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.09921v2

LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph

Large Language Models (LLMs) have impressive capabilities in text understanding and zero-shot reasoning. However, delays in knowledge updates may cause them to reason incorrectly or produce harmful results. Knowledge Graphs (KGs) provide rich and reliable contextual information for the reasoning process of LLMs by structurally organizing and connecting a wide range of entities and relations. Existing KG-based LLM reasoning methods only inject KGs' knowledge into prompts in a textual form, ignoring its structural information. Moreover, they mostly rely on close-source models or open-source models with large parameters, which poses challenges to high resource consumption. To address this, we propose a novel Lightweight and efficient Prompt learning-ReasOning Framework for KGQA (LightPROF), which leverages the full potential of LLMs to tackle complex reasoning tasks in a parameter-efficient manner. Specifically, LightPROF follows a "Retrieve-Embed-Reason process", first accurately, and stably retrieving the corresponding reasoning graph from the KG through retrieval module. Next, through a Transformer-based Knowledge Adapter, it finely extracts and integrates factual and structural information from the KG, then maps this information to the LLM's token embedding space, creating an LLM-friendly prompt to be used by the LLM for the final reasoning. Additionally, LightPROF only requires training Knowledge Adapter and can be compatible with any open-source LLM. Extensive experiments on two public KGQA benchmarks demonstrate that LightPROF achieves superior performance with small-scale LLMs. Furthermore, LightPROF shows significant advantages in terms of input token count and reasoning time.

Updated: 2025-04-04 03:03:47

标题: LightPROF：一个用于知识图上大型语言模型的轻量级推理框架

摘要: 大型语言模型（LLMs）在文本理解和零-shot 推理方面具有令人印象深刻的能力。然而，知识更新的延迟可能会导致它们进行错误推理或产生有害结果。知识图（KGs）通过结构化组织和连接各种实体和关系，为LLMs的推理过程提供丰富可靠的上下文信息。现有基于KG的LLM推理方法仅以文本形式将KGs的知识注入提示中，忽略了其结构信息。此外，它们大多依赖于具有大参数的封闭源模型或开源模型，这对高资源消耗提出了挑战。为了解决这个问题，我们提出了一种新颖的轻量级高效提示学习-知识图问答框架（LightPROF），充分利用LLMs的潜力，以一种参数高效的方式处理复杂的推理任务。具体来说，LightPROF 遵循“检索-嵌入-推理过程”，首先通过检索模块准确、稳定地从KG中检索相应的推理图。接下来，通过基于Transformer的知识适配器，它精细提取和整合KG的事实和结构信息，然后将这些信息映射到LLM的令牌嵌入空间，创建一个LLM友好的提示供LLM最终进行推理。此外，LightPROF 仅需要训练知识适配器，可以与任何开源LLM兼容。在两个公共KGQA基准上的广泛实验表明，LightPROF 在小规模LLMs上表现出优异性能。此外，LightPROF 在输入令牌数量和推理时间方面显示出显著优势。

更新时间: 2025-04-04 03:03:47

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.03137v1

Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion

Medical Visual Question Answering (Med-VQA) answers clinical questions using medical images, aiding diagnosis. Designing the MedVQA system holds profound importance in assisting clinical diagnosis and enhancing diagnostic accuracy. Building upon this foundation, Hierarchical Medical VQA extends Medical VQA by organizing medical questions into a hierarchical structure and making level-specific predictions to handle fine-grained distinctions. Recently, many studies have proposed hierarchical MedVQA tasks and established datasets, However, several issues still remain: (1) imperfect hierarchical modeling leads to poor differentiation between question levels causing semantic fragmentation across hierarchies. (2) Excessive reliance on implicit learning in Transformer-based cross-modal self-attention fusion methods, which obscures crucial local semantic correlations in medical scenarios. To address these issues, this study proposes a HiCA-VQA method, including two modules: Hierarchical Prompting for fine-grained medical questions and Hierarchical Answer Decoders. The hierarchical prompting module pre-aligns hierarchical text prompts with image features to guide the model in focusing on specific image regions according to question types, while the hierarchical decoder performs separate predictions for questions at different levels to improve accuracy across granularities. The framework also incorporates a cross-attention fusion module where images serve as queries and text as key-value pairs. Experiments on the Rad-Restruct benchmark demonstrate that the HiCA-VQA framework better outperforms existing state-of-the-art methods in answering hierarchical fine-grained questions. This study provides an effective pathway for hierarchical visual question answering systems, advancing medical image understanding.

Updated: 2025-04-04 03:03:12

标题: 分层建模在医学视觉问答中的应用：跨注意力融合

摘要: 医学视觉问答（Med-VQA）利用医学图像回答临床问题，帮助诊断。设计MedVQA系统在协助临床诊断和提高诊断准确性方面具有重要意义。在此基础上，分层医学VQA通过将医学问题组织成层次结构并进行级别特定的预测，以处理细粒度的区分。最近，许多研究提出了分层MedVQA任务并建立了数据集，然而，仍然存在一些问题：（1）不完善的层次建模导致在不同层次之间的问题水平区分不佳，导致层次之间的语义分裂。（2）基于Transformer的跨模态自注意力融合方法过度依赖隐式学习，模糊了医学场景中关键的局部语义相关性。为了解决这些问题，本研究提出了一种HiCA-VQA方法，包括两个模块：用于细粒度医学问题的分层提示和分层答案解码器。分层提示模块将层次文本提示与图像特征预对齐，以指导模型专注于根据问题类型关注特定图像区域，而分层解码器则对不同层次的问题进行单独预测，以提高跨粒度的准确性。该框架还包括一个交叉注意力融合模块，其中图像作为查询，文本作为键值对。在Rad-Restruct基准上的实验证明，HiCA-VQA框架在回答分层细粒度问题方面优于现有的最先进方法。本研究为分层视觉问答系统提供了有效的途径，推动了医学图像理解的发展。

更新时间: 2025-04-04 03:03:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03135v1

Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks

Sinusoidal neural networks have been shown effective as implicit neural representations (INRs) of low-dimensional signals, due to their smoothness and high representation capacity. However, initializing and training them remain empirical tasks which lack on deeper understanding to guide the learning process. To fill this gap, our work introduces a theoretical framework that explains the capacity property of sinusoidal networks and offers robust control mechanisms for initialization and training. Our analysis is based on a novel amplitude-phase expansion of the sinusoidal multilayer perceptron, showing how its layer compositions produce a large number of new frequencies expressed as integer combinations of the input frequencies. This relationship can be directly used to initialize the input neurons, as a form of spectral sampling, and to bound the network's spectrum while training. Our method, referred to as TUNER (TUNing sinusoidal nEtwoRks), greatly improves the stability and convergence of sinusoidal INR training, leading to detailed reconstructions, while preventing overfitting.

Updated: 2025-04-04 02:54:01

标题: 调整频率：正弦神经网络的稳健训练

摘要: 正弦神经网络已被证明是低维信号的隐式神经表示（INRs）有效，因为它们的平滑性和高表示能力。然而，初始化和训练它们仍然是经验性的任务，缺乏更深入的理解来指导学习过程。为了填补这一空白，我们的工作引入了一个理论框架，解释了正弦网络的容量属性，并提供了初始化和训练的稳健控制机制。我们的分析基于正弦多层感知器的新型振幅-相位展开，展示了它的层组成如何产生大量表达为输入频率的整数组合的新频率。这种关系可以直接用来初始化输入神经元，作为一种谱采样形式，并在训练过程中限制网络的频谱。我们的方法，称为TUNER（TUNing sinusoidal nEtwoRks），极大地改善了正弦INR训练的稳定性和收敛性，导致详细的重建，同时防止过拟合。

更新时间: 2025-04-04 02:54:01

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.21121v3

Steering Large Agent Populations using Mean-Field Schrodinger Bridges with Gaussian Mixture Models

The Mean-Field Schrodinger Bridge (MFSB) problem is an optimization problem aiming to find the minimum effort control policy to drive a McKean-Vlassov stochastic differential equation from one probability measure to another. In the context of multiagent control, the objective is to control the configuration of a swarm of identical, interacting cooperative agents, as captured by the time-varying probability measure of their state. Available methods for solving this problem for distributions with continuous support rely either on spatial discretizations of the problem's domain or on approximating optimal solutions using neural networks trained through stochastic optimization schemes. For agents following Linear Time-Varying dynamics, and for Gaussian Mixture Model boundary distributions, we propose a highly efficient parameterization to approximate the solutions of the corresponding MFSB in closed form, without any learning steps. Our proposed approach consists of a mixture of elementary policies, each solving a Gaussian-to-Gaussian Covariance Steering problem from the components of the initial to the components of the terminal mixture. Leveraging the semidefinite formulation of the Covariance Steering problem, our proposed solver can handle probabilistic hard constraints on the system's state, while maintaining numerical tractability. We illustrate our approach on a variety of numerical examples.

Updated: 2025-04-04 02:53:28

标题: 使用均场薛定谔桥和高斯混合模型引导大型代理群体

摘要: Mean-Field Schrodinger Bridge（MFSB）问题是一个优化问题，旨在找到驱动McKean-Vlassov随机微分方程从一个概率测度到另一个概率测度的最小控制策略。在多智能体控制的背景下，目标是控制一群相互作用的合作智能体的配置，如它们状态的时变概率测度所捕捉的那样。解决具有连续支持的分布的这个问题的可用方法依赖于问题域的空间离散化或通过训练神经网络来近似最优解，这些神经网络是通过随机优化方案训练的。对于遵循线性时变动力学的智能体，以及高斯混合模型边界分布，我们提出了一种高效的参数化方法来以封闭形式近似相应MFSB的解，而无需任何学习步骤。我们提出的方法由一系列基本策略的混合组成，每个策略解决从初始混合的分量到终端混合的分量的高斯到高斯协方差控制问题。利用协方差控制问题的半定式表述，我们提出的求解器可以处理系统状态上的概率硬约束，同时保持数值可处理性。我们通过各种数值例子展示了我们的方法。

更新时间: 2025-04-04 02:53:28

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.23705v2

DP-LET: An Efficient Spatio-Temporal Network Traffic Prediction Framework

Accurately predicting spatio-temporal network traffic is essential for dynamically managing computing resources in modern communication systems and minimizing energy consumption. Although spatio-temporal traffic prediction has received extensive research attention, further improvements in prediction accuracy and computational efficiency remain necessary. In particular, existing decomposition-based methods or hybrid architectures often incur heavy overhead when capturing local and global feature correlations, necessitating novel approaches that optimize accuracy and complexity. In this paper, we propose an efficient spatio-temporal network traffic prediction framework, DP-LET, which consists of a data processing module, a local feature enhancement module, and a Transformer-based prediction module. The data processing module is designed for high-efficiency denoising of network data and spatial decoupling. In contrast, the local feature enhancement module leverages multiple Temporal Convolutional Networks (TCNs) to capture fine-grained local features. Meanwhile, the prediction module utilizes a Transformer encoder to model long-term dependencies and assess feature relevance. A case study on real-world cellular traffic prediction demonstrates the practicality of DP-LET, which maintains low computational complexity while achieving state-of-the-art performance, significantly reducing MSE by 31.8% and MAE by 23.1% compared to baseline models.

Updated: 2025-04-04 02:52:43

标题: DP-LET：一种高效的时空网络流量预测框架

摘要: 准确预测时空网络流量对于动态管理现代通信系统中的计算资源和最小化能耗至关重要。尽管时空流量预测已经得到广泛研究，但仍需要进一步提高预测准确性和计算效率。特别是，现有的基于分解的方法或混合架构在捕获局部和全局特征相关性时往往会产生重大开销，因此需要优化准确性和复杂度的新方法。本文提出了一种高效的时空网络流量预测框架DP-LET，包括数据处理模块、局部特征增强模块和基于Transformer的预测模块。数据处理模块设计用于高效去噪网络数据和空间解耦。相反，局部特征增强模块利用多个时间卷积网络（TCNs）来捕获精细的局部特征。同时，预测模块利用Transformer编码器来建模长期依赖关系和评估特征相关性。对真实世界的蜂窝流量预测案例研究显示了DP-LET的实用性，该框架保持低计算复杂度的同时实现了最先进的性能，将均方误差降低了31.8％，平均绝对误差降低了23.1％，与基线模型相比。

更新时间: 2025-04-04 02:52:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03792v1

Memorization in In-Context Learning

In-context learning (ICL) has proven to be an effective strategy for improving the performance of large language models (LLMs) with no additional training. However, the exact mechanism behind this performance improvement remains unclear. This study is the first to show how ICL surfaces memorized training data and to explore the correlation between this memorization and performance on downstream tasks across various ICL regimes: zero-shot, few-shot, and many-shot. Our most notable findings include: (1) ICL significantly surfaces memorization compared to zero-shot learning in most cases; (2) demonstrations, without their labels, are the most effective element in surfacing memorization; (3) ICL improves performance when the surfaced memorization in few-shot regimes reaches a high level (about 40%); and (4) there is a very strong correlation between performance and memorization in ICL when it outperforms zero-shot learning. Overall, our study uncovers memorization as a new factor impacting ICL, raising an important question: to what extent do LLMs truly generalize from demonstrations in ICL, and how much of their success is due to memorization?

Updated: 2025-04-04 02:50:14

标题: 在上下文学习中的记忆化

摘要: 上下文学习（ICL）已被证明是一种有效的策略，可以提高大型语言模型（LLMs）的性能，而无需额外训练。然而，这种性能改进背后的确切机制仍不清楚。本研究首次展示了ICL如何呈现出被记忆的训练数据，并探讨了在各种ICL制度下这种记忆与下游任务表现之间的相关性：零次、少次和多次。我们最显著的发现包括：（1）在大多数情况下，ICL相对于零次学习显著呈现出记忆；（2）没有标签的演示是呈现记忆最有效的元素；（3）当少次制度中呈现的记忆达到较高水平（约40％）时，ICL可以提高性能；以及（4）在ICL表现优于零次学习时，性能与记忆之间存在非常强的相关性。总的来说，我们的研究揭示了记忆作为影响ICL的一个新因素，引发了一个重要问题：LLMs在ICL中真正从演示中推广到何种程度，他们的成功多大程度上归因于记忆？

更新时间: 2025-04-04 02:50:14

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11546v3

GraphSeg: Segmented 3D Representations via Graph Edge Addition and Contraction

Robots operating in unstructured environments often require accurate and consistent object-level representations. This typically requires segmenting individual objects from the robot's surroundings. While recent large models such as Segment Anything (SAM) offer strong performance in 2D image segmentation. These advances do not translate directly to performance in the physical 3D world, where they often over-segment objects and fail to produce consistent mask correspondences across views. In this paper, we present GraphSeg, a framework for generating consistent 3D object segmentations from a sparse set of 2D images of the environment without any depth information. GraphSeg adds edges to graphs and constructs dual correspondence graphs: one from 2D pixel-level similarities and one from inferred 3D structure. We formulate segmentation as a problem of edge addition, then subsequent graph contraction, which merges multiple 2D masks into unified object-level segmentations. We can then leverage \emph{3D foundation models} to produce segmented 3D representations. GraphSeg achieves robust segmentation with significantly fewer images and greater accuracy than prior methods. We demonstrate state-of-the-art performance on tabletop scenes and show that GraphSeg enables improved performance on downstream robotic manipulation tasks. Code available at https://github.com/tomtang502/graphseg.git.

Updated: 2025-04-04 02:42:45

标题: GraphSeg：通过图边增加和收缩实现分段3D表示

摘要: 在非结构化环境中运行的机器人通常需要准确和一致的对象级表示。这通常需要将单个对象从机器人周围的环境中分割出来。尽管最近的大型模型如Segment Anything (SAM)在2D图像分割中表现出色。但这些进展并不能直接转化为在物理3D世界中的表现，其中它们经常过度分割对象，无法在不同视角下产生一致的掩模对应关系。在本文中，我们提出了GraphSeg，这是一个从环境的稀疏2D图像中生成一致的3D对象分割的框架，而不需要任何深度信息。GraphSeg向图中添加边并构建双重对应关系图：一个从2D像素级相似性中获得，另一个从推断的3D结构中获得。我们将分割问题形式化为边添加问题，然后进行后续的图收缩，将多个2D掩模合并为统一的对象级分割。然后我们可以利用\emph{3D基础模型}来生成分割的3D表示。GraphSeg以比以前方法更少的图像实现了强大的分割，并具有更高的准确性。我们在桌面场景上展示了最先进的性能，并展示了GraphSeg在下游机器人操作任务中实现了改进的性能。代码可在https://github.com/tomtang502/graphseg.git找到。

更新时间: 2025-04-04 02:42:45

领域: cs.RO,cs.AI,cs.CV,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.03129v1

From Observation to Orientation: an Adaptive Integer Programming Approach to Intervention Design

Using both observational and experimental data, a causal discovery process can identify the causal relationships between variables. A unique adaptive intervention design paradigm is presented in this work, where causal directed acyclic graphs (DAGs) are for effectively recovered with practical budgetary considerations. In order to choose treatments that optimize information gain under these considerations, an iterative integer programming (IP) approach is proposed, which drastically reduces the number of experiments required. Simulations over a broad range of graph sizes and edge densities are used to assess the effectiveness of the suggested approach. Results show that the proposed adaptive IP approach achieves full causal graph recovery with fewer intervention iterations and variable manipulations than random intervention baselines, and it is also flexible enough to accommodate a variety of practical constraints.

Updated: 2025-04-04 02:35:35

标题: 从观察到定向：一种自适应整数规划方法用于干预设计

摘要: 使用观察和实验数据，因果发现过程可以确定变量之间的因果关系。本文介绍了一种独特的自适应干预设计范式，其中因果有向无环图（DAGs）通过实际预算考虑得到有效恢复。为了在这些考虑下选择优化信息增益的治疗方案，提出了一种迭代整数规划（IP）方法，该方法显著减少了所需实验的数量。通过对各种图的大小和边密度进行模拟，评估了建议方法的有效性。结果表明，所提出的自适应IP方法在比随机干预基线更少的干预迭代和变量操作下实现了完整的因果图恢复，并且还足够灵活以适应各种实际约束条件。

更新时间: 2025-04-04 02:35:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.03122v1

Variational Search Distributions

We develop VSD, a method for conditioning a generative model of discrete, combinatorial designs on a rare desired class by efficiently evaluating a black-box (e.g. experiment, simulation) in a batch sequential manner. We call this task active generation; we formalize active generation's requirements and desiderata, and formulate a solution via variational inference. VSD uses off-the-shelf gradient based optimization routines, can learn powerful generative models for desirable designs, and can take advantage of scalable predictive models. We derive asymptotic convergence rates for learning the true conditional generative distribution of designs with certain configurations of our method. After illustrating the generative model on images, we empirically demonstrate that VSD can outperform existing baseline methods on a set of real sequence-design problems in various protein and DNA/RNA engineering tasks.

Updated: 2025-04-04 02:32:31

标题: 变分搜索分布

摘要: 我们开发了VSD，这是一种在离散、组合设计的生成模型上进行条件设定的方法，通过有效地批量顺序评估黑盒（例如实验、模拟）来对稀有的期望类别进行条件设定。我们称这种任务为主动生成；我们形式化了主动生成的需求和期望，并通过变分推断提出了解决方案。VSD使用现成的基于梯度的优化例程，可以学习出可取得理想设计的强大生成模型，并可以利用可扩展的预测模型。我们推导了学习具有我们方法特定配置的设计真实条件生成分布的渐近收敛速率。在展示了图像的生成模型后，我们在一系列真实序列设计问题中通过实证证明VSD可以在各种蛋白质和DNA/RNA工程任务中胜过现有基准方法。

更新时间: 2025-04-04 02:32:31

领域: stat.ML,cs.LG,G.3; G.2.1; I.2.6

下载: http://arxiv.org/abs/2409.06142v5

Graph Network Modeling Techniques for Visualizing Human Mobility Patterns

Human mobility analysis at urban-scale requires models to represent the complex nature of human movements, which in turn are affected by accessibility to nearby points of interest, underlying socioeconomic factors of a place, and local transport choices for people living in a geographic region. In this work, we represent human mobility and the associated flow of movements as a grapyh. Graph-based approaches for mobility analysis are still in their early stages of adoption and are actively being researched. The challenges of graph-based mobility analysis are multifaceted - the lack of sufficiently high-quality data to represent flows at high spatial and teporal resolution whereas, limited computational resources to translate large voluments of mobility data into a network structure, and scaling issues inherent in graph models etc. The current study develops a methodology by embedding graphs into a continuous space, which alleviates issues related to fast graph matching, graph time-series modeling, and visualization of mobility dynamics. Through experiments, we demonstrate how mobility data collected from taxicab trajectories could be transformed into network structures and patterns of mobility flow changes, and can be used for downstream tasks reporting approx 40% decrease in error on average in matched graphs vs unmatched ones.

Updated: 2025-04-04 02:21:44

标题: 图网络建模技术用于可视化人类移动模式

摘要: 城市尺度上的人类流动分析需要模型来表示人类移动的复杂性质，这又受到附近点的可访问性、地点的基础经济因素以及地理区域居民的本地交通选择的影响。在这项工作中，我们将人类流动及相关流动的流量表示为一个图形。基于图形的移动性分析方法仍处于早期阶段，并正在积极研究中。基于图形的移动性分析面临多方面挑战 - 缺乏足够高质量的数据来表示高空间和时间分辨率下的流动，同时计算资源有限，无法将大量的流动数据转换为网络结构，并且图形模型固有的扩展问题等等。本研究通过将图形嵌入连续空间的方法开发出一种方法论，这可以缓解与快速图形匹配、图形时间序列建模以及流动性动态可视化相关的问题。通过实验，我们展示了如何将出租车轨迹收集的流动数据转换为网络结构和流动流量变化模式，并可用于下游任务，报告匹配图形与不匹配图形相比平均误差减少约40％。

更新时间: 2025-04-04 02:21:44

领域: cs.SI,cs.AI,stat.ML

下载: http://arxiv.org/abs/2504.03119v1

NuWa: Deriving Lightweight Task-Specific Vision Transformers for Edge Devices

Vision Transformers (ViTs) excel in computer vision tasks but lack flexibility for edge devices' diverse needs. A vital issue is that ViTs pre-trained to cover a broad range of tasks are \textit{over-qualified} for edge devices that usually demand only part of a ViT's knowledge for specific tasks. Their task-specific accuracy on these edge devices is suboptimal. We discovered that small ViTs that focus on device-specific tasks can improve model accuracy and in the meantime, accelerate model inference. This paper presents NuWa, an approach that derives small ViTs from the base ViT for edge devices with specific task requirements. NuWa can transfer task-specific knowledge extracted from the base ViT into small ViTs that fully leverage constrained resources on edge devices to maximize model accuracy with inference latency assurance. Experiments with three base ViTs on three public datasets demonstrate that compared with state-of-the-art solutions, NuWa improves model accuracy by up to $\text{11.83}\%$ and accelerates model inference by 1.29$\times$ - 2.79$\times$. Code for reproduction is available at https://anonymous.4open.science/r/Task_Specific-3A5E.

Updated: 2025-04-04 02:19:01

标题: NuWa：为边缘设备推导轻量级任务特定视觉Transformer

摘要: Vision Transformers (ViTs)在计算机视觉任务中表现出色，但缺乏对边缘设备多样化需求的灵活性。一个关键问题是，ViTs经过预训练以涵盖广泛任务范围，对于通常仅需要ViT部分知识的边缘设备来说是“过度合格”的。它们在这些边缘设备上的特定任务准确性不佳。我们发现，专注于设备特定任务的小型ViTs可以提高模型准确性，并同时加速模型推理。本文提出了NuWa，一种从基础ViT派生出适用于特定任务要求的边缘设备的小型ViTs的方法。NuWa可以将从基础ViT中提取的特定任务知识转移至小型ViTs，充分利用边缘设备上的有限资源，以确保最大化模型准确性并保证推理延迟。在三个公共数据集上进行的实验表明，与最先进的解决方案相比，NuWa将模型准确性提高了最多11.83％，并将模型推理加速了1.29倍至2.79倍。可用于重现的代码位于https://anonymous.4open.science/r/Task_Specific-3A5E。

更新时间: 2025-04-04 02:19:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03118v1

The Amenability Framework: Rethinking Causal Ordering Without Estimating Causal Effects

Who should we prioritize for intervention when we cannot estimate intervention effects? In many applied domains (e.g., advertising, customer retention, and behavioral nudging) prioritization is guided by predictive models that estimate outcome probabilities rather than causal effects. This paper investigates when these predictions (scores) can effectively rank individuals by their intervention effects, particularly when direct effect estimation is infeasible or unreliable. We propose a conceptual framework based on amenability: an individual's latent proclivity to be influenced by an intervention. We then formalize conditions under which predictive scores serve as effective proxies for amenability. These conditions justify using non-causal scores for intervention prioritization, even when the scores do not directly estimate effects. We further show that, under plausible assumptions, predictive models can outperform causal effect estimators in ranking individuals by intervention effects. Empirical evidence from an advertising context supports our theoretical findings, demonstrating that predictive modeling can offer a more robust approach to targeting than effect estimation. Our framework suggests a shift in focus, from estimating effects to inferring who is amenable, as a practical and theoretically grounded strategy for prioritizing interventions in resource-constrained environments.

Updated: 2025-04-04 02:16:54

标题: 《可适应性框架：重新思考无需估计因果效应的因果排序》

摘要: 在我们无法估计干预效果时，我们应该优先考虑谁进行干预？在许多应用领域（如广告、客户保留和行为引导）中，优先考虑是由估计结果概率而不是因果效应的预测模型所指导的。本文调查了这些预测（分数）何时可以有效地按其干预效果对个体进行排名，特别是当直接效应估计不可行或不可靠时。我们提出了一个基于可接受性的概念框架：一个个体被干预影响的潜在倾向。然后，我们形式化了预测分数作为有效可接受性代理的条件。这些条件证明了即使分数不直接估计效果，也可以使用非因果分数进行干预优先排序。我们进一步表明，在合理的假设下，预测模型可以在排名个体干预效果方面胜过因果效应估计器。来自广告环境的实证证据支持我们的理论发现，表明预测建模可以提供比效果估计更稳健的定位方法。我们的框架提议将注意力从估计效果转向推断谁是可接受的，作为在资源受限环境中优先考虑干预的实用和理论基础的策略。

更新时间: 2025-04-04 02:16:54

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2504.02456v2

A User-Tunable Machine Learning Framework for Step-Wise Synthesis Planning

We introduce MHNpath, a machine learning-driven retrosynthetic tool designed for computer-aided synthesis planning. Leveraging modern Hopfield networks and novel comparative metrics, MHNpath efficiently prioritizes reaction templates, improving the scalability and accuracy of retrosynthetic predictions. The tool incorporates a tunable scoring system that allows users to prioritize pathways based on cost, reaction temperature, and toxicity, thereby facilitating the design of greener and cost-effective reaction routes. We demonstrate its effectiveness through case studies involving complex molecules from ChemByDesign, showcasing its ability to predict novel synthetic and enzymatic pathways. Furthermore, we benchmark MHNpath against existing frameworks, replicating experimentally validated "gold-standard" pathways from PaRoutes. Our case studies reveal that the tool can generate shorter, cheaper, moderate-temperature routes employing green solvents, as exemplified by compounds such as dronabinol, arformoterol, and lupinine.

Updated: 2025-04-04 01:56:47

标题: 一个用户可调的机器学习框架用于逐步合成规划

摘要: 我们介绍了MHNpath，这是一个基于机器学习的逆合成工具，旨在用于计算辅助合成规划。利用现代Hopfield网络和新颖的比较指标，MHNpath能够有效地优先考虑反应模板，提高逆合成预测的可扩展性和准确性。该工具结合了可调节的评分系统，允许用户根据成本、反应温度和毒性优先考虑途径，从而促进更环保和成本效益的反应路径设计。我们通过涉及ChemByDesign中复杂分子的案例研究展示了其有效性，展示了其预测新颖合成和酶途径的能力。此外，我们将MHNpath与现有框架进行了基准测试，复制了实验验证的PaRoutes中的“黄金标准”途径。我们的案例研究表明，该工具可以生成更短、更便宜、中等温度的途径，采用绿色溶剂，如dronabinol、arformoterol和lupinine等化合物。

更新时间: 2025-04-04 01:56:47

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2504.02191v2

Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents

Large Language Model (LLM) agents are autonomous systems powered by LLMs, capable of reasoning and planning to solve problems by leveraging a set of tools. However, the integration of multi-tool capabilities in LLM agents introduces challenges in securely managing tools, ensuring their compatibility, handling dependency relationships, and protecting control flows within LLM agent workflows. In this paper, we present the first systematic security analysis of task control flows in multi-tool-enabled LLM agents. We identify a novel threat, Cross-Tool Harvesting and Polluting (XTHP), which includes multiple attack vectors to first hijack the normal control flows of agent tasks, and then collect and pollute confidential or private information within LLM agent systems. To understand the impact of this threat, we developed Chord, a dynamic scanning tool designed to automatically detect real-world agent tools susceptible to XTHP attacks. Our evaluation of 73 real-world tools from the repositories of two major LLM agent development frameworks, LangChain and LlamaIndex, revealed a significant security concern: 80% of the tools are vulnerable to hijacking attacks, 78% to XTH attacks, and 41% to XTP attacks, highlighting the prevalence of this threat.

Updated: 2025-04-04 01:41:06

标题: 《Les Dissonances：多工具增强的LLM代理中的跨工具收集和污染》

摘要: 大型语言模型（LLM）代理是由LLM驱动的自主系统，能够通过利用一组工具进行推理和规划来解决问题。然而，在LLM代理中集成多工具功能会引入一些挑战，如安全管理工具、确保其兼容性、处理依赖关系以及保护LLM代理工作流程中的控制流。本文首次系统地对多工具启用的LLM代理中的任务控制流进行了安全分析。我们确定了一种新型威胁，即跨工具收割和污染（XTHP），其中包括多个攻击向量，首先劫持代理任务的正常控制流，然后在LLM代理系统中收集和污染机密或私人信息。为了了解这种威胁的影响，我们开发了Chord，一个动态扫描工具，旨在自动检测真实世界中容易受到XTHP攻击的代理工具。我们对来自两个主要LLM代理开发框架LangChain和LlamaIndex存储库的73个真实工具进行了评估，结果显示一个重要的安全问题：80％的工具容易受到劫持攻击，78％容易受到XTH攻击，41％容易受到XTP攻击，突出了这种威胁的普遍存在。

更新时间: 2025-04-04 01:41:06

领域: cs.CR

下载: http://arxiv.org/abs/2504.03111v1

MT2ST: Adaptive Multi-Task to Single-Task Learning

Efficient machine learning (ML) has become increasingly important as models grow larger and data volumes expand. In this work, we address the trade-off between generalization in multi-task learning (MTL) and precision in single-task learning (STL) by introducing the Multi-Task to Single-Task (MT2ST) framework. MT2ST is designed to enhance training efficiency and accuracy in multi-modal tasks, showcasing its value as a practical application of efficient ML.

Updated: 2025-04-04 01:40:33

标题: MT2ST：自适应多任务到单任务学习

摘要: 高效的机器学习（ML）在模型变得更大和数据量扩大的情况下变得越来越重要。在这项工作中，我们通过引入多任务学习（MTL）中的泛化和单任务学习（STL）中的精度之间的权衡来介绍多任务到单任务（MT2ST）框架。MT2ST旨在增强多模态任务中的训练效率和准确性，展示其作为高效ML实际应用的价值。

更新时间: 2025-04-04 01:40:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.18038v5

Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

We study contextual dynamic pricing problems where a firm sells products to $T$ sequentially-arriving consumers, behaving according to an unknown demand model. The firm aims to minimize its regret over a clairvoyant that knows the model in advance. The demand follows a generalized linear model (GLM), allowing for stochastic feature vectors in $\mathbb R^d$ encoding product and consumer information. We first show the optimal regret is of order $\sqrt{dT}$, up to logarithmic factors, improving existing upper bounds by a $\sqrt{d}$ factor. This optimal rate is materialized by two algorithms: a confidence bound-type algorithm and an explore-then-commit (ETC) algorithm. A key insight is an intrinsic connection between dynamic pricing and contextual multi-armed bandit problems with many arms with a careful discretization. We further study contextual dynamic pricing under local differential privacy (LDP) constraints. We propose a stochastic gradient descent-based ETC algorithm achieving regret upper bounds of order $d\sqrt{T}/\epsilon$, up to logarithmic factors, where $\epsilon>0$ is the privacy parameter. The upper bounds with and without LDP constraints are matched by newly constructed minimax lower bounds, characterizing costs of privacy. Moreover, we extend our study to dynamic pricing under mixed privacy constraints, improving the privacy-utility tradeoff by leveraging public data. This is the first time such setting is studied in the dynamic pricing literature and our theoretical results seamlessly bridge dynamic pricing with and without LDP. Extensive numerical experiments and real data applications are conducted to illustrate the efficiency and practical value of our algorithms.

Updated: 2025-04-04 01:39:21

标题: 上下文动态定价：算法，最优性和局部差分隐私约束

摘要: 我们研究了上下文动态定价问题，其中一家公司向$T$个顺序到达的消费者销售产品，这些消费者根据未知的需求模型行事。该公司旨在最小化其对一个能提前知道模型的预见者的后悔。需求遵循广义线性模型（GLM），允许在$\mathbb R^d$中进行编码的产品和消费者信息的随机特征向量。我们首先表明，最佳后悔率是$O(\sqrt{dT})$，在对数因子方面进行改进，将现有的上限提高了$\sqrt{d}$倍。这一最佳速率通过两种算法实现：一种置信度界类型算法和一种探索-然后承诺（ETC）算法。一个关键的见解是动态定价与具有精心离散化的许多臂的上下文多臂老虎机问题之间的内在联系。我们进一步研究了在局部差分隐私（LDP）约束下的上下文动态定价。我们提出了一种基于随机梯度下降的ETC算法，实现了$O(d\sqrt{T}/\epsilon)$的后悔上限，直到对数因子，其中$\epsilon>0$是隐私参数。具有和没有LDP约束的上限由新构造的极小下限匹配，表征了隐私成本。此外，我们将研究扩展到混合隐私约束下的动态定价，通过利用公共数据来改进隐私和效用的权衡。这是第一次在动态定价文献中研究这种设置，我们的理论结果无缝地将具有和没有LDP的动态定价联系起来。我们进行了大量的数值实验和实际数据应用，以说明我们的算法的效率和实际价值。

更新时间: 2025-04-04 01:39:21

领域: cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2406.02424v3

Towards More Accurate US Presidential Election via Multi-step Reasoning with Large Language Models

Can Large Language Models (LLMs) accurately predict election outcomes? While LLMs have demonstrated impressive performance in various domains, including healthcare, legal analysis, and creative tasks, their ability to forecast elections remains unknown. Election prediction poses unique challenges, such as limited voter-level data, rapidly changing political landscapes, and the need to model complex human behavior. To address these challenges, we introduce a multi-step reasoning framework designed for political analysis. Our approach is validated on real-world data from the American National Election Studies (ANES) 2016 and 2020, as well as synthetic personas generated by the leading machine learning framework, offering scalable datasets for voter behavior modeling. To capture temporal dynamics, we incorporate candidates' policy positions and biographical details, ensuring that the model adapts to evolving political contexts. Drawing on Chain of Thought prompting, our multi-step reasoning pipeline systematically integrates demographic, ideological, and time-dependent factors, enhancing the model's predictive power.

Updated: 2025-04-04 01:33:20

标题: 朝向更准确的美国总统选举：通过大型语言模型的多步推理

摘要: 大型语言模型（LLMs）能够准确预测选举结果吗？虽然LLMs在各个领域展示出了令人印象深刻的表现，包括医疗保健、法律分析和创意任务，但它们对于预测选举的能力仍然未知。选举预测面临着独特的挑战，例如有限的选民级别数据、政治景观的快速变化以及需要对复杂人类行为进行建模。为了解决这些挑战，我们引入了一种为政治分析而设计的多步推理框架。我们的方法在美国国家选举研究（ANES）2016年和2020年的真实数据以及由主要机器学习框架生成的虚拟人物上进行了验证，为选民行为建模提供了可扩展的数据集。为了捕捉时间动态，我们结合了候选人的政策立场和个人资料，确保模型适应不断变化的政治背景。借鉴Chain of Thought提示，我们的多步推理流程系统地整合了人口统计、意识形态和时间依赖因素，增强了模型的预测能力。

更新时间: 2025-04-04 01:33:20

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.03321v3

Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation

Background:Convolutional Neural Networks(CNN) and Vision Transformers(ViT) are the main techniques used in Medical image segmentation. However, CNN is limited to local contextual information, and ViT's quadratic complexity results in significant computational costs. At the same time, equipping the model to distinguish lesion boundaries with varying degrees of severity is also a challenge encountered in skin lesion segmentation. Purpose:This research aims to optimize the balance between computational costs and long-range dependency modelling and achieve excellent generalization across lesions with different degrees of severity. Methods:we propose a lightweight U-shape network that utilizes Vision Fastformer with Fusion Mechanism (VFFM-UNet). We inherit the advantages of Fastformer's additive attention mechanism, combining element-wise product and matrix product for comprehensive feature extraction and channel reduction to save computational costs. In order to accurately identify the lesion boundaries with varying degrees of severity, we designed Fusion Mechanism including Multi-Granularity Fusion and Channel Fusion, which can process the feature maps in the granularity and channel levels to obtain different contextual information. Results:Comprehensive experiments on the ISIC2017, ISIC2018 and PH2 datasets demonstrate that VFFM-UNet outperforms existing state-of-the-art models regarding parameter numbers, computational complexity and segmentation performance. In short, compared to MISSFormer, our model achieves superior segmentation performance while reducing parameter and computation costs by 101x and 15x, respectively. Conclusions:Both quantitative and qualitative analyses show that VFFM-UNet sets a new benchmark by reaching an ideal balance between parameter numbers, computational complexity, and segmentation performance compared to existing state-of-the-art models.

Updated: 2025-04-04 01:27:43

标题: 多粒度视觉快速变压器与融合机制用于皮肤病变分割

摘要: 背景: 卷积神经网络(CNN)和视觉Transformer(ViT)是医学图像分割中主要使用的技术。然而，CNN受限于局部上下文信息，ViT的二次复杂度导致了显著的计算成本。同时，装备模型以区分具有不同严重程度的病变边界也是皮肤病变分割中遇到的挑战。目的: 本研究旨在优化计算成本和长程依赖建模之间的平衡，并在不同严重程度的病变之间实现出色的泛化。方法:我们提出了一种轻量级的U形网络，利用Vision Fastformer with Fusion Mechanism (VFFM-UNet)。我们继承了Fastformer的加性注意机制的优势，结合逐元素乘积和矩阵乘积进行全面特征提取和通道减少以节省计算成本。为了准确识别具有不同严重程度的病变边界，我们设计了融合机制，包括多粒度融合和通道融合，可以在粒度和通道级别处理特征图以获得不同的上下文信息。结果:对ISIC2017、ISIC2018和PH2数据集的综合实验表明，VFFM-UNet在参数数量、计算复杂性和分割性能方面优于现有的最先进模型。简而言之，与MISSFormer相比，我们的模型在减少参数和计算成本方面分别提高了101倍和15倍的分割性能。结论:定量和定性分析表明，与现有最先进模型相比，VFFM-UNet在参数数量、计算复杂性和分割性能之间达到了理想的平衡，树立了新的基准。

更新时间: 2025-04-04 01:27:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.03108v1

RBT4DNN: Requirements-based Testing of Neural Networks

Deep neural network (DNN) testing is crucial for the reliability and safety of critical systems, where failures can have severe consequences. Although various techniques have been developed to create robustness test suites, requirements-based testing for DNNs remains largely unexplored - yet such tests are recognized as an essential component of software validation of critical systems. In this work, we propose a requirements-based test suite generation method that uses structured natural language requirements formulated in a semantic feature space to create test suites by prompting text-conditional latent diffusion models with the requirement precondition and then using the associated postcondition to define a test oracle to judge outputs of the DNN under test. We investigate the approach using fine-tuned variants of pre-trained generative models. Our experiments on the MNIST, CelebA-HQ, ImageNet, and autonomous car driving datasets demonstrate that the generated test suites are realistic, diverse, consistent with preconditions, and capable of revealing faults.

Updated: 2025-04-04 01:24:07

标题: RBT4DNN：基于需求的神经网络测试

摘要: 深度神经网络（DNN）测试对于关键系统的可靠性和安全性至关重要，其中故障可能产生严重后果。尽管已经开发了各种技术来创建健壮性测试套件，但基于需求的DNN测试仍然较少被探索 - 然而这些测试被认为是关键系统软件验证的必要组成部分。在这项工作中，我们提出了一种基于需求的测试套件生成方法，该方法使用结构化自然语言需求在语义特征空间中制定测试套件，通过提示文本条件潜在扩散模型与需求前提，并使用相关后置条件定义测试预言，以判断被测试的DNN的输出。我们使用微调的预训练生成模型的变体来研究这种方法。我们在MNIST、CelebA-HQ、ImageNet和自动驾驶数据集上的实验表明，生成的测试套件具有现实性、多样性、与前提一致，并能够揭示故障。

更新时间: 2025-04-04 01:24:07

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.02737v2

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Reinforcement learning (RL) has recently shown strong potential in improving the reasoning capabilities of large language models and is now being actively extended to vision-language models (VLMs). However, existing RL applications in VLMs often rely on heavily engineered frameworks that hinder reproducibility and accessibility, while lacking standardized evaluation protocols, making it difficult to compare results or interpret training dynamics. This work introduces a transparent, from-scratch framework for RL in VLMs, offering a minimal yet functional four-step pipeline validated across multiple models and datasets. In addition, a standardized evaluation scheme is proposed to assess training dynamics and reflective behaviors. Extensive experiments on visual reasoning tasks uncover key empirical findings: response length is sensitive to random seeds, reflection correlates with output length, and RL consistently outperforms supervised fine-tuning (SFT) in generalization, even with high-quality data. These findings, together with the proposed framework, aim to establish a reproducible baseline and support broader engagement in RL-based VLM research.

Updated: 2025-04-04 01:07:06

标题: 重新思考视觉语言模型的强化学习规模：一个透明、从头开始的框架和全面评估方案

摘要: 强化学习（RL）最近在提高大型语言模型的推理能力方面表现出了强大的潜力，并且现在正积极地扩展到视觉-语言模型（VLMs）中。然而，在VLMs中现有的RL应用通常依赖于繁重的工程框架，这些框架阻碍了可重现性和可访问性，同时缺乏标准化的评估协议，使结果比较或解释训练动态变得困难。本研究引入了一个透明的，从零开始的RL框架，为VLMs提供了一个最小但功能完备的四步流程，经过多个模型和数据集的验证。此外，提出了一个标准化的评估方案，用于评估训练动态和反思行为。在视觉推理任务上进行的大量实验揭示了关键的实证发现：响应长度对随机种子敏感，反思与输出长度相关，RL在泛化上始终优于监督微调（SFT），即使使用高质量数据也是如此。这些发现，以及提出的框架，旨在建立一个可重现的基线，并支持更广泛地参与基于RL的VLM研究。

更新时间: 2025-04-04 01:07:06

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2504.02587v2

Learning Broken Symmetries with Approximate Invariance

Recognizing symmetries in data allows for significant boosts in neural network training, which is especially important where training data are limited. In many cases, however, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data, due to asymmetries in the detector, or varying response resolution as a function of particle momentum. Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry, effectively overconstraining the response of the neural network. We propose a learning model which balances the generality and asymptotic performance of unconstrained networks with the rapid learning of constrained networks. This is achieved through a dual-subnet structure, where one network is constrained by the symmetry and the other is not, along with a learned symmetry factor. In a simplified toy example that demonstrates violation of Lorentz invariance, our model learns as rapidly as symmetry-constrained networks but escapes its performance limitations.

Updated: 2025-04-04 00:58:59

标题: 学习具有近似不变性的破缺对称性

摘要: 识别数据中的对称性可以显著提升神经网络训练效果，特别是在训练数据有限的情况下尤为重要。然而，在许多情况下，精确的基础对称性仅存在于理想化的数据集中，在实际数据中由于探测器的不对称性或者粒子动量函数响应分辨率的变化而被打破。标准方法，如数据增强或等变网络都无法有效地表示完整、破碎对称性的性质，实际上过度约束了神经网络的响应。我们提出了一种学习模型，它平衡了无约束网络的一般性和渐近性能与受约束网络的快速学习。通过一个双子网络结构实现，其中一个网络受到对称性的约束，另一个网络则没有，并具有一个学习到的对称因子。在一个简化的玩具示例中，展示了洛伦兹不变性的违反，我们的模型学习速度与受对称性约束的网络相当，但避开了性能限制。

更新时间: 2025-04-04 00:58:59

领域: hep-ph,cs.LG,hep-ex

下载: http://arxiv.org/abs/2412.18773v2

FoundationStereo: Zero-Shot Stereo Matching

Tremendous progress has been made in deep stereo matching to excel on benchmark datasets through per-domain fine-tuning. However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching. We introduce FoundationStereo, a foundation model for stereo depth estimation designed to achieve strong zero-shot generalization. To this end, we first construct a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism, followed by an automatic self-curation pipeline to remove ambiguous samples. We then design a number of network architecture components to enhance scalability, including a side-tuning feature backbone that adapts rich monocular priors from vision foundation models to mitigate the sim-to-real gap, and long-range context reasoning for effective cost volume filtering. Together, these components lead to strong robustness and accuracy across domains, establishing a new standard in zero-shot stereo depth estimation. Project page: https://nvlabs.github.io/FoundationStereo/

Updated: 2025-04-04 00:51:17

标题: FoundationStereo：零样本立体匹配

摘要: 在通过领域内精细调整在基准数据集上取得卓越表现的深度立体匹配方面已经取得了巨大进展。然而，在立体匹配中实现强大的零样数据泛化——其他计算机视觉任务中基础模型的特点——仍然具有挑战性。我们引入了FoundationStereo，这是一个专为立体深度估计设计的基础模型，旨在实现强大的零样数据泛化。为此，我们首先构建了一个大规模（100万对立体图像）的合成训练数据集，具有较大的多样性和高度逼真性，并通过自动自我整理流程来移除模糊样本。然后，我们设计了多个网络架构组件来增强可扩展性，包括一个侧调谐特征骨干，该骨干可以从视觉基础模型中适应丰富的单眼先验知识，以减轻模拟到真实的差距，以及用于有效成本体积滤波的长程上下文推理。这些组件共同实现了跨领域的强大鲁棒性和准确性，建立了零样数据立体深度估计的新标准。项目页面：https://nvlabs.github.io/FoundationStereo/

更新时间: 2025-04-04 00:51:17

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2501.09898v4

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models

Increasing test-time computation has emerged as a promising direction for improving language model performance, particularly in scenarios where model finetuning is impractical or impossible due to computational constraints or private model weights. However, existing test-time search methods using a reward model (RM) often degrade in quality as compute scales, due to the over-optimization of what are inherently imperfect reward proxies. We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access. We demonstrate the effectiveness of QAlign on mathematical reasoning benchmarks (GSM8K and GSM-Symbolic) using a task-specific RM, showing consistent improvements over existing test-time compute methods like best-of-n and majority voting. Furthermore, when applied with more realistic RMs trained on the Tulu 3 preference dataset, QAlign outperforms direct preference optimization (DPO), best-of-n, majority voting, and weighted majority voting on a diverse range of datasets (GSM8K, MATH500, IFEval, MMLU-Redux, and TruthfulQA). A practical solution to aligning language models at test time using additional computation without degradation, our approach expands the limits of the capability that can be obtained from off-the-shelf language models without further training.

Updated: 2025-04-04 00:41:40

标题: 样本，不要搜索：重新思考语言模型的测试时间对齐

摘要: 增加测试时间计算已经成为提高语言模型性能的一个有前途的方向，特别是在模型微调由于计算限制或私人模型权重而变得不切实际或不可能的场景中。然而，现有的使用奖励模型（RM）的测试时间搜索方法往往在计算规模扩大时质量下降，这是由于根本上不完美的奖励代理的过度优化。我们引入了QAlign，一种新的测试时间对齐方法。随着测试时间计算的扩展，QAlign会收敛到从每个单独提示的最佳对齐分布中进行采样。通过采用最近在马尔可夫链蒙特卡洛文本生成方面的进展，我们的方法能够在不修改基础模型甚至不需要逻辑访问的情况下实现更好的对齐输出。我们在数学推理基准测试（GSM8K和GSM-Symbolic）上展示了QAlign的有效性，使用了一个特定任务的RM，显示出与现有的测试时间计算方法（如最佳-n和多数投票）相比的持续改进。此外，当与在Tulu 3偏好数据集上训练的更真实的RM一起应用时，QAlign在各种数据集上（GSM8K、MATH500、IFEval、MMLU-Redux和TruthfulQA）上性能优于直接偏好优化（DPO）、最佳-n、多数投票和加权多数投票。作为一种实际的解决方案，通过额外的计算在测试时间对齐语言模型，我们的方法扩展了可以从现成的语言模型中获得的能力的极限，而无需进一步训练。

更新时间: 2025-04-04 00:41:40

领域: cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.03790v1

A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials

In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=\tfrac{1}{\sqrt{1+\sigma^2}}(\Pi_* X Q_* + \sigma Z)$, where $X$ is an $n*d$ standard Gaussian design matrix, $Z$ is an $n*m$ Gaussian noise matrix, $\Pi_*$ is an unknown $n*n$ permutation matrix, and $Q_*$ is an unknown $d*m$ on the Grassmanian manifold satisfying $Q_*^{\top} Q_* = \mathbb I_m$. Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$ are independent Gaussian random matrices of sizes $n*d$ and $n*m$, respectively. Our results reveal a phase transition phenomenon in the performance of low-degree polynomial algorithms for this task. (1) When $m=o(d)$, we show that all degree-$D$ polynomials fail to distinguish these two models even when $\sigma=0$, provided with $D^4=o\big( \tfrac{d}{m} \big)$. (2) When $m=d$ and $\sigma=\omega(1)$, we show that all degree-$D$ polynomials fail to distinguish these two models provided with $D=o(\sigma)$. (3) When $m=d$ and $\sigma=o(1)$, we show that there exists a constant-degree polynomial that strongly distinguish these two models. These results establish a smooth transition in the effectiveness of low-degree polynomial algorithms for this problem, highlighting the interplay between the dimensions $m$ and $d$, the noise level $\sigma$, and the computational complexity of the testing task.

Updated: 2025-04-04 00:32:38

标题: 使用低次多项式检测多元混乱线性回归的计算过渡

摘要: 在这篇论文中，我们研究了多元混洗线性回归的问题，其中线性模型中的预测变量和响应之间的对应关系被一个潜在的排列混淆。具体而言，我们研究了模型$Y=\tfrac{1}{\sqrt{1+\sigma^2}}(\Pi_* X Q_* + \sigma Z)$，其中$X$是一个$n*d$标准高斯设计矩阵，$Z$是一个$n*m$高斯噪声矩阵，$\Pi_*$是一个未知的$n*n$排列矩阵，$Q_*$是一个未知的$d*m$在Grassmann流形上满足$Q_*^{\top} Q_* = \mathbb I_m$的矩阵。考虑区分这个模型与$X$和$Y$分别为大小为$n*d$和$n*m$的独立高斯随机矩阵的情况的假设检验问题。我们的结果揭示了低次多项式算法在这个任务中性能的相变现象。(1)当$m=o(d)$时，我们表明，即使$\sigma=0$，所有度为$D$的多项式在$D^4=o\big( \tfrac{d}{m} \big)$的情况下也无法区分这两个模型。(2)当$m=d$且$\sigma=\omega(1)$时，我们表明，即使$D=o(\sigma)$，所有度为$D$的多项式也无法区分这两个模型。(3)当$m=d$且$\sigma=o(1)$时，我们表明，存在一个常数次多项式可以强烈区分这两个模型。这些结果为这个问题的低次多项式算法的有效性建立了一个平滑过渡，突出了维度$m$和$d$，噪声水平$\sigma$以及测试任务的计算复杂度之间的相互作用。

更新时间: 2025-04-04 00:32:38

领域: stat.ML,cs.LG,math.PR,math.ST,stat.TH

下载: http://arxiv.org/abs/2504.03097v1

Learning Visual Composition through Improved Semantic Guidance

Visual imagery does not consist of solitary objects, but instead reflects the composition of a multitude of fluid concepts. While there have been great advances in visual representation learning, such advances have focused on building better representations for a small number of discrete objects bereft of an understanding of how these objects are interacting. One can observe this limitation in representations learned through captions or contrastive learning -- where the learned model treats an image essentially as a bag of words. Several works have attempted to address this limitation through the development of bespoke learned architectures to directly address the shortcomings in compositional learning. In this work, we focus on simple, and scalable approaches. In particular, we demonstrate that by substantially improving weakly labeled data, i.e. captions, we can vastly improve the performance of standard contrastive learning approaches. Previous CLIP models achieved near chance rate on challenging tasks probing compositional learning. However, our simple approach boosts performance of CLIP substantially and surpasses all bespoke architectures. Furthermore, we showcase our results on a relatively new captioning benchmark derived from DOCCI. We demonstrate through a series of ablations that a standard CLIP model trained with enhanced data may demonstrate impressive performance on image retrieval tasks.

Updated: 2025-04-04 00:14:26

标题: 学习视觉构图通过改进的语义指导

摘要: 视觉想象不是由孤立的对象组成，而是反映了多种流动概念的组合。虽然在视觉表示学习方面取得了很大进展，但这些进展主要集中在构建更好的表示来表示少量不相互作用的离散对象。可以观察到通过标题或对比学习学习到的表示存在这种局限性，学习模型基本上将图像视为一袋词语。一些研究已经尝试通过开发专门的学习架构来直接解决组合学习中的缺陷。在这项工作中，我们专注于简单且可扩展的方法。特别是，我们展示了通过大幅改进弱标记数据，即标题，可以极大提高标准对比学习方法的性能。先前的CLIP模型在挑战性任务上接近机会率，探索了组合学习。然而，我们的简单方法显著提高了CLIP的性能，并超越了所有专门的架构。此外，我们展示了我们在一个相对新的来自DOCCI的字幕基准上的结果。我们通过一系列消融实验表明，使用增强数据训练的标准CLIP模型可以在图像检索任务中展现出令人印象深刻的性能。

更新时间: 2025-04-04 00:14:26

领域: cs.CV,cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2412.15396v2

Post-processing for Fair Regression via Explainable SVD

This paper presents a post-processing algorithm for training fair neural network regression models that satisfy statistical parity, utilizing an explainable singular value decomposition (SVD) of the weight matrix. We propose a linear transformation of the weight matrix, whereby the singular values derived from the SVD of the transformed matrix directly correspond to the differences in the first and second moments of the output distributions across two groups. Consequently, we can convert the fairness constraints into constraints on the singular values. We analytically solve the problem of finding the optimal weights under these constraints. Experimental validation on various datasets demonstrates that our method achieves a similar or superior fairness-accuracy trade-off compared to the baselines without using the sensitive attribute at the inference time.

Updated: 2025-04-04 00:10:01

标题: 通过可解释的SVD实现公平回归的后处理

摘要: 本文提出了一种用于训练公平神经网络回归模型的后处理算法，该算法满足统计平衡，利用权重矩阵的可解释奇异值分解（SVD）。我们提出了一种权重矩阵的线性变换，通过这种变换矩阵的SVD导出的奇异值直接对应于两组输出分布的一阶和二阶矩的差异。因此，我们可以将公平性约束转化为奇异值的约束。我们在这些约束下分析性地解决了找到最佳权重的问题。在各种数据集上进行的实验证实，我们的方法在不使用敏感属性的推断时间达到了与基线相似或更优的公平性-准确性折衷。

更新时间: 2025-04-04 00:10:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03093v1

Machine Learning-Based Detection and Analysis of Suspicious Activities in Bitcoin Wallet Transactions in the USA

The dramatic adoption of Bitcoin and other cryptocurrencies in the USA has revolutionized the financial landscape and provided unprecedented investment and transaction efficiency opportunities. The prime objective of this research project is to develop machine learning algorithms capable of effectively identifying and tracking suspicious activity in Bitcoin wallet transactions. With high-tech analysis, the study aims to create a model with a feature for identifying trends and outliers that can expose illicit activity. The current study specifically focuses on Bitcoin transaction information in America, with a strong emphasis placed on the importance of knowing about the immediate environment in and through which such transactions pass through. The dataset is composed of in-depth Bitcoin wallet transactional information, including important factors such as transaction values, timestamps, network flows, and addresses for wallets. All entries in the dataset expose information about financial transactions between wallets, including received and sent transactions, and such information is significant for analysis and trends that can represent suspicious activity. This study deployed three accredited algorithms, most notably, Logistic Regression, Random Forest, and Support Vector Machines. In retrospect, Random Forest emerged as the best model with the highest F1 Score, showcasing its ability to handle non-linear relationships in the data. Insights revealed significant patterns in wallet activity, such as the correlation between unredeemed transactions and final balances. The application of machine algorithms in tracking cryptocurrencies is a tool for creating transparent and secure U.S. markets.

Updated: 2025-04-04 00:07:32

标题: 基于机器学习的美国比特币钱包交易可疑活动检测和分析

摘要: 美国对比特币和其他加密货币的激增已经彻底改变了金融领域，并提供了前所未有的投资和交易效率机会。本研究项目的主要目标是开发机器学习算法，能够有效地识别和跟踪比特币钱包交易中的可疑活动。通过高科技分析，研究旨在创建一个具有识别趋势和异常值功能的模型，可以暴露非法活动。当前研究专注于美国的比特币交易信息，强调了了解此类交易经过的即时环境的重要性。数据集由深入的比特币钱包交易信息组成，包括交易价值、时间戳、网络流和钱包地址等重要因素。数据集中的所有条目都暴露了关于钱包之间的交易的信息，包括接收和发送交易，这些信息对于可以代表可疑活动的分析和趋势至关重要。本研究采用了三种经过认证的算法，尤其是逻辑回归、随机森林和支持向量机。回顾来看，随机森林成为具有最高F1分数的最佳模型，展示了其处理数据中非线性关系的能力。洞察力揭示了钱包活动中的重要模式，如未兑现交易和最终余额之间的相关性。机器算法在追踪加密货币方面的应用是创造透明和安全美国市场的工具。

更新时间: 2025-04-04 00:07:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03092v1

GRS: Generating Robotic Simulation Tasks from Real-World Images

标题: GRS：从现实世界图像生成机器人模拟任务

Improving Offline Mixed-Criticality Scheduling with Reinforcement Learning

标题: 使用强化学习改进离线混合关键性调度

AutoStep: Locally adaptive involutive MCMC

标题: AutoStep：本地自适应的反演MCMC

Regression Discontinuity Design with Distribution-Valued Outcomes

标题: 具有分布值结果的回归断点设计

Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models

标题: 大规模语言模型的多样人类化团队和沟通的算法提示生成

A Symmetric Regressor for MRI-Based Assessment of Striatal Dopamine Transporter Uptake in Parkinson's Disease With Enhanced Uncertainty Estimation

标题: 一种对帕金森病中纹状体多巴胺转运体摄取进行MRI评估的对称回归器，带有增强不确定性估计

CORTEX-AVD: CORner Case Testing & EXploration for Autonomous Vehicles Development

标题: CORTEX-AVD：自动驾驶车辆开发中的角落案例测试与探索

V-CEM: Bridging Performance and Intervenability in Concept-based Models

标题: V-CEM：在基于概念的模型中实现性能和可干预性的桥梁

OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models

标题: OLAF：一个由大型语言模型驱动的用于对话生物信息学的开放生命科学分析框架

GREATERPROMPT: A Unified, Customizable, and High-Performing Open-Source Toolkit for Prompt Optimization

标题: GREATERPROMPT：一个统一、可定制和高性能的用于提示优化的开源工具包

Stack Overflow Meets Replication: Security Research Amid Evolving Code Snippets (Extended Version)

标题: Stack Overflow遇到复制：在不断演变的代码片段中的安全研究（扩展版）

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

标题: VideoComp：在视频文本模型中推进细粒度组合和时间对齐

Bridging LMS and Generative AI: Dynamic Course Content Integration (DCCI) for Connecting LLMs to Course Content -- The Ask ME Assistant

标题: 将LMS和生成式人工智能联系起来：用于连接LMS和课程内容的动态课程内容集成（DCCI）-- Ask ME助手

Clinical ModernBERT: An efficient and long context encoder for biomedical text

标题: 临床现代BERT：一种用于生物医学文本的高效且长上下文编码器

The Black-Box Simulation Barrier Persists in a Fully Quantum World

标题: 在完全量子世界中，黑匣子模拟障碍仍然存在

Optimizing UAV Aerial Base Station Flights Using DRL-based Proximal Policy Optimization

标题: 使用基于深度强化学习的近端策略优化优化无人机空中基站飞行

Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation

标题: 用LLMs解码时间序列：一个跨领域注释的多代理框架

Tell What You Hear From What You See -- Video to Audio Generation Through Text

标题: 从所见告诉你听到的内容--通过文本实现视频到音频的转换

Practical Poisoning Attacks against Retrieval-Augmented Generation

标题: 实用的毒化攻击对检索增强生成的影响

Data-driven construction of a generalized kinetic collision operator from molecular dynamics

标题: 基于分子动力学数据驱动的广义动力学碰撞算子构建

An Efficient Continual Learning Framework for Multivariate Time Series Prediction Tasks with Application to Vehicle State Estimation

标题: 一种高效的多变量时间序列预测任务的持续学习框架，应用于车辆状态估计

DeepOHeat-v1: Efficient Operator Learning for Fast and Trustworthy Thermal Simulation and Optimization in 3D-IC Design

标题: DeepOHeat-v1：三维IC设计中快速可靠的热仿真和优化的高效算子学习

TGraphX: Tensor-Aware Graph Neural Network for Multi-Dimensional Feature Learning

标题: TGraphX：多维特征学习的张量感知图神经网络

A New Approach to Controlling Linear Dynamical Systems

标题: 控制线性动力系统的新方法

Understanding EFX Allocations: Counting and Variants

标题: 理解EFX分配：计数和变体

Batch Bayesian Optimization for High-Dimensional Experimental Design: Simulation and Visualization

标题: 高维实验设计的批量贝叶斯优化：模拟与可视化

ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models

标题: ThinkEdit：可解释的权重编辑以减轻推理模型中过于短的思考

Analysis of Robustness of a Large Game Corpus

标题: 大型游戏语料库的鲁棒性分析

Commit-Reveal$^2$: Randomized Reveal Order Mitigates Last-Revealer Attacks in Commit-Reveal

标题: Commit-Reveal$^2$: 随机揭示顺序减轻了在Commit-Reveal中的最后揭示者攻击

Individualized Deepfake Detection Exploiting Traces Due to Double Neural-Network Operations

标题: 利用由双神经网络操作产生的痕迹进行个性化深度伪造检测

Metamorphic Testing for Fairness Evaluation in Large Language Models: Identifying Intersectional Bias in LLaMA and GPT

标题: 大语言模型公平性评估的变质测试：在 LLaMA 和 GPT 中识别交叉偏见

Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment

标题: Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment 通过毒化对齐增强对LLM的提示注入攻击

Language Models Are Implicitly Continuous

标题: 语言模型是隐式连续的

Adaptation of Large Language Models

标题: 大型语言模型的适应性

Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition

标题: 大型语言模型是否学会了推理？通过3-SAT相变的表征

Level Generation with Constrained Expressive Range

标题: 使用受限表达范围的层级生成

Minmax Trend Filtering: Generalizations of Total Variation Minmax Trend Filtering: Generalizations of Total Variation Denoising via a Local Minmax/Maxmin Formula

标题: 极小-极大趋势滤波：总变差的泛化 通过局部极小极大/极大极小公式进行总变差去噪

Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces

标题: 随机规范化k均值算法：在概率度量空间中聚类的范式转变

An Exploration-free Method for a Linear Stochastic Bandit Driven by a Linear Gaussian Dynamical System

标题: 一个不需要探索的方法，用于由线性高斯动态系统驱动的线性随机老虎机

Generative AI Enhanced Financial Risk Management Information Retrieval

标题: 生成式人工智能增强财务风险管理信息检索

标题: 极小-极大趋势滤波：总变差的泛化通过局部极小极大/极大极小公式进行总变差去噪